[jira] [Updated] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)
[ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4730: --- Resolution: Fixed Fix Version/s: 0.12.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Navis! Join on more than 2^31 records on single reducer failed (wrong results) --- Key: HIVE-4730 URL: https://issues.apache.org/jira/browse/HIVE-4730 Project: Hive Issue Type: Bug Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0 Reporter: Gabi Kazav Assignee: Navis Priority: Blocker Fix For: 0.12.0 Attachments: HIVE-4730.D11283.1.patch, HIVE-4730.D11283.2.patch join on more than 2^31 rows leads to wrong results. for example: Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED BY '\n'; Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED BY '\n'; Loading 1 row to small_table (the value 1). Loading 2149580800 rows to big_table with the same value (1 on this case). create table output as select a.p1 from big_table a join small_table b on (a.p1=b.p1); select count(*) from output ; will return only 1 row... the reducer syslog: ... 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 rows: used memory = 32925960 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 rows: used memory = 12815184 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 rows: used memory = 26684552 -- looks like wrong value.. ... 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 rows: used memory = 17715896 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished. closing... 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded 1 rows 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished. closing... 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded 1 rows 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 finished. closing... 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 forwarded 0 rows 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:1 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)
[ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4730: --- Status: Open (was: Patch Available) skewjoin.q test still fails for me. Canceling patch trying to trigger pre-commit test-patch Join on more than 2^31 records on single reducer failed (wrong results) --- Key: HIVE-4730 URL: https://issues.apache.org/jira/browse/HIVE-4730 Project: Hive Issue Type: Bug Affects Versions: 0.11.0, 0.10.0, 0.9.0, 0.8.1, 0.8.0, 0.7.1 Reporter: Gabi Kazav Assignee: Navis Priority: Blocker Attachments: HIVE-4730.D11283.1.patch, HIVE-4730.D11283.2.patch join on more than 2^31 rows leads to wrong results. for example: Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED BY '\n'; Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED BY '\n'; Loading 1 row to small_table (the value 1). Loading 2149580800 rows to big_table with the same value (1 on this case). create table output as select a.p1 from big_table a join small_table b on (a.p1=b.p1); select count(*) from output ; will return only 1 row... the reducer syslog: ... 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 rows: used memory = 32925960 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 rows: used memory = 12815184 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 rows: used memory = 26684552 -- looks like wrong value.. ... 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 rows: used memory = 17715896 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished. closing... 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded 1 rows 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished. closing... 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded 1 rows 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 finished. closing... 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 forwarded 0 rows 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:1 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)
[ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4730: --- Status: Open (was: Patch Available) Test {{TestCliDriver_skewjoin.q}} failed. All others did pass. Join on more than 2^31 records on single reducer failed (wrong results) --- Key: HIVE-4730 URL: https://issues.apache.org/jira/browse/HIVE-4730 Project: Hive Issue Type: Bug Affects Versions: 0.11.0, 0.10.0, 0.9.0, 0.8.1, 0.8.0, 0.7.1 Reporter: Gabi Kazav Assignee: Navis Priority: Blocker Attachments: HIVE-4730.D11283.1.patch join on more than 2^31 rows leads to wrong results. for example: Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED BY '\n'; Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED BY '\n'; Loading 1 row to small_table (the value 1). Loading 2149580800 rows to big_table with the same value (1 on this case). create table output as select a.p1 from big_table a join small_table b on (a.p1=b.p1); select count(*) from output ; will return only 1 row... the reducer syslog: ... 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 rows: used memory = 32925960 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 rows: used memory = 12815184 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 rows: used memory = 26684552 -- looks like wrong value.. ... 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 rows: used memory = 17715896 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished. closing... 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded 1 rows 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished. closing... 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded 1 rows 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 finished. closing... 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 forwarded 0 rows 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:1 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)
[ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4730: -- Attachment: HIVE-4730.D11283.2.patch navis updated the revision HIVE-4730 [jira] Join on more than 2^31 records on single reducer failed (wrong results). Fixed test failure Reviewers: ashutoshc, JIRA REVISION DETAIL https://reviews.facebook.net/D11283 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D11283?vs=34707id=35805#toc BRANCH HIVE-4730 ARCANIST PROJECT hive AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractRowContainer.java ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java To: JIRA, ashutoshc, navis Cc: brock Join on more than 2^31 records on single reducer failed (wrong results) --- Key: HIVE-4730 URL: https://issues.apache.org/jira/browse/HIVE-4730 Project: Hive Issue Type: Bug Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0 Reporter: Gabi Kazav Assignee: Navis Priority: Blocker Attachments: HIVE-4730.D11283.1.patch, HIVE-4730.D11283.2.patch join on more than 2^31 rows leads to wrong results. for example: Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED BY '\n'; Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED BY '\n'; Loading 1 row to small_table (the value 1). Loading 2149580800 rows to big_table with the same value (1 on this case). create table output as select a.p1 from big_table a join small_table b on (a.p1=b.p1); select count(*) from output ; will return only 1 row... the reducer syslog: ... 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 rows: used memory = 32925960 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 rows: used memory = 12815184 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 rows: used memory = 26684552 -- looks like wrong value.. ... 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 rows: used memory = 17715896 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished. closing... 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded 1 rows 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished. closing... 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded 1 rows 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 finished. closing... 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 forwarded 0 rows 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:1 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)
[ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-4730: Status: Patch Available (was: Open) Join on more than 2^31 records on single reducer failed (wrong results) --- Key: HIVE-4730 URL: https://issues.apache.org/jira/browse/HIVE-4730 Project: Hive Issue Type: Bug Affects Versions: 0.11.0, 0.10.0, 0.9.0, 0.8.1, 0.8.0, 0.7.1 Reporter: Gabi Kazav Assignee: Navis Priority: Blocker Attachments: HIVE-4730.D11283.1.patch, HIVE-4730.D11283.2.patch join on more than 2^31 rows leads to wrong results. for example: Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED BY '\n'; Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED BY '\n'; Loading 1 row to small_table (the value 1). Loading 2149580800 rows to big_table with the same value (1 on this case). create table output as select a.p1 from big_table a join small_table b on (a.p1=b.p1); select count(*) from output ; will return only 1 row... the reducer syslog: ... 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 rows: used memory = 32925960 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 rows: used memory = 12815184 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 rows: used memory = 26684552 -- looks like wrong value.. ... 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 rows: used memory = 17715896 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished. closing... 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded 1 rows 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished. closing... 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded 1 rows 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 finished. closing... 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 forwarded 0 rows 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:1 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)
[ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-4730: -- Priority: Blocker (was: Critical) Join on more than 2^31 records on single reducer failed (wrong results) --- Key: HIVE-4730 URL: https://issues.apache.org/jira/browse/HIVE-4730 Project: Hive Issue Type: Bug Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0 Reporter: Gabi Kazav Assignee: Navis Priority: Blocker Attachments: HIVE-4730.D11283.1.patch join on more than 2^31 rows leads to wrong results. for example: Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED BY '\n'; Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED BY '\n'; Loading 1 row to small_table (the value 1). Loading 2149580800 rows to big_table with the same value (1 on this case). create table output as select a.p1 from big_table a join small_table b on (a.p1=b.p1); select count(*) from output ; will return only 1 row... the reducer syslog: ... 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 rows: used memory = 32925960 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 rows: used memory = 12815184 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 rows: used memory = 26684552 -- looks like wrong value.. ... 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 rows: used memory = 17715896 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished. closing... 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded 1 rows 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished. closing... 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded 1 rows 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 finished. closing... 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 forwarded 0 rows 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:1 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)
[ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-4730: Status: Patch Available (was: Open) Join on more than 2^31 records on single reducer failed (wrong results) --- Key: HIVE-4730 URL: https://issues.apache.org/jira/browse/HIVE-4730 Project: Hive Issue Type: Bug Affects Versions: 0.11.0, 0.10.0, 0.9.0, 0.8.1, 0.8.0, 0.7.1 Reporter: Gabi Kazav Assignee: Navis Priority: Critical Attachments: HIVE-4730.D11283.1.patch join on more than 2^31 rows leads to wrong results. for example: Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED BY '\n'; Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED BY '\n'; Loading 1 row to small_table (the value 1). Loading 2149580800 rows to big_table with the same value (1 on this case). create table output as select a.p1 from big_table a join small_table b on (a.p1=b.p1); select count(*) from output ; will return only 1 row... the reducer syslog: ... 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 rows: used memory = 32925960 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 rows: used memory = 12815184 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 rows: used memory = 26684552 -- looks like wrong value.. ... 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 rows: used memory = 17715896 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished. closing... 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded 1 rows 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished. closing... 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded 1 rows 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 finished. closing... 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 forwarded 0 rows 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:1 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)
[ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4730: -- Attachment: HIVE-4730.D11283.1.patch navis requested code review of HIVE-4730 [jira] Join on more than 2^31 records on single reducer failed (wrong results). Reviewers: JIRA HIVE-4730 Join on more than 2^31 records on single reducer failed (wrong results) join on more than 2^31 rows leads to wrong results. for example: Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED BY '\n'; Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED BY '\n'; Loading 1 row to small_table (the value 1). Loading 2149580800 rows to big_table with the same value (1 on this case). create table output as select a.p1 from big_table a join small_table b on (a.p1=b.p1); select count from output ; will return only 1 row... the reducer syslog: ... 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 rows: used memory = 32925960 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 rows: used memory = 12815184 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 rows: used memory = 26684552 -- looks like wrong value.. ... 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 rows: used memory = 17715896 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished. closing... 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded 1 rows 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished. closing... 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded 1 rows 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 finished. closing... 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 forwarded 0 rows 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:1 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D11283 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractRowContainer.java ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/26817/ To: JIRA, navis Join on more than 2^31 records on single reducer failed (wrong results) --- Key: HIVE-4730 URL: https://issues.apache.org/jira/browse/HIVE-4730 Project: Hive Issue Type: Bug Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0 Reporter: Gabi Kazav Assignee: Navis Priority: Critical Attachments: HIVE-4730.D11283.1.patch join on more than 2^31 rows leads to wrong results. for example: Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED BY '\n'; Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED BY '\n'; Loading 1 row to small_table (the value 1). Loading 2149580800 rows to big_table with the same value (1 on this case). create table output as select a.p1 from big_table a join small_table b on (a.p1=b.p1); select count(*) from output ; will return only 1 row... the reducer syslog: ... 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 rows: used memory = 32925960 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 rows: used memory = 12815184 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 rows: used memory = 26684552 -- looks like wrong value.. ... 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 rows: used memory = 17715896 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished. closing... 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded 1 rows 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished.