[jira] [Updated] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)

2013-07-19 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4730:
---

   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis!

 Join on more than 2^31 records on single reducer failed (wrong results)
 ---

 Key: HIVE-4730
 URL: https://issues.apache.org/jira/browse/HIVE-4730
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0
Reporter: Gabi Kazav
Assignee: Navis
Priority: Blocker
 Fix For: 0.12.0

 Attachments: HIVE-4730.D11283.1.patch, HIVE-4730.D11283.2.patch


 join on more than 2^31 rows leads to wrong results. for example:
 Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
 BY  '\n';
 Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
 BY  '\n';
 Loading 1 row to small_table (the value 1).
 Loading 2149580800 rows to big_table with the same value (1 on this case).
 create table output as select a.p1 from  big_table a join small_table b on 
 (a.p1=b.p1);
 select count(*) from output ; will return only 1 row...
 the reducer syslog:
 ...
 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 
 rows: used memory = 32925960
 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 
 rows: used memory = 12815184
 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 
 rows: used memory = 26684552   -- looks like wrong value..
 ...
 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 
 rows: used memory = 17715896
 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 finished. closing...
 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarded 1 rows
 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 SKEWJOINFOLLOWUPJOBS:0
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
 finished. closing...
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
 forwarded 1 rows
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 6 finished. closing...
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 6 forwarded 0 rows
 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 TABLE_ID_1_ROWCOUNT:1
 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
 Close done
 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 Close done

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)

2013-07-18 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4730:
---

Status: Open  (was: Patch Available)

skewjoin.q test still fails for me. Canceling patch trying to trigger 
pre-commit test-patch

 Join on more than 2^31 records on single reducer failed (wrong results)
 ---

 Key: HIVE-4730
 URL: https://issues.apache.org/jira/browse/HIVE-4730
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0, 0.10.0, 0.9.0, 0.8.1, 0.8.0, 0.7.1
Reporter: Gabi Kazav
Assignee: Navis
Priority: Blocker
 Attachments: HIVE-4730.D11283.1.patch, HIVE-4730.D11283.2.patch


 join on more than 2^31 rows leads to wrong results. for example:
 Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
 BY  '\n';
 Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
 BY  '\n';
 Loading 1 row to small_table (the value 1).
 Loading 2149580800 rows to big_table with the same value (1 on this case).
 create table output as select a.p1 from  big_table a join small_table b on 
 (a.p1=b.p1);
 select count(*) from output ; will return only 1 row...
 the reducer syslog:
 ...
 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 
 rows: used memory = 32925960
 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 
 rows: used memory = 12815184
 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 
 rows: used memory = 26684552   -- looks like wrong value..
 ...
 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 
 rows: used memory = 17715896
 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 finished. closing...
 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarded 1 rows
 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 SKEWJOINFOLLOWUPJOBS:0
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
 finished. closing...
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
 forwarded 1 rows
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 6 finished. closing...
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 6 forwarded 0 rows
 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 TABLE_ID_1_ROWCOUNT:1
 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
 Close done
 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 Close done

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)

2013-07-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4730:
---

Status: Open  (was: Patch Available)

Test {{TestCliDriver_skewjoin.q}} failed. All others did pass.

 Join on more than 2^31 records on single reducer failed (wrong results)
 ---

 Key: HIVE-4730
 URL: https://issues.apache.org/jira/browse/HIVE-4730
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0, 0.10.0, 0.9.0, 0.8.1, 0.8.0, 0.7.1
Reporter: Gabi Kazav
Assignee: Navis
Priority: Blocker
 Attachments: HIVE-4730.D11283.1.patch


 join on more than 2^31 rows leads to wrong results. for example:
 Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
 BY  '\n';
 Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
 BY  '\n';
 Loading 1 row to small_table (the value 1).
 Loading 2149580800 rows to big_table with the same value (1 on this case).
 create table output as select a.p1 from  big_table a join small_table b on 
 (a.p1=b.p1);
 select count(*) from output ; will return only 1 row...
 the reducer syslog:
 ...
 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 
 rows: used memory = 32925960
 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 
 rows: used memory = 12815184
 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 
 rows: used memory = 26684552   -- looks like wrong value..
 ...
 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 
 rows: used memory = 17715896
 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 finished. closing...
 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarded 1 rows
 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 SKEWJOINFOLLOWUPJOBS:0
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
 finished. closing...
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
 forwarded 1 rows
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 6 finished. closing...
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 6 forwarded 0 rows
 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 TABLE_ID_1_ROWCOUNT:1
 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
 Close done
 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 Close done

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)

2013-07-16 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4730:
--

Attachment: HIVE-4730.D11283.2.patch

navis updated the revision HIVE-4730 [jira] Join on more than 2^31 records on 
single reducer failed (wrong results).

  Fixed test failure

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11283

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11283?vs=34707id=35805#toc

BRANCH
  HIVE-4730

ARCANIST PROJECT
  hive

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractRowContainer.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java

To: JIRA, ashutoshc, navis
Cc: brock


 Join on more than 2^31 records on single reducer failed (wrong results)
 ---

 Key: HIVE-4730
 URL: https://issues.apache.org/jira/browse/HIVE-4730
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0
Reporter: Gabi Kazav
Assignee: Navis
Priority: Blocker
 Attachments: HIVE-4730.D11283.1.patch, HIVE-4730.D11283.2.patch


 join on more than 2^31 rows leads to wrong results. for example:
 Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
 BY  '\n';
 Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
 BY  '\n';
 Loading 1 row to small_table (the value 1).
 Loading 2149580800 rows to big_table with the same value (1 on this case).
 create table output as select a.p1 from  big_table a join small_table b on 
 (a.p1=b.p1);
 select count(*) from output ; will return only 1 row...
 the reducer syslog:
 ...
 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 
 rows: used memory = 32925960
 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 
 rows: used memory = 12815184
 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 
 rows: used memory = 26684552   -- looks like wrong value..
 ...
 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 
 rows: used memory = 17715896
 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 finished. closing...
 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarded 1 rows
 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 SKEWJOINFOLLOWUPJOBS:0
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
 finished. closing...
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
 forwarded 1 rows
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 6 finished. closing...
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 6 forwarded 0 rows
 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 TABLE_ID_1_ROWCOUNT:1
 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
 Close done
 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 Close done

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)

2013-07-16 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4730:


Status: Patch Available  (was: Open)

 Join on more than 2^31 records on single reducer failed (wrong results)
 ---

 Key: HIVE-4730
 URL: https://issues.apache.org/jira/browse/HIVE-4730
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0, 0.10.0, 0.9.0, 0.8.1, 0.8.0, 0.7.1
Reporter: Gabi Kazav
Assignee: Navis
Priority: Blocker
 Attachments: HIVE-4730.D11283.1.patch, HIVE-4730.D11283.2.patch


 join on more than 2^31 rows leads to wrong results. for example:
 Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
 BY  '\n';
 Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
 BY  '\n';
 Loading 1 row to small_table (the value 1).
 Loading 2149580800 rows to big_table with the same value (1 on this case).
 create table output as select a.p1 from  big_table a join small_table b on 
 (a.p1=b.p1);
 select count(*) from output ; will return only 1 row...
 the reducer syslog:
 ...
 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 
 rows: used memory = 32925960
 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 
 rows: used memory = 12815184
 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 
 rows: used memory = 26684552   -- looks like wrong value..
 ...
 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 
 rows: used memory = 17715896
 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 finished. closing...
 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarded 1 rows
 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 SKEWJOINFOLLOWUPJOBS:0
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
 finished. closing...
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
 forwarded 1 rows
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 6 finished. closing...
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 6 forwarded 0 rows
 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 TABLE_ID_1_ROWCOUNT:1
 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
 Close done
 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 Close done

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)

2013-07-13 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-4730:
--

Priority: Blocker  (was: Critical)

 Join on more than 2^31 records on single reducer failed (wrong results)
 ---

 Key: HIVE-4730
 URL: https://issues.apache.org/jira/browse/HIVE-4730
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0
Reporter: Gabi Kazav
Assignee: Navis
Priority: Blocker
 Attachments: HIVE-4730.D11283.1.patch


 join on more than 2^31 rows leads to wrong results. for example:
 Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
 BY  '\n';
 Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
 BY  '\n';
 Loading 1 row to small_table (the value 1).
 Loading 2149580800 rows to big_table with the same value (1 on this case).
 create table output as select a.p1 from  big_table a join small_table b on 
 (a.p1=b.p1);
 select count(*) from output ; will return only 1 row...
 the reducer syslog:
 ...
 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 
 rows: used memory = 32925960
 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 
 rows: used memory = 12815184
 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 
 rows: used memory = 26684552   -- looks like wrong value..
 ...
 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 
 rows: used memory = 17715896
 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 finished. closing...
 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarded 1 rows
 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 SKEWJOINFOLLOWUPJOBS:0
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
 finished. closing...
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
 forwarded 1 rows
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 6 finished. closing...
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 6 forwarded 0 rows
 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 TABLE_ID_1_ROWCOUNT:1
 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
 Close done
 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 Close done

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)

2013-07-12 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4730:


Status: Patch Available  (was: Open)

 Join on more than 2^31 records on single reducer failed (wrong results)
 ---

 Key: HIVE-4730
 URL: https://issues.apache.org/jira/browse/HIVE-4730
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0, 0.10.0, 0.9.0, 0.8.1, 0.8.0, 0.7.1
Reporter: Gabi Kazav
Assignee: Navis
Priority: Critical
 Attachments: HIVE-4730.D11283.1.patch


 join on more than 2^31 rows leads to wrong results. for example:
 Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
 BY  '\n';
 Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
 BY  '\n';
 Loading 1 row to small_table (the value 1).
 Loading 2149580800 rows to big_table with the same value (1 on this case).
 create table output as select a.p1 from  big_table a join small_table b on 
 (a.p1=b.p1);
 select count(*) from output ; will return only 1 row...
 the reducer syslog:
 ...
 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 
 rows: used memory = 32925960
 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 
 rows: used memory = 12815184
 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 
 rows: used memory = 26684552   -- looks like wrong value..
 ...
 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 
 rows: used memory = 17715896
 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 finished. closing...
 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarded 1 rows
 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 SKEWJOINFOLLOWUPJOBS:0
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
 finished. closing...
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
 forwarded 1 rows
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 6 finished. closing...
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 6 forwarded 0 rows
 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 TABLE_ID_1_ROWCOUNT:1
 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
 Close done
 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 Close done

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)

2013-06-13 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4730:
--

Attachment: HIVE-4730.D11283.1.patch

navis requested code review of HIVE-4730 [jira] Join on more than 2^31 records 
on single reducer failed (wrong results).

Reviewers: JIRA

HIVE-4730 Join on more than 2^31 records on single reducer failed (wrong 
results)

join on more than 2^31 rows leads to wrong results. for example:

Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
BY  '\n';
Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED BY  
'\n';

Loading 1 row to small_table (the value 1).
Loading 2149580800 rows to big_table with the same value (1 on this case).

create table output as select a.p1 from  big_table a join small_table b on 
(a.p1=b.p1);

select count from output ; will return only 1 row...

the reducer syslog:
...
2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 
rows: used memory = 32925960
2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 
rows: used memory = 12815184
2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 
rows: used memory = 26684552   -- looks like wrong value..
...
2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 
rows: used memory = 17715896
2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
finished. closing...
2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
forwarded 1 rows
2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
SKEWJOINFOLLOWUPJOBS:0
2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
finished. closing...
2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
forwarded 1 rows
2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 
finished. closing...
2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 
forwarded 0 rows
2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
TABLE_ID_1_ROWCOUNT:1
2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
Close done
2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
Close done

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D11283

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractRowContainer.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/26817/

To: JIRA, navis


 Join on more than 2^31 records on single reducer failed (wrong results)
 ---

 Key: HIVE-4730
 URL: https://issues.apache.org/jira/browse/HIVE-4730
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0
Reporter: Gabi Kazav
Assignee: Navis
Priority: Critical
 Attachments: HIVE-4730.D11283.1.patch


 join on more than 2^31 rows leads to wrong results. for example:
 Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
 BY  '\n';
 Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
 BY  '\n';
 Loading 1 row to small_table (the value 1).
 Loading 2149580800 rows to big_table with the same value (1 on this case).
 create table output as select a.p1 from  big_table a join small_table b on 
 (a.p1=b.p1);
 select count(*) from output ; will return only 1 row...
 the reducer syslog:
 ...
 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 
 rows: used memory = 32925960
 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 
 rows: used memory = 12815184
 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 
 rows: used memory = 26684552   -- looks like wrong value..
 ...
 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 
 rows: used memory = 17715896
 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 finished. closing...
 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarded 1 rows
 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 SKEWJOINFOLLOWUPJOBS:0
 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
 finished.