[jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)

2013-06-19 Thread Gabi Kazav (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1363#comment-1363
 ] 

Gabi Kazav commented on HIVE-4730:
--

I have it here:
https://github.com/gabik/hive/tree/HIVE-4730
and merged into trunk:
https://github.com/gabik/hive/tree/trunk


> Join on more than 2^31 records on single reducer failed (wrong results)
> ---
>
> Key: HIVE-4730
> URL: https://issues.apache.org/jira/browse/HIVE-4730
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0
>Reporter: Gabi Kazav
>Assignee: Navis
>Priority: Critical
> Attachments: HIVE-4730.D11283.1.patch
>
>
> join on more than 2^31 rows leads to wrong results. for example:
> Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
> BY  '\n';
> Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
> BY  '\n';
> Loading 1 row to small_table (the value 1).
> Loading 2149580800 rows to big_table with the same value (1 on this case).
> create table output as select a.p1 from  big_table a join small_table b on 
> (a.p1=b.p1);
> select count(*) from output ; will return only 1 row...
> the reducer syslog:
> ...
> 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 
> rows: used memory = 32925960
> 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 
> rows: used memory = 12815184
> 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 
> rows: used memory = 26684552   <-- looks like wrong value..
> ...
> 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 
> rows: used memory = 17715896
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> finished. closing...
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarded 1 rows
> 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
> SKEWJOINFOLLOWUPJOBS:0
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> finished. closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> forwarded 1 rows
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> 6 finished. closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> 6 forwarded 0 rows
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> TABLE_ID_1_ROWCOUNT:1
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> Close done
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> Close done

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)

2013-06-19 Thread Gabi Kazav (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13688862#comment-13688862
 ] 

Gabi Kazav commented on HIVE-4730:
--

Hi - i want to add it to git,
how can i get permissions to push?

Thanks

> Join on more than 2^31 records on single reducer failed (wrong results)
> ---
>
> Key: HIVE-4730
> URL: https://issues.apache.org/jira/browse/HIVE-4730
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0
>Reporter: Gabi Kazav
>Assignee: Navis
>Priority: Critical
> Attachments: HIVE-4730.D11283.1.patch
>
>
> join on more than 2^31 rows leads to wrong results. for example:
> Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
> BY  '\n';
> Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
> BY  '\n';
> Loading 1 row to small_table (the value 1).
> Loading 2149580800 rows to big_table with the same value (1 on this case).
> create table output as select a.p1 from  big_table a join small_table b on 
> (a.p1=b.p1);
> select count(*) from output ; will return only 1 row...
> the reducer syslog:
> ...
> 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 
> rows: used memory = 32925960
> 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 
> rows: used memory = 12815184
> 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 
> rows: used memory = 26684552   <-- looks like wrong value..
> ...
> 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 
> rows: used memory = 17715896
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> finished. closing...
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarded 1 rows
> 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
> SKEWJOINFOLLOWUPJOBS:0
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> finished. closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> forwarded 1 rows
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> 6 finished. closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> 6 forwarded 0 rows
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> TABLE_ID_1_ROWCOUNT:1
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> Close done
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> Close done

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)

2013-06-16 Thread Gabi Kazav (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13684759#comment-13684759
 ] 

Gabi Kazav commented on HIVE-4730:
--

Looks good, thanks!

> Join on more than 2^31 records on single reducer failed (wrong results)
> ---
>
> Key: HIVE-4730
> URL: https://issues.apache.org/jira/browse/HIVE-4730
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0
>Reporter: Gabi Kazav
>Assignee: Navis
>Priority: Critical
> Attachments: HIVE-4730.D11283.1.patch
>
>
> join on more than 2^31 rows leads to wrong results. for example:
> Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
> BY  '\n';
> Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
> BY  '\n';
> Loading 1 row to small_table (the value 1).
> Loading 2149580800 rows to big_table with the same value (1 on this case).
> create table output as select a.p1 from  big_table a join small_table b on 
> (a.p1=b.p1);
> select count(*) from output ; will return only 1 row...
> the reducer syslog:
> ...
> 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 
> rows: used memory = 32925960
> 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 
> rows: used memory = 12815184
> 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 
> rows: used memory = 26684552   <-- looks like wrong value..
> ...
> 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 
> rows: used memory = 17715896
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> finished. closing...
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarded 1 rows
> 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
> SKEWJOINFOLLOWUPJOBS:0
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> finished. closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> forwarded 1 rows
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> 6 finished. closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> 6 forwarded 0 rows
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> TABLE_ID_1_ROWCOUNT:1
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> Close done
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> Close done

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)

2013-06-15 Thread Gabi Kazav (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13684584#comment-13684584
 ] 

Gabi Kazav commented on HIVE-4730:
--

looks like you right, sorry for that.

will check and update here.

Thanks again!

> Join on more than 2^31 records on single reducer failed (wrong results)
> ---
>
> Key: HIVE-4730
> URL: https://issues.apache.org/jira/browse/HIVE-4730
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0
>Reporter: Gabi Kazav
>Assignee: Navis
>Priority: Critical
> Attachments: HIVE-4730.D11283.1.patch
>
>
> join on more than 2^31 rows leads to wrong results. for example:
> Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
> BY  '\n';
> Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
> BY  '\n';
> Loading 1 row to small_table (the value 1).
> Loading 2149580800 rows to big_table with the same value (1 on this case).
> create table output as select a.p1 from  big_table a join small_table b on 
> (a.p1=b.p1);
> select count(*) from output ; will return only 1 row...
> the reducer syslog:
> ...
> 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 
> rows: used memory = 32925960
> 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 
> rows: used memory = 12815184
> 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 
> rows: used memory = 26684552   <-- looks like wrong value..
> ...
> 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 
> rows: used memory = 17715896
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> finished. closing...
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarded 1 rows
> 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
> SKEWJOINFOLLOWUPJOBS:0
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> finished. closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> forwarded 1 rows
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> 6 finished. closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> 6 forwarded 0 rows
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> TABLE_ID_1_ROWCOUNT:1
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> Close done
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> Close done

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)

2013-06-15 Thread Gabi Kazav (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13684193#comment-13684193
 ] 

Gabi Kazav commented on HIVE-4730:
--

ignore the script error line : /hive/hive/bin/hive: line 72: [: 
/hive/hive/lib/hive-exec-0.10.0.jar: binary operator expected
i had 2 jar files (old and new), fixed it, but i still have the exception:

hive> show tables;
FAILED: Error in metadata: MetaException(message:Got exception: 
org.apache.hadoop.hive.metastore.api.MetaException 
javax.jdo.JDODataStoreException: An exception was thrown while 
adding/validating class(es) : Constraint 'COLUMNS_PK' already exists in Schema 
'APP'.
java.sql.SQLException: Constraint 'COLUMNS_PK' already exists in Schema 'APP'.
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
Source)
at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
Source)
at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
Source)
at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
at 
org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264)
at 
org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264)
at 
org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:730)
at 
org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatementList(AbstractTable.java:681)
at 
org.datanucleus.store.rdbms.table.AbstractTable.create(AbstractTable.java:402)
at 
org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:458)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:2689)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:2503)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2148)
at 
org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:113)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:986)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:952)
at 
org.datanucleus.store.AbstractStoreManager.addClass(AbstractStoreManager.java:919)
at 
org.datanucleus.store.mapped.MappedStoreManager.getDatastoreClass(MappedStoreManager.java:356)
at 
org.datanucleus.store.rdbms.query.legacy.ExtentHelper.getExtent(ExtentHelper.java:48)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager.getExtent(RDBMSStoreManager.java:1332)
at 
org.datanucleus.ObjectManagerImpl.getExtent(ObjectManagerImpl.java:4149)
at 
org.datanucleus.store.rdbms.query.legacy.JDOQLQueryCompiler.compileCandidates(JDOQLQueryCompiler.java:411)
at 
org.datanucleus.store.rdbms.query.legacy.QueryCompiler.executionCompile(QueryCompiler.java:312)
at 
org.datanucleus.store.rdbms.query.legacy.JDOQLQueryCompiler.compile(JDOQLQueryCompiler.java:225)
at 
org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.compileInternal(JDOQLQuery.java:175)
at org.datanucleus.store.query.Query.executeQuery(Query.java:1628)
at 
org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.executeQuery(JDOQLQuery.java:245)
at org.datanucleus.store.query.Query.executeWithArray(Query.java:1499)
at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:243)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getTables(ObjectStore.java:781)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111)
at $Proxy6.getTables(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_tables(HiveMetaStore.java:2327)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at ja

[jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)

2013-06-15 Thread Gabi Kazav (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13684192#comment-13684192
 ] 

Gabi Kazav commented on HIVE-4730:
--

Hi Navis - 
on hive 0.7.x is works,
on hive 0.10.0, i am compiling the exec jar and it build me 
hive-exec-0.10.0-SNAPSHOT.jar
when i am running hive on ver 0.10.0, i got the following massage (also after 
clean build):

[hdp@hive-1 gabi]$ hive
/hive/hive/bin/hive: line 72: [: /hive/hive/lib/hive-exec-0.10.0.jar: binary 
operator expected
Logging initialized using configuration in 
jar:file:/hive/hive-0.10.0/lib/hive-common-0.10.0.jar!/hive-log4j.properties
Hive history file=/tmp/hdp/hive_job_log_hdp_201306151610_458251040.txt
hive> show tables;
FAILED: Error in metadata: MetaException(message:Got exception: 
org.apache.hadoop.hive.metastore.api.MetaException 
javax.jdo.JDODataStoreException: An exception was thrown while 
adding/validating class(es) : Constraint 'COLUMNS_PK' already exists in Schema 
'APP'.
java.sql.SQLException: Constraint 'COLUMNS_PK' already exists in Schema 'APP'.
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
Source)
at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
Source)
at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
Source)
at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
at 
org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264)
at 
org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264)
at 
org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:730)
at 
org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatementList(AbstractTable.java:681)
at 
org.datanucleus.store.rdbms.table.AbstractTable.create(AbstractTable.java:402)
at 
org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:458)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:2689)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:2503)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2148)
at 
org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:113)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:986)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:952)
at 
org.datanucleus.store.AbstractStoreManager.addClass(AbstractStoreManager.java:919)
at 
org.datanucleus.store.mapped.MappedStoreManager.getDatastoreClass(MappedStoreManager.java:356)
at 
org.datanucleus.store.rdbms.query.legacy.ExtentHelper.getExtent(ExtentHelper.java:48)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager.getExtent(RDBMSStoreManager.java:1332)
at 
org.datanucleus.ObjectManagerImpl.getExtent(ObjectManagerImpl.java:4149)
at 
org.datanucleus.store.rdbms.query.legacy.JDOQLQueryCompiler.compileCandidates(JDOQLQueryCompiler.java:411)
at 
org.datanucleus.store.rdbms.query.legacy.QueryCompiler.executionCompile(QueryCompiler.java:312)
at 
org.datanucleus.store.rdbms.query.legacy.JDOQLQueryCompiler.compile(JDOQLQueryCompiler.java:225)
at 
org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.compileInternal(JDOQLQuery.java:175)
at org.datanucleus.store.query.Query.executeQuery(Query.java:1628)
at 
org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.executeQuery(JDOQLQuery.java:245)
at org.datanucleus.store.query.Query.executeWithArray(Query.java:1499)
at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:243)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getTables(ObjectStore.java:781)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111)
at $Proxy6.getTables(Unknown Source)
at 
org.apache.hadoop.hive.metastore.Hiv

[jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)

2013-06-14 Thread Gabi Kazav (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683444#comment-13683444
 ] 

Gabi Kazav commented on HIVE-4730:
--

do i need to copy the hive-exec jar only?  or did the patch changed another jar?

Thanks.

> Join on more than 2^31 records on single reducer failed (wrong results)
> ---
>
> Key: HIVE-4730
> URL: https://issues.apache.org/jira/browse/HIVE-4730
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0
>Reporter: Gabi Kazav
>Assignee: Navis
>Priority: Critical
> Attachments: HIVE-4730.D11283.1.patch
>
>
> join on more than 2^31 rows leads to wrong results. for example:
> Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
> BY  '\n';
> Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
> BY  '\n';
> Loading 1 row to small_table (the value 1).
> Loading 2149580800 rows to big_table with the same value (1 on this case).
> create table output as select a.p1 from  big_table a join small_table b on 
> (a.p1=b.p1);
> select count(*) from output ; will return only 1 row...
> the reducer syslog:
> ...
> 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 
> rows: used memory = 32925960
> 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 
> rows: used memory = 12815184
> 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 
> rows: used memory = 26684552   <-- looks like wrong value..
> ...
> 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 
> rows: used memory = 17715896
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> finished. closing...
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarded 1 rows
> 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
> SKEWJOINFOLLOWUPJOBS:0
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> finished. closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> forwarded 1 rows
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> 6 finished. closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> 6 forwarded 0 rows
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> TABLE_ID_1_ROWCOUNT:1
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> Close done
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> Close done

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)

2013-06-14 Thread Gabi Kazav (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683439#comment-13683439
 ] 

Gabi Kazav commented on HIVE-4730:
--

ok - i am trying now - building ql from scratch and running the join,
i will comment here the results.

Thanks. 

> Join on more than 2^31 records on single reducer failed (wrong results)
> ---
>
> Key: HIVE-4730
> URL: https://issues.apache.org/jira/browse/HIVE-4730
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0
>Reporter: Gabi Kazav
>Assignee: Navis
>Priority: Critical
> Attachments: HIVE-4730.D11283.1.patch
>
>
> join on more than 2^31 rows leads to wrong results. for example:
> Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
> BY  '\n';
> Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
> BY  '\n';
> Loading 1 row to small_table (the value 1).
> Loading 2149580800 rows to big_table with the same value (1 on this case).
> create table output as select a.p1 from  big_table a join small_table b on 
> (a.p1=b.p1);
> select count(*) from output ; will return only 1 row...
> the reducer syslog:
> ...
> 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 
> rows: used memory = 32925960
> 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 
> rows: used memory = 12815184
> 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 
> rows: used memory = 26684552   <-- looks like wrong value..
> ...
> 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 
> rows: used memory = 17715896
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> finished. closing...
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarded 1 rows
> 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
> SKEWJOINFOLLOWUPJOBS:0
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> finished. closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> forwarded 1 rows
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> 6 finished. closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> 6 forwarded 0 rows
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> TABLE_ID_1_ROWCOUNT:1
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> Close done
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> Close done

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)

2013-06-14 Thread Gabi Kazav (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683427#comment-13683427
 ] 

Gabi Kazav commented on HIVE-4730:
--

can you explain me how?
ant clean ?

Navis - thank you for your kindly and fast help!

> Join on more than 2^31 records on single reducer failed (wrong results)
> ---
>
> Key: HIVE-4730
> URL: https://issues.apache.org/jira/browse/HIVE-4730
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0
>Reporter: Gabi Kazav
>Assignee: Navis
>Priority: Critical
> Attachments: HIVE-4730.D11283.1.patch
>
>
> join on more than 2^31 rows leads to wrong results. for example:
> Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
> BY  '\n';
> Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
> BY  '\n';
> Loading 1 row to small_table (the value 1).
> Loading 2149580800 rows to big_table with the same value (1 on this case).
> create table output as select a.p1 from  big_table a join small_table b on 
> (a.p1=b.p1);
> select count(*) from output ; will return only 1 row...
> the reducer syslog:
> ...
> 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 
> rows: used memory = 32925960
> 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 
> rows: used memory = 12815184
> 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 
> rows: used memory = 26684552   <-- looks like wrong value..
> ...
> 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 
> rows: used memory = 17715896
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> finished. closing...
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarded 1 rows
> 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
> SKEWJOINFOLLOWUPJOBS:0
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> finished. closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> forwarded 1 rows
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> 6 finished. closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> 6 forwarded 0 rows
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> TABLE_ID_1_ROWCOUNT:1
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> Close done
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> Close done

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)

2013-06-14 Thread Gabi Kazav (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683364#comment-13683364
 ] 

Gabi Kazav commented on HIVE-4730:
--

After patching and compiling, when i run the same join it fail:

..
2013-06-14 16:47:14,924 INFO ExecReducer: ExecReducer: processing 214900 
rows: used memory = 45018992
2013-06-14 16:47:16,042 FATAL org.apache.hadoop.mapred.TaskTracker: Error 
running child : java.lang.NoSuchMethodError: 
org.apache.hadoop.hive.ql.exec.persistence.AbstractRowContainer.size()I
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:802)
at 
org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:263)
at 
org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:301)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:473)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

2013-06-14 16:47:19,051 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=CLEANUP, sessionId=
2013-06-14 16:47:19,305 INFO org.apache.hadoop.mapred.TaskRunner: Runnning 
cleanup for the task
2013-06-14 16:47:19,305 INFO org.apache.hadoop.mapred.TaskRunner: 
Task:attempt_201306121727_0032_r_04_0 is done. And is in the process of 
commiting
2013-06-14 16:47:19,311 INFO org.apache.hadoop.mapred.TaskRunner: Task 
'attempt_201306121727_0032_r_04_0' done.



> Join on more than 2^31 records on single reducer failed (wrong results)
> ---
>
> Key: HIVE-4730
> URL: https://issues.apache.org/jira/browse/HIVE-4730
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0
>Reporter: Gabi Kazav
>Assignee: Navis
>Priority: Critical
> Attachments: HIVE-4730.D11283.1.patch
>
>
> join on more than 2^31 rows leads to wrong results. for example:
> Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
> BY  '\n';
> Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
> BY  '\n';
> Loading 1 row to small_table (the value 1).
> Loading 2149580800 rows to big_table with the same value (1 on this case).
> create table output as select a.p1 from  big_table a join small_table b on 
> (a.p1=b.p1);
> select count(*) from output ; will return only 1 row...
> the reducer syslog:
> ...
> 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 
> rows: used memory = 32925960
> 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 
> rows: used memory = 12815184
> 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 
> rows: used memory = 26684552   <-- looks like wrong value..
> ...
> 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 
> rows: used memory = 17715896
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> finished. closing...
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarded 1 rows
> 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
> SKEWJOINFOLLOWUPJOBS:0
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> finished. closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> forwarded 1 rows
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> 6 finished. closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> 6 forwarded 0 rows
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
> TABLE_ID_1_ROWCOUNT:1
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> Close done
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> Close done

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)

2013-06-13 Thread Gabi Kazav (JIRA)
Gabi Kazav created HIVE-4730:


 Summary: Join on more than 2^31 records on single reducer failed 
(wrong results)
 Key: HIVE-4730
 URL: https://issues.apache.org/jira/browse/HIVE-4730
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0, 0.10.0, 0.9.0, 0.8.1, 0.8.0, 0.7.1
Reporter: Gabi Kazav
Priority: Critical


join on more than 2^31 rows leads to wrong results. for example:

Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED 
BY  '\n';
Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED BY  
'\n';

Loading 1 row to small_table (the value 1).
Loading 2149580800 rows to big_table with the same value (1 on this case).

create table output as select a.p1 from  big_table a join small_table b on 
(a.p1=b.p1);

select count(*) from output ; will return only 1 row...

the reducer syslog:
...
2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 
rows: used memory = 32925960
2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 
rows: used memory = 12815184
2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 
rows: used memory = 26684552   <-- looks like wrong value..
...
2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 
rows: used memory = 17715896
2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
finished. closing...
2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
forwarded 1 rows
2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
SKEWJOINFOLLOWUPJOBS:0
2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
finished. closing...
2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
forwarded 1 rows
2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 
finished. closing...
2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 
forwarded 0 rows
2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
TABLE_ID_1_ROWCOUNT:1
2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
Close done
2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
Close done



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira