[jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)
[ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1363#comment-1363 ] Gabi Kazav commented on HIVE-4730: -- I have it here: https://github.com/gabik/hive/tree/HIVE-4730 and merged into trunk: https://github.com/gabik/hive/tree/trunk > Join on more than 2^31 records on single reducer failed (wrong results) > --- > > Key: HIVE-4730 > URL: https://issues.apache.org/jira/browse/HIVE-4730 > Project: Hive > Issue Type: Bug >Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0 >Reporter: Gabi Kazav >Assignee: Navis >Priority: Critical > Attachments: HIVE-4730.D11283.1.patch > > > join on more than 2^31 rows leads to wrong results. for example: > Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED > BY '\n'; > Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED > BY '\n'; > Loading 1 row to small_table (the value 1). > Loading 2149580800 rows to big_table with the same value (1 on this case). > create table output as select a.p1 from big_table a join small_table b on > (a.p1=b.p1); > select count(*) from output ; will return only 1 row... > the reducer syslog: > ... > 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 > rows: used memory = 32925960 > 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 > rows: used memory = 12815184 > 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 > rows: used memory = 26684552 <-- looks like wrong value.. > ... > 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 > rows: used memory = 17715896 > 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > finished. closing... > 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > forwarded 1 rows > 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: > SKEWJOINFOLLOWUPJOBS:0 > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > finished. closing... > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > forwarded 1 rows > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > 6 finished. closing... > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > 6 forwarded 0 rows > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > TABLE_ID_1_ROWCOUNT:1 > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > Close done > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > Close done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)
[ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13688862#comment-13688862 ] Gabi Kazav commented on HIVE-4730: -- Hi - i want to add it to git, how can i get permissions to push? Thanks > Join on more than 2^31 records on single reducer failed (wrong results) > --- > > Key: HIVE-4730 > URL: https://issues.apache.org/jira/browse/HIVE-4730 > Project: Hive > Issue Type: Bug >Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0 >Reporter: Gabi Kazav >Assignee: Navis >Priority: Critical > Attachments: HIVE-4730.D11283.1.patch > > > join on more than 2^31 rows leads to wrong results. for example: > Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED > BY '\n'; > Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED > BY '\n'; > Loading 1 row to small_table (the value 1). > Loading 2149580800 rows to big_table with the same value (1 on this case). > create table output as select a.p1 from big_table a join small_table b on > (a.p1=b.p1); > select count(*) from output ; will return only 1 row... > the reducer syslog: > ... > 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 > rows: used memory = 32925960 > 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 > rows: used memory = 12815184 > 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 > rows: used memory = 26684552 <-- looks like wrong value.. > ... > 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 > rows: used memory = 17715896 > 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > finished. closing... > 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > forwarded 1 rows > 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: > SKEWJOINFOLLOWUPJOBS:0 > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > finished. closing... > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > forwarded 1 rows > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > 6 finished. closing... > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > 6 forwarded 0 rows > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > TABLE_ID_1_ROWCOUNT:1 > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > Close done > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > Close done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)
[ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13684759#comment-13684759 ] Gabi Kazav commented on HIVE-4730: -- Looks good, thanks! > Join on more than 2^31 records on single reducer failed (wrong results) > --- > > Key: HIVE-4730 > URL: https://issues.apache.org/jira/browse/HIVE-4730 > Project: Hive > Issue Type: Bug >Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0 >Reporter: Gabi Kazav >Assignee: Navis >Priority: Critical > Attachments: HIVE-4730.D11283.1.patch > > > join on more than 2^31 rows leads to wrong results. for example: > Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED > BY '\n'; > Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED > BY '\n'; > Loading 1 row to small_table (the value 1). > Loading 2149580800 rows to big_table with the same value (1 on this case). > create table output as select a.p1 from big_table a join small_table b on > (a.p1=b.p1); > select count(*) from output ; will return only 1 row... > the reducer syslog: > ... > 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 > rows: used memory = 32925960 > 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 > rows: used memory = 12815184 > 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 > rows: used memory = 26684552 <-- looks like wrong value.. > ... > 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 > rows: used memory = 17715896 > 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > finished. closing... > 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > forwarded 1 rows > 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: > SKEWJOINFOLLOWUPJOBS:0 > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > finished. closing... > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > forwarded 1 rows > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > 6 finished. closing... > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > 6 forwarded 0 rows > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > TABLE_ID_1_ROWCOUNT:1 > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > Close done > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > Close done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)
[ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13684584#comment-13684584 ] Gabi Kazav commented on HIVE-4730: -- looks like you right, sorry for that. will check and update here. Thanks again! > Join on more than 2^31 records on single reducer failed (wrong results) > --- > > Key: HIVE-4730 > URL: https://issues.apache.org/jira/browse/HIVE-4730 > Project: Hive > Issue Type: Bug >Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0 >Reporter: Gabi Kazav >Assignee: Navis >Priority: Critical > Attachments: HIVE-4730.D11283.1.patch > > > join on more than 2^31 rows leads to wrong results. for example: > Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED > BY '\n'; > Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED > BY '\n'; > Loading 1 row to small_table (the value 1). > Loading 2149580800 rows to big_table with the same value (1 on this case). > create table output as select a.p1 from big_table a join small_table b on > (a.p1=b.p1); > select count(*) from output ; will return only 1 row... > the reducer syslog: > ... > 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 > rows: used memory = 32925960 > 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 > rows: used memory = 12815184 > 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 > rows: used memory = 26684552 <-- looks like wrong value.. > ... > 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 > rows: used memory = 17715896 > 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > finished. closing... > 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > forwarded 1 rows > 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: > SKEWJOINFOLLOWUPJOBS:0 > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > finished. closing... > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > forwarded 1 rows > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > 6 finished. closing... > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > 6 forwarded 0 rows > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > TABLE_ID_1_ROWCOUNT:1 > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > Close done > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > Close done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)
[ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13684193#comment-13684193 ] Gabi Kazav commented on HIVE-4730: -- ignore the script error line : /hive/hive/bin/hive: line 72: [: /hive/hive/lib/hive-exec-0.10.0.jar: binary operator expected i had 2 jar files (old and new), fixed it, but i still have the exception: hive> show tables; FAILED: Error in metadata: MetaException(message:Got exception: org.apache.hadoop.hive.metastore.api.MetaException javax.jdo.JDODataStoreException: An exception was thrown while adding/validating class(es) : Constraint 'COLUMNS_PK' already exists in Schema 'APP'. java.sql.SQLException: Constraint 'COLUMNS_PK' already exists in Schema 'APP'. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source) at org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264) at org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264) at org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:730) at org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatementList(AbstractTable.java:681) at org.datanucleus.store.rdbms.table.AbstractTable.create(AbstractTable.java:402) at org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:458) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:2689) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:2503) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2148) at org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:113) at org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:986) at org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:952) at org.datanucleus.store.AbstractStoreManager.addClass(AbstractStoreManager.java:919) at org.datanucleus.store.mapped.MappedStoreManager.getDatastoreClass(MappedStoreManager.java:356) at org.datanucleus.store.rdbms.query.legacy.ExtentHelper.getExtent(ExtentHelper.java:48) at org.datanucleus.store.rdbms.RDBMSStoreManager.getExtent(RDBMSStoreManager.java:1332) at org.datanucleus.ObjectManagerImpl.getExtent(ObjectManagerImpl.java:4149) at org.datanucleus.store.rdbms.query.legacy.JDOQLQueryCompiler.compileCandidates(JDOQLQueryCompiler.java:411) at org.datanucleus.store.rdbms.query.legacy.QueryCompiler.executionCompile(QueryCompiler.java:312) at org.datanucleus.store.rdbms.query.legacy.JDOQLQueryCompiler.compile(JDOQLQueryCompiler.java:225) at org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.compileInternal(JDOQLQuery.java:175) at org.datanucleus.store.query.Query.executeQuery(Query.java:1628) at org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.executeQuery(JDOQLQuery.java:245) at org.datanucleus.store.query.Query.executeWithArray(Query.java:1499) at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:243) at org.apache.hadoop.hive.metastore.ObjectStore.getTables(ObjectStore.java:781) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111) at $Proxy6.getTables(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_tables(HiveMetaStore.java:2327) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at ja
[jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)
[ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13684192#comment-13684192 ] Gabi Kazav commented on HIVE-4730: -- Hi Navis - on hive 0.7.x is works, on hive 0.10.0, i am compiling the exec jar and it build me hive-exec-0.10.0-SNAPSHOT.jar when i am running hive on ver 0.10.0, i got the following massage (also after clean build): [hdp@hive-1 gabi]$ hive /hive/hive/bin/hive: line 72: [: /hive/hive/lib/hive-exec-0.10.0.jar: binary operator expected Logging initialized using configuration in jar:file:/hive/hive-0.10.0/lib/hive-common-0.10.0.jar!/hive-log4j.properties Hive history file=/tmp/hdp/hive_job_log_hdp_201306151610_458251040.txt hive> show tables; FAILED: Error in metadata: MetaException(message:Got exception: org.apache.hadoop.hive.metastore.api.MetaException javax.jdo.JDODataStoreException: An exception was thrown while adding/validating class(es) : Constraint 'COLUMNS_PK' already exists in Schema 'APP'. java.sql.SQLException: Constraint 'COLUMNS_PK' already exists in Schema 'APP'. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source) at org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264) at org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264) at org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:730) at org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatementList(AbstractTable.java:681) at org.datanucleus.store.rdbms.table.AbstractTable.create(AbstractTable.java:402) at org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:458) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:2689) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:2503) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2148) at org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:113) at org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:986) at org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:952) at org.datanucleus.store.AbstractStoreManager.addClass(AbstractStoreManager.java:919) at org.datanucleus.store.mapped.MappedStoreManager.getDatastoreClass(MappedStoreManager.java:356) at org.datanucleus.store.rdbms.query.legacy.ExtentHelper.getExtent(ExtentHelper.java:48) at org.datanucleus.store.rdbms.RDBMSStoreManager.getExtent(RDBMSStoreManager.java:1332) at org.datanucleus.ObjectManagerImpl.getExtent(ObjectManagerImpl.java:4149) at org.datanucleus.store.rdbms.query.legacy.JDOQLQueryCompiler.compileCandidates(JDOQLQueryCompiler.java:411) at org.datanucleus.store.rdbms.query.legacy.QueryCompiler.executionCompile(QueryCompiler.java:312) at org.datanucleus.store.rdbms.query.legacy.JDOQLQueryCompiler.compile(JDOQLQueryCompiler.java:225) at org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.compileInternal(JDOQLQuery.java:175) at org.datanucleus.store.query.Query.executeQuery(Query.java:1628) at org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.executeQuery(JDOQLQuery.java:245) at org.datanucleus.store.query.Query.executeWithArray(Query.java:1499) at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:243) at org.apache.hadoop.hive.metastore.ObjectStore.getTables(ObjectStore.java:781) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111) at $Proxy6.getTables(Unknown Source) at org.apache.hadoop.hive.metastore.Hiv
[jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)
[ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683444#comment-13683444 ] Gabi Kazav commented on HIVE-4730: -- do i need to copy the hive-exec jar only? or did the patch changed another jar? Thanks. > Join on more than 2^31 records on single reducer failed (wrong results) > --- > > Key: HIVE-4730 > URL: https://issues.apache.org/jira/browse/HIVE-4730 > Project: Hive > Issue Type: Bug >Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0 >Reporter: Gabi Kazav >Assignee: Navis >Priority: Critical > Attachments: HIVE-4730.D11283.1.patch > > > join on more than 2^31 rows leads to wrong results. for example: > Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED > BY '\n'; > Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED > BY '\n'; > Loading 1 row to small_table (the value 1). > Loading 2149580800 rows to big_table with the same value (1 on this case). > create table output as select a.p1 from big_table a join small_table b on > (a.p1=b.p1); > select count(*) from output ; will return only 1 row... > the reducer syslog: > ... > 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 > rows: used memory = 32925960 > 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 > rows: used memory = 12815184 > 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 > rows: used memory = 26684552 <-- looks like wrong value.. > ... > 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 > rows: used memory = 17715896 > 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > finished. closing... > 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > forwarded 1 rows > 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: > SKEWJOINFOLLOWUPJOBS:0 > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > finished. closing... > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > forwarded 1 rows > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > 6 finished. closing... > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > 6 forwarded 0 rows > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > TABLE_ID_1_ROWCOUNT:1 > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > Close done > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > Close done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)
[ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683439#comment-13683439 ] Gabi Kazav commented on HIVE-4730: -- ok - i am trying now - building ql from scratch and running the join, i will comment here the results. Thanks. > Join on more than 2^31 records on single reducer failed (wrong results) > --- > > Key: HIVE-4730 > URL: https://issues.apache.org/jira/browse/HIVE-4730 > Project: Hive > Issue Type: Bug >Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0 >Reporter: Gabi Kazav >Assignee: Navis >Priority: Critical > Attachments: HIVE-4730.D11283.1.patch > > > join on more than 2^31 rows leads to wrong results. for example: > Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED > BY '\n'; > Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED > BY '\n'; > Loading 1 row to small_table (the value 1). > Loading 2149580800 rows to big_table with the same value (1 on this case). > create table output as select a.p1 from big_table a join small_table b on > (a.p1=b.p1); > select count(*) from output ; will return only 1 row... > the reducer syslog: > ... > 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 > rows: used memory = 32925960 > 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 > rows: used memory = 12815184 > 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 > rows: used memory = 26684552 <-- looks like wrong value.. > ... > 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 > rows: used memory = 17715896 > 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > finished. closing... > 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > forwarded 1 rows > 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: > SKEWJOINFOLLOWUPJOBS:0 > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > finished. closing... > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > forwarded 1 rows > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > 6 finished. closing... > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > 6 forwarded 0 rows > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > TABLE_ID_1_ROWCOUNT:1 > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > Close done > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > Close done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)
[ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683427#comment-13683427 ] Gabi Kazav commented on HIVE-4730: -- can you explain me how? ant clean ? Navis - thank you for your kindly and fast help! > Join on more than 2^31 records on single reducer failed (wrong results) > --- > > Key: HIVE-4730 > URL: https://issues.apache.org/jira/browse/HIVE-4730 > Project: Hive > Issue Type: Bug >Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0 >Reporter: Gabi Kazav >Assignee: Navis >Priority: Critical > Attachments: HIVE-4730.D11283.1.patch > > > join on more than 2^31 rows leads to wrong results. for example: > Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED > BY '\n'; > Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED > BY '\n'; > Loading 1 row to small_table (the value 1). > Loading 2149580800 rows to big_table with the same value (1 on this case). > create table output as select a.p1 from big_table a join small_table b on > (a.p1=b.p1); > select count(*) from output ; will return only 1 row... > the reducer syslog: > ... > 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 > rows: used memory = 32925960 > 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 > rows: used memory = 12815184 > 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 > rows: used memory = 26684552 <-- looks like wrong value.. > ... > 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 > rows: used memory = 17715896 > 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > finished. closing... > 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > forwarded 1 rows > 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: > SKEWJOINFOLLOWUPJOBS:0 > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > finished. closing... > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > forwarded 1 rows > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > 6 finished. closing... > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > 6 forwarded 0 rows > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > TABLE_ID_1_ROWCOUNT:1 > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > Close done > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > Close done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)
[ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683364#comment-13683364 ] Gabi Kazav commented on HIVE-4730: -- After patching and compiling, when i run the same join it fail: .. 2013-06-14 16:47:14,924 INFO ExecReducer: ExecReducer: processing 214900 rows: used memory = 45018992 2013-06-14 16:47:16,042 FATAL org.apache.hadoop.mapred.TaskTracker: Error running child : java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.exec.persistence.AbstractRowContainer.size()I at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:802) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:263) at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:301) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:473) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at org.apache.hadoop.mapred.Child.main(Child.java:170) 2013-06-14 16:47:19,051 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=CLEANUP, sessionId= 2013-06-14 16:47:19,305 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task 2013-06-14 16:47:19,305 INFO org.apache.hadoop.mapred.TaskRunner: Task:attempt_201306121727_0032_r_04_0 is done. And is in the process of commiting 2013-06-14 16:47:19,311 INFO org.apache.hadoop.mapred.TaskRunner: Task 'attempt_201306121727_0032_r_04_0' done. > Join on more than 2^31 records on single reducer failed (wrong results) > --- > > Key: HIVE-4730 > URL: https://issues.apache.org/jira/browse/HIVE-4730 > Project: Hive > Issue Type: Bug >Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0 >Reporter: Gabi Kazav >Assignee: Navis >Priority: Critical > Attachments: HIVE-4730.D11283.1.patch > > > join on more than 2^31 rows leads to wrong results. for example: > Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED > BY '\n'; > Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED > BY '\n'; > Loading 1 row to small_table (the value 1). > Loading 2149580800 rows to big_table with the same value (1 on this case). > create table output as select a.p1 from big_table a join small_table b on > (a.p1=b.p1); > select count(*) from output ; will return only 1 row... > the reducer syslog: > ... > 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 > rows: used memory = 32925960 > 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 > rows: used memory = 12815184 > 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 > rows: used memory = 26684552 <-- looks like wrong value.. > ... > 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 > rows: used memory = 17715896 > 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > finished. closing... > 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > forwarded 1 rows > 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: > SKEWJOINFOLLOWUPJOBS:0 > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > finished. closing... > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > forwarded 1 rows > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > 6 finished. closing... > 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > 6 forwarded 0 rows > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: > TABLE_ID_1_ROWCOUNT:1 > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > Close done > 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > Close done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)
Gabi Kazav created HIVE-4730: Summary: Join on more than 2^31 records on single reducer failed (wrong results) Key: HIVE-4730 URL: https://issues.apache.org/jira/browse/HIVE-4730 Project: Hive Issue Type: Bug Affects Versions: 0.11.0, 0.10.0, 0.9.0, 0.8.1, 0.8.0, 0.7.1 Reporter: Gabi Kazav Priority: Critical join on more than 2^31 rows leads to wrong results. for example: Create table small_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED BY '\n'; Create table big_table (p1 string) ROW FORMAT DELIMITEDLINES TERMINATED BY '\n'; Loading 1 row to small_table (the value 1). Loading 2149580800 rows to big_table with the same value (1 on this case). create table output as select a.p1 from big_table a join small_table b on (a.p1=b.p1); select count(*) from output ; will return only 1 row... the reducer syslog: ... 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 214700 rows: used memory = 32925960 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 214800 rows: used memory = 12815184 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 214900 rows: used memory = 26684552 <-- looks like wrong value.. ... 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 rows: used memory = 17715896 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished. closing... 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded 1 rows 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished. closing... 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded 1 rows 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 finished. closing... 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 forwarded 0 rows 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:1 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira