[jira] [Commented] (HIVE-17410) repl load task during subsequent DAG generation does not start from the last partition processed
[ https://issues.apache.org/jira/browse/HIVE-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146685#comment-16146685 ] Hive QA commented on HIVE-17410: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12884370/HIVE-17410.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11000 tests executed *Failed tests:* {noformat} TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) (batchId=280) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=61) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=100) org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver (batchId=104) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6595/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6595/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6595/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12884370 - PreCommit-HIVE-Build > repl load task during subsequent DAG generation does not start from the last > partition processed > > > Key: HIVE-17410 > URL: https://issues.apache.org/jira/browse/HIVE-17410 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Attachments: HIVE-17410.1.patch > > > DAG generation for repl load task was to be generated dynamically such that > if the load break happens at a partition load time then for subsequent runs > we should start post the last partition processed. > We currently identify the point from where we have to process the event but > reinitialize the iterator to start from beginning of all partition's to > process. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17399) Do not remove semijoin branch if it feeds to TS->DPP_EVENT
[ https://issues.apache.org/jira/browse/HIVE-17399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-17399: -- Attachment: HIVE-17399.2.patch Implemented the comments. Improved the test with non-zero results > Do not remove semijoin branch if it feeds to TS->DPP_EVENT > -- > > Key: HIVE-17399 > URL: https://issues.apache.org/jira/browse/HIVE-17399 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal > Attachments: HIVE-17399.1.patch, HIVE-17399.2.patch > > > If there is an incoming semijoin branch to a TS which has DPP event, then try > to keep it as it may serve as an excellent filter for DPP thus reducing the > input to join drastically. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17195) Long chain of tasks created by REPL LOAD shouldn't cause stack corruption.
[ https://issues.apache.org/jira/browse/HIVE-17195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146635#comment-16146635 ] ASF GitHub Bot commented on HIVE-17195: --- Github user sankarh closed the pull request at: https://github.com/apache/hive/pull/212 > Long chain of tasks created by REPL LOAD shouldn't cause stack corruption. > -- > > Key: HIVE-17195 > URL: https://issues.apache.org/jira/browse/HIVE-17195 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DAG, DR, Executor, replication > Fix For: 3.0.0 > > Attachments: HIVE-17195.01.patch, HIVE-17195.02.patch > > > Currently, long chain REPL LOAD tasks lead to huge recursive calls when try > to traverse the DAG. > For example, getMRTasks, getTezTasks, getSparkTasks and iterateTasks methods > run recursively to traverse the DAG. > Need to modify this traversal logic to reduce stack usage. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.
[ https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-17100: Status: Patch Available (was: Open) > Improve HS2 operation logs for REPL commands. > - > > Key: HIVE-17100 > URL: https://issues.apache.org/jira/browse/HIVE-17100 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, > HIVE-17100.03.patch, HIVE-17100.04.patch, HIVE-17100.05.patch, > HIVE-17100.06.patch, HIVE-17100.07.patch, HIVE-17100.08.patch, > HIVE-17100.09.patch, HIVE-17100.10.patch > > > It is necessary to log the progress the replication tasks in a structured > manner as follows. > *+Bootstrap Dump:+* > * At the start of bootstrap dump, will add one log with below details. > {color:#59afe1}* Database Name > * Dump Type (BOOTSTRAP) > * (Estimated) Total number of tables/views to dump > * (Estimated) Total number of functions to dump. > * Dump Start Time{color} > * After each table dump, will add a log as follows > {color:#59afe1}* Table/View Name > * Type (TABLE/VIEW/MATERIALIZED_VIEW) > * Table dump end time > * Table dump progress. Format is Table sequence no/(Estimated) Total number > of tables and views.{color} > * After each function dump, will add a log as follows > {color:#59afe1}* Function Name > * Function dump end time > * Function dump progress. Format is Function sequence no/(Estimated) Total > number of functions.{color} > * After completion of all dumps, will add a log as follows to consolidate the > dump. > {color:#59afe1}* Database Name. > * Dump Type (BOOTSTRAP). > * Dump End Time. > * (Actual) Total number of tables/views dumped. > * (Actual) Total number of functions dumped. > * Dump Directory. > * Last Repl ID of the dump.{color} > *Note:* The actual and estimated number of tables/functions may not match if > any table/function is dropped when dump in progress. > *+Bootstrap Load:+* > * At the start of bootstrap load, will add one log with below details. > {color:#59afe1}* Database Name > * Dump directory > * Load Type (BOOTSTRAP) > * Total number of tables/views to load > * Total number of functions to load. > * Load Start Time{color} > * After each table load, will add a log as follows > {color:#59afe1}* Table/View Name > * Type (TABLE/VIEW/MATERIALIZED_VIEW) > * Table load completion time > * Table load progress. Format is Table sequence no/Total number of tables and > views.{color} > * After each function load, will add a log as follows > {color:#59afe1}* Function Name > * Function load completion time > * Function load progress. Format is Function sequence no/Total number of > functions.{color} > * After completion of all dumps, will add a log as follows to consolidate the > load. > {color:#59afe1}* Database Name. > * Load Type (BOOTSTRAP). > * Load End Time. > * Total number of tables/views loaded. > * Total number of functions loaded. > * Last Repl ID of the loaded database.{color} > *+Incremental Dump:+* > * At the start of database dump, will add one log with below details. > {color:#59afe1}* Database Name > * Dump Type (INCREMENTAL) > * (Estimated) Total number of events to dump. > * Dump Start Time{color} > * After each event dump, will add a log as follows > {color:#59afe1}* Event ID > * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc) > * Event dump end time > * Event dump progress. Format is Event sequence no/ (Estimated) Total number > of events.{color} > * After completion of all event dumps, will add a log as follows. > {color:#59afe1}* Database Name. > * Dump Type (INCREMENTAL). > * Dump End Time. > * (Actual) Total number of events dumped. > * Dump Directory. > * Last Repl ID of the dump.{color} > *Note:* The estimated number of events can be terribly inaccurate with actual > number as we don’t have the number of events upfront until we read from > metastore NotificationEvents table. > *+Incremental Load:+* > * At the start of incremental load, will add one log with below details. > {color:#59afe1}* Target Database Name > * Dump directory > * Load Type (INCREMENTAL) > * Total number of events to load > * Load Start Time{color} > * After each event load, will add a log as follows > {color:#59afe1}* Event ID > * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc) > * Event load end time > * Event load progress. Format is Event sequence no/ Total number of > events.{color} > * After completion of all event loads, will add a log as follows to > consolidate the load. > {color:#59afe1}* Target Database Name. > * Load Type (INCREMENTAL). > * Load End Time. > * Tot
[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.
[ https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-17100: Attachment: HIVE-17100.10.patch Added 10.patch with below change - Replaced addTask with updateTackCount as it is enough to increment the task count in tracker as ReplLogTask is child of other root tasks. As [~anishek] already +1 the patch with this fix, request [~thejas] to please commit! > Improve HS2 operation logs for REPL commands. > - > > Key: HIVE-17100 > URL: https://issues.apache.org/jira/browse/HIVE-17100 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, > HIVE-17100.03.patch, HIVE-17100.04.patch, HIVE-17100.05.patch, > HIVE-17100.06.patch, HIVE-17100.07.patch, HIVE-17100.08.patch, > HIVE-17100.09.patch, HIVE-17100.10.patch > > > It is necessary to log the progress the replication tasks in a structured > manner as follows. > *+Bootstrap Dump:+* > * At the start of bootstrap dump, will add one log with below details. > {color:#59afe1}* Database Name > * Dump Type (BOOTSTRAP) > * (Estimated) Total number of tables/views to dump > * (Estimated) Total number of functions to dump. > * Dump Start Time{color} > * After each table dump, will add a log as follows > {color:#59afe1}* Table/View Name > * Type (TABLE/VIEW/MATERIALIZED_VIEW) > * Table dump end time > * Table dump progress. Format is Table sequence no/(Estimated) Total number > of tables and views.{color} > * After each function dump, will add a log as follows > {color:#59afe1}* Function Name > * Function dump end time > * Function dump progress. Format is Function sequence no/(Estimated) Total > number of functions.{color} > * After completion of all dumps, will add a log as follows to consolidate the > dump. > {color:#59afe1}* Database Name. > * Dump Type (BOOTSTRAP). > * Dump End Time. > * (Actual) Total number of tables/views dumped. > * (Actual) Total number of functions dumped. > * Dump Directory. > * Last Repl ID of the dump.{color} > *Note:* The actual and estimated number of tables/functions may not match if > any table/function is dropped when dump in progress. > *+Bootstrap Load:+* > * At the start of bootstrap load, will add one log with below details. > {color:#59afe1}* Database Name > * Dump directory > * Load Type (BOOTSTRAP) > * Total number of tables/views to load > * Total number of functions to load. > * Load Start Time{color} > * After each table load, will add a log as follows > {color:#59afe1}* Table/View Name > * Type (TABLE/VIEW/MATERIALIZED_VIEW) > * Table load completion time > * Table load progress. Format is Table sequence no/Total number of tables and > views.{color} > * After each function load, will add a log as follows > {color:#59afe1}* Function Name > * Function load completion time > * Function load progress. Format is Function sequence no/Total number of > functions.{color} > * After completion of all dumps, will add a log as follows to consolidate the > load. > {color:#59afe1}* Database Name. > * Load Type (BOOTSTRAP). > * Load End Time. > * Total number of tables/views loaded. > * Total number of functions loaded. > * Last Repl ID of the loaded database.{color} > *+Incremental Dump:+* > * At the start of database dump, will add one log with below details. > {color:#59afe1}* Database Name > * Dump Type (INCREMENTAL) > * (Estimated) Total number of events to dump. > * Dump Start Time{color} > * After each event dump, will add a log as follows > {color:#59afe1}* Event ID > * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc) > * Event dump end time > * Event dump progress. Format is Event sequence no/ (Estimated) Total number > of events.{color} > * After completion of all event dumps, will add a log as follows. > {color:#59afe1}* Database Name. > * Dump Type (INCREMENTAL). > * Dump End Time. > * (Actual) Total number of events dumped. > * Dump Directory. > * Last Repl ID of the dump.{color} > *Note:* The estimated number of events can be terribly inaccurate with actual > number as we don’t have the number of events upfront until we read from > metastore NotificationEvents table. > *+Incremental Load:+* > * At the start of incremental load, will add one log with below details. > {color:#59afe1}* Target Database Name > * Dump directory > * Load Type (INCREMENTAL) > * Total number of events to load > * Load Start Time{color} > * After each event load, will add a log as follows > {color:#59afe1}* Event ID > * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc) > * Event load end time > * Event load progre
[jira] [Commented] (HIVE-17307) Change the metastore to not use the metrics code in hive/common
[ https://issues.apache.org/jira/browse/HIVE-17307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146615#comment-16146615 ] Hive QA commented on HIVE-17307: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12884365/HIVE-17307.4.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11019 tests executed *Failed tests:* {noformat} TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) (batchId=280) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=61) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=234) org.apache.hadoop.hive.ql.parse.TestExport.org.apache.hadoop.hive.ql.parse.TestExport (batchId=218) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6594/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6594/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6594/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12884365 - PreCommit-HIVE-Build > Change the metastore to not use the metrics code in hive/common > --- > > Key: HIVE-17307 > URL: https://issues.apache.org/jira/browse/HIVE-17307 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-17307.2.patch, HIVE-17307.3.patch, > HIVE-17307.4.patch, HIVE-17307.patch > > > As we move code into the standalone metastore module, it cannot use the > metrics in hive-common. We could copy the current Metrics interface or we > could change the metastore code to directly use codahale metrics. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17405) HoS DPP ConstantPropagate should use ConstantPropagateOption.SHORTCUT
[ https://issues.apache.org/jira/browse/HIVE-17405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146616#comment-16146616 ] Rui Li commented on HIVE-17405: --- [~stakiar], I think the root cause is the vector GBY has some issue with constant key expressions. You can look at our discussions in HIVE-16823 and HIVE-17383. > HoS DPP ConstantPropagate should use ConstantPropagateOption.SHORTCUT > - > > Key: HIVE-17405 > URL: https://issues.apache.org/jira/browse/HIVE-17405 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17405.1.patch, HIVE-17405.2.patch, > HIVE-17405.3.patch > > > In {{SparkCompiler#runDynamicPartitionPruning}} we should change {{new > ConstantPropagate().transform(parseContext)}} to {{new > ConstantPropagate(ConstantPropagateOption.SHORTCUT).transform(parseContext)}} > Hive-on-Tez does the same thing. > Running the full constant propagation isn't really necessary, we just want to > eliminate any {{and true}} predicates that were introduced by > {{SyntheticJoinPredicate}} and {{DynamicPartitionPruningOptimization}}. The > {{SyntheticJoinPredicate}} will introduce dummy filter predicates into the > operator tree, and {{DynamicPartitionPruningOptimization}} will replace them. > The predicates introduced via {{SyntheticJoinPredicate}} are necessary to > help {{DynamicPartitionPruningOptimization}} determine if DPP can be used or > not. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17412) Add "-- SORT_QUERY_RESULTS" for spark_vectorized_dynamic_partition_pruning.q
[ https://issues.apache.org/jira/browse/HIVE-17412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel updated HIVE-17412: Attachment: HIVE-17412.patch [~stakiar], [~lirui]: Please help review, thanks! > Add "-- SORT_QUERY_RESULTS" for spark_vectorized_dynamic_partition_pruning.q > > > Key: HIVE-17412 > URL: https://issues.apache.org/jira/browse/HIVE-17412 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-17412.patch > > > for query > {code} > set hive.optimize.ppd=true; > set hive.ppd.remove.duplicatefilters=true; > set hive.spark.dynamic.partition.pruning=true; > set hive.optimize.metadataonly=false; > set hive.optimize.index.filter=true; > set hive.vectorized.execution.enabled=true; > set hive.strict.checks.cartesian.product=false; > select distinct ds from srcpart; > {code} > the result is > {code} > 2008-04-09 > 2008-04-08 > {code} > the result of groupby in spark is not in order. Sometimes it returns > {code} > 2008-04-08 > 2008-04-09 > {code} > Sometimes it returns > {code} > 2008-04-09 > 2008-04-08 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.
[ https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-17100: Status: Open (was: Patch Available) > Improve HS2 operation logs for REPL commands. > - > > Key: HIVE-17100 > URL: https://issues.apache.org/jira/browse/HIVE-17100 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, > HIVE-17100.03.patch, HIVE-17100.04.patch, HIVE-17100.05.patch, > HIVE-17100.06.patch, HIVE-17100.07.patch, HIVE-17100.08.patch, > HIVE-17100.09.patch > > > It is necessary to log the progress the replication tasks in a structured > manner as follows. > *+Bootstrap Dump:+* > * At the start of bootstrap dump, will add one log with below details. > {color:#59afe1}* Database Name > * Dump Type (BOOTSTRAP) > * (Estimated) Total number of tables/views to dump > * (Estimated) Total number of functions to dump. > * Dump Start Time{color} > * After each table dump, will add a log as follows > {color:#59afe1}* Table/View Name > * Type (TABLE/VIEW/MATERIALIZED_VIEW) > * Table dump end time > * Table dump progress. Format is Table sequence no/(Estimated) Total number > of tables and views.{color} > * After each function dump, will add a log as follows > {color:#59afe1}* Function Name > * Function dump end time > * Function dump progress. Format is Function sequence no/(Estimated) Total > number of functions.{color} > * After completion of all dumps, will add a log as follows to consolidate the > dump. > {color:#59afe1}* Database Name. > * Dump Type (BOOTSTRAP). > * Dump End Time. > * (Actual) Total number of tables/views dumped. > * (Actual) Total number of functions dumped. > * Dump Directory. > * Last Repl ID of the dump.{color} > *Note:* The actual and estimated number of tables/functions may not match if > any table/function is dropped when dump in progress. > *+Bootstrap Load:+* > * At the start of bootstrap load, will add one log with below details. > {color:#59afe1}* Database Name > * Dump directory > * Load Type (BOOTSTRAP) > * Total number of tables/views to load > * Total number of functions to load. > * Load Start Time{color} > * After each table load, will add a log as follows > {color:#59afe1}* Table/View Name > * Type (TABLE/VIEW/MATERIALIZED_VIEW) > * Table load completion time > * Table load progress. Format is Table sequence no/Total number of tables and > views.{color} > * After each function load, will add a log as follows > {color:#59afe1}* Function Name > * Function load completion time > * Function load progress. Format is Function sequence no/Total number of > functions.{color} > * After completion of all dumps, will add a log as follows to consolidate the > load. > {color:#59afe1}* Database Name. > * Load Type (BOOTSTRAP). > * Load End Time. > * Total number of tables/views loaded. > * Total number of functions loaded. > * Last Repl ID of the loaded database.{color} > *+Incremental Dump:+* > * At the start of database dump, will add one log with below details. > {color:#59afe1}* Database Name > * Dump Type (INCREMENTAL) > * (Estimated) Total number of events to dump. > * Dump Start Time{color} > * After each event dump, will add a log as follows > {color:#59afe1}* Event ID > * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc) > * Event dump end time > * Event dump progress. Format is Event sequence no/ (Estimated) Total number > of events.{color} > * After completion of all event dumps, will add a log as follows. > {color:#59afe1}* Database Name. > * Dump Type (INCREMENTAL). > * Dump End Time. > * (Actual) Total number of events dumped. > * Dump Directory. > * Last Repl ID of the dump.{color} > *Note:* The estimated number of events can be terribly inaccurate with actual > number as we don’t have the number of events upfront until we read from > metastore NotificationEvents table. > *+Incremental Load:+* > * At the start of incremental load, will add one log with below details. > {color:#59afe1}* Target Database Name > * Dump directory > * Load Type (INCREMENTAL) > * Total number of events to load > * Load Start Time{color} > * After each event load, will add a log as follows > {color:#59afe1}* Event ID > * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc) > * Event load end time > * Event load progress. Format is Event sequence no/ Total number of > events.{color} > * After completion of all event loads, will add a log as follows to > consolidate the load. > {color:#59afe1}* Target Database Name. > * Load Type (INCREMENTAL). > * Load End Time. > * Total number of events l
[jira] [Assigned] (HIVE-17412) Add "-- SORT_QUERY_RESULTS" for spark_vectorized_dynamic_partition_pruning.q
[ https://issues.apache.org/jira/browse/HIVE-17412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel reassigned HIVE-17412: --- > Add "-- SORT_QUERY_RESULTS" for spark_vectorized_dynamic_partition_pruning.q > > > Key: HIVE-17412 > URL: https://issues.apache.org/jira/browse/HIVE-17412 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > > for query > {code} > set hive.optimize.ppd=true; > set hive.ppd.remove.duplicatefilters=true; > set hive.spark.dynamic.partition.pruning=true; > set hive.optimize.metadataonly=false; > set hive.optimize.index.filter=true; > set hive.vectorized.execution.enabled=true; > set hive.strict.checks.cartesian.product=false; > select distinct ds from srcpart; > {code} > the result is > {code} > 2008-04-09 > 2008-04-08 > {code} > the result of groupby in spark is not in order. Sometimes it returns > {code} > 2008-04-08 > 2008-04-09 > {code} > Sometimes it returns > {code} > 2008-04-09 > 2008-04-08 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17405) HoS DPP ConstantPropagate should use ConstantPropagateOption.SHORTCUT
[ https://issues.apache.org/jira/browse/HIVE-17405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146590#comment-16146590 ] Sahil Takiar commented on HIVE-17405: - Oh wow, thanks for pointing that out [~lirui]! I didn't even notice. TBH not really sure why it fixes spark_vectorized_dynamic_partition_pruning.q, I can try to dig into it some more. Attached an updated patch with an re-generated spark_vectorized_dynamic_partition_pruning.q.out file. > HoS DPP ConstantPropagate should use ConstantPropagateOption.SHORTCUT > - > > Key: HIVE-17405 > URL: https://issues.apache.org/jira/browse/HIVE-17405 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17405.1.patch, HIVE-17405.2.patch, > HIVE-17405.3.patch > > > In {{SparkCompiler#runDynamicPartitionPruning}} we should change {{new > ConstantPropagate().transform(parseContext)}} to {{new > ConstantPropagate(ConstantPropagateOption.SHORTCUT).transform(parseContext)}} > Hive-on-Tez does the same thing. > Running the full constant propagation isn't really necessary, we just want to > eliminate any {{and true}} predicates that were introduced by > {{SyntheticJoinPredicate}} and {{DynamicPartitionPruningOptimization}}. The > {{SyntheticJoinPredicate}} will introduce dummy filter predicates into the > operator tree, and {{DynamicPartitionPruningOptimization}} will replace them. > The predicates introduced via {{SyntheticJoinPredicate}} are necessary to > help {{DynamicPartitionPruningOptimization}} determine if DPP can be used or > not. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17405) HoS DPP ConstantPropagate should use ConstantPropagateOption.SHORTCUT
[ https://issues.apache.org/jira/browse/HIVE-17405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-17405: Attachment: HIVE-17405.3.patch > HoS DPP ConstantPropagate should use ConstantPropagateOption.SHORTCUT > - > > Key: HIVE-17405 > URL: https://issues.apache.org/jira/browse/HIVE-17405 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17405.1.patch, HIVE-17405.2.patch, > HIVE-17405.3.patch > > > In {{SparkCompiler#runDynamicPartitionPruning}} we should change {{new > ConstantPropagate().transform(parseContext)}} to {{new > ConstantPropagate(ConstantPropagateOption.SHORTCUT).transform(parseContext)}} > Hive-on-Tez does the same thing. > Running the full constant propagation isn't really necessary, we just want to > eliminate any {{and true}} predicates that were introduced by > {{SyntheticJoinPredicate}} and {{DynamicPartitionPruningOptimization}}. The > {{SyntheticJoinPredicate}} will introduce dummy filter predicates into the > operator tree, and {{DynamicPartitionPruningOptimization}} will replace them. > The predicates introduced via {{SyntheticJoinPredicate}} are necessary to > help {{DynamicPartitionPruningOptimization}} determine if DPP can be used or > not. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader
[ https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146556#comment-16146556 ] Hive QA commented on HIVE-17304: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12884351/HIVE-17304.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11014 tests executed *Failed tests:* {noformat} TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) (batchId=280) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=61) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mergejoin] (batchId=158) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucketizedhiveinputformat] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=234) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6593/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6593/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6593/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12884351 - PreCommit-HIVE-Build > ThreadMXBean based memory allocation monitory for hash table loader > --- > > Key: HIVE-17304 > URL: https://issues.apache.org/jira/browse/HIVE-17304 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17304.1.patch, HIVE-17304.2.patch > > > Hash table memory monitoring is based on java data model which can be > unreliable because of various reasons (wrong object size estimation, adding > new variables to any class without accounting its size for memory monitoring, > etc.). We can use allocation size per thread that is provided by ThreadMXBean > and fallback to DataModel in case if JDK doesn't support thread based > allocations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17100) Improve HS2 operation logs for REPL commands.
[ https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146548#comment-16146548 ] anishek commented on HIVE-17100: +1 only issue left is the dependency logger added to root, which i think you are already working on . > Improve HS2 operation logs for REPL commands. > - > > Key: HIVE-17100 > URL: https://issues.apache.org/jira/browse/HIVE-17100 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, > HIVE-17100.03.patch, HIVE-17100.04.patch, HIVE-17100.05.patch, > HIVE-17100.06.patch, HIVE-17100.07.patch, HIVE-17100.08.patch, > HIVE-17100.09.patch > > > It is necessary to log the progress the replication tasks in a structured > manner as follows. > *+Bootstrap Dump:+* > * At the start of bootstrap dump, will add one log with below details. > {color:#59afe1}* Database Name > * Dump Type (BOOTSTRAP) > * (Estimated) Total number of tables/views to dump > * (Estimated) Total number of functions to dump. > * Dump Start Time{color} > * After each table dump, will add a log as follows > {color:#59afe1}* Table/View Name > * Type (TABLE/VIEW/MATERIALIZED_VIEW) > * Table dump end time > * Table dump progress. Format is Table sequence no/(Estimated) Total number > of tables and views.{color} > * After each function dump, will add a log as follows > {color:#59afe1}* Function Name > * Function dump end time > * Function dump progress. Format is Function sequence no/(Estimated) Total > number of functions.{color} > * After completion of all dumps, will add a log as follows to consolidate the > dump. > {color:#59afe1}* Database Name. > * Dump Type (BOOTSTRAP). > * Dump End Time. > * (Actual) Total number of tables/views dumped. > * (Actual) Total number of functions dumped. > * Dump Directory. > * Last Repl ID of the dump.{color} > *Note:* The actual and estimated number of tables/functions may not match if > any table/function is dropped when dump in progress. > *+Bootstrap Load:+* > * At the start of bootstrap load, will add one log with below details. > {color:#59afe1}* Database Name > * Dump directory > * Load Type (BOOTSTRAP) > * Total number of tables/views to load > * Total number of functions to load. > * Load Start Time{color} > * After each table load, will add a log as follows > {color:#59afe1}* Table/View Name > * Type (TABLE/VIEW/MATERIALIZED_VIEW) > * Table load completion time > * Table load progress. Format is Table sequence no/Total number of tables and > views.{color} > * After each function load, will add a log as follows > {color:#59afe1}* Function Name > * Function load completion time > * Function load progress. Format is Function sequence no/Total number of > functions.{color} > * After completion of all dumps, will add a log as follows to consolidate the > load. > {color:#59afe1}* Database Name. > * Load Type (BOOTSTRAP). > * Load End Time. > * Total number of tables/views loaded. > * Total number of functions loaded. > * Last Repl ID of the loaded database.{color} > *+Incremental Dump:+* > * At the start of database dump, will add one log with below details. > {color:#59afe1}* Database Name > * Dump Type (INCREMENTAL) > * (Estimated) Total number of events to dump. > * Dump Start Time{color} > * After each event dump, will add a log as follows > {color:#59afe1}* Event ID > * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc) > * Event dump end time > * Event dump progress. Format is Event sequence no/ (Estimated) Total number > of events.{color} > * After completion of all event dumps, will add a log as follows. > {color:#59afe1}* Database Name. > * Dump Type (INCREMENTAL). > * Dump End Time. > * (Actual) Total number of events dumped. > * Dump Directory. > * Last Repl ID of the dump.{color} > *Note:* The estimated number of events can be terribly inaccurate with actual > number as we don’t have the number of events upfront until we read from > metastore NotificationEvents table. > *+Incremental Load:+* > * At the start of incremental load, will add one log with below details. > {color:#59afe1}* Target Database Name > * Dump directory > * Load Type (INCREMENTAL) > * Total number of events to load > * Load Start Time{color} > * After each event load, will add a log as follows > {color:#59afe1}* Event ID > * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc) > * Event load end time > * Event load progress. Format is Event sequence no/ Total number of > events.{color} > * After completion of all event loads, will add a log as follows to > consolidate the load. > {color:#59afe
[jira] [Commented] (HIVE-17399) Do not remove semijoin branch if it feeds to TS->DPP_EVENT
[ https://issues.apache.org/jira/browse/HIVE-17399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146537#comment-16146537 ] Gopal V commented on HIVE-17399: [~djaiswal]: left some comments - I like this approach of marking branches as dead and removing it later instead of hitting the concurrent exceptions, but want to make sure we don't have un-dead branches because mutable state is hard to debug without a state machine. > Do not remove semijoin branch if it feeds to TS->DPP_EVENT > -- > > Key: HIVE-17399 > URL: https://issues.apache.org/jira/browse/HIVE-17399 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal > Attachments: HIVE-17399.1.patch > > > If there is an incoming semijoin branch to a TS which has DPP event, then try > to keep it as it may serve as an excellent filter for DPP thus reducing the > input to join drastically. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17399) Do not remove semijoin branch if it feeds to TS->DPP_EVENT
[ https://issues.apache.org/jira/browse/HIVE-17399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-17399: -- Attachment: HIVE-17399.1.patch [~gopalv] can you please review? > Do not remove semijoin branch if it feeds to TS->DPP_EVENT > -- > > Key: HIVE-17399 > URL: https://issues.apache.org/jira/browse/HIVE-17399 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal > Attachments: HIVE-17399.1.patch > > > If there is an incoming semijoin branch to a TS which has DPP event, then try > to keep it as it may serve as an excellent filter for DPP thus reducing the > input to join drastically. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work started] (HIVE-17399) Do not remove semijoin branch if it feeds to TS->DPP_EVENT
[ https://issues.apache.org/jira/browse/HIVE-17399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-17399 started by Deepak Jaiswal. - > Do not remove semijoin branch if it feeds to TS->DPP_EVENT > -- > > Key: HIVE-17399 > URL: https://issues.apache.org/jira/browse/HIVE-17399 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal > > If there is an incoming semijoin branch to a TS which has DPP event, then try > to keep it as it may serve as an excellent filter for DPP thus reducing the > input to join drastically. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17399) Do not remove semijoin branch if it feeds to TS->DPP_EVENT
[ https://issues.apache.org/jira/browse/HIVE-17399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-17399: -- Status: Patch Available (was: In Progress) > Do not remove semijoin branch if it feeds to TS->DPP_EVENT > -- > > Key: HIVE-17399 > URL: https://issues.apache.org/jira/browse/HIVE-17399 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal > > If there is an incoming semijoin branch to a TS which has DPP event, then try > to keep it as it may serve as an excellent filter for DPP thus reducing the > input to join drastically. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently
[ https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146512#comment-16146512 ] Hive QA commented on HIVE-16886: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12884341/HIVE-16886.6.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 11000 tests executed *Failed tests:* {noformat} TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) (batchId=280) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[transform_acid] (batchId=19) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=234) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=234) org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver (batchId=102) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6592/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6592/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6592/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12884341 - PreCommit-HIVE-Build > HMS log notifications may have duplicated event IDs if multiple HMS are > running concurrently > > > Key: HIVE-16886 > URL: https://issues.apache.org/jira/browse/HIVE-16886 > Project: Hive > Issue Type: Bug > Components: Hive, Metastore >Reporter: Sergio Peña >Assignee: anishek > Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch, > HIVE-16886.2.patch, HIVE-16886.3.patch, HIVE-16886.4.patch, > HIVE-16886.5.patch, HIVE-16886.6.patch > > > When running multiple Hive Metastore servers and DB notifications are > enabled, I could see that notifications can be persisted with a duplicated > event ID. > This does not happen when running multiple threads in a single HMS node due > to the locking acquired on the DbNotificationsLog class, but multiple HMS > could cause conflicts. > The issue is in the ObjectStore#addNotificationEvent() method. The event ID > fetched from the datastore is used for the new notification, incremented in > the server itself, then persisted or updated back to the datastore. If 2 > servers read the same ID, then these 2 servers write a new notification with > the same ID. > The event ID is not unique nor a primary key. > Here's a test case using the TestObjectStore class that confirms this issue: > {noformat} > @Test > public void testConcurrentAddNotifications() throws ExecutionException, > InterruptedException { > final int NUM_THREADS = 2; > CountDownLatch countIn = new CountDownLatch(NUM_THREADS); > CountDownLatch countOut = new CountDownLatch(1); > HiveConf conf = new HiveConf(); > conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, > MockPartitionExpressionProxy.class.getName()); > ExecutorService executorService = > Executors.newFixedThreadPool(NUM_THREADS); > FutureTask tasks[] = new FutureTask[NUM_THREADS]; > for (int i=0; i final int n = i; > tasks[i] = new FutureTask(new Callable() { > @Override > public Void call() throws Exception { > ObjectStore store = new ObjectStore(); > store.setConf(conf); > NotificationEvent dbEvent = > new NotificationEvent(0, 0, > EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n); > System.out.println("ADDING NOTIFICATION"); > countIn.countDown(); > countOut.await(); > store.addNotificationEvent(dbEvent); > System.out.println("FINISH NOTIFICATION"); > return null; > } > }); > executorService.execute(tasks[i]); > } > countIn.await(); > countOut.countDown(); > for (int i = 0; i < NUM_THREADS; ++i) { > tasks[i].get(); > } > NotificationEventResponse eventResponse = > objectStore.getNextNotification(new NotificationEventRequest()); > Assert.assertEquals(2, eventResponse.getEventsSize()); > Assert.assertEquals(1, eventResponse.getEve
[jira] [Updated] (HIVE-17323) Improve upon HIVE-16260
[ https://issues.apache.org/jira/browse/HIVE-17323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-17323: -- Attachment: HIVE-17323.5.patch > Improve upon HIVE-16260 > --- > > Key: HIVE-17323 > URL: https://issues.apache.org/jira/browse/HIVE-17323 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal > Attachments: HIVE-17323.1.patch, HIVE-17323.2.patch, > HIVE-17323.3.patch, HIVE-17323.4.patch, HIVE-17323.5.patch > > > HIVE-16260 allows removal of parallel edges of semijoin with mapjoins. > https://issues.apache.org/jira/browse/HIVE-16260 > However, it should also consider dynamic partition pruning edge like semijoin > without removing it while traversing the query tree. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17323) Improve upon HIVE-16260
[ https://issues.apache.org/jira/browse/HIVE-17323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146498#comment-16146498 ] Deepak Jaiswal commented on HIVE-17323: --- Redid the logic to handle operators with more than 1 parent while traversing upstream. > Improve upon HIVE-16260 > --- > > Key: HIVE-17323 > URL: https://issues.apache.org/jira/browse/HIVE-17323 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal > Attachments: HIVE-17323.1.patch, HIVE-17323.2.patch, > HIVE-17323.3.patch, HIVE-17323.4.patch, HIVE-17323.5.patch > > > HIVE-16260 allows removal of parallel edges of semijoin with mapjoins. > https://issues.apache.org/jira/browse/HIVE-16260 > However, it should also consider dynamic partition pruning edge like semijoin > without removing it while traversing the query tree. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17276) Check max shuffle size when converting to dynamically partitioned hash join
[ https://issues.apache.org/jira/browse/HIVE-17276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-17276: --- Attachment: HIVE-17276.03.patch > Check max shuffle size when converting to dynamically partitioned hash join > --- > > Key: HIVE-17276 > URL: https://issues.apache.org/jira/browse/HIVE-17276 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17276.01.patch, HIVE-17276.02.patch, > HIVE-17276.03.patch, HIVE-17276.patch > > > Currently we only check that the max number of entries in the hashmap for a > MapJoin surpasses a certain threshold to decide whether to execute a > dynamically partitioned hash join. > We would like to factor the size of the large input that we will shuffle for > the dynamically partitioned hash join into the cost model too. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17409) refactor LLAP ZK registry to make the ZK-registry part reusable
[ https://issues.apache.org/jira/browse/HIVE-17409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146481#comment-16146481 ] Hive QA commented on HIVE-17409: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12884330/HIVE-17409.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11014 tests executed *Failed tests:* {noformat} TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) (batchId=280) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=61) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=234) org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService.testExecuteStatementParallel (batchId=223) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6591/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6591/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6591/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12884330 - PreCommit-HIVE-Build > refactor LLAP ZK registry to make the ZK-registry part reusable > --- > > Key: HIVE-17409 > URL: https://issues.apache.org/jira/browse/HIVE-17409 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17409.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17405) HoS DPP ConstantPropagate should use ConstantPropagateOption.SHORTCUT
[ https://issues.apache.org/jira/browse/HIVE-17405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146478#comment-16146478 ] Rui Li commented on HIVE-17405: --- It seems this also fixes spark_vectorized_dynamic_partition_pruning.q > HoS DPP ConstantPropagate should use ConstantPropagateOption.SHORTCUT > - > > Key: HIVE-17405 > URL: https://issues.apache.org/jira/browse/HIVE-17405 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17405.1.patch, HIVE-17405.2.patch > > > In {{SparkCompiler#runDynamicPartitionPruning}} we should change {{new > ConstantPropagate().transform(parseContext)}} to {{new > ConstantPropagate(ConstantPropagateOption.SHORTCUT).transform(parseContext)}} > Hive-on-Tez does the same thing. > Running the full constant propagation isn't really necessary, we just want to > eliminate any {{and true}} predicates that were introduced by > {{SyntheticJoinPredicate}} and {{DynamicPartitionPruningOptimization}}. The > {{SyntheticJoinPredicate}} will introduce dummy filter predicates into the > operator tree, and {{DynamicPartitionPruningOptimization}} will replace them. > The predicates introduced via {{SyntheticJoinPredicate}} are necessary to > help {{DynamicPartitionPruningOptimization}} determine if DPP can be used or > not. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17409) refactor LLAP ZK registry to make the ZK-registry part reusable
[ https://issues.apache.org/jira/browse/HIVE-17409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146477#comment-16146477 ] Sergey Shelukhin commented on HIVE-17409: - Yes; not necessarily just takeover, sharing is also possible in active-active scenario where currently each HS2 has its own pool with all the problems resulting from that. That 's the WM part, in addition to some coordination like endpoint discovery. > refactor LLAP ZK registry to make the ZK-registry part reusable > --- > > Key: HIVE-17409 > URL: https://issues.apache.org/jira/browse/HIVE-17409 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17409.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17225) HoS DPP pruning sink ops can target parallel work objects
[ https://issues.apache.org/jira/browse/HIVE-17225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-17225: Attachment: HIVE-17225.4.patch > HoS DPP pruning sink ops can target parallel work objects > - > > Key: HIVE-17225 > URL: https://issues.apache.org/jira/browse/HIVE-17225 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: 3.0.0 >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE17225.1.patch, HIVE-17225.2.patch, > HIVE-17225.3.patch, HIVE-17225.4.patch > > > Setup: > {code:sql} > SET hive.spark.dynamic.partition.pruning=true; > SET hive.strict.checks.cartesian.product=false; > SET hive.auto.convert.join=true; > CREATE TABLE partitioned_table1 (col int) PARTITIONED BY (part_col int); > CREATE TABLE regular_table1 (col int); > CREATE TABLE regular_table2 (col int); > ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 1); > ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 2); > ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 3); > INSERT INTO table regular_table1 VALUES (1), (2), (3), (4), (5), (6); > INSERT INTO table regular_table2 VALUES (1), (2), (3), (4), (5), (6); > INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 1) VALUES (1); > INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 2) VALUES (2); > INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 3) VALUES (3); > SELECT * > FROM partitioned_table1, >regular_table1 rt1, >regular_table2 rt2 > WHERE rt1.col = partitioned_table1.part_col >AND rt2.col = partitioned_table1.part_col; > {code} > Exception: > {code} > 2017-08-01T13:27:47,483 ERROR [b0d354a8-4cdb-4ba9-acec-27d14926aaf4 main] > ql.Driver: FAILED: Execution Error, return code 3 from > org.apache.hadoop.hive.ql.exec.spark.SparkTask. java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.FileNotFoundException: File > file:/Users/stakiar/Documents/idea/apache-hive/itests/qtest-spark/target/tmp/scratchdir/stakiar/b0d354a8-4cdb-4ba9-acec-27d14926aaf4/hive_2017-08-01_13-27-45_553_1088589686371686526-1/-mr-10004/3/5 > does not exist > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:408) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:498) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:246) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:246) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:82) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:246) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:82) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) > at scala.
[jira] [Commented] (HIVE-17411) LLAP IO may incorrectly release a refcount in some rare cases
[ https://issues.apache.org/jira/browse/HIVE-17411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146446#comment-16146446 ] Sergey Shelukhin commented on HIVE-17411: - No, there's no consistent repro > LLAP IO may incorrectly release a refcount in some rare cases > - > > Key: HIVE-17411 > URL: https://issues.apache.org/jira/browse/HIVE-17411 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17411.patch > > > In a large stream whose buffers are not reused, and that is separated into > many CB (e.g. due to a small ORC compression buffer size), it may happen that > some, but not all, buffers that are read together as a unit are evicted from > cache. > If CacheBuffer follows BufferChunk in the buffer list when a stream like this > is read, the latter will be converted to ProcCacheChunk; it is possible for > early refcount release logic from the former to release the refcount (for a > dictionary stream, the initial refCount is always released early), and then > backtrack to the latter to see if we can unlock more buffers. It would then > to decref an uninitialized MemoryBuffer in ProcCacheChunk because > ProcCacheChunk looks like a CacheChunk. PCC initial refcounts are released > separately after the data is uncompressed. > I'm assuming this would almost never happen with non-stripe-level streams > because one would need a large RG to span 2+ CBs, no overlap with > next/previous RGs in 2+ buffers for the early release to kick in, and an > unfortunate eviction order. However it's possible with large-ish dictionaries. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17411) LLAP IO may incorrectly release a refcount in some rare cases
[ https://issues.apache.org/jira/browse/HIVE-17411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146439#comment-16146439 ] Prasanth Jayachandran commented on HIVE-17411: -- Not sure if there is repro for this issue. If there is, can this be tested by not projecting the large dictionary column? looks good otherwise +1 > LLAP IO may incorrectly release a refcount in some rare cases > - > > Key: HIVE-17411 > URL: https://issues.apache.org/jira/browse/HIVE-17411 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17411.patch > > > In a large stream whose buffers are not reused, and that is separated into > many CB (e.g. due to a small ORC compression buffer size), it may happen that > some, but not all, buffers that are read together as a unit are evicted from > cache. > If CacheBuffer follows BufferChunk in the buffer list when a stream like this > is read, the latter will be converted to ProcCacheChunk; it is possible for > early refcount release logic from the former to release the refcount (for a > dictionary stream, the initial refCount is always released early), and then > backtrack to the latter to see if we can unlock more buffers. It would then > to decref an uninitialized MemoryBuffer in ProcCacheChunk because > ProcCacheChunk looks like a CacheChunk. PCC initial refcounts are released > separately after the data is uncompressed. > I'm assuming this would almost never happen with non-stripe-level streams > because one would need a large RG to span 2+ CBs, no overlap with > next/previous RGs in 2+ buffers for the early release to kick in, and an > unfortunate eviction order. However it's possible with large-ish dictionaries. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17411) LLAP IO may incorrectly release a refcount in some rare cases
[ https://issues.apache.org/jira/browse/HIVE-17411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17411: Status: Patch Available (was: Open) > LLAP IO may incorrectly release a refcount in some rare cases > - > > Key: HIVE-17411 > URL: https://issues.apache.org/jira/browse/HIVE-17411 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17411.patch > > > In a large stream whose buffers are not reused, and that is separated into > many CB (e.g. due to a small ORC compression buffer size), it may happen that > some, but not all, buffers that are read together as a unit are evicted from > cache. > If CacheBuffer follows BufferChunk in the buffer list when a stream like this > is read, the latter will be converted to ProcCacheChunk; it is possible for > early refcount release logic from the former to release the refcount (for a > dictionary stream, the initial refCount is always released early), and then > backtrack to the latter to see if we can unlock more buffers. It would then > to decref an uninitialized MemoryBuffer in ProcCacheChunk because > ProcCacheChunk looks like a CacheChunk. PCC initial refcounts are released > separately after the data is uncompressed. > I'm assuming this would almost never happen with non-stripe-level streams > because one would need a large RG to span 2+ CBs, no overlap with > next/previous RGs in 2+ buffers for the early release to kick in, and an > unfortunate eviction order. However it's possible with large-ish dictionaries. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17411) LLAP IO may incorrectly release a refcount in some rare cases
[ https://issues.apache.org/jira/browse/HIVE-17411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17411: Description: In a large stream whose buffers are not reused, and that is separated into many CB (e.g. due to a small ORC compression buffer size), it may happen that some, but not all, buffers that are read together as a unit are evicted from cache. If CacheBuffer follows BufferChunk in the buffer list when a stream like this is read, the latter will be converted to ProcCacheChunk; it is possible for early refcount release logic from the former to release the refcount (for a dictionary stream, the initial refCount is always released early), and then backtrack to the latter to see if we can unlock more buffers. It would then to decref an uninitialized MemoryBuffer in ProcCacheChunk because ProcCacheChunk looks like a CacheChunk. PCC initial refcounts are released separately after the data is uncompressed. I'm assuming this would almost never happen with non-stripe-level streams because one would need a large RG to span 2+ CBs, no overlap with next/previous RGs in 2+ buffers for the early release to kick in, and an unfortunate eviction order. However it's possible with large-ish dictionaries. was: In a large stream whose buffers are not reused, and that is separated into many CB (e.g. due to a small ORC compression buffer size), it may happen that some, but not all, buffers that are read together as a unit are evicted from cache. If CacheBuffer follows BufferChunk in the buffer list when a stream like this is read, the latter will be converted to ProcCacheChunk; it is possible for early refcount release logic from the former to release the refcount (for a dictionary stream, the initial refCount is always released early), and then backtrack to the latter to see if we can unlock more buffers. It would then to decref an uninitialized MemoryBuffer in ProcCacheChunk because ProcCacheChunk looks like a CacheChunk. PCC initial refcounts are released separately after the data is uncompressed. I'm assuming it would almost never happen with non-stripe-level streams because one would need both very large RG to span 2+ CBs, no overlap with next/previous RGs in 2+ buffers for the early release to kick in, and an unfortunate eviction order. However it's possible with large-ish dictionaries. > LLAP IO may incorrectly release a refcount in some rare cases > - > > Key: HIVE-17411 > URL: https://issues.apache.org/jira/browse/HIVE-17411 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17411.patch > > > In a large stream whose buffers are not reused, and that is separated into > many CB (e.g. due to a small ORC compression buffer size), it may happen that > some, but not all, buffers that are read together as a unit are evicted from > cache. > If CacheBuffer follows BufferChunk in the buffer list when a stream like this > is read, the latter will be converted to ProcCacheChunk; it is possible for > early refcount release logic from the former to release the refcount (for a > dictionary stream, the initial refCount is always released early), and then > backtrack to the latter to see if we can unlock more buffers. It would then > to decref an uninitialized MemoryBuffer in ProcCacheChunk because > ProcCacheChunk looks like a CacheChunk. PCC initial refcounts are released > separately after the data is uncompressed. > I'm assuming this would almost never happen with non-stripe-level streams > because one would need a large RG to span 2+ CBs, no overlap with > next/previous RGs in 2+ buffers for the early release to kick in, and an > unfortunate eviction order. However it's possible with large-ish dictionaries. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17411) LLAP IO may incorrectly release a refcount in some rare cases
[ https://issues.apache.org/jira/browse/HIVE-17411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17411: Description: In a large stream whose buffers are not reused, and that is separated into many CB (e.g. due to a small ORC compression buffer size), it may happen that some, but not all, buffers that are read together as a unit are evicted from cache. If CacheBuffer follows BufferChunk in the buffer list when a stream like this is read, the latter will be converted to ProcCacheChunk; it is possible for early refcount release logic from the former to release the refcount (for a dictionary stream, the initial refCount is always released early), and then backtrack to the latter to see if we can unlock more buffers. It would then to decref an uninitialized MemoryBuffer in ProcCacheChunk because ProcCacheChunk looks like a CacheChunk. PCC initial refcounts are released separately after the data is uncompressed. I'm assuming it would almost never happen with non-stripe-level streams because one would need both very large RG to span 2+ CBs, no overlap with next/previous RGs in 2+ buffers for the early release to kick in, and an unfortunate eviction order. However it's possible with large-ish dictionaries. was: In a large stream whose buffers are not reused, separated into many buffers (e.g. due to a small ORC compression buffer size), it may happen that some, but not all, buffers that are read together as a unit are evicted from cache. If CacheBuffer follows BufferChunk in the buffer list, the latter will be converted to ProcCacheChunk; it is possible for early refcount release logic from the former to release the refcount (for a dictionary it would always be released cause by definition there's no reuse), and then backtrack to the latter, and try to decref an uninitialized MemoryBuffer in ProcCacheChunk because ProcCacheChunk looks like a CacheChunk. PCC initial refcounts are released separately after the data is uncompressed. I'm assuming it would almost never happen with non-stripe-level streams because one would need both very large RG to span 2+ CBs, no overlap with next/previous RGs in 2+ buffers for the early release to kick in, and an unfortunate eviction order. However it's possible with large-ish dictionaries. > LLAP IO may incorrectly release a refcount in some rare cases > - > > Key: HIVE-17411 > URL: https://issues.apache.org/jira/browse/HIVE-17411 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17411.patch > > > In a large stream whose buffers are not reused, and that is separated into > many CB (e.g. due to a small ORC compression buffer size), it may happen that > some, but not all, buffers that are read together as a unit are evicted from > cache. > If CacheBuffer follows BufferChunk in the buffer list when a stream like this > is read, the latter will be converted to ProcCacheChunk; it is possible for > early refcount release logic from the former to release the refcount (for a > dictionary stream, the initial refCount is always released early), and then > backtrack to the latter to see if we can unlock more buffers. It would then > to decref an uninitialized MemoryBuffer in ProcCacheChunk because > ProcCacheChunk looks like a CacheChunk. PCC initial refcounts are released > separately after the data is uncompressed. > I'm assuming it would almost never happen with non-stripe-level streams > because one would need both very large RG to span 2+ CBs, no overlap with > next/previous RGs in 2+ buffers for the early release to kick in, and an > unfortunate eviction order. However it's possible with large-ish dictionaries. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17411) LLAP IO may incorrectly release a refcount in some rare cases
[ https://issues.apache.org/jira/browse/HIVE-17411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17411: Attachment: HIVE-17411.patch [~prasanth_j] can you take a look? small bugfix > LLAP IO may incorrectly release a refcount in some rare cases > - > > Key: HIVE-17411 > URL: https://issues.apache.org/jira/browse/HIVE-17411 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17411.patch > > > In a large stream whose buffers are not reused, separated into many buffers > (e.g. due to a small ORC compression buffer size), it may happen that some, > but not all, buffers that are read together as a unit are evicted from cache. > If CacheBuffer follows BufferChunk in the buffer list, the latter will be > converted to ProcCacheChunk; it is possible for early refcount release logic > from the former to release the refcount (for a dictionary it would always be > released cause by definition there's no reuse), and then backtrack to the > latter, and try to decref an uninitialized MemoryBuffer in ProcCacheChunk > because ProcCacheChunk looks like a CacheChunk. PCC initial refcounts are > released separately after the data is uncompressed. > I'm assuming it would almost never happen with non-stripe-level streams > because one would need both very large RG to span 2+ CBs, no overlap with > next/previous RGs in 2+ buffers for the early release to kick in, and an > unfortunate eviction order. However it's possible with large-ish dictionaries. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17411) LLAP IO may incorrectly release a refcount in some rare cases
[ https://issues.apache.org/jira/browse/HIVE-17411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17411: Description: In a large stream whose buffers are not reused, separated into many buffers (e.g. due to a small ORC compression buffer size), it may happen that some, but not all, buffers that are read together as a unit are evicted from cache. If CacheBuffer follows BufferChunk in the buffer list, the latter will be converted to ProcCacheChunk; it is possible for early refcount release logic from the former to release the refcount (for a dictionary it would always be released cause by definition there's no reuse), and then backtrack to the latter, and try to decref an uninitialized MemoryBuffer in ProcCacheChunk because ProcCacheChunk looks like a CacheChunk. PCC initial refcounts are released separately after the data is uncompressed. I'm assuming it would almost never happen with non-stripe-level streams because one would need both very large RG to span 2+ CBs, no overlap with next/previous RGs in 2+ buffers for the early release to kick in, and an unfortunate eviction order. However it's possible with large-ish dictionaries. was: In a large stream whose buffers are not reused (e.g. a dictionary, that is locked once for all RGs), separated into many buffers (e.g. due to a small ORC compression buffer size), it may happen that some, but not all, buffers are evicted from cache. If CacheBuffer follows BufferChunk in the buffer list, the latter will be converted to ProcCacheChunk; it is possible for early refcount release logic from the former to release the refcount (for a dictionary it would always be released cause by definition there's no reuse), and then backtrack to the latter, and try to decref an uninitialized MemoryBuffer in ProcCacheChunk because ProcCacheChunk looks like a CacheChunk. PCC initial refcounts are released separately after the data is uncompressed. > LLAP IO may incorrectly release a refcount in some rare cases > - > > Key: HIVE-17411 > URL: https://issues.apache.org/jira/browse/HIVE-17411 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > > In a large stream whose buffers are not reused, separated into many buffers > (e.g. due to a small ORC compression buffer size), it may happen that some, > but not all, buffers that are read together as a unit are evicted from cache. > If CacheBuffer follows BufferChunk in the buffer list, the latter will be > converted to ProcCacheChunk; it is possible for early refcount release logic > from the former to release the refcount (for a dictionary it would always be > released cause by definition there's no reuse), and then backtrack to the > latter, and try to decref an uninitialized MemoryBuffer in ProcCacheChunk > because ProcCacheChunk looks like a CacheChunk. PCC initial refcounts are > released separately after the data is uncompressed. > I'm assuming it would almost never happen with non-stripe-level streams > because one would need both very large RG to span 2+ CBs, no overlap with > next/previous RGs in 2+ buffers for the early release to kick in, and an > unfortunate eviction order. However it's possible with large-ish dictionaries. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17411) LLAP IO may incorrectly release a refcount in some rare cases
[ https://issues.apache.org/jira/browse/HIVE-17411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17411: Summary: LLAP IO may incorrectly release a refcount in some rare cases (was: LLAP IO may incorrectly release a refcount in some cases) > LLAP IO may incorrectly release a refcount in some rare cases > - > > Key: HIVE-17411 > URL: https://issues.apache.org/jira/browse/HIVE-17411 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > > Not sure why this doesn't happen much more often, actually. > In a large stream whose buffers are not reused (e.g. a dictionary, that is > locked once for all RGs), separated into many buffers (e.g. due to a small > ORC compression buffer size), it may happen that some, but not all, buffers > are evicted from cache. > If CacheBuffer follows BufferChunk in the buffer list, the latter will be > converted to ProcCacheChunk; it is possible for early refcount release logic > from the former to release the refcount (for a dictionary it would always be > released cause by definition there's no reuse), and then backtrack to the > latter, and try to decref an uninitialized MemoryBuffer in ProcCacheChunk > because ProcCacheChunk looks like a CacheChunk. PCC initial refcounts are > released separately after the data is uncompressed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17411) LLAP IO may incorrectly release a refcount in some rare cases
[ https://issues.apache.org/jira/browse/HIVE-17411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17411: Description: In a large stream whose buffers are not reused (e.g. a dictionary, that is locked once for all RGs), separated into many buffers (e.g. due to a small ORC compression buffer size), it may happen that some, but not all, buffers are evicted from cache. If CacheBuffer follows BufferChunk in the buffer list, the latter will be converted to ProcCacheChunk; it is possible for early refcount release logic from the former to release the refcount (for a dictionary it would always be released cause by definition there's no reuse), and then backtrack to the latter, and try to decref an uninitialized MemoryBuffer in ProcCacheChunk because ProcCacheChunk looks like a CacheChunk. PCC initial refcounts are released separately after the data is uncompressed. was: Not sure why this doesn't happen much more often, actually. In a large stream whose buffers are not reused (e.g. a dictionary, that is locked once for all RGs), separated into many buffers (e.g. due to a small ORC compression buffer size), it may happen that some, but not all, buffers are evicted from cache. If CacheBuffer follows BufferChunk in the buffer list, the latter will be converted to ProcCacheChunk; it is possible for early refcount release logic from the former to release the refcount (for a dictionary it would always be released cause by definition there's no reuse), and then backtrack to the latter, and try to decref an uninitialized MemoryBuffer in ProcCacheChunk because ProcCacheChunk looks like a CacheChunk. PCC initial refcounts are released separately after the data is uncompressed. > LLAP IO may incorrectly release a refcount in some rare cases > - > > Key: HIVE-17411 > URL: https://issues.apache.org/jira/browse/HIVE-17411 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > > In a large stream whose buffers are not reused (e.g. a dictionary, that is > locked once for all RGs), separated into many buffers (e.g. due to a small > ORC compression buffer size), it may happen that some, but not all, buffers > are evicted from cache. > If CacheBuffer follows BufferChunk in the buffer list, the latter will be > converted to ProcCacheChunk; it is possible for early refcount release logic > from the former to release the refcount (for a dictionary it would always be > released cause by definition there's no reuse), and then backtrack to the > latter, and try to decref an uninitialized MemoryBuffer in ProcCacheChunk > because ProcCacheChunk looks like a CacheChunk. PCC initial refcounts are > released separately after the data is uncompressed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17411) LLAP IO may incorrectly release a refcount in some cases
[ https://issues.apache.org/jira/browse/HIVE-17411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-17411: --- > LLAP IO may incorrectly release a refcount in some cases > > > Key: HIVE-17411 > URL: https://issues.apache.org/jira/browse/HIVE-17411 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > > Not sure why this doesn't happen much more often, actually. > In a large stream whose buffers are not reused (e.g. a dictionary, that is > locked once for all RGs), separated into many buffers (e.g. due to a small > ORC compression buffer size), it may happen that some, but not all, buffers > are evicted from cache. > If CacheBuffer follows BufferChunk in the buffer list, the latter will be > converted to ProcCacheChunk; it is possible for early refcount release logic > from the former to release the refcount (for a dictionary it would always be > released cause by definition there's no reuse), and then backtrack to the > latter, and try to decref an uninitialized MemoryBuffer in ProcCacheChunk > because ProcCacheChunk looks like a CacheChunk. PCC initial refcounts are > released separately after the data is uncompressed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17408) replication distcp should only be invoked if number of files AND file size cross configured limits
[ https://issues.apache.org/jira/browse/HIVE-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146408#comment-16146408 ] Hive QA commented on HIVE-17408: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12884329/HIVE-17408.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11014 tests executed *Failed tests:* {noformat} TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) (batchId=280) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_3] (batchId=239) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=61) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=100) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=234) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6590/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6590/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6590/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12884329 - PreCommit-HIVE-Build > replication distcp should only be invoked if number of files AND file size > cross configured limits > -- > > Key: HIVE-17408 > URL: https://issues.apache.org/jira/browse/HIVE-17408 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek >Priority: Trivial > Fix For: 3.0.0 > > Attachments: HIVE-17408.1.patch > > > CopyUtils currently invokes distcp on whether > "hive.exec.copyfile.maxnumfiles" or "hive.exec.copyfile.maxsize" condition is > breached, should only be invoked when both are breached so should be AND > rather than OR. > distcp cannot do a distributed copy of a large single file hence more reason > to do the above change. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently
[ https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146384#comment-16146384 ] Alexander Kolbasov commented on HIVE-16886: --- Looks like SQLGenerator.createInsertValuesStmt() may be subject to SQL injection attack since it uses provided values to generate SQL statements. Perhaps it should use prepared statements instead? > HMS log notifications may have duplicated event IDs if multiple HMS are > running concurrently > > > Key: HIVE-16886 > URL: https://issues.apache.org/jira/browse/HIVE-16886 > Project: Hive > Issue Type: Bug > Components: Hive, Metastore >Reporter: Sergio Peña >Assignee: anishek > Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch, > HIVE-16886.2.patch, HIVE-16886.3.patch, HIVE-16886.4.patch, > HIVE-16886.5.patch, HIVE-16886.6.patch > > > When running multiple Hive Metastore servers and DB notifications are > enabled, I could see that notifications can be persisted with a duplicated > event ID. > This does not happen when running multiple threads in a single HMS node due > to the locking acquired on the DbNotificationsLog class, but multiple HMS > could cause conflicts. > The issue is in the ObjectStore#addNotificationEvent() method. The event ID > fetched from the datastore is used for the new notification, incremented in > the server itself, then persisted or updated back to the datastore. If 2 > servers read the same ID, then these 2 servers write a new notification with > the same ID. > The event ID is not unique nor a primary key. > Here's a test case using the TestObjectStore class that confirms this issue: > {noformat} > @Test > public void testConcurrentAddNotifications() throws ExecutionException, > InterruptedException { > final int NUM_THREADS = 2; > CountDownLatch countIn = new CountDownLatch(NUM_THREADS); > CountDownLatch countOut = new CountDownLatch(1); > HiveConf conf = new HiveConf(); > conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, > MockPartitionExpressionProxy.class.getName()); > ExecutorService executorService = > Executors.newFixedThreadPool(NUM_THREADS); > FutureTask tasks[] = new FutureTask[NUM_THREADS]; > for (int i=0; i final int n = i; > tasks[i] = new FutureTask(new Callable() { > @Override > public Void call() throws Exception { > ObjectStore store = new ObjectStore(); > store.setConf(conf); > NotificationEvent dbEvent = > new NotificationEvent(0, 0, > EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n); > System.out.println("ADDING NOTIFICATION"); > countIn.countDown(); > countOut.await(); > store.addNotificationEvent(dbEvent); > System.out.println("FINISH NOTIFICATION"); > return null; > } > }); > executorService.execute(tasks[i]); > } > countIn.await(); > countOut.countDown(); > for (int i = 0; i < NUM_THREADS; ++i) { > tasks[i].get(); > } > NotificationEventResponse eventResponse = > objectStore.getNextNotification(new NotificationEventRequest()); > Assert.assertEquals(2, eventResponse.getEventsSize()); > Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId()); > // This fails because the next notification has an event ID = 1 > Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId()); > } > {noformat} > The last assertion fails expecting an event ID 1 instead of 2. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17410) repl load task during subsequent DAG generation does not start from the last partition processed
[ https://issues.apache.org/jira/browse/HIVE-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146373#comment-16146373 ] anishek commented on HIVE-17410: [~sankarh]/[~thejas]/[~daijy] please review > repl load task during subsequent DAG generation does not start from the last > partition processed > > > Key: HIVE-17410 > URL: https://issues.apache.org/jira/browse/HIVE-17410 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Attachments: HIVE-17410.1.patch > > > DAG generation for repl load task was to be generated dynamically such that > if the load break happens at a partition load time then for subsequent runs > we should start post the last partition processed. > We currently identify the point from where we have to process the event but > reinitialize the iterator to start from beginning of all partition's to > process. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17410) repl load task during subsequent DAG generation does not start from the last partition processed
[ https://issues.apache.org/jira/browse/HIVE-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-17410: --- Attachment: HIVE-17410.1.patch > repl load task during subsequent DAG generation does not start from the last > partition processed > > > Key: HIVE-17410 > URL: https://issues.apache.org/jira/browse/HIVE-17410 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Attachments: HIVE-17410.1.patch > > > DAG generation for repl load task was to be generated dynamically such that > if the load break happens at a partition load time then for subsequent runs > we should start post the last partition processed. > We currently identify the point from where we have to process the event but > reinitialize the iterator to start from beginning of all partition's to > process. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17410) repl load task during subsequent DAG generation does not start from the last partition processed
[ https://issues.apache.org/jira/browse/HIVE-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-17410: --- Status: Patch Available (was: In Progress) > repl load task during subsequent DAG generation does not start from the last > partition processed > > > Key: HIVE-17410 > URL: https://issues.apache.org/jira/browse/HIVE-17410 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Attachments: HIVE-17410.1.patch > > > DAG generation for repl load task was to be generated dynamically such that > if the load break happens at a partition load time then for subsequent runs > we should start post the last partition processed. > We currently identify the point from where we have to process the event but > reinitialize the iterator to start from beginning of all partition's to > process. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work started] (HIVE-17410) repl load task during subsequent DAG generation does not start from the last partition processed
[ https://issues.apache.org/jira/browse/HIVE-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-17410 started by anishek. -- > repl load task during subsequent DAG generation does not start from the last > partition processed > > > Key: HIVE-17410 > URL: https://issues.apache.org/jira/browse/HIVE-17410 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Attachments: HIVE-17410.1.patch > > > DAG generation for repl load task was to be generated dynamically such that > if the load break happens at a partition load time then for subsequent runs > we should start post the last partition processed. > We currently identify the point from where we have to process the event but > reinitialize the iterator to start from beginning of all partition's to > process. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17323) Improve upon HIVE-16260
[ https://issues.apache.org/jira/browse/HIVE-17323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146340#comment-16146340 ] Hive QA commented on HIVE-17323: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12884326/HIVE-17323.4.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11014 tests executed *Failed tests:* {noformat} TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) (batchId=280) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_null_check] (batchId=4) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=234) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=234) org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testStoreFuncSimple (batchId=183) org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDate2 (batchId=183) org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDecimalXY (batchId=183) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6589/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6589/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6589/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12884326 - PreCommit-HIVE-Build > Improve upon HIVE-16260 > --- > > Key: HIVE-17323 > URL: https://issues.apache.org/jira/browse/HIVE-17323 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal > Attachments: HIVE-17323.1.patch, HIVE-17323.2.patch, > HIVE-17323.3.patch, HIVE-17323.4.patch > > > HIVE-16260 allows removal of parallel edges of semijoin with mapjoins. > https://issues.apache.org/jira/browse/HIVE-16260 > However, it should also consider dynamic partition pruning edge like semijoin > without removing it while traversing the query tree. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17307) Change the metastore to not use the metrics code in hive/common
[ https://issues.apache.org/jira/browse/HIVE-17307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-17307: -- Attachment: HIVE-17307.4.patch A new patch that addresses Vihang's comments from the PR. > Change the metastore to not use the metrics code in hive/common > --- > > Key: HIVE-17307 > URL: https://issues.apache.org/jira/browse/HIVE-17307 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-17307.2.patch, HIVE-17307.3.patch, > HIVE-17307.4.patch, HIVE-17307.patch > > > As we move code into the standalone metastore module, it cannot use the > metrics in hive-common. We could copy the current Metrics interface or we > could change the metastore code to directly use codahale metrics. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17307) Change the metastore to not use the metrics code in hive/common
[ https://issues.apache.org/jira/browse/HIVE-17307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-17307: -- Status: Patch Available (was: Open) > Change the metastore to not use the metrics code in hive/common > --- > > Key: HIVE-17307 > URL: https://issues.apache.org/jira/browse/HIVE-17307 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-17307.2.patch, HIVE-17307.3.patch, > HIVE-17307.4.patch, HIVE-17307.patch > > > As we move code into the standalone metastore module, it cannot use the > metrics in hive-common. We could copy the current Metrics interface or we > could change the metastore code to directly use codahale metrics. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17307) Change the metastore to not use the metrics code in hive/common
[ https://issues.apache.org/jira/browse/HIVE-17307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-17307: -- Status: Open (was: Patch Available) > Change the metastore to not use the metrics code in hive/common > --- > > Key: HIVE-17307 > URL: https://issues.apache.org/jira/browse/HIVE-17307 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-17307.2.patch, HIVE-17307.3.patch, > HIVE-17307.4.patch, HIVE-17307.patch > > > As we move code into the standalone metastore module, it cannot use the > metrics in hive-common. We could copy the current Metrics interface or we > could change the metastore code to directly use codahale metrics. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17409) refactor LLAP ZK registry to make the ZK-registry part reusable
[ https://issues.apache.org/jira/browse/HIVE-17409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146290#comment-16146290 ] Prasanth Jayachandran commented on HIVE-17409: -- bq. Are you talking about the YARN registry? I remember something about ACL leaks or something related to ACL that HS2 service discovery code and LLAP registry did. bq. AMRegistry is a sharing/heartbeating/ownership mechanism, not discovery. Is this also intended for HS2 taking over sessions from another HS2 (probably dead)? Untie-ing sessions and HS2 in general. > refactor LLAP ZK registry to make the ZK-registry part reusable > --- > > Key: HIVE-17409 > URL: https://issues.apache.org/jira/browse/HIVE-17409 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17409.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently
[ https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146257#comment-16146257 ] Hive QA commented on HIVE-16886: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12884341/HIVE-16886.6.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11014 tests executed *Failed tests:* {noformat} TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) (batchId=280) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_15] (batchId=82) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=234) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout (batchId=227) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6588/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6588/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6588/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12884341 - PreCommit-HIVE-Build > HMS log notifications may have duplicated event IDs if multiple HMS are > running concurrently > > > Key: HIVE-16886 > URL: https://issues.apache.org/jira/browse/HIVE-16886 > Project: Hive > Issue Type: Bug > Components: Hive, Metastore >Reporter: Sergio Peña >Assignee: anishek > Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch, > HIVE-16886.2.patch, HIVE-16886.3.patch, HIVE-16886.4.patch, > HIVE-16886.5.patch, HIVE-16886.6.patch > > > When running multiple Hive Metastore servers and DB notifications are > enabled, I could see that notifications can be persisted with a duplicated > event ID. > This does not happen when running multiple threads in a single HMS node due > to the locking acquired on the DbNotificationsLog class, but multiple HMS > could cause conflicts. > The issue is in the ObjectStore#addNotificationEvent() method. The event ID > fetched from the datastore is used for the new notification, incremented in > the server itself, then persisted or updated back to the datastore. If 2 > servers read the same ID, then these 2 servers write a new notification with > the same ID. > The event ID is not unique nor a primary key. > Here's a test case using the TestObjectStore class that confirms this issue: > {noformat} > @Test > public void testConcurrentAddNotifications() throws ExecutionException, > InterruptedException { > final int NUM_THREADS = 2; > CountDownLatch countIn = new CountDownLatch(NUM_THREADS); > CountDownLatch countOut = new CountDownLatch(1); > HiveConf conf = new HiveConf(); > conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, > MockPartitionExpressionProxy.class.getName()); > ExecutorService executorService = > Executors.newFixedThreadPool(NUM_THREADS); > FutureTask tasks[] = new FutureTask[NUM_THREADS]; > for (int i=0; i final int n = i; > tasks[i] = new FutureTask(new Callable() { > @Override > public Void call() throws Exception { > ObjectStore store = new ObjectStore(); > store.setConf(conf); > NotificationEvent dbEvent = > new NotificationEvent(0, 0, > EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n); > System.out.println("ADDING NOTIFICATION"); > countIn.countDown(); > countOut.await(); > store.addNotificationEvent(dbEvent); > System.out.println("FINISH NOTIFICATION"); > return null; > } > }); > executorService.execute(tasks[i]); > } > countIn.await(); > countOut.countDown(); > for (int i = 0; i < NUM_THREADS; ++i) { > tasks[i].get(); > } > NotificationEventResponse eventResponse = > objectStore.getNextNotification(new NotificationEventRequest()); > Assert.assertEquals(2, eventResponse.getEventsSize()); > Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId()); > // This fails because the next notification has an event ID = 1 >
[jira] [Commented] (HIVE-17409) refactor LLAP ZK registry to make the ZK-registry part reusable
[ https://issues.apache.org/jira/browse/HIVE-17409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146252#comment-16146252 ] Sergey Shelukhin commented on HIVE-17409: - LLAP ZK code is better and easier to reuse. I'm not sure why AM registry is more similar to HS2. HS2 code is to discover instances of a service in a load balancer type scenario. AMRegistry is a sharing/heartbeating/ownership mechanism, not discovery. In fact it's probably less similar to either HS2 or LLAP one than they are to each other. We can have a followup jira to refactor HS2 to use the same code. Code in HS2 is so dissimilar that they are unlikely to have shared bugs (other than general ones related to incorrect ZK usage or whatever). In any case both already exist. > refactor LLAP ZK registry to make the ZK-registry part reusable > --- > > Key: HIVE-17409 > URL: https://issues.apache.org/jira/browse/HIVE-17409 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17409.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17409) refactor LLAP ZK registry to make the ZK-registry part reusable
[ https://issues.apache.org/jira/browse/HIVE-17409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146245#comment-16146245 ] Gopal V commented on HIVE-17409: bq. But at some point both registry codes diverged and independent fixes went in each of them. Are you talking about the YARN registry? I don't think the HS2 Zk code shares any history with the LLAP registry code. > refactor LLAP ZK registry to make the ZK-registry part reusable > --- > > Key: HIVE-17409 > URL: https://issues.apache.org/jira/browse/HIVE-17409 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17409.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17409) refactor LLAP ZK registry to make the ZK-registry part reusable
[ https://issues.apache.org/jira/browse/HIVE-17409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146239#comment-16146239 ] Prasanth Jayachandran commented on HIVE-17409: -- I am not exactly sure about the full differences in the implementations. But at some point both registry codes diverged and independent fixes went in each of them. Also I think this shouldn't be tied to AM but sessions in general. I am fine as long as code is reused. Just that sessions seem to be closely related to HS2, I suggested to reuse HS2 registry code. > refactor LLAP ZK registry to make the ZK-registry part reusable > --- > > Key: HIVE-17409 > URL: https://issues.apache.org/jira/browse/HIVE-17409 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17409.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader
[ https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-17304: - Attachment: HIVE-17304.2.patch Rebased patch > ThreadMXBean based memory allocation monitory for hash table loader > --- > > Key: HIVE-17304 > URL: https://issues.apache.org/jira/browse/HIVE-17304 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17304.1.patch, HIVE-17304.2.patch > > > Hash table memory monitoring is based on java data model which can be > unreliable because of various reasons (wrong object size estimation, adding > new variables to any class without accounting its size for memory monitoring, > etc.). We can use allocation size per thread that is provided by ThreadMXBean > and fallback to DataModel in case if JDK doesn't support thread based > allocations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17409) refactor LLAP ZK registry to make the ZK-registry part reusable
[ https://issues.apache.org/jira/browse/HIVE-17409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146211#comment-16146211 ] Sergey Shelukhin commented on HIVE-17409: - The AM registry logic is more similar to LLAP registry than to service discovery. Service discovery code currently lives in 2 big methods in HiveServer2.java that are very HS2 specific and not reusable. Also they are missing some features of LLAP registry, such as convenient endpoint handling, ACL correctness checking, etc. If anything HS2 should reuse the new generic registry at some point :) > refactor LLAP ZK registry to make the ZK-registry part reusable > --- > > Key: HIVE-17409 > URL: https://issues.apache.org/jira/browse/HIVE-17409 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17409.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17409) refactor LLAP ZK registry to make the ZK-registry part reusable
[ https://issues.apache.org/jira/browse/HIVE-17409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146202#comment-16146202 ] Prasanth Jayachandran commented on HIVE-17409: -- The concern I have is around code duplication. There is HS2 service discovery registry and LLAP registry which sort of does similar stuff (LLAP does slightly more around secure/unsecure namespaces, slot znodes etc.). This patch moves LLAP registry code out to support AM/sessions registrations. IMO sessions/AM registrations will be a better fit in HS2 service discovery registry code. > refactor LLAP ZK registry to make the ZK-registry part reusable > --- > > Key: HIVE-17409 > URL: https://issues.apache.org/jira/browse/HIVE-17409 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17409.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17323) Improve upon HIVE-16260
[ https://issues.apache.org/jira/browse/HIVE-17323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146152#comment-16146152 ] Hive QA commented on HIVE-17323: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12884326/HIVE-17323.4.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11014 tests executed *Failed tests:* {noformat} TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) (batchId=280) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=61) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=234) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=234) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testConcurrentStatements (batchId=227) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6587/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6587/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6587/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12884326 - PreCommit-HIVE-Build > Improve upon HIVE-16260 > --- > > Key: HIVE-17323 > URL: https://issues.apache.org/jira/browse/HIVE-17323 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal > Attachments: HIVE-17323.1.patch, HIVE-17323.2.patch, > HIVE-17323.3.patch, HIVE-17323.4.patch > > > HIVE-16260 allows removal of parallel edges of semijoin with mapjoins. > https://issues.apache.org/jira/browse/HIVE-16260 > However, it should also consider dynamic partition pruning edge like semijoin > without removing it while traversing the query tree. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17410) repl load task during subsequent DAG generation does not start from the last partition processed
[ https://issues.apache.org/jira/browse/HIVE-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek reassigned HIVE-17410: -- Assignee: anishek > repl load task during subsequent DAG generation does not start from the last > partition processed > > > Key: HIVE-17410 > URL: https://issues.apache.org/jira/browse/HIVE-17410 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > > DAG generation for repl load task was to be generated dynamically such that > if the load break happens at a partition load time then for subsequent runs > we should start post the last partition processed. > We currently identify the point from where we have to process the event but > reinitialize the iterator to start from beginning of all partition's to > process. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently
[ https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16886: --- Attachment: HIVE-16886.6.patch fixing configuration object reference > HMS log notifications may have duplicated event IDs if multiple HMS are > running concurrently > > > Key: HIVE-16886 > URL: https://issues.apache.org/jira/browse/HIVE-16886 > Project: Hive > Issue Type: Bug > Components: Hive, Metastore >Reporter: Sergio Peña >Assignee: anishek > Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch, > HIVE-16886.2.patch, HIVE-16886.3.patch, HIVE-16886.4.patch, > HIVE-16886.5.patch, HIVE-16886.6.patch > > > When running multiple Hive Metastore servers and DB notifications are > enabled, I could see that notifications can be persisted with a duplicated > event ID. > This does not happen when running multiple threads in a single HMS node due > to the locking acquired on the DbNotificationsLog class, but multiple HMS > could cause conflicts. > The issue is in the ObjectStore#addNotificationEvent() method. The event ID > fetched from the datastore is used for the new notification, incremented in > the server itself, then persisted or updated back to the datastore. If 2 > servers read the same ID, then these 2 servers write a new notification with > the same ID. > The event ID is not unique nor a primary key. > Here's a test case using the TestObjectStore class that confirms this issue: > {noformat} > @Test > public void testConcurrentAddNotifications() throws ExecutionException, > InterruptedException { > final int NUM_THREADS = 2; > CountDownLatch countIn = new CountDownLatch(NUM_THREADS); > CountDownLatch countOut = new CountDownLatch(1); > HiveConf conf = new HiveConf(); > conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, > MockPartitionExpressionProxy.class.getName()); > ExecutorService executorService = > Executors.newFixedThreadPool(NUM_THREADS); > FutureTask tasks[] = new FutureTask[NUM_THREADS]; > for (int i=0; i final int n = i; > tasks[i] = new FutureTask(new Callable() { > @Override > public Void call() throws Exception { > ObjectStore store = new ObjectStore(); > store.setConf(conf); > NotificationEvent dbEvent = > new NotificationEvent(0, 0, > EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n); > System.out.println("ADDING NOTIFICATION"); > countIn.countDown(); > countOut.await(); > store.addNotificationEvent(dbEvent); > System.out.println("FINISH NOTIFICATION"); > return null; > } > }); > executorService.execute(tasks[i]); > } > countIn.await(); > countOut.countDown(); > for (int i = 0; i < NUM_THREADS; ++i) { > tasks[i].get(); > } > NotificationEventResponse eventResponse = > objectStore.getNextNotification(new NotificationEventRequest()); > Assert.assertEquals(2, eventResponse.getEventsSize()); > Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId()); > // This fails because the next notification has an event ID = 1 > Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId()); > } > {noformat} > The last assertion fails expecting an event ID 1 instead of 2. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently
[ https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16886: --- Attachment: (was: HIVE-16886.6.patch) > HMS log notifications may have duplicated event IDs if multiple HMS are > running concurrently > > > Key: HIVE-16886 > URL: https://issues.apache.org/jira/browse/HIVE-16886 > Project: Hive > Issue Type: Bug > Components: Hive, Metastore >Reporter: Sergio Peña >Assignee: anishek > Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch, > HIVE-16886.2.patch, HIVE-16886.3.patch, HIVE-16886.4.patch, HIVE-16886.5.patch > > > When running multiple Hive Metastore servers and DB notifications are > enabled, I could see that notifications can be persisted with a duplicated > event ID. > This does not happen when running multiple threads in a single HMS node due > to the locking acquired on the DbNotificationsLog class, but multiple HMS > could cause conflicts. > The issue is in the ObjectStore#addNotificationEvent() method. The event ID > fetched from the datastore is used for the new notification, incremented in > the server itself, then persisted or updated back to the datastore. If 2 > servers read the same ID, then these 2 servers write a new notification with > the same ID. > The event ID is not unique nor a primary key. > Here's a test case using the TestObjectStore class that confirms this issue: > {noformat} > @Test > public void testConcurrentAddNotifications() throws ExecutionException, > InterruptedException { > final int NUM_THREADS = 2; > CountDownLatch countIn = new CountDownLatch(NUM_THREADS); > CountDownLatch countOut = new CountDownLatch(1); > HiveConf conf = new HiveConf(); > conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, > MockPartitionExpressionProxy.class.getName()); > ExecutorService executorService = > Executors.newFixedThreadPool(NUM_THREADS); > FutureTask tasks[] = new FutureTask[NUM_THREADS]; > for (int i=0; i final int n = i; > tasks[i] = new FutureTask(new Callable() { > @Override > public Void call() throws Exception { > ObjectStore store = new ObjectStore(); > store.setConf(conf); > NotificationEvent dbEvent = > new NotificationEvent(0, 0, > EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n); > System.out.println("ADDING NOTIFICATION"); > countIn.countDown(); > countOut.await(); > store.addNotificationEvent(dbEvent); > System.out.println("FINISH NOTIFICATION"); > return null; > } > }); > executorService.execute(tasks[i]); > } > countIn.await(); > countOut.countDown(); > for (int i = 0; i < NUM_THREADS; ++i) { > tasks[i].get(); > } > NotificationEventResponse eventResponse = > objectStore.getNextNotification(new NotificationEventRequest()); > Assert.assertEquals(2, eventResponse.getEventsSize()); > Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId()); > // This fails because the next notification has an event ID = 1 > Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId()); > } > {noformat} > The last assertion fails expecting an event ID 1 instead of 2. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently
[ https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16886: --- Attachment: HIVE-16886.6.patch fixing the Configuration object reference > HMS log notifications may have duplicated event IDs if multiple HMS are > running concurrently > > > Key: HIVE-16886 > URL: https://issues.apache.org/jira/browse/HIVE-16886 > Project: Hive > Issue Type: Bug > Components: Hive, Metastore >Reporter: Sergio Peña >Assignee: anishek > Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch, > HIVE-16886.2.patch, HIVE-16886.3.patch, HIVE-16886.4.patch, HIVE-16886.5.patch > > > When running multiple Hive Metastore servers and DB notifications are > enabled, I could see that notifications can be persisted with a duplicated > event ID. > This does not happen when running multiple threads in a single HMS node due > to the locking acquired on the DbNotificationsLog class, but multiple HMS > could cause conflicts. > The issue is in the ObjectStore#addNotificationEvent() method. The event ID > fetched from the datastore is used for the new notification, incremented in > the server itself, then persisted or updated back to the datastore. If 2 > servers read the same ID, then these 2 servers write a new notification with > the same ID. > The event ID is not unique nor a primary key. > Here's a test case using the TestObjectStore class that confirms this issue: > {noformat} > @Test > public void testConcurrentAddNotifications() throws ExecutionException, > InterruptedException { > final int NUM_THREADS = 2; > CountDownLatch countIn = new CountDownLatch(NUM_THREADS); > CountDownLatch countOut = new CountDownLatch(1); > HiveConf conf = new HiveConf(); > conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, > MockPartitionExpressionProxy.class.getName()); > ExecutorService executorService = > Executors.newFixedThreadPool(NUM_THREADS); > FutureTask tasks[] = new FutureTask[NUM_THREADS]; > for (int i=0; i final int n = i; > tasks[i] = new FutureTask(new Callable() { > @Override > public Void call() throws Exception { > ObjectStore store = new ObjectStore(); > store.setConf(conf); > NotificationEvent dbEvent = > new NotificationEvent(0, 0, > EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n); > System.out.println("ADDING NOTIFICATION"); > countIn.countDown(); > countOut.await(); > store.addNotificationEvent(dbEvent); > System.out.println("FINISH NOTIFICATION"); > return null; > } > }); > executorService.execute(tasks[i]); > } > countIn.await(); > countOut.countDown(); > for (int i = 0; i < NUM_THREADS; ++i) { > tasks[i].get(); > } > NotificationEventResponse eventResponse = > objectStore.getNextNotification(new NotificationEventRequest()); > Assert.assertEquals(2, eventResponse.getEventsSize()); > Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId()); > // This fails because the next notification has an event ID = 1 > Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId()); > } > {noformat} > The last assertion fails expecting an event ID 1 instead of 2. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader
[ https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146111#comment-16146111 ] Sergey Shelukhin commented on HIVE-17304: - +1 with some testing > ThreadMXBean based memory allocation monitory for hash table loader > --- > > Key: HIVE-17304 > URL: https://issues.apache.org/jira/browse/HIVE-17304 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17304.1.patch > > > Hash table memory monitoring is based on java data model which can be > unreliable because of various reasons (wrong object size estimation, adding > new variables to any class without accounting its size for memory monitoring, > etc.). We can use allocation size per thread that is provided by ThreadMXBean > and fallback to DataModel in case if JDK doesn't support thread based > allocations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader
[ https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146107#comment-16146107 ] Prasanth Jayachandran commented on HIVE-17304: -- The config changed because we are often very close to estimates in most of the cases (vectorized atleast). I have seen some heapdumps with 2GB hash tables and estimates from log lines are also very close to 2GB (<5%). Initial 2x factor was added earlier primarily for non-vectorized cases + object overhead + key/value size misestimation. Also 2x factor is after memory overscription which already gives some more room for hash tables. With this patch even in non-vectorized case we are pretty close when ThreadMXBean info is used. The idea is to get close to noconditional task size + oversubscribed memory. So relaxed it to 1.5x :) > ThreadMXBean based memory allocation monitory for hash table loader > --- > > Key: HIVE-17304 > URL: https://issues.apache.org/jira/browse/HIVE-17304 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17304.1.patch > > > Hash table memory monitoring is based on java data model which can be > unreliable because of various reasons (wrong object size estimation, adding > new variables to any class without accounting its size for memory monitoring, > etc.). We can use allocation size per thread that is provided by ThreadMXBean > and fallback to DataModel in case if JDK doesn't support thread based > allocations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17239) HoS doesn't trigger mapjoins against subquery with union all
[ https://issues.apache.org/jira/browse/HIVE-17239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-17239: Issue Type: Sub-task (was: Bug) Parent: HIVE-16923 > HoS doesn't trigger mapjoins against subquery with union all > > > Key: HIVE-17239 > URL: https://issues.apache.org/jira/browse/HIVE-17239 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar > > HoS doesn't trigger map-joins for the following query: > {code} > EXPLAIN SELECT * FROM (SELECT part_col FROM partitioned_table1 UNION ALL > SELECT > part_col FROM partitioned_table2) q1 JOIN regular_table1 JOIN regular_table2 > WHERE q1.part_col = regular_table1.col1 AND q1.part_col = regular_table2.col1; > {code} > Hive-on-Tez does. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17409) refactor LLAP ZK registry to make the ZK-registry part reusable
[ https://issues.apache.org/jira/browse/HIVE-17409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146103#comment-16146103 ] Sergey Shelukhin commented on HIVE-17409: - Seems to work ok on the cluster > refactor LLAP ZK registry to make the ZK-registry part reusable > --- > > Key: HIVE-17409 > URL: https://issues.apache.org/jira/browse/HIVE-17409 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17409.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146102#comment-16146102 ] Sergey Shelukhin commented on HIVE-17006: - [~hagleitn] ping? > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.01.patch, HIVE-17006.02.patch, > HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader
[ https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146089#comment-16146089 ] Sergey Shelukhin commented on HIVE-17304: - Why did the config change? Otherwise looks good. Might need some realistic testing. > ThreadMXBean based memory allocation monitory for hash table loader > --- > > Key: HIVE-17304 > URL: https://issues.apache.org/jira/browse/HIVE-17304 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17304.1.patch > > > Hash table memory monitoring is based on java data model which can be > unreliable because of various reasons (wrong object size estimation, adding > new variables to any class without accounting its size for memory monitoring, > etc.). We can use allocation size per thread that is provided by ThreadMXBean > and fallback to DataModel in case if JDK doesn't support thread based > allocations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17408) replication distcp should only be invoked if number of files AND file size cross configured limits
[ https://issues.apache.org/jira/browse/HIVE-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146071#comment-16146071 ] Thejas M Nair commented on HIVE-17408: -- +1 > replication distcp should only be invoked if number of files AND file size > cross configured limits > -- > > Key: HIVE-17408 > URL: https://issues.apache.org/jira/browse/HIVE-17408 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek >Priority: Trivial > Fix For: 3.0.0 > > Attachments: HIVE-17408.1.patch > > > CopyUtils currently invokes distcp on whether > "hive.exec.copyfile.maxnumfiles" or "hive.exec.copyfile.maxsize" condition is > breached, should only be invoked when both are breached so should be AND > rather than OR. > distcp cannot do a distributed copy of a large single file hence more reason > to do the above change. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently
[ https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146028#comment-16146028 ] Hive QA commented on HIVE-16886: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12884316/HIVE-16886.5.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 80 failed/errored test(s), 11000 tests executed *Failed tests:* {noformat} TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) (batchId=280) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=61) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=234) org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver (batchId=109) org.apache.hadoop.hive.metastore.TestObjectStore.testNotificationOps (batchId=201) org.apache.hadoop.hive.ql.parse.TestExport.shouldExportImportATemporaryTable (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testAlters (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testBasic (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testBasicWithCM (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testBootstrapLoadOnExistingDb (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testBootstrapWithConcurrentDropPartition (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testBootstrapWithConcurrentDropTable (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConcatenatePartitionedTable (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConcatenateTable (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testDrops (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testDropsWithCM (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testDumpLimit (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testEventTypesForDynamicAddPartitionByInsert (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testExchangePartition (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalAdds (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalInsertDropPartitionedTable (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalInsertDropUnpartitionedTable (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalInsertToPartition (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalInserts (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalLoad (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalLoadFailAndRetry (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalLoadWithVariableLengthEventId (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalRepeatEventOnExistingObject (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testIncrementalRepeatEventOnMissingObject (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testInsertOverwriteOnPartitionedTableWithCM (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testInsertOverwriteOnUnpartitionedTableWithCM (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testInsertToMultiKeyPartition (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testRenamePartitionWithCM (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testRenameTableWithCM (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testStatus (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testTruncatePartitionedTable (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testTruncateTable (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testTruncateWithCM (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testViewsReplication (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=218) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
[jira] [Resolved] (HIVE-17298) export when running distcp for large number of files should not run as privileged user
[ https://issues.apache.org/jira/browse/HIVE-17298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek resolved HIVE-17298. Resolution: Duplicate > export when running distcp for large number of files should not run as > privileged user > --- > > Key: HIVE-17298 > URL: https://issues.apache.org/jira/browse/HIVE-17298 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > Attachments: HIVE-17298.1.patch > > > Export command when encounters a large number of files or large size of files > it invokes distcp. > distcp is run as privileged user with user taken from config > hive.distcp.privileged.doAs, this should not be the case, it should not run > distcp as privileged user. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17409) refactor LLAP ZK registry to make the ZK-registry part reusable
[ https://issues.apache.org/jira/browse/HIVE-17409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145999#comment-16145999 ] Sergey Shelukhin commented on HIVE-17409: - I'm going to test on a cluster > refactor LLAP ZK registry to make the ZK-registry part reusable > --- > > Key: HIVE-17409 > URL: https://issues.apache.org/jira/browse/HIVE-17409 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17409.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-17409) refactor LLAP ZK registry to make the ZK-registry part reusable
[ https://issues.apache.org/jira/browse/HIVE-17409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145991#comment-16145991 ] Sergey Shelukhin edited comment on HIVE-17409 at 8/29/17 7:50 PM: -- [~prasanth_j] can you take a look? This moves code around pretty much, and splits some classes. Also DynamicInstanceSet changed to just call parent methods, since it was anyway just using bunch of parent fields for most things, except instanceCache itself for some reason... I've preserved separate reference to instanceCache for now. was (Author: sershe): [~prasanth_j] can you take a look? This moves code around pretty much, and splits some classes > refactor LLAP ZK registry to make the ZK-registry part reusable > --- > > Key: HIVE-17409 > URL: https://issues.apache.org/jira/browse/HIVE-17409 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17409.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17409) refactor LLAP ZK registry to make the ZK-registry part reusable
[ https://issues.apache.org/jira/browse/HIVE-17409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17409: Status: Patch Available (was: Open) > refactor LLAP ZK registry to make the ZK-registry part reusable > --- > > Key: HIVE-17409 > URL: https://issues.apache.org/jira/browse/HIVE-17409 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17409.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17409) refactor LLAP ZK registry to make the ZK-registry part reusable
[ https://issues.apache.org/jira/browse/HIVE-17409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17409: Attachment: HIVE-17409.patch > refactor LLAP ZK registry to make the ZK-registry part reusable > --- > > Key: HIVE-17409 > URL: https://issues.apache.org/jira/browse/HIVE-17409 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17409.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17408) replication distcp should only be invoked if number of files AND file size cross configured limits
[ https://issues.apache.org/jira/browse/HIVE-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145975#comment-16145975 ] anishek commented on HIVE-17408: [~sankarh]/[~thejas]/[~daijy] please review > replication distcp should only be invoked if number of files AND file size > cross configured limits > -- > > Key: HIVE-17408 > URL: https://issues.apache.org/jira/browse/HIVE-17408 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek >Priority: Trivial > Fix For: 3.0.0 > > Attachments: HIVE-17408.1.patch > > > CopyUtils currently invokes distcp on whether > "hive.exec.copyfile.maxnumfiles" or "hive.exec.copyfile.maxsize" condition is > breached, should only be invoked when both are breached so should be AND > rather than OR. > distcp cannot do a distributed copy of a large single file hence more reason > to do the above change. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17408) replication distcp should only be invoked if number of files AND file size cross configured limits
[ https://issues.apache.org/jira/browse/HIVE-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-17408: --- Status: Patch Available (was: In Progress) > replication distcp should only be invoked if number of files AND file size > cross configured limits > -- > > Key: HIVE-17408 > URL: https://issues.apache.org/jira/browse/HIVE-17408 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek >Priority: Trivial > Fix For: 3.0.0 > > Attachments: HIVE-17408.1.patch > > > CopyUtils currently invokes distcp on whether > "hive.exec.copyfile.maxnumfiles" or "hive.exec.copyfile.maxsize" condition is > breached, should only be invoked when both are breached so should be AND > rather than OR. > distcp cannot do a distributed copy of a large single file hence more reason > to do the above change. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17408) replication distcp should only be invoked if number of files AND file size cross configured limits
[ https://issues.apache.org/jira/browse/HIVE-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-17408: --- Attachment: HIVE-17408.1.patch > replication distcp should only be invoked if number of files AND file size > cross configured limits > -- > > Key: HIVE-17408 > URL: https://issues.apache.org/jira/browse/HIVE-17408 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek >Priority: Trivial > Fix For: 3.0.0 > > Attachments: HIVE-17408.1.patch > > > CopyUtils currently invokes distcp on whether > "hive.exec.copyfile.maxnumfiles" or "hive.exec.copyfile.maxsize" condition is > breached, should only be invoked when both are breached so should be AND > rather than OR. > distcp cannot do a distributed copy of a large single file hence more reason > to do the above change. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17409) refactor LLAP ZK registry to make the ZK-registry part reusable
[ https://issues.apache.org/jira/browse/HIVE-17409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-17409: --- > refactor LLAP ZK registry to make the ZK-registry part reusable > --- > > Key: HIVE-17409 > URL: https://issues.apache.org/jira/browse/HIVE-17409 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work started] (HIVE-17408) replication distcp should only be invoked if number of files AND file size cross configured limits
[ https://issues.apache.org/jira/browse/HIVE-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-17408 started by anishek. -- > replication distcp should only be invoked if number of files AND file size > cross configured limits > -- > > Key: HIVE-17408 > URL: https://issues.apache.org/jira/browse/HIVE-17408 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek >Priority: Trivial > Fix For: 3.0.0 > > Attachments: HIVE-17408.1.patch > > > CopyUtils currently invokes distcp on whether > "hive.exec.copyfile.maxnumfiles" or "hive.exec.copyfile.maxsize" condition is > breached, should only be invoked when both are breached so should be AND > rather than OR. > distcp cannot do a distributed copy of a large single file hence more reason > to do the above change. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16924) Support distinct in presence Gby
[ https://issues.apache.org/jira/browse/HIVE-16924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-16924: Description: {code:sql} create table e011_01 (c1 int, c2 smallint); insert into e011_01 values (1, 1), (2, 2); {code} These queries should work: {code:sql} select distinct c1, count(*) from e011_01 group by c1; select distinct c1, avg(c2) from e011_01 group by c1; {code} Currently, you get : FAILED: SemanticException 1:52 SELECT DISTINCT and GROUP BY can not be in the same query. Error encountered near token 'c1' was: create table e011_01 (c1 int, c2 smallint); insert into e011_01 values (1, 1), (2, 2); These queries should work: select distinct c1, count(*) from e011_01 group by c1; select distinct c1, avg(c2) from e011_01 group by c1; Currently, you get : FAILED: SemanticException 1:52 SELECT DISTINCT and GROUP BY can not be in the same query. Error encountered near token 'c1' > Support distinct in presence Gby > - > > Key: HIVE-16924 > URL: https://issues.apache.org/jira/browse/HIVE-16924 > Project: Hive > Issue Type: New Feature > Components: Query Planning >Reporter: Carter Shanklin >Assignee: Julian Hyde > Attachments: HIVE-16924.01.patch > > > {code:sql} > create table e011_01 (c1 int, c2 smallint); > insert into e011_01 values (1, 1), (2, 2); > {code} > These queries should work: > {code:sql} > select distinct c1, count(*) from e011_01 group by c1; > select distinct c1, avg(c2) from e011_01 group by c1; > {code} > Currently, you get : > FAILED: SemanticException 1:52 SELECT DISTINCT and GROUP BY can not be in the > same query. Error encountered near token 'c1' -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17323) Improve upon HIVE-16260
[ https://issues.apache.org/jira/browse/HIVE-17323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-17323: -- Attachment: HIVE-17323.4.patch Updated the test to work with and without semijoin reduction for better result comparison. > Improve upon HIVE-16260 > --- > > Key: HIVE-17323 > URL: https://issues.apache.org/jira/browse/HIVE-17323 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal > Attachments: HIVE-17323.1.patch, HIVE-17323.2.patch, > HIVE-17323.3.patch, HIVE-17323.4.patch > > > HIVE-16260 allows removal of parallel edges of semijoin with mapjoins. > https://issues.apache.org/jira/browse/HIVE-16260 > However, it should also consider dynamic partition pruning edge like semijoin > without removing it while traversing the query tree. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17276) Check max shuffle size when converting to dynamically partitioned hash join
[ https://issues.apache.org/jira/browse/HIVE-17276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145938#comment-16145938 ] Hive QA commented on HIVE-17276: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12884305/HIVE-17276.02.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 28 failed/errored test(s), 11014 tests executed *Failed tests:* {noformat} TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) (batchId=280) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=61) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_join29] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_join30] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_join_filters] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_join_nulls] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_sortmerge_join_14] (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_sortmerge_join_15] (batchId=147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez2] (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_partition_pruning] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1] (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_max_hashtable] (batchId=146) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mapjoin2] (batchId=147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mapjoin46] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_union] (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_join30] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_join_filters] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_join_nulls] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_leftsemi_mapjoin] (batchId=147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_nullsafe_join] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_outer_join0] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_dynamic_partition_pruning] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_join46] (batchId=155) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=234) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=234) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout (batchId=227) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6585/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6585/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6585/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 28 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12884305 - PreCommit-HIVE-Build > Check max shuffle size when converting to dynamically partitioned hash join > --- > > Key: HIVE-17276 > URL: https://issues.apache.org/jira/browse/HIVE-17276 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17276.01.patch, HIVE-17276.02.patch, > HIVE-17276.patch > > > Currently we only check that the max number of entries in the hashmap for a > MapJoin surpasses a certain threshold to decide whether to execute a > dynamically partitioned hash join. > We would like to factor the size of the large input that we will shuffle for > the dynamically partitioned hash join into the cost model too. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17408) replication distcp should only be invoked if number of files AND file size cross configured limits
[ https://issues.apache.org/jira/browse/HIVE-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek reassigned HIVE-17408: -- > replication distcp should only be invoked if number of files AND file size > cross configured limits > -- > > Key: HIVE-17408 > URL: https://issues.apache.org/jira/browse/HIVE-17408 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek >Priority: Trivial > Fix For: 3.0.0 > > > CopyUtils currently invokes distcp on whether > "hive.exec.copyfile.maxnumfiles" or "hive.exec.copyfile.maxsize" condition is > breached, should only be invoked when both are breached so should be AND > rather than OR. > distcp cannot do a distributed copy of a large single file hence more reason > to do the above change. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently
[ https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16886: --- Attachment: HIVE-16886.5.patch configurations for sleep and retry intervals for notification sequence lock > HMS log notifications may have duplicated event IDs if multiple HMS are > running concurrently > > > Key: HIVE-16886 > URL: https://issues.apache.org/jira/browse/HIVE-16886 > Project: Hive > Issue Type: Bug > Components: Hive, Metastore >Reporter: Sergio Peña >Assignee: anishek > Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch, > HIVE-16886.2.patch, HIVE-16886.3.patch, HIVE-16886.4.patch, HIVE-16886.5.patch > > > When running multiple Hive Metastore servers and DB notifications are > enabled, I could see that notifications can be persisted with a duplicated > event ID. > This does not happen when running multiple threads in a single HMS node due > to the locking acquired on the DbNotificationsLog class, but multiple HMS > could cause conflicts. > The issue is in the ObjectStore#addNotificationEvent() method. The event ID > fetched from the datastore is used for the new notification, incremented in > the server itself, then persisted or updated back to the datastore. If 2 > servers read the same ID, then these 2 servers write a new notification with > the same ID. > The event ID is not unique nor a primary key. > Here's a test case using the TestObjectStore class that confirms this issue: > {noformat} > @Test > public void testConcurrentAddNotifications() throws ExecutionException, > InterruptedException { > final int NUM_THREADS = 2; > CountDownLatch countIn = new CountDownLatch(NUM_THREADS); > CountDownLatch countOut = new CountDownLatch(1); > HiveConf conf = new HiveConf(); > conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, > MockPartitionExpressionProxy.class.getName()); > ExecutorService executorService = > Executors.newFixedThreadPool(NUM_THREADS); > FutureTask tasks[] = new FutureTask[NUM_THREADS]; > for (int i=0; i final int n = i; > tasks[i] = new FutureTask(new Callable() { > @Override > public Void call() throws Exception { > ObjectStore store = new ObjectStore(); > store.setConf(conf); > NotificationEvent dbEvent = > new NotificationEvent(0, 0, > EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n); > System.out.println("ADDING NOTIFICATION"); > countIn.countDown(); > countOut.await(); > store.addNotificationEvent(dbEvent); > System.out.println("FINISH NOTIFICATION"); > return null; > } > }); > executorService.execute(tasks[i]); > } > countIn.await(); > countOut.countDown(); > for (int i = 0; i < NUM_THREADS; ++i) { > tasks[i].get(); > } > NotificationEventResponse eventResponse = > objectStore.getNextNotification(new NotificationEventRequest()); > Assert.assertEquals(2, eventResponse.getEventsSize()); > Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId()); > // This fails because the next notification has an event ID = 1 > Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId()); > } > {noformat} > The last assertion fails expecting an event ID 1 instead of 2. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently
[ https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16886: --- Attachment: (was: HIVE-16886.5.patch) > HMS log notifications may have duplicated event IDs if multiple HMS are > running concurrently > > > Key: HIVE-16886 > URL: https://issues.apache.org/jira/browse/HIVE-16886 > Project: Hive > Issue Type: Bug > Components: Hive, Metastore >Reporter: Sergio Peña >Assignee: anishek > Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch, > HIVE-16886.2.patch, HIVE-16886.3.patch, HIVE-16886.4.patch, HIVE-16886.5.patch > > > When running multiple Hive Metastore servers and DB notifications are > enabled, I could see that notifications can be persisted with a duplicated > event ID. > This does not happen when running multiple threads in a single HMS node due > to the locking acquired on the DbNotificationsLog class, but multiple HMS > could cause conflicts. > The issue is in the ObjectStore#addNotificationEvent() method. The event ID > fetched from the datastore is used for the new notification, incremented in > the server itself, then persisted or updated back to the datastore. If 2 > servers read the same ID, then these 2 servers write a new notification with > the same ID. > The event ID is not unique nor a primary key. > Here's a test case using the TestObjectStore class that confirms this issue: > {noformat} > @Test > public void testConcurrentAddNotifications() throws ExecutionException, > InterruptedException { > final int NUM_THREADS = 2; > CountDownLatch countIn = new CountDownLatch(NUM_THREADS); > CountDownLatch countOut = new CountDownLatch(1); > HiveConf conf = new HiveConf(); > conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, > MockPartitionExpressionProxy.class.getName()); > ExecutorService executorService = > Executors.newFixedThreadPool(NUM_THREADS); > FutureTask tasks[] = new FutureTask[NUM_THREADS]; > for (int i=0; i final int n = i; > tasks[i] = new FutureTask(new Callable() { > @Override > public Void call() throws Exception { > ObjectStore store = new ObjectStore(); > store.setConf(conf); > NotificationEvent dbEvent = > new NotificationEvent(0, 0, > EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n); > System.out.println("ADDING NOTIFICATION"); > countIn.countDown(); > countOut.await(); > store.addNotificationEvent(dbEvent); > System.out.println("FINISH NOTIFICATION"); > return null; > } > }); > executorService.execute(tasks[i]); > } > countIn.await(); > countOut.countDown(); > for (int i = 0; i < NUM_THREADS; ++i) { > tasks[i].get(); > } > NotificationEventResponse eventResponse = > objectStore.getNextNotification(new NotificationEventRequest()); > Assert.assertEquals(2, eventResponse.getEventsSize()); > Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId()); > // This fails because the next notification has an event ID = 1 > Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId()); > } > {noformat} > The last assertion fails expecting an event ID 1 instead of 2. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17373) Upgrade some dependency versions
[ https://issues.apache.org/jira/browse/HIVE-17373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145866#comment-16145866 ] Naveen Gangam commented on HIVE-17373: -- The patch looks good to me. +1 for me. > Upgrade some dependency versions > > > Key: HIVE-17373 > URL: https://issues.apache.org/jira/browse/HIVE-17373 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-17373.1.patch, HIVE-17373.2.patch > > > Upgrade some libraries including log4j to 2.8.2, accumulo to 1.8.1 and > commons-httpclient to 3.1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently
[ https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16886: --- Attachment: HIVE-16886.5.patch adding configurations to manage retry limit and sleep time for acquiring the lock on the notification sequence. > HMS log notifications may have duplicated event IDs if multiple HMS are > running concurrently > > > Key: HIVE-16886 > URL: https://issues.apache.org/jira/browse/HIVE-16886 > Project: Hive > Issue Type: Bug > Components: Hive, Metastore >Reporter: Sergio Peña >Assignee: anishek > Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch, > HIVE-16886.2.patch, HIVE-16886.3.patch, HIVE-16886.4.patch, HIVE-16886.5.patch > > > When running multiple Hive Metastore servers and DB notifications are > enabled, I could see that notifications can be persisted with a duplicated > event ID. > This does not happen when running multiple threads in a single HMS node due > to the locking acquired on the DbNotificationsLog class, but multiple HMS > could cause conflicts. > The issue is in the ObjectStore#addNotificationEvent() method. The event ID > fetched from the datastore is used for the new notification, incremented in > the server itself, then persisted or updated back to the datastore. If 2 > servers read the same ID, then these 2 servers write a new notification with > the same ID. > The event ID is not unique nor a primary key. > Here's a test case using the TestObjectStore class that confirms this issue: > {noformat} > @Test > public void testConcurrentAddNotifications() throws ExecutionException, > InterruptedException { > final int NUM_THREADS = 2; > CountDownLatch countIn = new CountDownLatch(NUM_THREADS); > CountDownLatch countOut = new CountDownLatch(1); > HiveConf conf = new HiveConf(); > conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, > MockPartitionExpressionProxy.class.getName()); > ExecutorService executorService = > Executors.newFixedThreadPool(NUM_THREADS); > FutureTask tasks[] = new FutureTask[NUM_THREADS]; > for (int i=0; i final int n = i; > tasks[i] = new FutureTask(new Callable() { > @Override > public Void call() throws Exception { > ObjectStore store = new ObjectStore(); > store.setConf(conf); > NotificationEvent dbEvent = > new NotificationEvent(0, 0, > EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n); > System.out.println("ADDING NOTIFICATION"); > countIn.countDown(); > countOut.await(); > store.addNotificationEvent(dbEvent); > System.out.println("FINISH NOTIFICATION"); > return null; > } > }); > executorService.execute(tasks[i]); > } > countIn.await(); > countOut.countDown(); > for (int i = 0; i < NUM_THREADS; ++i) { > tasks[i].get(); > } > NotificationEventResponse eventResponse = > objectStore.getNextNotification(new NotificationEventRequest()); > Assert.assertEquals(2, eventResponse.getEventsSize()); > Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId()); > // This fails because the next notification has an event ID = 1 > Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId()); > } > {noformat} > The last assertion fails expecting an event ID 1 instead of 2. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently
[ https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145845#comment-16145845 ] anishek commented on HIVE-16886: add {{NOTIFICATION_SEQUENCE_LOCK_MAX_RETRIES}} and {{NOTIFICATION_SEQUENCE_LOCK_RETRY_SLEEP_INTERVAL}} to docs > HMS log notifications may have duplicated event IDs if multiple HMS are > running concurrently > > > Key: HIVE-16886 > URL: https://issues.apache.org/jira/browse/HIVE-16886 > Project: Hive > Issue Type: Bug > Components: Hive, Metastore >Reporter: Sergio Peña >Assignee: anishek > Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch, > HIVE-16886.2.patch, HIVE-16886.3.patch, HIVE-16886.4.patch, HIVE-16886.5.patch > > > When running multiple Hive Metastore servers and DB notifications are > enabled, I could see that notifications can be persisted with a duplicated > event ID. > This does not happen when running multiple threads in a single HMS node due > to the locking acquired on the DbNotificationsLog class, but multiple HMS > could cause conflicts. > The issue is in the ObjectStore#addNotificationEvent() method. The event ID > fetched from the datastore is used for the new notification, incremented in > the server itself, then persisted or updated back to the datastore. If 2 > servers read the same ID, then these 2 servers write a new notification with > the same ID. > The event ID is not unique nor a primary key. > Here's a test case using the TestObjectStore class that confirms this issue: > {noformat} > @Test > public void testConcurrentAddNotifications() throws ExecutionException, > InterruptedException { > final int NUM_THREADS = 2; > CountDownLatch countIn = new CountDownLatch(NUM_THREADS); > CountDownLatch countOut = new CountDownLatch(1); > HiveConf conf = new HiveConf(); > conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, > MockPartitionExpressionProxy.class.getName()); > ExecutorService executorService = > Executors.newFixedThreadPool(NUM_THREADS); > FutureTask tasks[] = new FutureTask[NUM_THREADS]; > for (int i=0; i final int n = i; > tasks[i] = new FutureTask(new Callable() { > @Override > public Void call() throws Exception { > ObjectStore store = new ObjectStore(); > store.setConf(conf); > NotificationEvent dbEvent = > new NotificationEvent(0, 0, > EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n); > System.out.println("ADDING NOTIFICATION"); > countIn.countDown(); > countOut.await(); > store.addNotificationEvent(dbEvent); > System.out.println("FINISH NOTIFICATION"); > return null; > } > }); > executorService.execute(tasks[i]); > } > countIn.await(); > countOut.countDown(); > for (int i = 0; i < NUM_THREADS; ++i) { > tasks[i].get(); > } > NotificationEventResponse eventResponse = > objectStore.getNextNotification(new NotificationEventRequest()); > Assert.assertEquals(2, eventResponse.getEventsSize()); > Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId()); > // This fails because the next notification has an event ID = 1 > Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId()); > } > {noformat} > The last assertion fails expecting an event ID 1 instead of 2. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17323) Improve upon HIVE-16260
[ https://issues.apache.org/jira/browse/HIVE-17323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-17323: -- Attachment: HIVE-17323.3.patch Fixed a ConcurrentModificationException > Improve upon HIVE-16260 > --- > > Key: HIVE-17323 > URL: https://issues.apache.org/jira/browse/HIVE-17323 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal > Attachments: HIVE-17323.1.patch, HIVE-17323.2.patch, > HIVE-17323.3.patch > > > HIVE-16260 allows removal of parallel edges of semijoin with mapjoins. > https://issues.apache.org/jira/browse/HIVE-16260 > However, it should also consider dynamic partition pruning edge like semijoin > without removing it while traversing the query tree. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17359) Deal with TypeInfo dependencies in the metastore
[ https://issues.apache.org/jira/browse/HIVE-17359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145797#comment-16145797 ] Hive QA commented on HIVE-17359: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12884280/HIVE-17359.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 11000 tests executed *Failed tests:* {noformat} TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) (batchId=280) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=61) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] (batchId=143) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=100) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=234) org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver (batchId=102) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6583/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6583/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6583/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12884280 - PreCommit-HIVE-Build > Deal with TypeInfo dependencies in the metastore > > > Key: HIVE-17359 > URL: https://issues.apache.org/jira/browse/HIVE-17359 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Affects Versions: 3.0.0 >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-17359.patch > > > The metastore uses TypeInfo, which resides in the serdes package. In order > to move the metastore to be separately releasable we need to deal with this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16924) Support distinct in presence Gby
[ https://issues.apache.org/jira/browse/HIVE-16924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Hyde reassigned HIVE-16924: -- Assignee: Julian Hyde (was: Remus Rusanu) > Support distinct in presence Gby > - > > Key: HIVE-16924 > URL: https://issues.apache.org/jira/browse/HIVE-16924 > Project: Hive > Issue Type: New Feature > Components: Query Planning >Reporter: Carter Shanklin >Assignee: Julian Hyde > Attachments: HIVE-16924.01.patch > > > create table e011_01 (c1 int, c2 smallint); > insert into e011_01 values (1, 1), (2, 2); > These queries should work: > select distinct c1, count(*) from e011_01 group by c1; > select distinct c1, avg(c2) from e011_01 group by c1; > Currently, you get : > FAILED: SemanticException 1:52 SELECT DISTINCT and GROUP BY can not be in the > same query. Error encountered near token 'c1' -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17280) Data loss in CONCATENATE ORC created by Spark
[ https://issues.apache.org/jira/browse/HIVE-17280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145785#comment-16145785 ] Prasanth Jayachandran commented on HIVE-17280: -- Do you happen to know the filenames generated by spark? Hive has some assumptions around filenames when moving the files from staging to final target directory. Recently encountered similar issue (HIVE-17403) which could be related to this as well. > Data loss in CONCATENATE ORC created by Spark > - > > Key: HIVE-17280 > URL: https://issues.apache.org/jira/browse/HIVE-17280 > Project: Hive > Issue Type: Bug > Components: Hive, Spark >Affects Versions: 1.2.1 > Environment: Spark 1.6.3 >Reporter: Marco Gaido >Priority: Critical > > Hive concatenation causes data loss if the ORC files in the table were > written by Spark. > Here are the steps to reproduce the problem: > - create a table; > {code:java} > hive > hive> create table aa (a string, b int) stored as orc; > {code} > - insert 2 rows using Spark; > {code:java} > spark-shell > scala> case class AA(a:String, b:Int) > scala> val df = sc.parallelize(Array(AA("b",2),AA("c",3) )).toDF > scala> df.write.insertInto("aa") > {code} > - change table schema; > {code:java} > hive > hive> alter table aa add columns(aa string, bb int); > {code} > - insert other 2 rows with Spark > {code:java} > spark-shell > scala> case class BB(a:String, b:Int, aa:String, bb:Int) > scala> val df = sc.parallelize(Array(BB("b",2,"b",2),BB("c",3,"c",3) )).toDF > scala> df.write.insertInto("aa") > {code} > - at this point, running a select statement with Hive returns correctly *4 > rows* in the table; then run the concatenation > {code:java} > hive > hive> alter table aa concatenate; > {code} > At this point, a select returns only *3 rows, ie. a row is missing*. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently
[ https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145772#comment-16145772 ] anishek commented on HIVE-16886: [~akolb] the transaction for handling the dbnotification update is a nested transaction and i didnt want to set the whole transaction to setSerializedRead, as even though currently this is done as the last set of actions in HMS, it might change in future and then every additional queries run in the transaction will be serializedRead. Setting the same on the query was again confusing as the result of change should not be visible to other transactions till commit and hence even though we are setting is on the query object, it might just be internally doing it on the current open transactions which brings us back to the first problem. Also since code to handle {{select.. for update}} was already there as part of ACID, i just reused the same, as it gives more confidence that there are no problems with this semantics rather than the setSerializedRead on Datanucleus. > HMS log notifications may have duplicated event IDs if multiple HMS are > running concurrently > > > Key: HIVE-16886 > URL: https://issues.apache.org/jira/browse/HIVE-16886 > Project: Hive > Issue Type: Bug > Components: Hive, Metastore >Reporter: Sergio Peña >Assignee: anishek > Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch, > HIVE-16886.2.patch, HIVE-16886.3.patch, HIVE-16886.4.patch > > > When running multiple Hive Metastore servers and DB notifications are > enabled, I could see that notifications can be persisted with a duplicated > event ID. > This does not happen when running multiple threads in a single HMS node due > to the locking acquired on the DbNotificationsLog class, but multiple HMS > could cause conflicts. > The issue is in the ObjectStore#addNotificationEvent() method. The event ID > fetched from the datastore is used for the new notification, incremented in > the server itself, then persisted or updated back to the datastore. If 2 > servers read the same ID, then these 2 servers write a new notification with > the same ID. > The event ID is not unique nor a primary key. > Here's a test case using the TestObjectStore class that confirms this issue: > {noformat} > @Test > public void testConcurrentAddNotifications() throws ExecutionException, > InterruptedException { > final int NUM_THREADS = 2; > CountDownLatch countIn = new CountDownLatch(NUM_THREADS); > CountDownLatch countOut = new CountDownLatch(1); > HiveConf conf = new HiveConf(); > conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, > MockPartitionExpressionProxy.class.getName()); > ExecutorService executorService = > Executors.newFixedThreadPool(NUM_THREADS); > FutureTask tasks[] = new FutureTask[NUM_THREADS]; > for (int i=0; i final int n = i; > tasks[i] = new FutureTask(new Callable() { > @Override > public Void call() throws Exception { > ObjectStore store = new ObjectStore(); > store.setConf(conf); > NotificationEvent dbEvent = > new NotificationEvent(0, 0, > EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n); > System.out.println("ADDING NOTIFICATION"); > countIn.countDown(); > countOut.await(); > store.addNotificationEvent(dbEvent); > System.out.println("FINISH NOTIFICATION"); > return null; > } > }); > executorService.execute(tasks[i]); > } > countIn.await(); > countOut.countDown(); > for (int i = 0; i < NUM_THREADS; ++i) { > tasks[i].get(); > } > NotificationEventResponse eventResponse = > objectStore.getNextNotification(new NotificationEventRequest()); > Assert.assertEquals(2, eventResponse.getEventsSize()); > Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId()); > // This fails because the next notification has an event ID = 1 > Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId()); > } > {noformat} > The last assertion fails expecting an event ID 1 instead of 2. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12157) Support unicode for table/column names
[ https://issues.apache.org/jira/browse/HIVE-12157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Tolpeko updated HIVE-12157: -- Component/s: (was: hpl/sql) > Support unicode for table/column names > --- > > Key: HIVE-12157 > URL: https://issues.apache.org/jira/browse/HIVE-12157 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 >Reporter: richard du >Assignee: hefuhua >Priority: Minor > Attachments: HIVE-12157.01.patch, HIVE-12157.02.patch, > HIVE-12157.patch > > > Parser will throw exception when I use alias: > hive> desc test; > OK > a int > b string > Time taken: 0.135 seconds, Fetched: 2 row(s) > hive> select a as 行1 from test limit 10; > NoViableAltException(302@[134:7: ( ( ( KW_AS )? identifier ) | ( KW_AS LPAREN > identifier ( COMMA identifier )* RPAREN ) )?]) > at org.antlr.runtime.DFA.noViableAlt(DFA.java:158) > at org.antlr.runtime.DFA.predict(DFA.java:116) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2915) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1373) > at > org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128) > at > org.apache.hadoop.hive.ql.parse.HiveParser.selectClause(HiveParser.java:45827) > at > org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:41495) > at > org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:41402) > at > org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:40413) > at > org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:40283) > at > org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1590) > at > org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1109) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:396) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > FAILED: ParseException line 1:13 cannot recognize input near 'as' '1' 'from' > in selection target -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-14160) Reduce-task costs a long time to finish on the condition that the certain sql "select a,distinct(b) group by a" has been executed on the data which has skew distribution
[ https://issues.apache.org/jira/browse/HIVE-14160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Tolpeko updated HIVE-14160: -- Component/s: (was: hpl/sql) > Reduce-task costs a long time to finish on the condition that the certain sql > "select a,distinct(b) group by a" has been executed on the data which has > skew distribution > - > > Key: HIVE-14160 > URL: https://issues.apache.org/jira/browse/HIVE-14160 > Project: Hive > Issue Type: Improvement >Affects Versions: 1.1.0 >Reporter: marymwu > > Reduce-task costs a long time to finish on the condition that the certain sql > "select a,distinct(b) group by a" has been executed on the data which has > skew distribution > data scale: 64G -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-13992) Insert overwrite from Parquet Source Table to Parquet Destination Table throws InvocationTargetException
[ https://issues.apache.org/jira/browse/HIVE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Tolpeko updated HIVE-13992: -- Component/s: (was: hpl/sql) > Insert overwrite from Parquet Source Table to Parquet Destination Table > throws InvocationTargetException > > > Key: HIVE-13992 > URL: https://issues.apache.org/jira/browse/HIVE-13992 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0 >Reporter: Abhijit Das > > This issue is due to an issue in Hive-1.2 with bundled parquet JAR. The same > use case(s) are working good in Hive-0.14. > This issue is because, Hive-1.2 comes with a packaged parquet Jar > (parquet-hadoop-bundle-1.6.0.jar) inside "../hive-1.2.1/lib" , and while > executing parquet related functionalities Hive-1.2 uses this JAR from the > classpath. > If the parquet JAR's which comes with Hadoop distribution is used instead of > Hive bundled parquet JARs, then these parquet related use cases(s) will work > fine. > As the workaround, we need to execute "set > mapreduce.job.user.classpath.first=true" for each and every hive session. If > this workaround is implemented, then this will be a code change in > provisioning module. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12745) Hive Timestamp value change after joining two tables
[ https://issues.apache.org/jira/browse/HIVE-12745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Tolpeko updated HIVE-12745: -- Component/s: (was: hpl/sql) > Hive Timestamp value change after joining two tables > > > Key: HIVE-12745 > URL: https://issues.apache.org/jira/browse/HIVE-12745 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 >Reporter: wyp >Assignee: Dmitry Tolpeko > > I have two Hive tables:test and test1: > {code} > CREATE TABLE `test`( `t` timestamp) > CREATE TABLE `test1`( `t` timestamp) > {code} > they all holds a t value with Timestamp datatype,the contents of the two > table as follow: > {code} > hive> select * from test1; > OK > 1970-01-01 00:00:00 > 1970-03-02 00:00:00 > Time taken: 0.091 seconds, Fetched: 2 row(s) > hive> select * from test; > OK > 1970-01-01 00:00:00 > 1970-01-02 00:00:00 > Time taken: 0.085 seconds, Fetched: 2 row(s) > {code} > However when joining this two table, the returned timestamp value changed: > {code} > hive> select test.t, test1.t from test, test1; > OK > 1969-12-31 23:00:00 1970-01-01 00:00:00 > 1970-01-01 23:00:00 1970-01-01 00:00:00 > 1969-12-31 23:00:00 1970-03-02 00:00:00 > 1970-01-01 23:00:00 1970-03-02 00:00:00 > Time taken: 54.347 seconds, Fetched: 4 row(s) > {code} > and I found the result is changed every time > {code} > hive> select test.t, test1.t from test, test1; > OK > 1970-01-01 00:00:00 1970-01-01 00:00:00 > 1970-01-02 00:00:00 1970-01-01 00:00:00 > 1970-01-01 00:00:00 1970-03-02 00:00:00 > 1970-01-02 00:00:00 1970-03-02 00:00:00 > Time taken: 26.308 seconds, Fetched: 4 row(s) > {code} > Any suggestion? Thanks -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12226) Support unicode for table names
[ https://issues.apache.org/jira/browse/HIVE-12226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Tolpeko updated HIVE-12226: -- Component/s: (was: hpl/sql) > Support unicode for table names > --- > > Key: HIVE-12226 > URL: https://issues.apache.org/jira/browse/HIVE-12226 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: richard du > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16968) MAPJOIN hint error
[ https://issues.apache.org/jira/browse/HIVE-16968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Tolpeko updated HIVE-16968: -- Component/s: (was: hpl/sql) > MAPJOIN hint error > -- > > Key: HIVE-16968 > URL: https://issues.apache.org/jira/browse/HIVE-16968 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: xujie > > set hive.auto.convert.join=false; > set hive.ignore.mapjoin.hint=false; > WITH TEMP_CUST_BAL > AS > ( > SELECT /*+ MAPJOIN(C) */ FROM FACT_RPSM.F_AGT_INFO_ALL B0 > LEFT JOIN FACT_RPSM.F_AGT_INFO_BAL_AVG A ON B0.JIZH_ID = a.JIZH_ID > AND A.DT='20170625' > LEFT JOIN BASE.B_IF_PROD_DIM C > ON B0.PROD_ID = C.PROD_ID > AND C.DT='20170625' > WHERE B0.DT='20170625' > GROUP BY B0.CUST_ID; > - > Your query has the following error(s): > Invalid OperationHandle: OperationHandle [opType=EXECUTE_STATEMENT, > getHandleIdentifier()=7300a9f1-d6cf-418b-aa59-59a7688fa74e] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12216) WHERE on the FROM table not (always) working when JOIN are present
[ https://issues.apache.org/jira/browse/HIVE-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Tolpeko updated HIVE-12216: -- Component/s: (was: hpl/sql) > WHERE on the FROM table not (always) working when JOIN are present > -- > > Key: HIVE-12216 > URL: https://issues.apache.org/jira/browse/HIVE-12216 > Project: Hive > Issue Type: Bug > Components: Parser, Query Processor, SQL >Affects Versions: 1.1.0, 1.2.1 > Environment: CDH 5.4.7 HDP2.3.2 MR TEZ >Reporter: Bolke de Bruin >Priority: Blocker > > In case we use a where clause in a state where also joins are present, the > clauses are not (always) respected. We have been able to reproduce this issue > consistently with Hive 1.1.0 on MR, Hive 1.2.1 on Tez (MR Fails here). > So fo the below query we *do* get results back like: > 'gs.i_s_c = 23' (and this goes for all clauses!) > CREATE TABLE tmp_hub_and_sats AS > SELECT >f.dt, >f.t_c, >sum(f.transaction_amount) as amount, >sum(f.amount_euro) amount_euro, >IF(f.org_grid is null, f.org_cust, f.org_grid) as org, >IF(f.org_grid is null, 0, 1) as is_org_grid, >IF(f.org_up is null, if(f.org_grid is null, f.org_cust, f.org_grid), > f.org_up) as org_up, >IF(f.to_grid is null, f.to_cust, f.to_grid) to, >IF(f.to_grid is null, 0, 1) as is_to_grid, >IF(f.to_up is null, if(f.to_grid is null, f.to_cust, f.to_grid), f.to_up) > as to_up, >gh.i_g_c as customer_code_hub, >gs.i_g_c as customer_code_satellite > from x_grid_orders f > LEFT OUTER JOIN > grid.grid gh > ON f.org_grid = gh.hashed_gridid > LEFT OUTER JOIN > grid.grid gs > ON f.to_grid = gs.hashed_gridid > where > IF(f.org_up is null, f.org_cust, f.org_up) <> IF(f.to_up is null, f.to_cust, > f.to_up) > AND > (substring(gh.i_g_c, 1, 2) <> "06" or gh.i_g_c is null) > AND > (substring(gs.i_g_c, 1, 2) <> "06" or gs.i_g_c is null) > AND > (gh.i_s_c <> "23" or gh.i_s_c is null) > AND > (gs.i_s_c <> "23" or gs.i_s_c is null) > group by > f.dt, > f.t_c, > IF(f.org_grid is null, f.org_cust, f.org_grid), > IF(f.org_grid is null, 0, 1), > IF(f.org_up is null, if(f.org_grid is null, f.org_cust, f.org_grid), > f.org_up), > IF(f.to_grid is null, f.to_cust, f.to_grid), > IF(f.to_grid is null, 0, 1), > IF(f.to_up is null, if(f.to_grid is null, f.to_cust, f.to_grid), f.to_up), > gh.i_g_c, > gs.i_g_c -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-13136) Create index failed because index table not found.
[ https://issues.apache.org/jira/browse/HIVE-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Tolpeko updated HIVE-13136: -- Component/s: (was: hpl/sql) > Create index failed because index table not found. > -- > > Key: HIVE-13136 > URL: https://issues.apache.org/jira/browse/HIVE-13136 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 1.2.1 >Reporter: Meng, Yongjian >Assignee: Meng, Yongjian > > When I execute > CREATE INDEX y_key1_index ON TABLE y(key1) AS > 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED > REBUILD; > It always display an Error: Error while processing statement: FAILED: > Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. > Table not found default__y_y_key1_index__ (state=08S01,code=1) > BUT in my hdfs, there exists the index table, but hive cannot load it when I > show tables. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17276) Check max shuffle size when converting to dynamically partitioned hash join
[ https://issues.apache.org/jira/browse/HIVE-17276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-17276: --- Attachment: HIVE-17276.02.patch Rebasing patch and triggering ptest again. > Check max shuffle size when converting to dynamically partitioned hash join > --- > > Key: HIVE-17276 > URL: https://issues.apache.org/jira/browse/HIVE-17276 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17276.01.patch, HIVE-17276.02.patch, > HIVE-17276.patch > > > Currently we only check that the max number of entries in the hashmap for a > MapJoin surpasses a certain threshold to decide whether to execute a > dynamically partitioned hash join. > We would like to factor the size of the large input that we will shuffle for > the dynamically partitioned hash join into the cost model too. -- This message was sent by Atlassian JIRA (v6.4.14#64029)