[jira] [Updated] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id
[ https://issues.apache.org/jira/browse/HIVE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21213: -- Release Note: Merged. Thanks. Resolution: Fixed Status: Resolved (was: Patch Available) > Acid table bootstrap replication needs to handle directory created by > compaction with txn id > > > Key: HIVE-21213 > URL: https://issues.apache.org/jira/browse/HIVE-21213 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2, repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, > HIVE-21213.03.patch, HIVE-21213.04.patch, HIVE-21213.05.patch > > Time Spent: 2.5h > Remaining Estimate: 0h > > The current implementation of compaction uses the txn id in the directory > name. This is used to isolate the queries from reading the directory until > compaction has finished and to avoid the compactor marking used earlier. In > case of replication, during bootstrap , directory is copied as it is with the > same name from source to destination cluster. But the directory created by > compaction with txn id can not be copied as the txn list at target may be > different from source. The txn id which is valid at source may be an aborted > txn at target. So conversion logic is required to create a new directory with > valid txn at target and dump the data to the newly created directory. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id
[ https://issues.apache.org/jira/browse/HIVE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759824#comment-17759824 ] Teddy Choi commented on HIVE-21213: --- I found that there are failing tests. > Acid table bootstrap replication needs to handle directory created by > compaction with txn id > > > Key: HIVE-21213 > URL: https://issues.apache.org/jira/browse/HIVE-21213 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2, repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, > HIVE-21213.03.patch, HIVE-21213.04.patch, HIVE-21213.05.patch > > Time Spent: 2.5h > Remaining Estimate: 0h > > The current implementation of compaction uses the txn id in the directory > name. This is used to isolate the queries from reading the directory until > compaction has finished and to avoid the compactor marking used earlier. In > case of replication, during bootstrap , directory is copied as it is with the > same name from source to destination cluster. But the directory created by > compaction with txn id can not be copied as the txn list at target may be > different from source. The txn id which is valid at source may be an aborted > txn at target. So conversion logic is required to create a new directory with > valid txn at target and dump the data to the newly created directory. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id
[ https://issues.apache.org/jira/browse/HIVE-21213 ] Teddy Choi deleted comment on HIVE-21213: --- was (Author: teddy.choi): +1. LGTM. > Acid table bootstrap replication needs to handle directory created by > compaction with txn id > > > Key: HIVE-21213 > URL: https://issues.apache.org/jira/browse/HIVE-21213 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2, repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, > HIVE-21213.03.patch, HIVE-21213.04.patch, HIVE-21213.05.patch > > Time Spent: 2.5h > Remaining Estimate: 0h > > The current implementation of compaction uses the txn id in the directory > name. This is used to isolate the queries from reading the directory until > compaction has finished and to avoid the compactor marking used earlier. In > case of replication, during bootstrap , directory is copied as it is with the > same name from source to destination cluster. But the directory created by > compaction with txn id can not be copied as the txn list at target may be > different from source. The txn id which is valid at source may be an aborted > txn at target. So conversion logic is required to create a new directory with > valid txn at target and dump the data to the newly created directory. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id
[ https://issues.apache.org/jira/browse/HIVE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759816#comment-17759816 ] Teddy Choi commented on HIVE-21213: --- +1. LGTM. > Acid table bootstrap replication needs to handle directory created by > compaction with txn id > > > Key: HIVE-21213 > URL: https://issues.apache.org/jira/browse/HIVE-21213 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2, repl >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, > HIVE-21213.03.patch, HIVE-21213.04.patch, HIVE-21213.05.patch > > Time Spent: 2.5h > Remaining Estimate: 0h > > The current implementation of compaction uses the txn id in the directory > name. This is used to isolate the queries from reading the directory until > compaction has finished and to avoid the compactor marking used earlier. In > case of replication, during bootstrap , directory is copied as it is with the > same name from source to destination cluster. But the directory created by > compaction with txn id can not be copied as the txn list at target may be > different from source. The txn id which is valid at source may be an aborted > txn at target. So conversion logic is required to create a new directory with > valid txn at target and dump the data to the newly created directory. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-26555) Read-only mode for Hive database
[ https://issues.apache.org/jira/browse/HIVE-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653808#comment-17653808 ] Teddy Choi edited comment on HIVE-26555 at 1/3/23 7:03 AM: --- [~abstractdog], sorry for late reply. It's assuming an [active-passive HA configuration|https://en.wikipedia.org/wiki/High-availability_cluster#Node_configurations] with reads on the passive. The active instance should be the single source of the truth, while the passive instance should follow it. However, the current Hive replication design allows the passive instance to diverge from the active instance. A data divergence between the active-passive instances is hard to detect and resolve. This read-only mode prevents the passive instance to change to avoid any unintended divergence. References * Microsoft SQL Server: [Configure read-only access to a secondary replica of an Always On availability group|https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/configure-read-only-access-on-an-availability-replica-sql-server?view=sql-server-ver16] * Oracle Database: [High Availability Overview and Best Practices - Features for Maximizing Availability|https://docs.oracle.com/en/database/oracle/oracle-database/21/haovw/ha-features.html#GUID-314F15CE-BD8F-45B0-911E-B7FCC2B8006A] * IBM DB2: [Enabling reads on standby|https://www.ibm.com/docs/en/db2/11.5?topic=feature-enabling-reads-standby] was (Author: teddy.choi): [~abstractdog], sorry for late reply. It's assuming an [active-passive HA configuration|https://en.wikipedia.org/wiki/High-availability_cluster#Node_configurations] with reads on the passive. The active instance should be the single source of the truth, while the passive instance should follow it. However, the current Hive replication design allows the passive instance to diverge from the active instance. A data divergence between the active-passive instances is hard to detect and resolve. This read-only mode prevents the passive instance to change to avoid any unintended divergence. References * Microsoft SQL Server: [Configure read-only access to a secondary replica of an Always On availability group|https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/configure-read-only-access-on-an-availability-replica-sql-server?view=sql-server-ver16] * Oracle Database: [High Availability Overview and Best Practices | Features for Maximizing Availability|https://docs.oracle.com/en/database/oracle/oracle-database/21/haovw/ha-features.html#GUID-314F15CE-BD8F-45B0-911E-B7FCC2B8006A] * IBM DB2: [Enabling reads on standby|https://www.ibm.com/docs/en/db2/11.5?topic=feature-enabling-reads-standby] > Read-only mode for Hive database > > > Key: HIVE-26555 > URL: https://issues.apache.org/jira/browse/HIVE-26555 > Project: Hive > Issue Type: New Feature >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > h1. Purpose > In failover/fail-back scenarios, a Hive database needs to be read-only, while > other one is writable to keep a single source of truth. > h1. User-Facing Changes > Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext > interface. hive.exec.pre.hooks needs to have the class name to initiate an > instance. The "readonly" database property can be configured to turn it on > and off. > h2. Allowed read operations > All read operations without any data/metadata change are allowed. > * EXPLAIN > * USE(or SWITCHDATABASE) > * REPLDUMP > * REPLSTATUS > * EXPORT > * KILL_QUERY > * DESC prefix > * SHOW prefix > * QUERY with SELECT or EXPLAIN. INSERT, DELETE, UPDATE are disallowed. > h2. Allowed write operations > Most of write operations that change data/metadata are disallowed. There are > few allowed exceptions. The first one is alter database to make a database > writable. The second one is replication load to load a dumped database. > * ALTER DATABASE db_name SET DBPROPERTIES without "readonly"="true". > * REPLLOAD > h1. Tests > * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT > * read_only_delete.q > * read_only_insert.q -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26555) Read-only mode for Hive database
[ https://issues.apache.org/jira/browse/HIVE-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653808#comment-17653808 ] Teddy Choi commented on HIVE-26555: --- [~abstractdog], sorry for late reply. It's assuming an [active-passive HA configuration|https://en.wikipedia.org/wiki/High-availability_cluster#Node_configurations] with reads on the passive. The active instance should be the single source of the truth, while the passive instance should follow it. However, the current Hive replication design allows the passive instance to diverge from the active instance. A data divergence between the active-passive instances is hard to detect and resolve. This read-only mode prevents the passive instance to change to avoid any unintended divergence. References * Microsoft SQL Server: [Configure read-only access to a secondary replica of an Always On availability group|https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/configure-read-only-access-on-an-availability-replica-sql-server?view=sql-server-ver16] * Oracle Database: [High Availability Overview and Best Practices | Features for Maximizing Availability|https://docs.oracle.com/en/database/oracle/oracle-database/21/haovw/ha-features.html#GUID-314F15CE-BD8F-45B0-911E-B7FCC2B8006A] * IBM DB2: [Enabling reads on standby|https://www.ibm.com/docs/en/db2/11.5?topic=feature-enabling-reads-standby] > Read-only mode for Hive database > > > Key: HIVE-26555 > URL: https://issues.apache.org/jira/browse/HIVE-26555 > Project: Hive > Issue Type: New Feature >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > h1. Purpose > In failover/fail-back scenarios, a Hive database needs to be read-only, while > other one is writable to keep a single source of truth. > h1. User-Facing Changes > Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext > interface. hive.exec.pre.hooks needs to have the class name to initiate an > instance. The "readonly" database property can be configured to turn it on > and off. > h2. Allowed read operations > All read operations without any data/metadata change are allowed. > * EXPLAIN > * USE(or SWITCHDATABASE) > * REPLDUMP > * REPLSTATUS > * EXPORT > * KILL_QUERY > * DESC prefix > * SHOW prefix > * QUERY with SELECT or EXPLAIN. INSERT, DELETE, UPDATE are disallowed. > h2. Allowed write operations > Most of write operations that change data/metadata are disallowed. There are > few allowed exceptions. The first one is alter database to make a database > writable. The second one is replication load to load a dumped database. > * ALTER DATABASE db_name SET DBPROPERTIES without "readonly"="true". > * REPLLOAD > h1. Tests > * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT > * read_only_delete.q > * read_only_insert.q -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-24933) Replication fails for transactional tables having same name as dropped non-transactional table
[ https://issues.apache.org/jira/browse/HIVE-24933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi resolved HIVE-24933. --- Resolution: Fixed Merged to master. Thank you very much. > Replication fails for transactional tables having same name as dropped > non-transactional table > -- > > Key: HIVE-24933 > URL: https://issues.apache.org/jira/browse/HIVE-24933 > Project: Hive > Issue Type: Bug >Reporter: Pratyush Madhukar >Assignee: Pratyush Madhukar >Priority: Major > Labels: pull-request-available > Time Spent: 5h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26601) Fix NPE encountered in second load cycle of optimised bootstrap
[ https://issues.apache.org/jira/browse/HIVE-26601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616133#comment-17616133 ] Teddy Choi commented on HIVE-26601: --- Hello [~vpatni], please update the status and create a corresponding PR. Thank you. > Fix NPE encountered in second load cycle of optimised bootstrap > > > Key: HIVE-26601 > URL: https://issues.apache.org/jira/browse/HIVE-26601 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Vinit Patni >Priority: Blocker > > After creating reverse replication policy after failover is completed from > Primary to DR cluster and DR takes over. First dump and load cycle of > optimised bootstrap is completing successfully, Second dump cycle on DR is > also completed which does selective bootstrap of tables that it read from > table_diff directory. However we observed issue with Second load cycle on > Primary Cluster side which is failing with following exception logs that > needs to be fixed. > {code:java} > [Scheduled Query Executor(schedule:repl_vinreverse, execution_id:421)]: > Exception while logging metrics > java.lang.NullPointerException: null > at > org.apache.hadoop.hive.ql.parse.repl.metric.ReplicationMetricCollector.reportStageProgress(ReplicationMetricCollector.java:192) > ~[hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at > org.apache.hadoop.hive.ql.exec.repl.ReplStateLogWork.replStateLog(ReplStateLogWork.java:145) > ~[hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at > org.apache.hadoop.hive.ql.exec.repl.ReplStateLogTask.execute(ReplStateLogTask.java:39) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:749) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:504) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:498) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:232) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at > org.apache.hadoop.hive.ql.scheduled.ScheduledQueryExecutionService$ScheduledQueryExecutor.processQuery(ScheduledQueryExecutionService.java:240) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at > org.apache.hadoop.hive.ql.scheduled.ScheduledQueryExecutionService$ScheduledQueryExecutor.run(ScheduledQueryExecutionService.java:193) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [?:1.8.0_232] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [?:1.8.0_232] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_232] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_232] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26600) Handle failover during optimized bootstrap
[ https://issues.apache.org/jira/browse/HIVE-26600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616132#comment-17616132 ] Teddy Choi commented on HIVE-26600: --- Hello [~rakshithc] , please update the status and create a corresponding PR. Thank you. > Handle failover during optimized bootstrap > -- > > Key: HIVE-26600 > URL: https://issues.apache.org/jira/browse/HIVE-26600 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Rakshith C >Priority: Blocker > > when the reverse policy is enabled from DR to PROD, there is a situation > wherein the user may initiate a failover from DR to PROD before the optimized > bootstrap is ever run. > Current observations: > * Repl Dump will place a failover ready marker but failover metadata won't > be generated. > * Repl Load will throw an error since failover will be set to true but > failovermetadata is missing. > Replication fails and we reach an undefined state. > Fix : > * create failover ready marker only during second cycle of optimized > bootstrap if possible. > * since some tables may need to be bootstrapped, it may take upto 3 cycles > before failover from DR to PROD is complete. > * if no tables are modified, second dump from DR to PROD will be marked as > failover ready. > Result: > * users can initiate a failover immediately after enabling reverse policy > without any hassles. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26599) Fix NPE encountered in second dump cycle of optimised bootstrap
[ https://issues.apache.org/jira/browse/HIVE-26599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616131#comment-17616131 ] Teddy Choi commented on HIVE-26599: --- Hello [~vpatni], please update the status and create a corresponding PR. Thank you. > Fix NPE encountered in second dump cycle of optimised bootstrap > --- > > Key: HIVE-26599 > URL: https://issues.apache.org/jira/browse/HIVE-26599 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Vinit Patni >Priority: Blocker > > After creating reverse replication policy after failover is completed from > Primary to DR cluster and DR takes over. First dump and load cycle of > optimised bootstrap is completing successfully, But We are encountering Null > pointer exception in the second dump cycle which is halting this reverse > replication and major blocker to test complete cycle of replication. > {code:java} > Scheduled Query Executor(schedule:repl_reverse, execution_id:14)]: FAILED: > Execution Error, return code -101 from > org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask. > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.parse.repl.metric.ReplicationMetricCollector.reportStageProgress(ReplicationMetricCollector.java:192) > at > org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.dumpTable(ReplDumpTask.java:1458) > at > org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:961) > at > org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:290) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:749) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:504) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:498) > at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:232){code} > After doing RCA, we figured out that In second dump cycle on DR cluster when > StageStart method is invoked by code, metrics corresponding to Tables is not > being registered (which should be registered as we are doing selective > bootstrap of tables for optimise bootstrap along with incremental dump) which > is causing NPE when it is trying to update the progress corresponding to this > metric latter on after bootstrap of table is completed. > Fix is to register the Tables metric before updating the progress. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26606) Expose failover states in replication metrics
[ https://issues.apache.org/jira/browse/HIVE-26606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616134#comment-17616134 ] Teddy Choi commented on HIVE-26606: --- Hello [~harshalk], please update the status and create a corresponding PR. Thank you. > Expose failover states in replication metrics > - > > Key: HIVE-26606 > URL: https://issues.apache.org/jira/browse/HIVE-26606 > Project: Hive > Issue Type: Improvement >Reporter: Teddy Choi >Assignee: Harshal Patel >Priority: Major > > Expose the state of failover in replication metrics, -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26597) Fix unsetting of db prop repl.target.for in ReplicationSemanticAnalyzer
[ https://issues.apache.org/jira/browse/HIVE-26597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616129#comment-17616129 ] Teddy Choi commented on HIVE-26597: --- Hello [~rakshithc] , please update the status and create a corresponding PR. Thank you. > Fix unsetting of db prop repl.target.for in ReplicationSemanticAnalyzer > --- > > Key: HIVE-26597 > URL: https://issues.apache.org/jira/browse/HIVE-26597 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Rakshith C >Priority: Major > > when repl policy is set from A -> B > * *repl.target.for* is set on B. > when failover is initiated > * *repl.failover.endpoint* = *'TARGET'* is set on B. > > now when reverse policy is set up from {*}A <- B{*}; > there is a check in > [ReplicationSemanticAnalyzer#initReplDump|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java#L196] > which checks for existence of these two properties and if they are set, > it unsets the *repl.target.for* property. > Because of this optimisedBootstrap won't be triggered because it checks for > the existence of *repl.target.for* property during repl dump on target > [HERE|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/OptimisedBootstrapUtils.java#L93]. > > Fix : remove the code which unsets repl.target.for in > ReplicationSemanticAnalyzer, because second dump cycle of optimized bootstrap > unsets it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26598) Fix unset db params for optimized bootstrap incase of data copy tasks run on target and testcases
[ https://issues.apache.org/jira/browse/HIVE-26598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616130#comment-17616130 ] Teddy Choi commented on HIVE-26598: --- Hello [~rakshithc] , please update the status and create a corresponding PR. Thank you. > Fix unset db params for optimized bootstrap incase of data copy tasks run on > target and testcases > - > > Key: HIVE-26598 > URL: https://issues.apache.org/jira/browse/HIVE-26598 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Rakshith C >Priority: Major > > when hive.repl.run.data.copy.tasks.on.target is set to false, repl dump task > will initiate the copy task from source cluster to staging directory. > In current code flow repl dump task dumps the metadata and then creates > another repl dump task with datacopyIterators initialized. > when the second dump cycle executes, it directly begins data copy tasks. > Because of this we don't enter second reverse dump flow and > unsetDbPropertiesForOptimisedBootstrap is never set to true again. > this results in db params (repl.target.for, repl.background.threads, etc) not > being unset. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26607) Replace vectorization templates with overrides
[ https://issues.apache.org/jira/browse/HIVE-26607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi resolved HIVE-26607. --- Resolution: Duplicate > Replace vectorization templates with overrides > -- > > Key: HIVE-26607 > URL: https://issues.apache.org/jira/browse/HIVE-26607 > Project: Hive > Issue Type: Improvement >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > > Replace vectorization templates with overrides. > h1. Background > There are many combinations among different data types, column/scalar types, > and operators in vectorization. It leaves a lot of code to implement. The > current Hive vectorization is implemented with a simple string template > engine for it. It replaces a with a value at all places within each > template file. > However, the template is written in a text file. It's not natively supported > by modern IDEs. Also any change on the template needs a separate Maven step > to generate actual code. It's time consuming. > h1. Design > The base abstract classes will respect Java's data type system. Each string > template will be divided into several sub data types, such as long-long, > long-double, double-long, double-double. > * ColumnArithmeticColumn.txt will be separated into > ** BaseLongColLongColumn.java > *** Add: long func(long a, long b) \{ return a + b; } > *** Subtract: long func(long a, long b) \{ return a - b; } > *** Multiply: long func(long a, long b) \{ return a * b; } > *** CheckedAdd: boolean supportsCheckedExecution() \{ return true; } > *** CheckedSubtract: boolean supportsCheckedExecution() \{ return true; } > *** CheckedMultiply: boolean supportsCheckedExecution() \{ return true; } > ** BaseLongColDoubleColumn.java > *** Add: double func(long a, double b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckMultiply > ** BaseDoubleColLongColumn.java > *** Add: double func(double a, long b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckMultiply > ** BaseDoubleColDoubleColumn.java > *** Add: double func(double a, double b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > * ColumnArithmeticScalar.txt > ** BaseLongColLongScalar.java > *** Add: long func(long a, long b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > ** BaseLongColDoubleScalar.java > *** Add: double func(long a, double b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > ** BaseDoubleColLongScalar.java > *** Add: double func(double a, long b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > ** BaseDoubleColDoubleColumn.java > *** Add: double func(double a, double b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26604) Replace vectorization templates with overrides
[ https://issues.apache.org/jira/browse/HIVE-26604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi resolved HIVE-26604. --- Resolution: Duplicate > Replace vectorization templates with overrides > -- > > Key: HIVE-26604 > URL: https://issues.apache.org/jira/browse/HIVE-26604 > Project: Hive > Issue Type: Improvement >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > > Replace vectorization templates with overrides. > h1. Background > There are many combinations among different data types, column/scalar types, > and operators in vectorization. It leaves a lot of code to implement. The > current Hive vectorization is implemented with a simple string template > engine for it. It replaces a with a value at all places within each > template file. > However, the template is written in a text file. It's not natively supported > by modern IDEs. Also any change on the template needs a separate Maven step > to generate actual code. It's time consuming. > h1. Design > The base abstract classes will respect Java's data type system. Each string > template will be divided into several sub data types, such as long-long, > long-double, double-long, double-double. > * ColumnArithmeticColumn.txt will be separated into > ** BaseLongColLongColumn.java > *** Add: long func(long a, long b) \{ return a + b; } > *** Subtract: long func(long a, long b) \{ return a - b; } > *** Multiply: long func(long a, long b) \{ return a * b; } > *** CheckedAdd: boolean supportsCheckedExecution() \{ return true; } > *** CheckedSubtract: boolean supportsCheckedExecution() \{ return true; } > *** CheckedMultiply: boolean supportsCheckedExecution() \{ return true; } > ** BaseLongColDoubleColumn.java > *** Add: double func(long a, double b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckMultiply > ** BaseDoubleColLongColumn.java > *** Add: double func(double a, long b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckMultiply > ** BaseDoubleColDoubleColumn.java > *** Add: double func(double a, double b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > * ColumnArithmeticScalar.txt > ** BaseLongColLongScalar.java > *** Add: long func(long a, long b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > ** BaseLongColDoubleScalar.java > *** Add: double func(long a, double b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > ** BaseDoubleColLongScalar.java > *** Add: double func(double a, long b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > ** BaseDoubleColDoubleColumn.java > *** Add: double func(double a, double b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26607) Replace vectorization templates with overrides
[ https://issues.apache.org/jira/browse/HIVE-26607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-26607: - > Replace vectorization templates with overrides > -- > > Key: HIVE-26607 > URL: https://issues.apache.org/jira/browse/HIVE-26607 > Project: Hive > Issue Type: Improvement >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > > Replace vectorization templates with overrides. > h1. Background > There are many combinations among different data types, column/scalar types, > and operators in vectorization. It leaves a lot of code to implement. The > current Hive vectorization is implemented with a simple string template > engine for it. It replaces a with a value at all places within each > template file. > However, the template is written in a text file. It's not natively supported > by modern IDEs. Also any change on the template needs a separate Maven step > to generate actual code. It's time consuming. > h1. Design > The base abstract classes will respect Java's data type system. Each string > template will be divided into several sub data types, such as long-long, > long-double, double-long, double-double. > * ColumnArithmeticColumn.txt will be separated into > ** BaseLongColLongColumn.java > *** Add: long func(long a, long b) \{ return a + b; } > *** Subtract: long func(long a, long b) \{ return a - b; } > *** Multiply: long func(long a, long b) \{ return a * b; } > *** CheckedAdd: boolean supportsCheckedExecution() \{ return true; } > *** CheckedSubtract: boolean supportsCheckedExecution() \{ return true; } > *** CheckedMultiply: boolean supportsCheckedExecution() \{ return true; } > ** BaseLongColDoubleColumn.java > *** Add: double func(long a, double b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckMultiply > ** BaseDoubleColLongColumn.java > *** Add: double func(double a, long b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckMultiply > ** BaseDoubleColDoubleColumn.java > *** Add: double func(double a, double b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > * ColumnArithmeticScalar.txt > ** BaseLongColLongScalar.java > *** Add: long func(long a, long b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > ** BaseLongColDoubleScalar.java > *** Add: double func(long a, double b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > ** BaseDoubleColLongScalar.java > *** Add: double func(double a, long b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > ** BaseDoubleColDoubleColumn.java > *** Add: double func(double a, double b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26606) Expose failover states in replication metrics
[ https://issues.apache.org/jira/browse/HIVE-26606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-26606: - Assignee: Harshal Patel (was: Teddy Choi) > Expose failover states in replication metrics > - > > Key: HIVE-26606 > URL: https://issues.apache.org/jira/browse/HIVE-26606 > Project: Hive > Issue Type: Improvement >Reporter: Teddy Choi >Assignee: Harshal Patel >Priority: Major > > Expose the state of failover in replication metrics, -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26606) Expose failover states in replication metrics
[ https://issues.apache.org/jira/browse/HIVE-26606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-26606: - > Expose failover states in replication metrics > - > > Key: HIVE-26606 > URL: https://issues.apache.org/jira/browse/HIVE-26606 > Project: Hive > Issue Type: Improvement >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > > Expose the state of failover in replication metrics, -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26555) Read-only mode for Hive database
[ https://issues.apache.org/jira/browse/HIVE-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-26555: -- Description: h1. Purpose In failover/fail-back scenarios, a Hive database needs to be read-only, while other one is writable to keep a single source of truth. h1. User-Facing Changes Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext interface. hive.exec.pre.hooks needs to have the class name to initiate an instance. The "readonly" database property can be configured to turn it on and off. h2. Allowed read operations All read operations without any data/metadata change are allowed. * EXPLAIN * USE(or SWITCHDATABASE) * REPLDUMP * REPLSTATUS * EXPORT * KILL_QUERY * DESC prefix * SHOW prefix * QUERY with SELECT or EXPLAIN. INSERT, DELETE, UPDATE are disallowed. h2. Allowed write operations Most of write operations that change data/metadata are disallowed. There are few allowed exceptions. The first one is alter database to make a database writable. The second one is replication load to load a dumped database. * ALTER DATABASE db_name SET DBPROPERTIES without "readonly"="true". * REPLLOAD h1. Tests * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT * read_only_delete.q * read_only_insert.q was: h1. Purpose In failover/fail-back scenarios, a Hive database needs to be read-only, while other one is writable to keep a single source of truth. h1. Design Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext interface. hive.exec.pre.hooks needs to have the class name to initiate an instance. The "readonly" database property can be configured to turn it on and off. Allowed operations prefixes * EXPLAIN * USE(or SWITCHDATABASE) * REPLDUMP * REPLSTATUS * EXPORT * KILL_QUERY * DESC * SHOW h1. Tests * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT * read_only_delete.q * read_only_insert.q > Read-only mode for Hive database > > > Key: HIVE-26555 > URL: https://issues.apache.org/jira/browse/HIVE-26555 > Project: Hive > Issue Type: New Feature >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 40m > Remaining Estimate: 0h > > h1. Purpose > In failover/fail-back scenarios, a Hive database needs to be read-only, while > other one is writable to keep a single source of truth. > h1. User-Facing Changes > Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext > interface. hive.exec.pre.hooks needs to have the class name to initiate an > instance. The "readonly" database property can be configured to turn it on > and off. > h2. Allowed read operations > All read operations without any data/metadata change are allowed. > * EXPLAIN > * USE(or SWITCHDATABASE) > * REPLDUMP > * REPLSTATUS > * EXPORT > * KILL_QUERY > * DESC prefix > * SHOW prefix > * QUERY with SELECT or EXPLAIN. INSERT, DELETE, UPDATE are disallowed. > h2. Allowed write operations > Most of write operations that change data/metadata are disallowed. There are > few allowed exceptions. The first one is alter database to make a database > writable. The second one is replication load to load a dumped database. > * ALTER DATABASE db_name SET DBPROPERTIES without "readonly"="true". > * REPLLOAD > h1. Tests > * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT > * read_only_delete.q > * read_only_insert.q -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26604) Replace vectorization templates with overrides
[ https://issues.apache.org/jira/browse/HIVE-26604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-26604: - > Replace vectorization templates with overrides > -- > > Key: HIVE-26604 > URL: https://issues.apache.org/jira/browse/HIVE-26604 > Project: Hive > Issue Type: Improvement >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > > Replace vectorization templates with overrides. > h1. Background > There are many combinations among different data types, column/scalar types, > and operators in vectorization. It leaves a lot of code to implement. The > current Hive vectorization is implemented with a simple string template > engine for it. It replaces a with a value at all places within each > template file. > However, the template is written in a text file. It's not natively supported > by modern IDEs. Also any change on the template needs a separate Maven step > to generate actual code. It's time consuming. > h1. Design > The base abstract classes will respect Java's data type system. Each string > template will be divided into several sub data types, such as long-long, > long-double, double-long, double-double. > * ColumnArithmeticColumn.txt will be separated into > ** BaseLongColLongColumn.java > *** Add: long func(long a, long b) \{ return a + b; } > *** Subtract: long func(long a, long b) \{ return a - b; } > *** Multiply: long func(long a, long b) \{ return a * b; } > *** CheckedAdd: boolean supportsCheckedExecution() \{ return true; } > *** CheckedSubtract: boolean supportsCheckedExecution() \{ return true; } > *** CheckedMultiply: boolean supportsCheckedExecution() \{ return true; } > ** BaseLongColDoubleColumn.java > *** Add: double func(long a, double b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckMultiply > ** BaseDoubleColLongColumn.java > *** Add: double func(double a, long b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckMultiply > ** BaseDoubleColDoubleColumn.java > *** Add: double func(double a, double b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > * ColumnArithmeticScalar.txt > ** BaseLongColLongScalar.java > *** Add: long func(long a, long b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > ** BaseLongColDoubleScalar.java > *** Add: double func(long a, double b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > ** BaseDoubleColLongScalar.java > *** Add: double func(double a, long b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > ** BaseDoubleColDoubleColumn.java > *** Add: double func(double a, double b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26602) Replace vectorization templates with overrides
[ https://issues.apache.org/jira/browse/HIVE-26602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-26602: - > Replace vectorization templates with overrides > -- > > Key: HIVE-26602 > URL: https://issues.apache.org/jira/browse/HIVE-26602 > Project: Hive > Issue Type: Improvement >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > > Replace vectorization templates with overrides. > h1. Background > There are many combinations among different data types, column/scalar types, > and operators in vectorization. It leaves a lot of code to implement. The > current Hive vectorization is implemented with a simple string template > engine for it. It replaces a with a value at all places within each > template file. > However, the template is written in a text file. It's not natively supported > by modern IDEs. Also any change on the template needs a separate Maven step > to generate actual code. It's time consuming. > h1. Design > The base abstract classes will respect Java's data type system. Each string > template will be divided into several sub data types, such as long-long, > long-double, double-long, double-double. > * ColumnArithmeticColumn.txt will be separated into > ** BaseLongColLongColumn.java > *** Add: long func(long a, long b) \{ return a + b; } > *** Subtract: long func(long a, long b) \{ return a - b; } > *** Multiply: long func(long a, long b) \{ return a * b; } > *** CheckedAdd: boolean supportsCheckedExecution() \{ return true; } > *** CheckedSubtract: boolean supportsCheckedExecution() \{ return true; } > *** CheckedMultiply: boolean supportsCheckedExecution() \{ return true; } > ** BaseLongColDoubleColumn.java > *** Add: double func(long a, double b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckMultiply > ** BaseDoubleColLongColumn.java > *** Add: double func(double a, long b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckMultiply > ** BaseDoubleColDoubleColumn.java > *** Add: double func(double a, double b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > * ColumnArithmeticScalar.txt > ** BaseLongColLongScalar.java > *** Add: long func(long a, long b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > ** BaseLongColDoubleScalar.java > *** Add: double func(long a, double b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > ** BaseDoubleColLongScalar.java > *** Add: double func(double a, long b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > ** BaseDoubleColDoubleColumn.java > *** Add: double func(double a, double b) \{ return a + b; } > *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26601) Fix NPE encountered in second load cycle of optimised bootstrap
[ https://issues.apache.org/jira/browse/HIVE-26601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-26601: - > Fix NPE encountered in second load cycle of optimised bootstrap > > > Key: HIVE-26601 > URL: https://issues.apache.org/jira/browse/HIVE-26601 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Blocker > > After creating reverse replication policy after failover is completed from > Primary to DR cluster and DR takes over. First dump and load cycle of > optimised bootstrap is completing successfully, Second dump cycle on DR is > also completed which does selective bootstrap of tables that it read from > table_diff directory. However we observed issue with Second load cycle on > Primary Cluster side which is failing with following exception logs that > needs to be fixed. > {code:java} > [Scheduled Query Executor(schedule:repl_vinreverse, execution_id:421)]: > Exception while logging metrics > java.lang.NullPointerException: null > at > org.apache.hadoop.hive.ql.parse.repl.metric.ReplicationMetricCollector.reportStageProgress(ReplicationMetricCollector.java:192) > ~[hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at > org.apache.hadoop.hive.ql.exec.repl.ReplStateLogWork.replStateLog(ReplStateLogWork.java:145) > ~[hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at > org.apache.hadoop.hive.ql.exec.repl.ReplStateLogTask.execute(ReplStateLogTask.java:39) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:749) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:504) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:498) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:232) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at > org.apache.hadoop.hive.ql.scheduled.ScheduledQueryExecutionService$ScheduledQueryExecutor.processQuery(ScheduledQueryExecutionService.java:240) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at > org.apache.hadoop.hive.ql.scheduled.ScheduledQueryExecutionService$ScheduledQueryExecutor.run(ScheduledQueryExecutionService.java:193) > [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [?:1.8.0_232] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [?:1.8.0_232] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_232] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_232] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26600) Handle failover during optimized bootstrap
[ https://issues.apache.org/jira/browse/HIVE-26600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-26600: - > Handle failover during optimized bootstrap > -- > > Key: HIVE-26600 > URL: https://issues.apache.org/jira/browse/HIVE-26600 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Rakshith C >Priority: Blocker > > when the reverse policy is enabled from DR to PROD, there is a situation > wherein the user may initiate a failover from DR to PROD before the optimized > bootstrap is ever run. > Current observations: > * Repl Dump will place a failover ready marker but failover metadata won't > be generated. > * Repl Load will throw an error since failover will be set to true but > failovermetadata is missing. > Replication fails and we reach an undefined state. > Fix : > * create failover ready marker only during second cycle of optimized > bootstrap if possible. > * since some tables may need to be bootstrapped, it may take upto 3 cycles > before failover from DR to PROD is complete. > * if no tables are modified, second dump from DR to PROD will be marked as > failover ready. > Result: > * users can initiate a failover immediately after enabling reverse policy > without any hassles. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26599) Fix NPE encountered in second dump cycle of optimised bootstrap
[ https://issues.apache.org/jira/browse/HIVE-26599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-26599: - > Fix NPE encountered in second dump cycle of optimised bootstrap > --- > > Key: HIVE-26599 > URL: https://issues.apache.org/jira/browse/HIVE-26599 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Blocker > > After creating reverse replication policy after failover is completed from > Primary to DR cluster and DR takes over. First dump and load cycle of > optimised bootstrap is completing successfully, But We are encountering Null > pointer exception in the second dump cycle which is halting this reverse > replication and major blocker to test complete cycle of replication. > {code:java} > Scheduled Query Executor(schedule:repl_reverse, execution_id:14)]: FAILED: > Execution Error, return code -101 from > org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask. > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.parse.repl.metric.ReplicationMetricCollector.reportStageProgress(ReplicationMetricCollector.java:192) > at > org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.dumpTable(ReplDumpTask.java:1458) > at > org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:961) > at > org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:290) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:749) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:504) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:498) > at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:232){code} > After doing RCA, we figured out that In second dump cycle on DR cluster when > StageStart method is invoked by code, metrics corresponding to Tables is not > being registered (which should be registered as we are doing selective > bootstrap of tables for optimise bootstrap along with incremental dump) which > is causing NPE when it is trying to update the progress corresponding to this > metric latter on after bootstrap of table is completed. > Fix is to register the Tables metric before updating the progress. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26598) Fix unset db params for optimized bootstrap incase of data copy tasks run on target and testcases
[ https://issues.apache.org/jira/browse/HIVE-26598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-26598: - > Fix unset db params for optimized bootstrap incase of data copy tasks run on > target and testcases > - > > Key: HIVE-26598 > URL: https://issues.apache.org/jira/browse/HIVE-26598 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Rakshith C >Priority: Major > > when hive.repl.run.data.copy.tasks.on.target is set to false, repl dump task > will initiate the copy task from source cluster to staging directory. > In current code flow repl dump task dumps the metadata and then creates > another repl dump task with datacopyIterators initialized. > when the second dump cycle executes, it directly begins data copy tasks. > Because of this we don't enter second reverse dump flow and > unsetDbPropertiesForOptimisedBootstrap is never set to true again. > this results in db params (repl.target.for, repl.background.threads, etc) not > being unset. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26597) Fix unsetting of db prop repl.target.for in ReplicationSemanticAnalyzer
[ https://issues.apache.org/jira/browse/HIVE-26597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-26597: - > Fix unsetting of db prop repl.target.for in ReplicationSemanticAnalyzer > --- > > Key: HIVE-26597 > URL: https://issues.apache.org/jira/browse/HIVE-26597 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Rakshith C >Priority: Major > > when repl policy is set from A -> B > * *repl.target.for* is set on B. > when failover is initiated > * *repl.failover.endpoint* = *'TARGET'* is set on B. > > now when reverse policy is set up from {*}A <- B{*}; > there is a check in > [ReplicationSemanticAnalyzer#initReplDump|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java#L196] > which checks for existence of these two properties and if they are set, > it unsets the *repl.target.for* property. > Because of this optimisedBootstrap won't be triggered because it checks for > the existence of *repl.target.for* property during repl dump on target > [HERE|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/OptimisedBootstrapUtils.java#L93]. > > Fix : remove the code which unsets repl.target.for in > ReplicationSemanticAnalyzer, because second dump cycle of optimized bootstrap > unsets it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26555) Read-only mode for Hive database
[ https://issues.apache.org/jira/browse/HIVE-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-26555: -- Fix Version/s: 4.0.0-alpha-2 > Read-only mode for Hive database > > > Key: HIVE-26555 > URL: https://issues.apache.org/jira/browse/HIVE-26555 > Project: Hive > Issue Type: New Feature >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 40m > Remaining Estimate: 0h > > h1. Purpose > In failover/fail-back scenarios, a Hive database needs to be read-only, while > other one is writable to keep a single source of truth. > h1. Design > Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext > interface. hive.exec.pre.hooks needs to have the class name to initiate an > instance. The "readonly" database property can be configured to turn it on > and off. > Allowed operations prefixes > * EXPLAIN > * USE(or SWITCHDATABASE) > * REPLDUMP > * REPLSTATUS > * EXPORT > * KILL_QUERY > * DESC > * SHOW > h1. Tests > * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT > * read_only_delete.q > * read_only_insert.q -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26555) Read-only mode for Hive database
[ https://issues.apache.org/jira/browse/HIVE-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-26555: -- Summary: Read-only mode for Hive database (was: Read-only mode) > Read-only mode for Hive database > > > Key: HIVE-26555 > URL: https://issues.apache.org/jira/browse/HIVE-26555 > Project: Hive > Issue Type: New Feature >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > h1. Purpose > In failover/fail-back scenarios, a Hive database needs to be read-only, while > other one is writable to keep a single source of truth. > h1. Design > Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext > interface. hive.exec.pre.hooks needs to have the class name to initiate an > instance. The "readonly" database property can be configured to turn it on > and off. > Allowed operations prefixes > * EXPLAIN > * USE(or SWITCHDATABASE) > * REPLDUMP > * REPLSTATUS > * EXPORT > * KILL_QUERY > * DESC > * SHOW > h1. Tests > * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT > * read_only_delete.q > * read_only_insert.q -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26555) Read-only mode
[ https://issues.apache.org/jira/browse/HIVE-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-26555: -- Status: Patch Available (was: In Progress) > Read-only mode > -- > > Key: HIVE-26555 > URL: https://issues.apache.org/jira/browse/HIVE-26555 > Project: Hive > Issue Type: New Feature >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > h1. Purpose > In failover/fail-back scenarios, a Hive database needs to be read-only, while > other one is writable to keep a single source of truth. > h1. Design > Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext > interface. hive.exec.pre.hooks needs to have the class name to initiate an > instance. The "readonly" database property can be configured to turn it on > and off. > Allowed operations prefixes > * EXPLAIN > * USE(or SWITCHDATABASE) > * REPLDUMP > * REPLSTATUS > * EXPORT > * KILL_QUERY > * DESC > * SHOW > h1. Tests > * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT > * read_only_delete.q > * read_only_insert.q -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26555) Read-only mode
[ https://issues.apache.org/jira/browse/HIVE-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-26555: -- Description: h1. Purpose In failover/fail-back scenarios, a Hive database needs to be read-only, while other one is writable to keep a single source of truth. h1. Design Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext interface. hive.exec.pre.hooks needs to have the class name to initiate an instance. The "readonly" database property can be configured to turn it on and off. Allowed operations prefixes * EXPLAIN * USE(or SWITCHDATABASE) * REPLDUMP * REPLSTATUS * EXPORT * KILL_QUERY * DESC * SHOW h1. Tests * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT * read_only_delete.q * read_only_insert.q was: h1. Purpose In failover/fail-back scenarios, a Hive database needs to be read-only, while other one is writable to keep a single source of truth. h1. Design Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext interface. hive.exec.pre.hooks needs to have the class name to initiate an instance. The "readonly" database property can be configured to turn it on and off. Allowed operations prefixes * EXPLAIN * USE(or SWITCHDATABASE) * REPLDUMP * REPLSTATUS * EXPORT * KILL_QUERY * DESC * SHOW Tests * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT * read_only_delete.q * read_only_insert.q > Read-only mode > -- > > Key: HIVE-26555 > URL: https://issues.apache.org/jira/browse/HIVE-26555 > Project: Hive > Issue Type: New Feature >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > h1. Purpose > In failover/fail-back scenarios, a Hive database needs to be read-only, while > other one is writable to keep a single source of truth. > h1. Design > Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext > interface. hive.exec.pre.hooks needs to have the class name to initiate an > instance. The "readonly" database property can be configured to turn it on > and off. > Allowed operations prefixes > * EXPLAIN > * USE(or SWITCHDATABASE) > * REPLDUMP > * REPLSTATUS > * EXPORT > * KILL_QUERY > * DESC > * SHOW > h1. Tests > * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT > * read_only_delete.q > * read_only_insert.q -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26555) Read-only mode
[ https://issues.apache.org/jira/browse/HIVE-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-26555: -- Description: h1. Purpose In failover/fail-back scenarios, a Hive database needs to be read-only, while other one is writable to keep a single source of truth. h1. Design Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext interface. hive.exec.pre.hooks needs to have the class name to initiate an instance. The "readonly" database property can be configured to turn it on and off. Allowed operations prefixes * EXPLAIN * USE(or SWITCHDATABASE) * REPLDUMP * REPLSTATUS * EXPORT * KILL_QUERY * DESC * SHOW Tests * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT * read_only_delete.q * read_only_insert.q was: h1. Purpose In failover/fail-back scenarios, a Hive instance needs to be read-only, while other one is writable to keep a single source of truth. h1. Design EnforceReadOnlyHiveHook class can implement ExecuteWithHookContext interface. hive.exec.pre.hooks needs to have the class name to initiate an instance. "hive.enforce.readonly" can be configured to turn it on and off. h2. Allowed operations prefixes * USE(or SWITCHDATABASE) * SELECT * DESC * DESCRIBE * SET * EXPLAIN * ROLLBACK * KILL * ABORT h1. Tests * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT * read_only_hook_delete_failure.q * read_only_hook_insert_failure.q * read_only_hook_update_failure.q > Read-only mode > -- > > Key: HIVE-26555 > URL: https://issues.apache.org/jira/browse/HIVE-26555 > Project: Hive > Issue Type: New Feature >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > h1. Purpose > In failover/fail-back scenarios, a Hive database needs to be read-only, while > other one is writable to keep a single source of truth. > h1. Design > Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext > interface. hive.exec.pre.hooks needs to have the class name to initiate an > instance. The "readonly" database property can be configured to turn it on > and off. > Allowed operations prefixes > * EXPLAIN > * USE(or SWITCHDATABASE) > * REPLDUMP > * REPLSTATUS > * EXPORT > * KILL_QUERY > * DESC > * SHOW > Tests > * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT > * read_only_delete.q > * read_only_insert.q -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-26555) Read-only mode
[ https://issues.apache.org/jira/browse/HIVE-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26555 started by Teddy Choi. - > Read-only mode > -- > > Key: HIVE-26555 > URL: https://issues.apache.org/jira/browse/HIVE-26555 > Project: Hive > Issue Type: New Feature >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > h1. Purpose > In failover/fail-back scenarios, a Hive instance needs to be read-only, while > other one is writable to keep a single source of truth. > h1. Design > EnforceReadOnlyHiveHook class can implement ExecuteWithHookContext interface. > hive.exec.pre.hooks needs to have the class name to initiate an instance. > "hive.enforce.readonly" can be configured to turn it on and off. > h2. Allowed operations prefixes > * USE(or SWITCHDATABASE) > * SELECT > * DESC > * DESCRIBE > * SET > * EXPLAIN > * ROLLBACK > * KILL > * ABORT > h1. Tests > * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT > * read_only_hook_delete_failure.q > * read_only_hook_insert_failure.q > * read_only_hook_update_failure.q -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26555) Read-only mode
[ https://issues.apache.org/jira/browse/HIVE-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-26555: - > Read-only mode > -- > > Key: HIVE-26555 > URL: https://issues.apache.org/jira/browse/HIVE-26555 > Project: Hive > Issue Type: New Feature >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Minor > > h1. Purpose > In failover/fail-back scenarios, a Hive instance needs to be read-only, while > other one is writable to keep a single source of truth. > h1. Design > EnforceReadOnlyHiveHook class can implement ExecuteWithHookContext interface. > hive.exec.pre.hooks needs to have the class name to initiate an instance. > "hive.enforce.readonly" can be configured to turn it on and off. > h2. Allowed operations prefixes > * USE(or SWITCHDATABASE) > * SELECT > * DESC > * DESCRIBE > * SET > * EXPLAIN > * ROLLBACK > * KILL > * ABORT > h1. Tests > * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT > * read_only_hook_delete_failure.q > * read_only_hook_insert_failure.q > * read_only_hook_update_failure.q -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-25790) Make managed table copies handle updates (FileUtils)
[ https://issues.apache.org/jira/browse/HIVE-25790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606858#comment-17606858 ] Teddy Choi commented on HIVE-25790: --- The Jenkins tests passed. > Make managed table copies handle updates (FileUtils) > > > Key: HIVE-25790 > URL: https://issues.apache.org/jira/browse/HIVE-25790 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] (HIVE-25790) Make managed table copies handle updates (FileUtils)
[ https://issues.apache.org/jira/browse/HIVE-25790 ] Teddy Choi deleted comment on HIVE-25790: --- was (Author: teddy.choi): I created a pull request. Its third commit is running on the upstream Jenkins. > Make managed table copies handle updates (FileUtils) > > > Key: HIVE-25790 > URL: https://issues.apache.org/jira/browse/HIVE-25790 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-25790) Make managed table copies handle updates (FileUtils)
[ https://issues.apache.org/jira/browse/HIVE-25790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-25790: -- Status: Patch Available (was: In Progress) I created a pull request. Its third commit is running on the upstream Jenkins. > Make managed table copies handle updates (FileUtils) > > > Key: HIVE-25790 > URL: https://issues.apache.org/jira/browse/HIVE-25790 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-25790) Make managed table copies handle updates (FileUtils)
[ https://issues.apache.org/jira/browse/HIVE-25790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606546#comment-17606546 ] Teddy Choi commented on HIVE-25790: --- I made a pull request. There's its third commit running on Jenkins. It copies the only different files from a source path to a destination path. For existing directories and files, it skips full copies but updates modification times to represent that it's updated. It's most optimized for HDFS-HDFS replication scenarios with checksum, block size, and length comparisons. > Make managed table copies handle updates (FileUtils) > > > Key: HIVE-25790 > URL: https://issues.apache.org/jira/browse/HIVE-25790 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-25790) Make managed table copies handle updates (FileUtils)
[ https://issues.apache.org/jira/browse/HIVE-25790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-25790 started by Teddy Choi. - > Make managed table copies handle updates (FileUtils) > > > Key: HIVE-25790 > URL: https://issues.apache.org/jira/browse/HIVE-25790 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Teddy Choi >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-25790) Make managed table copies handle updates (FileUtils)
[ https://issues.apache.org/jira/browse/HIVE-25790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-25790: - Assignee: Teddy Choi (was: Haymant Mangla) > Make managed table copies handle updates (FileUtils) > > > Key: HIVE-25790 > URL: https://issues.apache.org/jira/browse/HIVE-25790 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Teddy Choi >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-1626) stop using java.util.Stack
[ https://issues.apache.org/jira/browse/HIVE-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566850#comment-17566850 ] Teddy Choi commented on HIVE-1626: -- It's rewritten. A new pull request was created. It introduces an ArrayStack implementation for faster indexed accesses without synchronization. > stop using java.util.Stack > -- > > Key: HIVE-1626 > URL: https://issues.apache.org/jira/browse/HIVE-1626 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: John Sichi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-1626.2.patch, HIVE-1626.2.patch, HIVE-1626.3.patch, > HIVE-1626.3.patch, HIVE-1626.3.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > We currently use Stack as part of the generic node walking library. Stack > should not be used for this since its inheritance from Vector incurs > superfluous synchronization overhead. > Most projects end up adding an ArrayStack implementation and using that > instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-21437) Vectorization: Decimal64 division with integer columns
[ https://issues.apache.org/jira/browse/HIVE-21437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21437: -- Attachment: HIVE-21437.3.patch > Vectorization: Decimal64 division with integer columns > -- > > Key: HIVE-21437 > URL: https://issues.apache.org/jira/browse/HIVE-21437 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 4.0.0 >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21437.1.patch, HIVE-21437.2.patch, > HIVE-21437.3.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Vectorizer fails for > {code} > CREATE temporary TABLE `catalog_Sales`( > `cs_quantity` int, > `cs_wholesale_cost` decimal(7,2), > `cs_list_price` decimal(7,2), > `cs_sales_price` decimal(7,2), > `cs_ext_discount_amt` decimal(7,2), > `cs_ext_sales_price` decimal(7,2), > `cs_ext_wholesale_cost` decimal(7,2), > `cs_ext_list_price` decimal(7,2), > `cs_ext_tax` decimal(7,2), > `cs_coupon_amt` decimal(7,2), > `cs_ext_ship_cost` decimal(7,2), > `cs_net_paid` decimal(7,2), > `cs_net_paid_inc_tax` decimal(7,2), > `cs_net_paid_inc_ship` decimal(7,2), > `cs_net_paid_inc_ship_tax` decimal(7,2), > `cs_net_profit` decimal(7,2)) > ; > explain vectorization detail select maxcs_ext_list_price - > cs_ext_wholesale_cost) - cs_ext_discount_amt) + cs_ext_sales_price) / 2) from > catalog_sales; > {code} > {code} > 'Map Vectorization:' > 'enabled: true' > 'enabledConditionsMet: > hive.vectorized.use.vectorized.input.format IS true' > 'inputFileFormats: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' > 'notVectorizedReason: SELECT operator: Could not instantiate > DecimalColDivideDecimalScalar with arguments arguments: [21, 20, 22], > argument classes: [Integer, Integer, Integer], exception: > java.lang.IllegalArgumentException: java.lang.ClassCastException@63b56be0 > stack trace: > sun.reflect.GeneratedConstructorAccessor.newInstance(Unknown > Source), > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45), > java.lang.reflect.Constructor.newInstance(Constructor.java:423), > org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.instantiateExpression(VectorizationContext.java:2088), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4662), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4602), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.vectorizeSelectOperator(Vectorizer.java:4584), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperator(Vectorizer.java:5171), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChild(Vectorizer.java:923), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChildren(Vectorizer.java:809), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperatorTree(Vectorizer.java:776), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.access$2400(Vectorizer.java:240), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:2038), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:1990), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(Vectorizer.java:1963), > ...' > 'vectorized: false' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21437) Vectorization: Decimal64 division with integer columns
[ https://issues.apache.org/jira/browse/HIVE-21437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21437: -- Attachment: HIVE-21437.2.patch > Vectorization: Decimal64 division with integer columns > -- > > Key: HIVE-21437 > URL: https://issues.apache.org/jira/browse/HIVE-21437 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 4.0.0 >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21437.1.patch, HIVE-21437.2.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Vectorizer fails for > {code} > CREATE temporary TABLE `catalog_Sales`( > `cs_quantity` int, > `cs_wholesale_cost` decimal(7,2), > `cs_list_price` decimal(7,2), > `cs_sales_price` decimal(7,2), > `cs_ext_discount_amt` decimal(7,2), > `cs_ext_sales_price` decimal(7,2), > `cs_ext_wholesale_cost` decimal(7,2), > `cs_ext_list_price` decimal(7,2), > `cs_ext_tax` decimal(7,2), > `cs_coupon_amt` decimal(7,2), > `cs_ext_ship_cost` decimal(7,2), > `cs_net_paid` decimal(7,2), > `cs_net_paid_inc_tax` decimal(7,2), > `cs_net_paid_inc_ship` decimal(7,2), > `cs_net_paid_inc_ship_tax` decimal(7,2), > `cs_net_profit` decimal(7,2)) > ; > explain vectorization detail select maxcs_ext_list_price - > cs_ext_wholesale_cost) - cs_ext_discount_amt) + cs_ext_sales_price) / 2) from > catalog_sales; > {code} > {code} > 'Map Vectorization:' > 'enabled: true' > 'enabledConditionsMet: > hive.vectorized.use.vectorized.input.format IS true' > 'inputFileFormats: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' > 'notVectorizedReason: SELECT operator: Could not instantiate > DecimalColDivideDecimalScalar with arguments arguments: [21, 20, 22], > argument classes: [Integer, Integer, Integer], exception: > java.lang.IllegalArgumentException: java.lang.ClassCastException@63b56be0 > stack trace: > sun.reflect.GeneratedConstructorAccessor.newInstance(Unknown > Source), > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45), > java.lang.reflect.Constructor.newInstance(Constructor.java:423), > org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.instantiateExpression(VectorizationContext.java:2088), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4662), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4602), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.vectorizeSelectOperator(Vectorizer.java:4584), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperator(Vectorizer.java:5171), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChild(Vectorizer.java:923), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChildren(Vectorizer.java:809), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperatorTree(Vectorizer.java:776), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.access$2400(Vectorizer.java:240), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:2038), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:1990), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(Vectorizer.java:1963), > ...' > 'vectorized: false' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21437) Vectorization: Decimal64 division with integer columns
[ https://issues.apache.org/jira/browse/HIVE-21437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21437: -- Attachment: HIVE-21437.1.patch > Vectorization: Decimal64 division with integer columns > -- > > Key: HIVE-21437 > URL: https://issues.apache.org/jira/browse/HIVE-21437 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 4.0.0 >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Attachments: HIVE-21437.1.patch > > > Vectorizer fails for > {code} > CREATE temporary TABLE `catalog_Sales`( > `cs_quantity` int, > `cs_wholesale_cost` decimal(7,2), > `cs_list_price` decimal(7,2), > `cs_sales_price` decimal(7,2), > `cs_ext_discount_amt` decimal(7,2), > `cs_ext_sales_price` decimal(7,2), > `cs_ext_wholesale_cost` decimal(7,2), > `cs_ext_list_price` decimal(7,2), > `cs_ext_tax` decimal(7,2), > `cs_coupon_amt` decimal(7,2), > `cs_ext_ship_cost` decimal(7,2), > `cs_net_paid` decimal(7,2), > `cs_net_paid_inc_tax` decimal(7,2), > `cs_net_paid_inc_ship` decimal(7,2), > `cs_net_paid_inc_ship_tax` decimal(7,2), > `cs_net_profit` decimal(7,2)) > ; > explain vectorization detail select maxcs_ext_list_price - > cs_ext_wholesale_cost) - cs_ext_discount_amt) + cs_ext_sales_price) / 2) from > catalog_sales; > {code} > {code} > 'Map Vectorization:' > 'enabled: true' > 'enabledConditionsMet: > hive.vectorized.use.vectorized.input.format IS true' > 'inputFileFormats: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' > 'notVectorizedReason: SELECT operator: Could not instantiate > DecimalColDivideDecimalScalar with arguments arguments: [21, 20, 22], > argument classes: [Integer, Integer, Integer], exception: > java.lang.IllegalArgumentException: java.lang.ClassCastException@63b56be0 > stack trace: > sun.reflect.GeneratedConstructorAccessor.newInstance(Unknown > Source), > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45), > java.lang.reflect.Constructor.newInstance(Constructor.java:423), > org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.instantiateExpression(VectorizationContext.java:2088), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4662), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4602), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.vectorizeSelectOperator(Vectorizer.java:4584), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperator(Vectorizer.java:5171), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChild(Vectorizer.java:923), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChildren(Vectorizer.java:809), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperatorTree(Vectorizer.java:776), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.access$2400(Vectorizer.java:240), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:2038), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:1990), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(Vectorizer.java:1963), > ...' > 'vectorized: false' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21437) Vectorization: Decimal64 division with integer columns
[ https://issues.apache.org/jira/browse/HIVE-21437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21437: -- Status: Patch Available (was: Open) > Vectorization: Decimal64 division with integer columns > -- > > Key: HIVE-21437 > URL: https://issues.apache.org/jira/browse/HIVE-21437 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 4.0.0 >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > > Vectorizer fails for > {code} > CREATE temporary TABLE `catalog_Sales`( > `cs_quantity` int, > `cs_wholesale_cost` decimal(7,2), > `cs_list_price` decimal(7,2), > `cs_sales_price` decimal(7,2), > `cs_ext_discount_amt` decimal(7,2), > `cs_ext_sales_price` decimal(7,2), > `cs_ext_wholesale_cost` decimal(7,2), > `cs_ext_list_price` decimal(7,2), > `cs_ext_tax` decimal(7,2), > `cs_coupon_amt` decimal(7,2), > `cs_ext_ship_cost` decimal(7,2), > `cs_net_paid` decimal(7,2), > `cs_net_paid_inc_tax` decimal(7,2), > `cs_net_paid_inc_ship` decimal(7,2), > `cs_net_paid_inc_ship_tax` decimal(7,2), > `cs_net_profit` decimal(7,2)) > ; > explain vectorization detail select maxcs_ext_list_price - > cs_ext_wholesale_cost) - cs_ext_discount_amt) + cs_ext_sales_price) / 2) from > catalog_sales; > {code} > {code} > 'Map Vectorization:' > 'enabled: true' > 'enabledConditionsMet: > hive.vectorized.use.vectorized.input.format IS true' > 'inputFileFormats: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' > 'notVectorizedReason: SELECT operator: Could not instantiate > DecimalColDivideDecimalScalar with arguments arguments: [21, 20, 22], > argument classes: [Integer, Integer, Integer], exception: > java.lang.IllegalArgumentException: java.lang.ClassCastException@63b56be0 > stack trace: > sun.reflect.GeneratedConstructorAccessor.newInstance(Unknown > Source), > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45), > java.lang.reflect.Constructor.newInstance(Constructor.java:423), > org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.instantiateExpression(VectorizationContext.java:2088), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4662), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4602), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.vectorizeSelectOperator(Vectorizer.java:4584), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperator(Vectorizer.java:5171), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChild(Vectorizer.java:923), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChildren(Vectorizer.java:809), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperatorTree(Vectorizer.java:776), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.access$2400(Vectorizer.java:240), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:2038), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:1990), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(Vectorizer.java:1963), > ...' > 'vectorized: false' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-21437) Vectorization: Decimal64 division with integer columns
[ https://issues.apache.org/jira/browse/HIVE-21437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-21437: - Assignee: Teddy Choi > Vectorization: Decimal64 division with integer columns > -- > > Key: HIVE-21437 > URL: https://issues.apache.org/jira/browse/HIVE-21437 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 4.0.0 >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > > Vectorizer fails for > {code} > CREATE temporary TABLE `catalog_Sales`( > `cs_quantity` int, > `cs_wholesale_cost` decimal(7,2), > `cs_list_price` decimal(7,2), > `cs_sales_price` decimal(7,2), > `cs_ext_discount_amt` decimal(7,2), > `cs_ext_sales_price` decimal(7,2), > `cs_ext_wholesale_cost` decimal(7,2), > `cs_ext_list_price` decimal(7,2), > `cs_ext_tax` decimal(7,2), > `cs_coupon_amt` decimal(7,2), > `cs_ext_ship_cost` decimal(7,2), > `cs_net_paid` decimal(7,2), > `cs_net_paid_inc_tax` decimal(7,2), > `cs_net_paid_inc_ship` decimal(7,2), > `cs_net_paid_inc_ship_tax` decimal(7,2), > `cs_net_profit` decimal(7,2)) > ; > explain vectorization detail select maxcs_ext_list_price - > cs_ext_wholesale_cost) - cs_ext_discount_amt) + cs_ext_sales_price) / 2) from > catalog_sales; > {code} > {code} > 'Map Vectorization:' > 'enabled: true' > 'enabledConditionsMet: > hive.vectorized.use.vectorized.input.format IS true' > 'inputFileFormats: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' > 'notVectorizedReason: SELECT operator: Could not instantiate > DecimalColDivideDecimalScalar with arguments arguments: [21, 20, 22], > argument classes: [Integer, Integer, Integer], exception: > java.lang.IllegalArgumentException: java.lang.ClassCastException@63b56be0 > stack trace: > sun.reflect.GeneratedConstructorAccessor.newInstance(Unknown > Source), > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45), > java.lang.reflect.Constructor.newInstance(Constructor.java:423), > org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.instantiateExpression(VectorizationContext.java:2088), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4662), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4602), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.vectorizeSelectOperator(Vectorizer.java:4584), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperator(Vectorizer.java:5171), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChild(Vectorizer.java:923), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChildren(Vectorizer.java:809), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperatorTree(Vectorizer.java:776), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.access$2400(Vectorizer.java:240), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:2038), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:1990), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(Vectorizer.java:1963), > ...' > 'vectorized: false' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21368) Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion
[ https://issues.apache.org/jira/browse/HIVE-21368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21368: -- Resolution: Fixed Fix Version/s: 4.0.0 Status: Resolved (was: Patch Available) > Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion > -- > > Key: HIVE-21368 > URL: https://issues.apache.org/jira/browse/HIVE-21368 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-21368.1.patch, HIVE-21368.2.patch > > > Joins projecting Decimal64 have a suspicious cast in the inner loop > {code} > ConvertDecimal64ToDecimal(col 14:decimal(7,2)/DECIMAL_64) -> 24:decimal(7,2)' > {code} > {code} > create temporary table foo(x int , y decimal(7,2)); > create temporary table bar(x int , y decimal(7,2)); > set hive.explain.user=false; > explain vectorization detail select sum(foo.y) from foo, bar where foo.x = > bar.x; > {code} > {code} > ' Map Join Operator' > 'condition map:' > ' Inner Join 0 to 1' > 'keys:' > ' 0 _col0 (type: int)' > ' 1 _col0 (type: int)' > 'Map Join Vectorization:' > 'bigTableKeyColumnNums: [0]' > 'bigTableRetainedColumnNums: [3]' > 'bigTableValueColumnNums: [3]' > 'bigTableValueExpressions: > ConvertDecimal64ToDecimal(col 1:decimal(7,2)/DECIMAL_64) -> 3:decimal(7,2)' > 'className: VectorMapJoinInnerBigOnlyLongOperator' > 'native: true' > 'nativeConditionsMet: > hive.mapjoin.optimized.hashtable IS true, > hive.vectorized.execution.mapjoin.native.enabled IS true, > hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS > true, No nullsafe IS true, Small table vectorizes IS true, Fast Hash Table > and No Hybrid Hash Join IS true' > 'projectedOutputColumnNums: [3]' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21368) Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion
[ https://issues.apache.org/jira/browse/HIVE-21368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792346#comment-16792346 ] Teddy Choi commented on HIVE-21368: --- Committed to master. Thanks. [~gopalv] > Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion > -- > > Key: HIVE-21368 > URL: https://issues.apache.org/jira/browse/HIVE-21368 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Attachments: HIVE-21368.1.patch, HIVE-21368.2.patch > > > Joins projecting Decimal64 have a suspicious cast in the inner loop > {code} > ConvertDecimal64ToDecimal(col 14:decimal(7,2)/DECIMAL_64) -> 24:decimal(7,2)' > {code} > {code} > create temporary table foo(x int , y decimal(7,2)); > create temporary table bar(x int , y decimal(7,2)); > set hive.explain.user=false; > explain vectorization detail select sum(foo.y) from foo, bar where foo.x = > bar.x; > {code} > {code} > ' Map Join Operator' > 'condition map:' > ' Inner Join 0 to 1' > 'keys:' > ' 0 _col0 (type: int)' > ' 1 _col0 (type: int)' > 'Map Join Vectorization:' > 'bigTableKeyColumnNums: [0]' > 'bigTableRetainedColumnNums: [3]' > 'bigTableValueColumnNums: [3]' > 'bigTableValueExpressions: > ConvertDecimal64ToDecimal(col 1:decimal(7,2)/DECIMAL_64) -> 3:decimal(7,2)' > 'className: VectorMapJoinInnerBigOnlyLongOperator' > 'native: true' > 'nativeConditionsMet: > hive.mapjoin.optimized.hashtable IS true, > hive.vectorized.execution.mapjoin.native.enabled IS true, > hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS > true, No nullsafe IS true, Small table vectorizes IS true, Fast Hash Table > and No Hybrid Hash Join IS true' > 'projectedOutputColumnNums: [3]' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21368) Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion
[ https://issues.apache.org/jira/browse/HIVE-21368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21368: -- Attachment: HIVE-21368.2.patch > Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion > -- > > Key: HIVE-21368 > URL: https://issues.apache.org/jira/browse/HIVE-21368 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Attachments: HIVE-21368.1.patch, HIVE-21368.2.patch > > > Joins projecting Decimal64 have a suspicious cast in the inner loop > {code} > ConvertDecimal64ToDecimal(col 14:decimal(7,2)/DECIMAL_64) -> 24:decimal(7,2)' > {code} > {code} > create temporary table foo(x int , y decimal(7,2)); > create temporary table bar(x int , y decimal(7,2)); > set hive.explain.user=false; > explain vectorization detail select sum(foo.y) from foo, bar where foo.x = > bar.x; > {code} > {code} > ' Map Join Operator' > 'condition map:' > ' Inner Join 0 to 1' > 'keys:' > ' 0 _col0 (type: int)' > ' 1 _col0 (type: int)' > 'Map Join Vectorization:' > 'bigTableKeyColumnNums: [0]' > 'bigTableRetainedColumnNums: [3]' > 'bigTableValueColumnNums: [3]' > 'bigTableValueExpressions: > ConvertDecimal64ToDecimal(col 1:decimal(7,2)/DECIMAL_64) -> 3:decimal(7,2)' > 'className: VectorMapJoinInnerBigOnlyLongOperator' > 'native: true' > 'nativeConditionsMet: > hive.mapjoin.optimized.hashtable IS true, > hive.vectorized.execution.mapjoin.native.enabled IS true, > hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS > true, No nullsafe IS true, Small table vectorizes IS true, Fast Hash Table > and No Hybrid Hash Join IS true' > 'projectedOutputColumnNums: [3]' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21368) Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion
[ https://issues.apache.org/jira/browse/HIVE-21368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789239#comment-16789239 ] Teddy Choi commented on HIVE-21368: --- [~gopalv], thanks for pointing out. I made a patch for it. > Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion > -- > > Key: HIVE-21368 > URL: https://issues.apache.org/jira/browse/HIVE-21368 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Attachments: HIVE-21368.1.patch > > > Joins projecting Decimal64 have a suspicious cast in the inner loop > {code} > ConvertDecimal64ToDecimal(col 14:decimal(7,2)/DECIMAL_64) -> 24:decimal(7,2)' > {code} > {code} > create temporary table foo(x int , y decimal(7,2)); > create temporary table bar(x int , y decimal(7,2)); > set hive.explain.user=false; > explain vectorization detail select sum(foo.y) from foo, bar where foo.x = > bar.x; > {code} > {code} > ' Map Join Operator' > 'condition map:' > ' Inner Join 0 to 1' > 'keys:' > ' 0 _col0 (type: int)' > ' 1 _col0 (type: int)' > 'Map Join Vectorization:' > 'bigTableKeyColumnNums: [0]' > 'bigTableRetainedColumnNums: [3]' > 'bigTableValueColumnNums: [3]' > 'bigTableValueExpressions: > ConvertDecimal64ToDecimal(col 1:decimal(7,2)/DECIMAL_64) -> 3:decimal(7,2)' > 'className: VectorMapJoinInnerBigOnlyLongOperator' > 'native: true' > 'nativeConditionsMet: > hive.mapjoin.optimized.hashtable IS true, > hive.vectorized.execution.mapjoin.native.enabled IS true, > hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS > true, No nullsafe IS true, Small table vectorizes IS true, Fast Hash Table > and No Hybrid Hash Join IS true' > 'projectedOutputColumnNums: [3]' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21368) Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion
[ https://issues.apache.org/jira/browse/HIVE-21368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21368: -- Attachment: HIVE-21368.1.patch > Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion > -- > > Key: HIVE-21368 > URL: https://issues.apache.org/jira/browse/HIVE-21368 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Attachments: HIVE-21368.1.patch > > > Joins projecting Decimal64 have a suspicious cast in the inner loop > {code} > ConvertDecimal64ToDecimal(col 14:decimal(7,2)/DECIMAL_64) -> 24:decimal(7,2)' > {code} > {code} > create temporary table foo(x int , y decimal(7,2)); > create temporary table bar(x int , y decimal(7,2)); > set hive.explain.user=false; > explain vectorization detail select sum(foo.y) from foo, bar where foo.x = > bar.x; > {code} > {code} > ' Map Join Operator' > 'condition map:' > ' Inner Join 0 to 1' > 'keys:' > ' 0 _col0 (type: int)' > ' 1 _col0 (type: int)' > 'Map Join Vectorization:' > 'bigTableKeyColumnNums: [0]' > 'bigTableRetainedColumnNums: [3]' > 'bigTableValueColumnNums: [3]' > 'bigTableValueExpressions: > ConvertDecimal64ToDecimal(col 1:decimal(7,2)/DECIMAL_64) -> 3:decimal(7,2)' > 'className: VectorMapJoinInnerBigOnlyLongOperator' > 'native: true' > 'nativeConditionsMet: > hive.mapjoin.optimized.hashtable IS true, > hive.vectorized.execution.mapjoin.native.enabled IS true, > hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS > true, No nullsafe IS true, Small table vectorizes IS true, Fast Hash Table > and No Hybrid Hash Join IS true' > 'projectedOutputColumnNums: [3]' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21368) Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion
[ https://issues.apache.org/jira/browse/HIVE-21368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21368: -- Status: Patch Available (was: Open) > Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion > -- > > Key: HIVE-21368 > URL: https://issues.apache.org/jira/browse/HIVE-21368 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Attachments: HIVE-21368.1.patch > > > Joins projecting Decimal64 have a suspicious cast in the inner loop > {code} > ConvertDecimal64ToDecimal(col 14:decimal(7,2)/DECIMAL_64) -> 24:decimal(7,2)' > {code} > {code} > create temporary table foo(x int , y decimal(7,2)); > create temporary table bar(x int , y decimal(7,2)); > set hive.explain.user=false; > explain vectorization detail select sum(foo.y) from foo, bar where foo.x = > bar.x; > {code} > {code} > ' Map Join Operator' > 'condition map:' > ' Inner Join 0 to 1' > 'keys:' > ' 0 _col0 (type: int)' > ' 1 _col0 (type: int)' > 'Map Join Vectorization:' > 'bigTableKeyColumnNums: [0]' > 'bigTableRetainedColumnNums: [3]' > 'bigTableValueColumnNums: [3]' > 'bigTableValueExpressions: > ConvertDecimal64ToDecimal(col 1:decimal(7,2)/DECIMAL_64) -> 3:decimal(7,2)' > 'className: VectorMapJoinInnerBigOnlyLongOperator' > 'native: true' > 'nativeConditionsMet: > hive.mapjoin.optimized.hashtable IS true, > hive.vectorized.execution.mapjoin.native.enabled IS true, > hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS > true, No nullsafe IS true, Small table vectorizes IS true, Fast Hash Table > and No Hybrid Hash Join IS true' > 'projectedOutputColumnNums: [3]' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HIVE-21368) Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion
[ https://issues.apache.org/jira/browse/HIVE-21368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787503#comment-16787503 ] Teddy Choi edited comment on HIVE-21368 at 3/8/19 3:57 AM: --- I found a commit that reverts HIVE-20315. [According to Matt|https://issues.apache.org/jira/browse/HIVE-20315?focusedCommentId=16592355=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16592355], "Removed DECIMAL_64 conversion avoidance changes for GROUP BY / JOIN since they caused external test failures". It may take more than few simple changes. was (Author: teddy.choi): I found a commit that reverts HIVE-20315. [According to Matt|https://issues.apache.org/jira/browse/HIVE-20315?focusedCommentId=16592355=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16592355], the DECIMAL_64 to DECIMAL conversion was on purpose since they caused external test failures. It may be related with more tests and take more than few simple changes. > Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion > -- > > Key: HIVE-21368 > URL: https://issues.apache.org/jira/browse/HIVE-21368 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > > Joins projecting Decimal64 have a suspicious cast in the inner loop > {code} > ConvertDecimal64ToDecimal(col 14:decimal(7,2)/DECIMAL_64) -> 24:decimal(7,2)' > {code} > {code} > create temporary table foo(x int , y decimal(7,2)); > create temporary table bar(x int , y decimal(7,2)); > set hive.explain.user=false; > explain vectorization detail select sum(foo.y) from foo, bar where foo.x = > bar.x; > {code} > {code} > ' Map Join Operator' > 'condition map:' > ' Inner Join 0 to 1' > 'keys:' > ' 0 _col0 (type: int)' > ' 1 _col0 (type: int)' > 'Map Join Vectorization:' > 'bigTableKeyColumnNums: [0]' > 'bigTableRetainedColumnNums: [3]' > 'bigTableValueColumnNums: [3]' > 'bigTableValueExpressions: > ConvertDecimal64ToDecimal(col 1:decimal(7,2)/DECIMAL_64) -> 3:decimal(7,2)' > 'className: VectorMapJoinInnerBigOnlyLongOperator' > 'native: true' > 'nativeConditionsMet: > hive.mapjoin.optimized.hashtable IS true, > hive.vectorized.execution.mapjoin.native.enabled IS true, > hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS > true, No nullsafe IS true, Small table vectorizes IS true, Fast Hash Table > and No Hybrid Hash Join IS true' > 'projectedOutputColumnNums: [3]' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21368) Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion
[ https://issues.apache.org/jira/browse/HIVE-21368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787503#comment-16787503 ] Teddy Choi commented on HIVE-21368: --- I found a commit that reverts HIVE-20315. [According to Matt|https://issues.apache.org/jira/browse/HIVE-20315?focusedCommentId=16592355=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16592355], the DECIMAL_64 to DECIMAL conversion was on purpose since they caused external test failures. It may be related with more tests and take more than few simple changes. > Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion > -- > > Key: HIVE-21368 > URL: https://issues.apache.org/jira/browse/HIVE-21368 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > > Joins projecting Decimal64 have a suspicious cast in the inner loop > {code} > ConvertDecimal64ToDecimal(col 14:decimal(7,2)/DECIMAL_64) -> 24:decimal(7,2)' > {code} > {code} > create temporary table foo(x int , y decimal(7,2)); > create temporary table bar(x int , y decimal(7,2)); > set hive.explain.user=false; > explain vectorization detail select sum(foo.y) from foo, bar where foo.x = > bar.x; > {code} > {code} > ' Map Join Operator' > 'condition map:' > ' Inner Join 0 to 1' > 'keys:' > ' 0 _col0 (type: int)' > ' 1 _col0 (type: int)' > 'Map Join Vectorization:' > 'bigTableKeyColumnNums: [0]' > 'bigTableRetainedColumnNums: [3]' > 'bigTableValueColumnNums: [3]' > 'bigTableValueExpressions: > ConvertDecimal64ToDecimal(col 1:decimal(7,2)/DECIMAL_64) -> 3:decimal(7,2)' > 'className: VectorMapJoinInnerBigOnlyLongOperator' > 'native: true' > 'nativeConditionsMet: > hive.mapjoin.optimized.hashtable IS true, > hive.vectorized.execution.mapjoin.native.enabled IS true, > hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS > true, No nullsafe IS true, Small table vectorizes IS true, Fast Hash Table > and No Hybrid Hash Join IS true' > 'projectedOutputColumnNums: [3]' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (HIVE-21368) Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion
[ https://issues.apache.org/jira/browse/HIVE-21368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21368: -- Comment: was deleted (was: I found following code in Vectorizer.java. It was reverted in commit 470ba3e2835ef769f940d013acbe6c05d9208903 by McMcline in 2018-08-16 to revert HIVE-20315. I don't know why it was reverted. {code:java} // For now, we don't support joins on or using DECIMAL_64. VectorExpression[] allBigTableValueExpressions = vContext.getVectorExpressionsUpConvertDecimal64(bigTableExprs); {code}) > Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion > -- > > Key: HIVE-21368 > URL: https://issues.apache.org/jira/browse/HIVE-21368 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > > Joins projecting Decimal64 have a suspicious cast in the inner loop > {code} > ConvertDecimal64ToDecimal(col 14:decimal(7,2)/DECIMAL_64) -> 24:decimal(7,2)' > {code} > {code} > create temporary table foo(x int , y decimal(7,2)); > create temporary table bar(x int , y decimal(7,2)); > set hive.explain.user=false; > explain vectorization detail select sum(foo.y) from foo, bar where foo.x = > bar.x; > {code} > {code} > ' Map Join Operator' > 'condition map:' > ' Inner Join 0 to 1' > 'keys:' > ' 0 _col0 (type: int)' > ' 1 _col0 (type: int)' > 'Map Join Vectorization:' > 'bigTableKeyColumnNums: [0]' > 'bigTableRetainedColumnNums: [3]' > 'bigTableValueColumnNums: [3]' > 'bigTableValueExpressions: > ConvertDecimal64ToDecimal(col 1:decimal(7,2)/DECIMAL_64) -> 3:decimal(7,2)' > 'className: VectorMapJoinInnerBigOnlyLongOperator' > 'native: true' > 'nativeConditionsMet: > hive.mapjoin.optimized.hashtable IS true, > hive.vectorized.execution.mapjoin.native.enabled IS true, > hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS > true, No nullsafe IS true, Small table vectorizes IS true, Fast Hash Table > and No Hybrid Hash Join IS true' > 'projectedOutputColumnNums: [3]' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21368) Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion
[ https://issues.apache.org/jira/browse/HIVE-21368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787498#comment-16787498 ] Teddy Choi commented on HIVE-21368: --- I found following code in Vectorizer.java. It was reverted in commit 470ba3e2835ef769f940d013acbe6c05d9208903 by McMcline in 2018-08-16 to revert HIVE-20315. I don't know why it was reverted. {code:java} // For now, we don't support joins on or using DECIMAL_64. VectorExpression[] allBigTableValueExpressions = vContext.getVectorExpressionsUpConvertDecimal64(bigTableExprs); {code} > Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion > -- > > Key: HIVE-21368 > URL: https://issues.apache.org/jira/browse/HIVE-21368 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > > Joins projecting Decimal64 have a suspicious cast in the inner loop > {code} > ConvertDecimal64ToDecimal(col 14:decimal(7,2)/DECIMAL_64) -> 24:decimal(7,2)' > {code} > {code} > create temporary table foo(x int , y decimal(7,2)); > create temporary table bar(x int , y decimal(7,2)); > set hive.explain.user=false; > explain vectorization detail select sum(foo.y) from foo, bar where foo.x = > bar.x; > {code} > {code} > ' Map Join Operator' > 'condition map:' > ' Inner Join 0 to 1' > 'keys:' > ' 0 _col0 (type: int)' > ' 1 _col0 (type: int)' > 'Map Join Vectorization:' > 'bigTableKeyColumnNums: [0]' > 'bigTableRetainedColumnNums: [3]' > 'bigTableValueColumnNums: [3]' > 'bigTableValueExpressions: > ConvertDecimal64ToDecimal(col 1:decimal(7,2)/DECIMAL_64) -> 3:decimal(7,2)' > 'className: VectorMapJoinInnerBigOnlyLongOperator' > 'native: true' > 'nativeConditionsMet: > hive.mapjoin.optimized.hashtable IS true, > hive.vectorized.execution.mapjoin.native.enabled IS true, > hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS > true, No nullsafe IS true, Small table vectorizes IS true, Fast Hash Table > and No Hybrid Hash Join IS true' > 'projectedOutputColumnNums: [3]' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-21368) Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion
[ https://issues.apache.org/jira/browse/HIVE-21368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-21368: - Assignee: Teddy Choi > Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion > -- > > Key: HIVE-21368 > URL: https://issues.apache.org/jira/browse/HIVE-21368 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > > Joins projecting Decimal64 have a suspicious cast in the inner loop > {code} > ConvertDecimal64ToDecimal(col 14:decimal(7,2)/DECIMAL_64) -> 24:decimal(7,2)' > {code} > {code} > create temporary table foo(x int , y decimal(7,2)); > create temporary table bar(x int , y decimal(7,2)); > set hive.explain.user=false; > explain vectorization detail select sum(foo.y) from foo, bar where foo.x = > bar.x; > {code} > {code} > ' Map Join Operator' > 'condition map:' > ' Inner Join 0 to 1' > 'keys:' > ' 0 _col0 (type: int)' > ' 1 _col0 (type: int)' > 'Map Join Vectorization:' > 'bigTableKeyColumnNums: [0]' > 'bigTableRetainedColumnNums: [3]' > 'bigTableValueColumnNums: [3]' > 'bigTableValueExpressions: > ConvertDecimal64ToDecimal(col 1:decimal(7,2)/DECIMAL_64) -> 3:decimal(7,2)' > 'className: VectorMapJoinInnerBigOnlyLongOperator' > 'native: true' > 'nativeConditionsMet: > hive.mapjoin.optimized.hashtable IS true, > hive.vectorized.execution.mapjoin.native.enabled IS true, > hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS > true, No nullsafe IS true, Small table vectorizes IS true, Fast Hash Table > and No Hybrid Hash Join IS true' > 'projectedOutputColumnNums: [3]' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21294) Vectorization: 1-reducer Shuffle can skip the object hash functions
[ https://issues.apache.org/jira/browse/HIVE-21294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779908#comment-16779908 ] Teddy Choi commented on HIVE-21294: --- [~gopalv], I fixed the differences in murmur_hash_migration.q.out. TestObjectStore failures seem unrelated. I tested them on my laptop and there were no errors. Will it be okay to push it to master? > Vectorization: 1-reducer Shuffle can skip the object hash functions > --- > > Key: HIVE-21294 > URL: https://issues.apache.org/jira/browse/HIVE-21294 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21294.2.patch, HIVE-21294.3.patch, > HIVE-21294.4.patch > > Time Spent: 10m > Remaining Estimate: 0h > > VectorReduceSinkObjectHashOperator can skip the object hashing entirely if > the reducer count = 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21294) Vectorization: 1-reducer Shuffle can skip the object hash functions
[ https://issues.apache.org/jira/browse/HIVE-21294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21294: -- Attachment: HIVE-21294.4.patch > Vectorization: 1-reducer Shuffle can skip the object hash functions > --- > > Key: HIVE-21294 > URL: https://issues.apache.org/jira/browse/HIVE-21294 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21294.2.patch, HIVE-21294.3.patch, > HIVE-21294.4.patch > > Time Spent: 10m > Remaining Estimate: 0h > > VectorReduceSinkObjectHashOperator can skip the object hashing entirely if > the reducer count = 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21294) Vectorization: 1-reducer Shuffle can skip the object hash functions
[ https://issues.apache.org/jira/browse/HIVE-21294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21294: -- Attachment: HIVE-21294.3.patch > Vectorization: 1-reducer Shuffle can skip the object hash functions > --- > > Key: HIVE-21294 > URL: https://issues.apache.org/jira/browse/HIVE-21294 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21294.2.patch, HIVE-21294.3.patch > > Time Spent: 10m > Remaining Estimate: 0h > > VectorReduceSinkObjectHashOperator can skip the object hashing entirely if > the reducer count = 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21294) Vectorization: 1-reducer Shuffle can skip the object hash functions
[ https://issues.apache.org/jira/browse/HIVE-21294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21294: -- Description: VectorReduceSinkObjectHashOperator can skip the object hashing entirely if the reducer count = 1. (was: VectorObjectSinkHashOperator can skip the object hashing entirely if the reducer count = 1.) > Vectorization: 1-reducer Shuffle can skip the object hash functions > --- > > Key: HIVE-21294 > URL: https://issues.apache.org/jira/browse/HIVE-21294 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21294.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > VectorReduceSinkObjectHashOperator can skip the object hashing entirely if > the reducer count = 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21294) Vectorization: 1-reducer Shuffle can skip the object hash functions
[ https://issues.apache.org/jira/browse/HIVE-21294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21294: -- Attachment: HIVE-21294.2.patch > Vectorization: 1-reducer Shuffle can skip the object hash functions > --- > > Key: HIVE-21294 > URL: https://issues.apache.org/jira/browse/HIVE-21294 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21294.2.patch > > Time Spent: 10m > Remaining Estimate: 0h > > VectorReduceSinkObjectHashOperator can skip the object hashing entirely if > the reducer count = 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21294) Vectorization: 1-reducer Shuffle can skip the object hash functions
[ https://issues.apache.org/jira/browse/HIVE-21294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21294: -- Attachment: (was: HIVE-21294.1.patch) > Vectorization: 1-reducer Shuffle can skip the object hash functions > --- > > Key: HIVE-21294 > URL: https://issues.apache.org/jira/browse/HIVE-21294 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > VectorReduceSinkObjectHashOperator can skip the object hashing entirely if > the reducer count = 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21294) Vectorization: 1-reducer Shuffle can skip the object hash functions
[ https://issues.apache.org/jira/browse/HIVE-21294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21294: -- Attachment: HIVE-21294.1.patch > Vectorization: 1-reducer Shuffle can skip the object hash functions > --- > > Key: HIVE-21294 > URL: https://issues.apache.org/jira/browse/HIVE-21294 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Attachments: HIVE-21294.1.patch > > > VectorObjectSinkHashOperator can skip the object hashing entirely if the > reducer count = 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21294) Vectorization: 1-reducer Shuffle can skip the object hash functions
[ https://issues.apache.org/jira/browse/HIVE-21294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21294: -- Status: Patch Available (was: Open) > Vectorization: 1-reducer Shuffle can skip the object hash functions > --- > > Key: HIVE-21294 > URL: https://issues.apache.org/jira/browse/HIVE-21294 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Attachments: HIVE-21294.1.patch > > > VectorObjectSinkHashOperator can skip the object hashing entirely if the > reducer count = 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-21294) Vectorization: 1-reducer Shuffle can skip the object hash functions
[ https://issues.apache.org/jira/browse/HIVE-21294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-21294: - Assignee: Teddy Choi > Vectorization: 1-reducer Shuffle can skip the object hash functions > --- > > Key: HIVE-21294 > URL: https://issues.apache.org/jira/browse/HIVE-21294 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > > VectorObjectSinkHashOperator can skip the object hashing entirely if the > reducer count = 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21294) Vectorization: 1-reducer Shuffle can skip the object hash functions
[ https://issues.apache.org/jira/browse/HIVE-21294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773693#comment-16773693 ] Teddy Choi commented on HIVE-21294: --- I guess you meant VectorReduceSinkObjectHashOperator. > Vectorization: 1-reducer Shuffle can skip the object hash functions > --- > > Key: HIVE-21294 > URL: https://issues.apache.org/jira/browse/HIVE-21294 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > > VectorObjectSinkHashOperator can skip the object hashing entirely if the > reducer count = 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21257) Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in Hive 3+
[ https://issues.apache.org/jira/browse/HIVE-21257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767946#comment-16767946 ] Teddy Choi commented on HIVE-21257: --- I tested this issue in Hive 3+ and I found that it only occurs in Hive 2. I will close this issue. > Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in > Hive 3+ > -- > > Key: HIVE-21257 > URL: https://issues.apache.org/jira/browse/HIVE-21257 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0, 3.1.1 >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > > After HIVE-19951 is fixed, there still are some cases that vectorized length > UDF doesn't respect the max length of VARCHAR or CHAR. StringLength has an > internal bug. Moreover, it's hard to get input data type details in Hive 2, > unlike Hive 3. So separate both implementation to keep code clean in Hive 3 > while the changes minimal in Hive 2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HIVE-21257) Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in Hive 3+
[ https://issues.apache.org/jira/browse/HIVE-21257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi resolved HIVE-21257. --- Resolution: Not A Problem > Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in > Hive 3+ > -- > > Key: HIVE-21257 > URL: https://issues.apache.org/jira/browse/HIVE-21257 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0, 3.1.1 >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > > After HIVE-19951 is fixed, there still are some cases that vectorized length > UDF doesn't respect the max length of VARCHAR or CHAR. StringLength has an > internal bug. Moreover, it's hard to get input data type details in Hive 2, > unlike Hive 3. So separate both implementation to keep code clean in Hive 3 > while the changes minimal in Hive 2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21257) Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in Hive 3+
[ https://issues.apache.org/jira/browse/HIVE-21257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21257: -- Status: Open (was: Patch Available) > Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in > Hive 3+ > -- > > Key: HIVE-21257 > URL: https://issues.apache.org/jira/browse/HIVE-21257 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.1, 4.0.0 >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > > After HIVE-19951 is fixed, there still are some cases that vectorized length > UDF doesn't respect the max length of VARCHAR or CHAR. StringLength has an > internal bug. Moreover, it's hard to get input data type details in Hive 2, > unlike Hive 3. So separate both implementation to keep code clean in Hive 3 > while the changes minimal in Hive 2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21256) Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in Hive 2
[ https://issues.apache.org/jira/browse/HIVE-21256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21256: -- Status: Patch Available (was: Open) > Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in > Hive 2 > - > > Key: HIVE-21256 > URL: https://issues.apache.org/jira/browse/HIVE-21256 > Project: Hive > Issue Type: Bug >Affects Versions: 2.3.4 >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Attachments: HIVE-21256.2.branch-2.patch > > > After HIVE-19951 is fixed, there still are some cases that vectorized length > UDF doesn't respect the max length of VARCHAR or CHAR. StringLength has an > internal bug. Moreover, it's hard to get input data type details in Hive 2, > unlike Hive 3. So separate both implementation to keep code clean in Hive 3 > while the changes minimal in Hive 2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21256) Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in Hive 2
[ https://issues.apache.org/jira/browse/HIVE-21256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21256: -- Attachment: HIVE-21256.2.branch-2.patch > Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in > Hive 2 > - > > Key: HIVE-21256 > URL: https://issues.apache.org/jira/browse/HIVE-21256 > Project: Hive > Issue Type: Bug >Affects Versions: 2.3.4 >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Attachments: HIVE-21256.2.branch-2.patch > > > After HIVE-19951 is fixed, there still are some cases that vectorized length > UDF doesn't respect the max length of VARCHAR or CHAR. StringLength has an > internal bug. Moreover, it's hard to get input data type details in Hive 2, > unlike Hive 3. So separate both implementation to keep code clean in Hive 3 > while the changes minimal in Hive 2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21257) Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in Hive 3+
[ https://issues.apache.org/jira/browse/HIVE-21257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21257: -- Attachment: (was: HIVE-21256.1.branch-2.patch) > Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in > Hive 3+ > -- > > Key: HIVE-21257 > URL: https://issues.apache.org/jira/browse/HIVE-21257 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0, 3.1.1 >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > > After HIVE-19951 is fixed, there still are some cases that vectorized length > UDF doesn't respect the max length of VARCHAR or CHAR. StringLength has an > internal bug. Moreover, it's hard to get input data type details in Hive 2, > unlike Hive 3. So separate both implementation to keep code clean in Hive 3 > while the changes minimal in Hive 2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21257) Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in Hive 3+
[ https://issues.apache.org/jira/browse/HIVE-21257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21257: -- Status: Patch Available (was: Open) > Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in > Hive 3+ > -- > > Key: HIVE-21257 > URL: https://issues.apache.org/jira/browse/HIVE-21257 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.1, 4.0.0 >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Attachments: HIVE-21256.1.branch-2.patch > > > After HIVE-19951 is fixed, there still are some cases that vectorized length > UDF doesn't respect the max length of VARCHAR or CHAR. StringLength has an > internal bug. Moreover, it's hard to get input data type details in Hive 2, > unlike Hive 3. So separate both implementation to keep code clean in Hive 3 > while the changes minimal in Hive 2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21257) Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in Hive 3+
[ https://issues.apache.org/jira/browse/HIVE-21257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21257: -- Attachment: HIVE-21256.1.branch-2.patch > Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in > Hive 3+ > -- > > Key: HIVE-21257 > URL: https://issues.apache.org/jira/browse/HIVE-21257 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0, 3.1.1 >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Attachments: HIVE-21256.1.branch-2.patch > > > After HIVE-19951 is fixed, there still are some cases that vectorized length > UDF doesn't respect the max length of VARCHAR or CHAR. StringLength has an > internal bug. Moreover, it's hard to get input data type details in Hive 2, > unlike Hive 3. So separate both implementation to keep code clean in Hive 3 > while the changes minimal in Hive 2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-21257) Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in Hive 3+
[ https://issues.apache.org/jira/browse/HIVE-21257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-21257: - > Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in > Hive 3+ > -- > > Key: HIVE-21257 > URL: https://issues.apache.org/jira/browse/HIVE-21257 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.1, 4.0.0 >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > > After HIVE-19951 is fixed, there still are some cases that vectorized length > UDF doesn't respect the max length of VARCHAR or CHAR. StringLength has an > internal bug. Moreover, it's hard to get input data type details in Hive 2, > unlike Hive 3. So separate both implementation to keep code clean in Hive 3 > while the changes minimal in Hive 2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-21256) Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in Hive 2
[ https://issues.apache.org/jira/browse/HIVE-21256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-21256: - > Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in > Hive 2 > - > > Key: HIVE-21256 > URL: https://issues.apache.org/jira/browse/HIVE-21256 > Project: Hive > Issue Type: Bug >Affects Versions: 2.3.4 >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > > After HIVE-19951 is fixed, there still are some cases that vectorized length > UDF doesn't respect the max length of VARCHAR or CHAR. StringLength has an > internal bug. Moreover, it's hard to get input data type details in Hive 2, > unlike Hive 3. So separate both implementation to keep code clean in Hive 3 > while the changes minimal in Hive 2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21126) Allow session level queries in LlapBaseInputFormat#getSplits() before actual get_splits() call
[ https://issues.apache.org/jira/browse/HIVE-21126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21126: -- Fix Version/s: 3.2.0 4.0.0 > Allow session level queries in LlapBaseInputFormat#getSplits() before actual > get_splits() call > -- > > Key: HIVE-21126 > URL: https://issues.apache.org/jira/browse/HIVE-21126 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 3.1.1 >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.2.0 > > Attachments: HIVE-21126.1.patch, HIVE-21126.2.patch, > HIVE-21126.3.patch > > > Facilitate execution of session level queries before \{{select get_splits()}} > call. This will allow us to set params like \{{tez.grouping.split-count}} > which can be taken into consideration while splits calculation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21126) Allow session level queries in LlapBaseInputFormat#getSplits() before actual get_splits() call
[ https://issues.apache.org/jira/browse/HIVE-21126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21126: -- Resolution: Fixed Status: Resolved (was: Patch Available) > Allow session level queries in LlapBaseInputFormat#getSplits() before actual > get_splits() call > -- > > Key: HIVE-21126 > URL: https://issues.apache.org/jira/browse/HIVE-21126 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 3.1.1 >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.2.0 > > Attachments: HIVE-21126.1.patch, HIVE-21126.2.patch, > HIVE-21126.3.patch > > > Facilitate execution of session level queries before \{{select get_splits()}} > call. This will allow us to set params like \{{tez.grouping.split-count}} > which can be taken into consideration while splits calculation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-21163) ParseUtils.parseQueryAndGetSchema fails on views with global limit
[ https://issues.apache.org/jira/browse/HIVE-21163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-21163: - Assignee: Teddy Choi > ParseUtils.parseQueryAndGetSchema fails on views with global limit > -- > > Key: HIVE-21163 > URL: https://issues.apache.org/jira/browse/HIVE-21163 > Project: Hive > Issue Type: Bug >Reporter: Eric Wohlstadter >Assignee: Teddy Choi >Priority: Major > > {code:java} > hive> USE tpcds_bin_partitioned_orc_1000; > hive> CREATE VIEW profit_view AS SELECT ss_net_profit, d_date FROM > store_sales, date_dim WHERE d_date = ss_sold_date LIMIT 100; > hive> SELECT get_splits("SELECT * from profit_view", 0); > Error: java.io.IOException: > org.apache.hadoop.hive.ql.parse.SemanticException: View profit_view is > corresponding to HiveSortLimit#3447, rather than a HiveProject. > (state=,code=0) > {code} > This works fine if the view doesn't have a global limit. > It also works fine if you define a view without a global limit, and then > apply a limit on top of the view. > {{Calcite.genLogicalPlan}} is expecting a {{HiveProject}} root but when going > through {{ParseUtils.parseQueryAndGetSchema}} the {{HiveSortLimit}} appears > at the root. Perhaps it is simply missing a step to wrap the limit with a > projection? > {code} > Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: View > profit_view is corresponding to HiveSortLimit#2275, rather than a HiveProject. > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:4931) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1741) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1689) > at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118) > at > org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1043) > at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154) > at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1448) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genLogicalPlan(CalcitePlanner.java:395) > at > org.apache.hadoop.hive.ql.parse.ParseUtils.parseQueryAndGetSchema(ParseUtils.java:561) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.createPlanFragment(GenericUDTFGetSplits.java:254) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21091) Arrow serializer sets null at wrong index
[ https://issues.apache.org/jira/browse/HIVE-21091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21091: -- Attachment: HIVE-21091.3.patch > Arrow serializer sets null at wrong index > - > > Key: HIVE-21091 > URL: https://issues.apache.org/jira/browse/HIVE-21091 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21091.1.patch, HIVE-21091.2.patch, > HIVE-21091.3.patch, HIVE-21091.3.patch > > > Arrow serializer sets null at wrong index -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key
[ https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-20419: -- Resolution: Fixed Status: Resolved (was: Patch Available) > Vectorization: Prevent mutation of VectorPartitionDesc after being used in a > hashmap key > > > Key: HIVE-20419 > URL: https://issues.apache.org/jira/browse/HIVE-20419 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20419.1.patch, HIVE-20419.2.patch, > HIVE-20419.4.patch > > > This is going into the loop because the VectorPartitionDesc is modified after > it is used in the HashMap key - resulting in a hashcode & equals modification > after it has been placed in the hashmap. > {code} > HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: > 621ms > java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869 <7 > recursive calls> > java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, > Object) HashMap.java:1989 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) > HashMap.java:637 > java.util.HashMap.put(Object, Object) HashMap.java:611 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc, > VectorPartitionDesc, Map) Vectorizer.java:1272 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc, > boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork, > String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) > Vectorizer.java:1654 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork, > Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork, > boolean) Vectorizer.java:1109 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node, > Stack, Object[]) Vectorizer.java:961 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, > TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) > TaskGraphWalker.java:180 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, > HashMap) TaskGraphWalker.java:125 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext) > Vectorizer.java:2442 > org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, > ParseContext, Context) TezCompiler.java:717 > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, > HashSet, HashSet) TaskCompiler.java:258 > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, > SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443 > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) > CalcitePlanner.java:358 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key
[ https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-20419: -- Attachment: (was: BUG-116953.4.patch) > Vectorization: Prevent mutation of VectorPartitionDesc after being used in a > hashmap key > > > Key: HIVE-20419 > URL: https://issues.apache.org/jira/browse/HIVE-20419 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20419.1.patch, HIVE-20419.2.patch, > HIVE-20419.4.patch > > > This is going into the loop because the VectorPartitionDesc is modified after > it is used in the HashMap key - resulting in a hashcode & equals modification > after it has been placed in the hashmap. > {code} > HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: > 621ms > java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869 <7 > recursive calls> > java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, > Object) HashMap.java:1989 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) > HashMap.java:637 > java.util.HashMap.put(Object, Object) HashMap.java:611 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc, > VectorPartitionDesc, Map) Vectorizer.java:1272 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc, > boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork, > String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) > Vectorizer.java:1654 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork, > Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork, > boolean) Vectorizer.java:1109 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node, > Stack, Object[]) Vectorizer.java:961 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, > TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) > TaskGraphWalker.java:180 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, > HashMap) TaskGraphWalker.java:125 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext) > Vectorizer.java:2442 > org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, > ParseContext, Context) TezCompiler.java:717 > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, > HashSet, HashSet) TaskCompiler.java:258 > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, > SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443 > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) > CalcitePlanner.java:358 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key
[ https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748416#comment-16748416 ] Teddy Choi commented on HIVE-20419: --- Revised and pushed to master. Thanks [~gopalv]. > Vectorization: Prevent mutation of VectorPartitionDesc after being used in a > hashmap key > > > Key: HIVE-20419 > URL: https://issues.apache.org/jira/browse/HIVE-20419 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20419.1.patch, HIVE-20419.2.patch, > HIVE-20419.4.patch > > > This is going into the loop because the VectorPartitionDesc is modified after > it is used in the HashMap key - resulting in a hashcode & equals modification > after it has been placed in the hashmap. > {code} > HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: > 621ms > java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869 <7 > recursive calls> > java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, > Object) HashMap.java:1989 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) > HashMap.java:637 > java.util.HashMap.put(Object, Object) HashMap.java:611 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc, > VectorPartitionDesc, Map) Vectorizer.java:1272 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc, > boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork, > String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) > Vectorizer.java:1654 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork, > Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork, > boolean) Vectorizer.java:1109 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node, > Stack, Object[]) Vectorizer.java:961 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, > TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) > TaskGraphWalker.java:180 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, > HashMap) TaskGraphWalker.java:125 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext) > Vectorizer.java:2442 > org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, > ParseContext, Context) TezCompiler.java:717 > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, > HashSet, HashSet) TaskCompiler.java:258 > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, > SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443 > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) > CalcitePlanner.java:358 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key
[ https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-20419: -- Fix Version/s: 4.0.0 > Vectorization: Prevent mutation of VectorPartitionDesc after being used in a > hashmap key > > > Key: HIVE-20419 > URL: https://issues.apache.org/jira/browse/HIVE-20419 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20419.1.patch, HIVE-20419.2.patch, > HIVE-20419.4.patch > > > This is going into the loop because the VectorPartitionDesc is modified after > it is used in the HashMap key - resulting in a hashcode & equals modification > after it has been placed in the hashmap. > {code} > HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: > 621ms > java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869 <7 > recursive calls> > java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, > Object) HashMap.java:1989 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) > HashMap.java:637 > java.util.HashMap.put(Object, Object) HashMap.java:611 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc, > VectorPartitionDesc, Map) Vectorizer.java:1272 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc, > boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork, > String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) > Vectorizer.java:1654 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork, > Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork, > boolean) Vectorizer.java:1109 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node, > Stack, Object[]) Vectorizer.java:961 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, > TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) > TaskGraphWalker.java:180 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, > HashMap) TaskGraphWalker.java:125 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext) > Vectorizer.java:2442 > org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, > ParseContext, Context) TezCompiler.java:717 > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, > HashSet, HashSet) TaskCompiler.java:258 > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, > SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443 > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) > CalcitePlanner.java:358 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key
[ https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-20419: -- Attachment: (was: HIVE-20419.3.patch) > Vectorization: Prevent mutation of VectorPartitionDesc after being used in a > hashmap key > > > Key: HIVE-20419 > URL: https://issues.apache.org/jira/browse/HIVE-20419 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20419.1.patch, HIVE-20419.2.patch > > > This is going into the loop because the VectorPartitionDesc is modified after > it is used in the HashMap key - resulting in a hashcode & equals modification > after it has been placed in the hashmap. > {code} > HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: > 621ms > java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869 <7 > recursive calls> > java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, > Object) HashMap.java:1989 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) > HashMap.java:637 > java.util.HashMap.put(Object, Object) HashMap.java:611 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc, > VectorPartitionDesc, Map) Vectorizer.java:1272 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc, > boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork, > String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) > Vectorizer.java:1654 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork, > Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork, > boolean) Vectorizer.java:1109 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node, > Stack, Object[]) Vectorizer.java:961 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, > TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) > TaskGraphWalker.java:180 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, > HashMap) TaskGraphWalker.java:125 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext) > Vectorizer.java:2442 > org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, > ParseContext, Context) TezCompiler.java:717 > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, > HashSet, HashSet) TaskCompiler.java:258 > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, > SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443 > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) > CalcitePlanner.java:358 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key
[ https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-20419: -- Attachment: HIVE-20419.4.patch > Vectorization: Prevent mutation of VectorPartitionDesc after being used in a > hashmap key > > > Key: HIVE-20419 > URL: https://issues.apache.org/jira/browse/HIVE-20419 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20419.1.patch, HIVE-20419.2.patch, > HIVE-20419.4.patch > > > This is going into the loop because the VectorPartitionDesc is modified after > it is used in the HashMap key - resulting in a hashcode & equals modification > after it has been placed in the hashmap. > {code} > HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: > 621ms > java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869 <7 > recursive calls> > java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, > Object) HashMap.java:1989 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) > HashMap.java:637 > java.util.HashMap.put(Object, Object) HashMap.java:611 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc, > VectorPartitionDesc, Map) Vectorizer.java:1272 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc, > boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork, > String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) > Vectorizer.java:1654 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork, > Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork, > boolean) Vectorizer.java:1109 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node, > Stack, Object[]) Vectorizer.java:961 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, > TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) > TaskGraphWalker.java:180 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, > HashMap) TaskGraphWalker.java:125 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext) > Vectorizer.java:2442 > org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, > ParseContext, Context) TezCompiler.java:717 > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, > HashSet, HashSet) TaskCompiler.java:258 > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, > SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443 > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) > CalcitePlanner.java:358 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key
[ https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-20419: -- Attachment: BUG-116953.4.patch > Vectorization: Prevent mutation of VectorPartitionDesc after being used in a > hashmap key > > > Key: HIVE-20419 > URL: https://issues.apache.org/jira/browse/HIVE-20419 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20419.1.patch, HIVE-20419.2.patch, > HIVE-20419.4.patch > > > This is going into the loop because the VectorPartitionDesc is modified after > it is used in the HashMap key - resulting in a hashcode & equals modification > after it has been placed in the hashmap. > {code} > HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: > 621ms > java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869 <7 > recursive calls> > java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, > Object) HashMap.java:1989 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) > HashMap.java:637 > java.util.HashMap.put(Object, Object) HashMap.java:611 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc, > VectorPartitionDesc, Map) Vectorizer.java:1272 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc, > boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork, > String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) > Vectorizer.java:1654 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork, > Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork, > boolean) Vectorizer.java:1109 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node, > Stack, Object[]) Vectorizer.java:961 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, > TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) > TaskGraphWalker.java:180 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, > HashMap) TaskGraphWalker.java:125 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext) > Vectorizer.java:2442 > org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, > ParseContext, Context) TezCompiler.java:717 > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, > HashSet, HashSet) TaskCompiler.java:258 > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, > SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443 > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) > CalcitePlanner.java:358 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key
[ https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-20419: -- Attachment: HIVE-20419.3.patch > Vectorization: Prevent mutation of VectorPartitionDesc after being used in a > hashmap key > > > Key: HIVE-20419 > URL: https://issues.apache.org/jira/browse/HIVE-20419 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20419.1.patch, HIVE-20419.2.patch, > HIVE-20419.3.patch > > > This is going into the loop because the VectorPartitionDesc is modified after > it is used in the HashMap key - resulting in a hashcode & equals modification > after it has been placed in the hashmap. > {code} > HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: > 621ms > java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869 <7 > recursive calls> > java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, > Object) HashMap.java:1989 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) > HashMap.java:637 > java.util.HashMap.put(Object, Object) HashMap.java:611 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc, > VectorPartitionDesc, Map) Vectorizer.java:1272 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc, > boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork, > String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) > Vectorizer.java:1654 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork, > Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork, > boolean) Vectorizer.java:1109 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node, > Stack, Object[]) Vectorizer.java:961 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, > TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) > TaskGraphWalker.java:180 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, > HashMap) TaskGraphWalker.java:125 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext) > Vectorizer.java:2442 > org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, > ParseContext, Context) TezCompiler.java:717 > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, > HashSet, HashSet) TaskCompiler.java:258 > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, > SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443 > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) > CalcitePlanner.java:358 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21126) Allow session level queries in LlapBaseInputFormat#getSplits() before actual get_splits() call
[ https://issues.apache.org/jira/browse/HIVE-21126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748402#comment-16748402 ] Teddy Choi commented on HIVE-21126: --- +1. Looks good to me. > Allow session level queries in LlapBaseInputFormat#getSplits() before actual > get_splits() call > -- > > Key: HIVE-21126 > URL: https://issues.apache.org/jira/browse/HIVE-21126 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 3.1.1 >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21126.1.patch, HIVE-21126.2.patch, > HIVE-21126.3.patch > > > Facilitate execution of session level queries before \{{select get_splits()}} > call. This will allow us to set params like \{{tez.grouping.split-count}} > which can be taken into consideration while splits calculation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key
[ https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-20419: -- Attachment: HIVE-20419.2.patch > Vectorization: Prevent mutation of VectorPartitionDesc after being used in a > hashmap key > > > Key: HIVE-20419 > URL: https://issues.apache.org/jira/browse/HIVE-20419 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20419.1.patch, HIVE-20419.2.patch > > > This is going into the loop because the VectorPartitionDesc is modified after > it is used in the HashMap key - resulting in a hashcode & equals modification > after it has been placed in the hashmap. > {code} > HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: > 621ms > java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869 <7 > recursive calls> > java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, > Object) HashMap.java:1989 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) > HashMap.java:637 > java.util.HashMap.put(Object, Object) HashMap.java:611 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc, > VectorPartitionDesc, Map) Vectorizer.java:1272 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc, > boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork, > String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) > Vectorizer.java:1654 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork, > Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork, > boolean) Vectorizer.java:1109 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node, > Stack, Object[]) Vectorizer.java:961 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, > TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) > TaskGraphWalker.java:180 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, > HashMap) TaskGraphWalker.java:125 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext) > Vectorizer.java:2442 > org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, > ParseContext, Context) TezCompiler.java:717 > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, > HashSet, HashSet) TaskCompiler.java:258 > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, > SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443 > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) > CalcitePlanner.java:358 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key
[ https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-20419: -- Attachment: HIVE-20419.1.patch > Vectorization: Prevent mutation of VectorPartitionDesc after being used in a > hashmap key > > > Key: HIVE-20419 > URL: https://issues.apache.org/jira/browse/HIVE-20419 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20419.1.patch > > > This is going into the loop because the VectorPartitionDesc is modified after > it is used in the HashMap key - resulting in a hashcode & equals modification > after it has been placed in the hashmap. > {code} > HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: > 621ms > java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869 <7 > recursive calls> > java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, > Object) HashMap.java:1989 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) > HashMap.java:637 > java.util.HashMap.put(Object, Object) HashMap.java:611 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc, > VectorPartitionDesc, Map) Vectorizer.java:1272 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc, > boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork, > String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) > Vectorizer.java:1654 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork, > Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork, > boolean) Vectorizer.java:1109 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node, > Stack, Object[]) Vectorizer.java:961 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, > TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) > TaskGraphWalker.java:180 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, > HashMap) TaskGraphWalker.java:125 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext) > Vectorizer.java:2442 > org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, > ParseContext, Context) TezCompiler.java:717 > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, > HashSet, HashSet) TaskCompiler.java:258 > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, > SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443 > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) > CalcitePlanner.java:358 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key
[ https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-20419: -- Status: Patch Available (was: Open) > Vectorization: Prevent mutation of VectorPartitionDesc after being used in a > hashmap key > > > Key: HIVE-20419 > URL: https://issues.apache.org/jira/browse/HIVE-20419 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal V >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20419.1.patch > > > This is going into the loop because the VectorPartitionDesc is modified after > it is used in the HashMap key - resulting in a hashcode & equals modification > after it has been placed in the hashmap. > {code} > HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: > 621ms > java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869 <7 > recursive calls> > java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, > Object) HashMap.java:1989 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) > HashMap.java:637 > java.util.HashMap.put(Object, Object) HashMap.java:611 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc, > VectorPartitionDesc, Map) Vectorizer.java:1272 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc, > boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork, > String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) > Vectorizer.java:1654 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork, > Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork, > boolean) Vectorizer.java:1109 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node, > Stack, Object[]) Vectorizer.java:961 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, > TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) > TaskGraphWalker.java:180 > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, > HashMap) TaskGraphWalker.java:125 > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext) > Vectorizer.java:2442 > org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, > ParseContext, Context) TezCompiler.java:717 > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, > HashSet, HashSet) TaskCompiler.java:258 > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, > SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443 > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) > CalcitePlanner.java:358 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21091) Arrow serializer sets null at wrong index
[ https://issues.apache.org/jira/browse/HIVE-21091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747698#comment-16747698 ] Teddy Choi commented on HIVE-21091: --- After adding unit tests, I found that there's still a bug in null handling for lists. I fixed it, too. Thanks [~bslim] for advising for UT. > Arrow serializer sets null at wrong index > - > > Key: HIVE-21091 > URL: https://issues.apache.org/jira/browse/HIVE-21091 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21091.1.patch, HIVE-21091.2.patch, > HIVE-21091.3.patch > > > Arrow serializer sets null at wrong index -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21091) Arrow serializer sets null at wrong index
[ https://issues.apache.org/jira/browse/HIVE-21091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21091: -- Attachment: HIVE-21091.3.patch > Arrow serializer sets null at wrong index > - > > Key: HIVE-21091 > URL: https://issues.apache.org/jira/browse/HIVE-21091 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21091.1.patch, HIVE-21091.2.patch, > HIVE-21091.3.patch > > > Arrow serializer sets null at wrong index -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21091) Arrow serializer sets null at wrong index
[ https://issues.apache.org/jira/browse/HIVE-21091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21091: -- Attachment: HIVE-21091.2.patch > Arrow serializer sets null at wrong index > - > > Key: HIVE-21091 > URL: https://issues.apache.org/jira/browse/HIVE-21091 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Attachments: HIVE-21091.1.patch, HIVE-21091.2.patch > > > Arrow serializer sets null at wrong index -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21091) Arrow serializer sets null at wrong index
[ https://issues.apache.org/jira/browse/HIVE-21091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21091: -- Attachment: HIVE-21091.1.patch > Arrow serializer sets null at wrong index > - > > Key: HIVE-21091 > URL: https://issues.apache.org/jira/browse/HIVE-21091 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Attachments: HIVE-21091.1.patch > > > Arrow serializer sets null at wrong index -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21091) Arrow serializer sets null at wrong index
[ https://issues.apache.org/jira/browse/HIVE-21091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21091: -- Status: Patch Available (was: Open) > Arrow serializer sets null at wrong index > - > > Key: HIVE-21091 > URL: https://issues.apache.org/jira/browse/HIVE-21091 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Attachments: HIVE-21091.1.patch > > > Arrow serializer sets null at wrong index -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-21091) Arrow serializer sets null at wrong index
[ https://issues.apache.org/jira/browse/HIVE-21091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-21091: - > Arrow serializer sets null at wrong index > - > > Key: HIVE-21091 > URL: https://issues.apache.org/jira/browse/HIVE-21091 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > > Arrow serializer sets null at wrong index -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21041) NPE, ParseException in getting schema from logical plan
[ https://issues.apache.org/jira/browse/HIVE-21041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724141#comment-16724141 ] Teddy Choi commented on HIVE-21041: --- Pushed to master and branch-3. Thanks, [~jcamachorodriguez]. > NPE, ParseException in getting schema from logical plan > --- > > Key: HIVE-21041 > URL: https://issues.apache.org/jira/browse/HIVE-21041 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0, 3.2.0 > > Attachments: HIVE-21041.2.patch, HIVE-21041.3.patch > > > HIVE-20552 makes getting schema from logical plan faster. But it throws > ParseException when it has column alias, and NullPointerException when it has > subqueries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21041) NPE, ParseException in getting schema from logical plan
[ https://issues.apache.org/jira/browse/HIVE-21041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-21041: -- Resolution: Fixed Status: Resolved (was: Patch Available) > NPE, ParseException in getting schema from logical plan > --- > > Key: HIVE-21041 > URL: https://issues.apache.org/jira/browse/HIVE-21041 > Project: Hive > Issue Type: Bug >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0, 3.2.0 > > Attachments: HIVE-21041.2.patch, HIVE-21041.3.patch > > > HIVE-20552 makes getting schema from logical plan faster. But it throws > ParseException when it has column alias, and NullPointerException when it has > subqueries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)