from:"Teddy Choi \(Jira\)"

[jira] [Updated] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2023-09-04 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21213:
--
Release Note: Merged. Thanks.
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch, HIVE-21213.04.patch, HIVE-21213.05.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2023-08-28 Thread Teddy Choi (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759824#comment-17759824
 ] 

Teddy Choi commented on HIVE-21213:
---

I found that there are failing tests.

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch, HIVE-21213.04.patch, HIVE-21213.05.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2023-08-28 Thread Teddy Choi (Jira)



[ https://issues.apache.org/jira/browse/HIVE-21213 ]


Teddy Choi deleted comment on HIVE-21213:
---

was (Author: teddy.choi):
+1. LGTM.

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch, HIVE-21213.04.patch, HIVE-21213.05.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2023-08-28 Thread Teddy Choi (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759816#comment-17759816
 ] 

Teddy Choi commented on HIVE-21213:
---

+1. LGTM.

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch, HIVE-21213.04.patch, HIVE-21213.05.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-26555) Read-only mode for Hive database

2023-01-02 Thread Teddy Choi (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653808#comment-17653808
 ] 

Teddy Choi edited comment on HIVE-26555 at 1/3/23 7:03 AM:
---

[~abstractdog], sorry for late reply.

It's assuming an [active-passive HA 
configuration|https://en.wikipedia.org/wiki/High-availability_cluster#Node_configurations]
 with reads on the passive. The active instance should be the single source of 
the truth, while the passive instance should follow it. However, the current 
Hive replication design allows the passive instance to diverge from the active 
instance. A data divergence between the active-passive instances is hard to 
detect and resolve. This read-only mode prevents the passive instance to change 
to avoid any unintended divergence.

References
 * Microsoft SQL Server: [Configure read-only access to a secondary replica of 
an Always On availability 
group|https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/configure-read-only-access-on-an-availability-replica-sql-server?view=sql-server-ver16]
 * Oracle Database: [High Availability Overview and Best Practices - Features 
for Maximizing 
Availability|https://docs.oracle.com/en/database/oracle/oracle-database/21/haovw/ha-features.html#GUID-314F15CE-BD8F-45B0-911E-B7FCC2B8006A]
 * IBM DB2: [Enabling reads on 
standby|https://www.ibm.com/docs/en/db2/11.5?topic=feature-enabling-reads-standby]

 


was (Author: teddy.choi):
[~abstractdog], sorry for late reply.

It's assuming an [active-passive HA 
configuration|https://en.wikipedia.org/wiki/High-availability_cluster#Node_configurations]
 with reads on the passive. The active instance should be the single source of 
the truth, while the passive instance should follow it. However, the current 
Hive replication design allows the passive instance to diverge from the active 
instance. A data divergence between the active-passive instances is hard to 
detect and resolve. This read-only mode prevents the passive instance to change 
to avoid any unintended divergence.

References
 * Microsoft SQL Server: [Configure read-only access to a secondary replica of 
an Always On availability 
group|https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/configure-read-only-access-on-an-availability-replica-sql-server?view=sql-server-ver16]
 * Oracle Database: [High Availability Overview and Best Practices | Features 
for Maximizing 
Availability|https://docs.oracle.com/en/database/oracle/oracle-database/21/haovw/ha-features.html#GUID-314F15CE-BD8F-45B0-911E-B7FCC2B8006A]
 * IBM DB2: [Enabling reads on 
standby|https://www.ibm.com/docs/en/db2/11.5?topic=feature-enabling-reads-standby]

 

> Read-only mode for Hive database
> 
>
> Key: HIVE-26555
> URL: https://issues.apache.org/jira/browse/HIVE-26555
> Project: Hive
>  Issue Type: New Feature
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> h1. Purpose
> In failover/fail-back scenarios, a Hive database needs to be read-only, while 
> other one is writable to keep a single source of truth.
> h1. User-Facing Changes
> Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext 
> interface. hive.exec.pre.hooks needs to have the class name to initiate an 
> instance. The "readonly" database property can be configured to turn it on 
> and off.
> h2. Allowed read operations
> All read operations without any data/metadata change are allowed.
>  * EXPLAIN
>  * USE(or SWITCHDATABASE)
>  * REPLDUMP
>  * REPLSTATUS
>  * EXPORT
>  * KILL_QUERY
>  * DESC prefix
>  * SHOW prefix
>  * QUERY with SELECT or EXPLAIN. INSERT, DELETE, UPDATE are disallowed.
> h2. Allowed write operations
> Most of write operations that change data/metadata are disallowed. There are 
> few allowed exceptions. The first one is alter database to make a database 
> writable. The second one is replication load to load a dumped database.
>  * ALTER DATABASE db_name SET DBPROPERTIES without "readonly"="true".
>  * REPLLOAD
> h1. Tests
>  * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT
>  * read_only_delete.q
>  * read_only_insert.q



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26555) Read-only mode for Hive database

2023-01-02 Thread Teddy Choi (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653808#comment-17653808
 ] 

Teddy Choi commented on HIVE-26555:
---

[~abstractdog], sorry for late reply.

It's assuming an [active-passive HA 
configuration|https://en.wikipedia.org/wiki/High-availability_cluster#Node_configurations]
 with reads on the passive. The active instance should be the single source of 
the truth, while the passive instance should follow it. However, the current 
Hive replication design allows the passive instance to diverge from the active 
instance. A data divergence between the active-passive instances is hard to 
detect and resolve. This read-only mode prevents the passive instance to change 
to avoid any unintended divergence.

References
 * Microsoft SQL Server: [Configure read-only access to a secondary replica of 
an Always On availability 
group|https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/configure-read-only-access-on-an-availability-replica-sql-server?view=sql-server-ver16]
 * Oracle Database: [High Availability Overview and Best Practices | Features 
for Maximizing 
Availability|https://docs.oracle.com/en/database/oracle/oracle-database/21/haovw/ha-features.html#GUID-314F15CE-BD8F-45B0-911E-B7FCC2B8006A]
 * IBM DB2: [Enabling reads on 
standby|https://www.ibm.com/docs/en/db2/11.5?topic=feature-enabling-reads-standby]

 

> Read-only mode for Hive database
> 
>
> Key: HIVE-26555
> URL: https://issues.apache.org/jira/browse/HIVE-26555
> Project: Hive
>  Issue Type: New Feature
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> h1. Purpose
> In failover/fail-back scenarios, a Hive database needs to be read-only, while 
> other one is writable to keep a single source of truth.
> h1. User-Facing Changes
> Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext 
> interface. hive.exec.pre.hooks needs to have the class name to initiate an 
> instance. The "readonly" database property can be configured to turn it on 
> and off.
> h2. Allowed read operations
> All read operations without any data/metadata change are allowed.
>  * EXPLAIN
>  * USE(or SWITCHDATABASE)
>  * REPLDUMP
>  * REPLSTATUS
>  * EXPORT
>  * KILL_QUERY
>  * DESC prefix
>  * SHOW prefix
>  * QUERY with SELECT or EXPLAIN. INSERT, DELETE, UPDATE are disallowed.
> h2. Allowed write operations
> Most of write operations that change data/metadata are disallowed. There are 
> few allowed exceptions. The first one is alter database to make a database 
> writable. The second one is replication load to load a dumped database.
>  * ALTER DATABASE db_name SET DBPROPERTIES without "readonly"="true".
>  * REPLLOAD
> h1. Tests
>  * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT
>  * read_only_delete.q
>  * read_only_insert.q



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-24933) Replication fails for transactional tables having same name as dropped non-transactional table

2022-11-14 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi resolved HIVE-24933.
---
Resolution: Fixed

Merged to master. Thank you very much.

> Replication fails for transactional tables having same name as dropped 
> non-transactional table
> --
>
> Key: HIVE-24933
> URL: https://issues.apache.org/jira/browse/HIVE-24933
> Project: Hive
>  Issue Type: Bug
>Reporter: Pratyush Madhukar
>Assignee: Pratyush Madhukar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26601) Fix NPE encountered in second load cycle of optimised bootstrap

2022-10-11 Thread Teddy Choi (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616133#comment-17616133
 ] 

Teddy Choi commented on HIVE-26601:
---

Hello [~vpatni], please update the status and create a corresponding PR. Thank 
you.

> Fix NPE encountered in second load cycle of optimised bootstrap 
> 
>
> Key: HIVE-26601
> URL: https://issues.apache.org/jira/browse/HIVE-26601
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Vinit Patni
>Priority: Blocker
>
> After creating reverse replication policy  after failover is completed from 
> Primary to DR cluster and DR takes over. First dump and load cycle of 
> optimised bootstrap is completing successfully, Second dump cycle on DR is 
> also completed which does selective bootstrap of tables that it read from 
> table_diff directory. However we observed issue with Second load cycle on 
> Primary Cluster side which is failing with following exception logs that 
> needs to be fixed.
> {code:java}
> [Scheduled Query Executor(schedule:repl_vinreverse, execution_id:421)]: 
> Exception while logging metrics 
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.hive.ql.parse.repl.metric.ReplicationMetricCollector.reportStageProgress(ReplicationMetricCollector.java:192)
>  ~[hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplStateLogWork.replStateLog(ReplStateLogWork.java:145)
>  ~[hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplStateLogTask.execute(ReplStateLogTask.java:39)
>  [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) 
> [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
> [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
> [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:749) 
> [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:504) 
> [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:498) 
> [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
> [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:232) 
> [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at 
> org.apache.hadoop.hive.ql.scheduled.ScheduledQueryExecutionService$ScheduledQueryExecutor.processQuery(ScheduledQueryExecutionService.java:240)
>  [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at 
> org.apache.hadoop.hive.ql.scheduled.ScheduledQueryExecutionService$ScheduledQueryExecutor.run(ScheduledQueryExecutionService.java:193)
>  [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_232]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [?:1.8.0_232]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_232]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_232]
>   at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26600) Handle failover during optimized bootstrap

2022-10-11 Thread Teddy Choi (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616132#comment-17616132
 ] 

Teddy Choi commented on HIVE-26600:
---

Hello [~rakshithc] , please update the status and create a corresponding PR. 
Thank you.

> Handle failover during optimized bootstrap
> --
>
> Key: HIVE-26600
> URL: https://issues.apache.org/jira/browse/HIVE-26600
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Rakshith C
>Priority: Blocker
>
> when the reverse policy is enabled from DR to PROD, there is a situation 
> wherein the user may initiate a failover from DR to PROD before the optimized 
> bootstrap is ever run.
> Current observations:
>  * Repl Dump will place a failover ready marker but failover metadata won't 
> be generated.
>  * Repl Load will throw an error since failover will be set to true but 
> failovermetadata is missing.
> Replication fails and we reach an undefined state.
> Fix :
>  * create failover ready marker only during second cycle of optimized 
> bootstrap if possible.
>  * since some tables may need to be bootstrapped, it may take upto 3 cycles 
> before failover from DR to PROD is complete.
>  * if no tables are modified, second dump from DR to PROD will be marked as 
> failover ready.
> Result:
>  * users can initiate a failover immediately after enabling reverse policy 
> without any hassles.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26599) Fix NPE encountered in second dump cycle of optimised bootstrap

2022-10-11 Thread Teddy Choi (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616131#comment-17616131
 ] 

Teddy Choi commented on HIVE-26599:
---

Hello [~vpatni], please update the status and create a corresponding PR. Thank 
you.

> Fix NPE encountered in second dump cycle of optimised bootstrap
> ---
>
> Key: HIVE-26599
> URL: https://issues.apache.org/jira/browse/HIVE-26599
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Vinit Patni
>Priority: Blocker
>
> After creating reverse replication policy  after failover is completed from 
> Primary to DR cluster and DR takes over. First dump and load cycle of 
> optimised bootstrap is completing successfully, But We are encountering Null 
> pointer exception in the second dump cycle which is halting this reverse 
> replication and major blocker to test complete cycle of replication. 
> {code:java}
> Scheduled Query Executor(schedule:repl_reverse, execution_id:14)]: FAILED: 
> Execution Error, return code -101 from 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask. 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.parse.repl.metric.ReplicationMetricCollector.reportStageProgress(ReplicationMetricCollector.java:192)
> at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.dumpTable(ReplDumpTask.java:1458)
> at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:961)
> at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:290)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:749)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:504)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:498)
> at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:232){code}
> After doing RCA, we figured out that In second dump cycle on DR cluster when 
> StageStart method is invoked by code,  metrics corresponding to Tables is not 
> being registered (which should be registered as we are doing selective 
> bootstrap of tables for optimise bootstrap along with incremental dump) which 
> is causing NPE when it is trying to update the progress corresponding to this 
> metric latter on after bootstrap of table is completed. 
> Fix is to register the Tables metric before updating the progress.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26606) Expose failover states in replication metrics

2022-10-11 Thread Teddy Choi (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616134#comment-17616134
 ] 

Teddy Choi commented on HIVE-26606:
---

Hello [~harshalk], please update the status and create a corresponding PR. 
Thank you.

> Expose failover states in replication metrics
> -
>
> Key: HIVE-26606
> URL: https://issues.apache.org/jira/browse/HIVE-26606
> Project: Hive
>  Issue Type: Improvement
>Reporter: Teddy Choi
>Assignee: Harshal Patel
>Priority: Major
>
> Expose the state of failover in replication metrics,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26597) Fix unsetting of db prop repl.target.for in ReplicationSemanticAnalyzer

2022-10-11 Thread Teddy Choi (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616129#comment-17616129
 ] 

Teddy Choi commented on HIVE-26597:
---

Hello [~rakshithc] , please update the status and create a corresponding PR. 
Thank you.

> Fix unsetting of db prop repl.target.for in ReplicationSemanticAnalyzer
> ---
>
> Key: HIVE-26597
> URL: https://issues.apache.org/jira/browse/HIVE-26597
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Rakshith C
>Priority: Major
>
> when repl policy is set from A -> B
>  * *repl.target.for* is set on B.
> when failover is initiated
>  * *repl.failover.endpoint* = *'TARGET'* is set on B.
>  
> now when reverse policy is set up from {*}A <- B{*};
> there is a check in 
> [ReplicationSemanticAnalyzer#initReplDump|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java#L196]
>  which checks for existence of these two properties and if they are set,
> it unsets the *repl.target.for* property.
> Because of this optimisedBootstrap won't be triggered because it checks for 
> the existence of *repl.target.for* property during repl dump on target 
> [HERE|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/OptimisedBootstrapUtils.java#L93].
>  
> Fix : remove the code which unsets repl.target.for in 
> ReplicationSemanticAnalyzer, because second dump cycle of optimized bootstrap 
> unsets it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26598) Fix unset db params for optimized bootstrap incase of data copy tasks run on target and testcases

2022-10-11 Thread Teddy Choi (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616130#comment-17616130
 ] 

Teddy Choi commented on HIVE-26598:
---

Hello [~rakshithc] , please update the status and create a corresponding PR. 
Thank you.

> Fix unset db params for optimized bootstrap incase of data copy tasks run on 
> target and testcases
> -
>
> Key: HIVE-26598
> URL: https://issues.apache.org/jira/browse/HIVE-26598
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Rakshith C
>Priority: Major
>
> when hive.repl.run.data.copy.tasks.on.target is set to false, repl dump task 
> will initiate the copy task from source cluster to staging directory.
> In current code flow repl dump task dumps the metadata and then creates 
> another repl dump task with datacopyIterators initialized.
> when the second dump cycle executes, it directly begins data copy tasks. 
> Because of this we don't enter second reverse dump flow and  
> unsetDbPropertiesForOptimisedBootstrap is never set to true again.
> this results in db params (repl.target.for, repl.background.threads, etc) not 
> being unset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-26607) Replace vectorization templates with overrides

2022-10-07 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi resolved HIVE-26607.
---
Resolution: Duplicate

> Replace vectorization templates with overrides
> --
>
> Key: HIVE-26607
> URL: https://issues.apache.org/jira/browse/HIVE-26607
> Project: Hive
>  Issue Type: Improvement
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>
> Replace vectorization templates with overrides.
> h1. Background
> There are many combinations among different data types, column/scalar types, 
> and operators in vectorization. It leaves a lot of code to implement. The 
> current Hive vectorization is implemented with a simple string template 
> engine for it. It replaces a  with a value at all places within each 
> template file.
> However, the template is written in a text file. It's not natively supported 
> by modern IDEs. Also any change on the template needs a separate Maven step 
> to generate actual code. It's time consuming.
> h1. Design
> The base abstract classes will respect Java's data type system. Each string 
> template will be divided into several sub data types, such as long-long, 
> long-double, double-long, double-double.
>  * ColumnArithmeticColumn.txt will be separated into
>  ** BaseLongColLongColumn.java
>  *** Add: long func(long a, long b) \{ return a + b; }
>  *** Subtract: long func(long a, long b) \{ return a - b; }
>  *** Multiply: long func(long a, long b) \{ return a * b; }
>  *** CheckedAdd: boolean supportsCheckedExecution() \{ return true; }
>  *** CheckedSubtract: boolean supportsCheckedExecution() \{ return true; }
>  *** CheckedMultiply: boolean supportsCheckedExecution() \{ return true; }
>  ** BaseLongColDoubleColumn.java
>  *** Add: double func(long a, double b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckMultiply
>  ** BaseDoubleColLongColumn.java
>  *** Add: double func(double a, long b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckMultiply
>  ** BaseDoubleColDoubleColumn.java
>  *** Add: double func(double a, double b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  * ColumnArithmeticScalar.txt
>  ** BaseLongColLongScalar.java
>  *** Add: long func(long a, long b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  ** BaseLongColDoubleScalar.java
>  *** Add: double func(long a, double b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  ** BaseDoubleColLongScalar.java
>  *** Add: double func(double a, long b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  ** BaseDoubleColDoubleColumn.java
>  *** Add: double func(double a, double b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-26604) Replace vectorization templates with overrides

2022-10-07 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi resolved HIVE-26604.
---
Resolution: Duplicate

> Replace vectorization templates with overrides
> --
>
> Key: HIVE-26604
> URL: https://issues.apache.org/jira/browse/HIVE-26604
> Project: Hive
>  Issue Type: Improvement
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>
> Replace vectorization templates with overrides.
> h1. Background
> There are many combinations among different data types, column/scalar types, 
> and operators in vectorization. It leaves a lot of code to implement. The 
> current Hive vectorization is implemented with a simple string template 
> engine for it. It replaces a  with a value at all places within each 
> template file.
> However, the template is written in a text file. It's not natively supported 
> by modern IDEs. Also any change on the template needs a separate Maven step 
> to generate actual code. It's time consuming.
> h1. Design
> The base abstract classes will respect Java's data type system. Each string 
> template will be divided into several sub data types, such as long-long, 
> long-double, double-long, double-double.
>  * ColumnArithmeticColumn.txt will be separated into
>  ** BaseLongColLongColumn.java
>  *** Add: long func(long a, long b) \{ return a + b; }
>  *** Subtract: long func(long a, long b) \{ return a - b; }
>  *** Multiply: long func(long a, long b) \{ return a * b; }
>  *** CheckedAdd: boolean supportsCheckedExecution() \{ return true; }
>  *** CheckedSubtract: boolean supportsCheckedExecution() \{ return true; }
>  *** CheckedMultiply: boolean supportsCheckedExecution() \{ return true; }
>  ** BaseLongColDoubleColumn.java
>  *** Add: double func(long a, double b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckMultiply
>  ** BaseDoubleColLongColumn.java
>  *** Add: double func(double a, long b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckMultiply
>  ** BaseDoubleColDoubleColumn.java
>  *** Add: double func(double a, double b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  * ColumnArithmeticScalar.txt
>  ** BaseLongColLongScalar.java
>  *** Add: long func(long a, long b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  ** BaseLongColDoubleScalar.java
>  *** Add: double func(long a, double b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  ** BaseDoubleColLongScalar.java
>  *** Add: double func(double a, long b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  ** BaseDoubleColDoubleColumn.java
>  *** Add: double func(double a, double b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26607) Replace vectorization templates with overrides

2022-10-07 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-26607:
-


> Replace vectorization templates with overrides
> --
>
> Key: HIVE-26607
> URL: https://issues.apache.org/jira/browse/HIVE-26607
> Project: Hive
>  Issue Type: Improvement
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>
> Replace vectorization templates with overrides.
> h1. Background
> There are many combinations among different data types, column/scalar types, 
> and operators in vectorization. It leaves a lot of code to implement. The 
> current Hive vectorization is implemented with a simple string template 
> engine for it. It replaces a  with a value at all places within each 
> template file.
> However, the template is written in a text file. It's not natively supported 
> by modern IDEs. Also any change on the template needs a separate Maven step 
> to generate actual code. It's time consuming.
> h1. Design
> The base abstract classes will respect Java's data type system. Each string 
> template will be divided into several sub data types, such as long-long, 
> long-double, double-long, double-double.
>  * ColumnArithmeticColumn.txt will be separated into
>  ** BaseLongColLongColumn.java
>  *** Add: long func(long a, long b) \{ return a + b; }
>  *** Subtract: long func(long a, long b) \{ return a - b; }
>  *** Multiply: long func(long a, long b) \{ return a * b; }
>  *** CheckedAdd: boolean supportsCheckedExecution() \{ return true; }
>  *** CheckedSubtract: boolean supportsCheckedExecution() \{ return true; }
>  *** CheckedMultiply: boolean supportsCheckedExecution() \{ return true; }
>  ** BaseLongColDoubleColumn.java
>  *** Add: double func(long a, double b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckMultiply
>  ** BaseDoubleColLongColumn.java
>  *** Add: double func(double a, long b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckMultiply
>  ** BaseDoubleColDoubleColumn.java
>  *** Add: double func(double a, double b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  * ColumnArithmeticScalar.txt
>  ** BaseLongColLongScalar.java
>  *** Add: long func(long a, long b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  ** BaseLongColDoubleScalar.java
>  *** Add: double func(long a, double b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  ** BaseDoubleColLongScalar.java
>  *** Add: double func(double a, long b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  ** BaseDoubleColDoubleColumn.java
>  *** Add: double func(double a, double b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26606) Expose failover states in replication metrics

2022-10-07 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-26606:
-

Assignee: Harshal Patel  (was: Teddy Choi)

> Expose failover states in replication metrics
> -
>
> Key: HIVE-26606
> URL: https://issues.apache.org/jira/browse/HIVE-26606
> Project: Hive
>  Issue Type: Improvement
>Reporter: Teddy Choi
>Assignee: Harshal Patel
>Priority: Major
>
> Expose the state of failover in replication metrics,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26606) Expose failover states in replication metrics

2022-10-07 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-26606:
-


> Expose failover states in replication metrics
> -
>
> Key: HIVE-26606
> URL: https://issues.apache.org/jira/browse/HIVE-26606
> Project: Hive
>  Issue Type: Improvement
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>
> Expose the state of failover in replication metrics,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26555) Read-only mode for Hive database

2022-10-07 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-26555:
--
Description: 
h1. Purpose

In failover/fail-back scenarios, a Hive database needs to be read-only, while 
other one is writable to keep a single source of truth.
h1. User-Facing Changes

Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext 
interface. hive.exec.pre.hooks needs to have the class name to initiate an 
instance. The "readonly" database property can be configured to turn it on and 
off.
h2. Allowed read operations

All read operations without any data/metadata change are allowed.
 * EXPLAIN
 * USE(or SWITCHDATABASE)
 * REPLDUMP
 * REPLSTATUS
 * EXPORT
 * KILL_QUERY
 * DESC prefix
 * SHOW prefix
 * QUERY with SELECT or EXPLAIN. INSERT, DELETE, UPDATE are disallowed.

h2. Allowed write operations

Most of write operations that change data/metadata are disallowed. There are 
few allowed exceptions. The first one is alter database to make a database 
writable. The second one is replication load to load a dumped database.
 * ALTER DATABASE db_name SET DBPROPERTIES without "readonly"="true".
 * REPLLOAD

h1. Tests
 * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT
 * read_only_delete.q
 * read_only_insert.q

  was:
h1. Purpose

In failover/fail-back scenarios, a Hive database needs to be read-only, while 
other one is writable to keep a single source of truth.
h1. Design

Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext 
interface. hive.exec.pre.hooks needs to have the class name to initiate an 
instance. The "readonly" database property can be configured to turn it on and 
off.

Allowed operations prefixes
 * EXPLAIN
 * USE(or SWITCHDATABASE)
 * REPLDUMP
 * REPLSTATUS
 * EXPORT
 * KILL_QUERY
 * DESC
 * SHOW

h1. Tests
 * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT
 * read_only_delete.q
 * read_only_insert.q


> Read-only mode for Hive database
> 
>
> Key: HIVE-26555
> URL: https://issues.apache.org/jira/browse/HIVE-26555
> Project: Hive
>  Issue Type: New Feature
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> h1. Purpose
> In failover/fail-back scenarios, a Hive database needs to be read-only, while 
> other one is writable to keep a single source of truth.
> h1. User-Facing Changes
> Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext 
> interface. hive.exec.pre.hooks needs to have the class name to initiate an 
> instance. The "readonly" database property can be configured to turn it on 
> and off.
> h2. Allowed read operations
> All read operations without any data/metadata change are allowed.
>  * EXPLAIN
>  * USE(or SWITCHDATABASE)
>  * REPLDUMP
>  * REPLSTATUS
>  * EXPORT
>  * KILL_QUERY
>  * DESC prefix
>  * SHOW prefix
>  * QUERY with SELECT or EXPLAIN. INSERT, DELETE, UPDATE are disallowed.
> h2. Allowed write operations
> Most of write operations that change data/metadata are disallowed. There are 
> few allowed exceptions. The first one is alter database to make a database 
> writable. The second one is replication load to load a dumped database.
>  * ALTER DATABASE db_name SET DBPROPERTIES without "readonly"="true".
>  * REPLLOAD
> h1. Tests
>  * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT
>  * read_only_delete.q
>  * read_only_insert.q



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26604) Replace vectorization templates with overrides

2022-10-07 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-26604:
-


> Replace vectorization templates with overrides
> --
>
> Key: HIVE-26604
> URL: https://issues.apache.org/jira/browse/HIVE-26604
> Project: Hive
>  Issue Type: Improvement
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>
> Replace vectorization templates with overrides.
> h1. Background
> There are many combinations among different data types, column/scalar types, 
> and operators in vectorization. It leaves a lot of code to implement. The 
> current Hive vectorization is implemented with a simple string template 
> engine for it. It replaces a  with a value at all places within each 
> template file.
> However, the template is written in a text file. It's not natively supported 
> by modern IDEs. Also any change on the template needs a separate Maven step 
> to generate actual code. It's time consuming.
> h1. Design
> The base abstract classes will respect Java's data type system. Each string 
> template will be divided into several sub data types, such as long-long, 
> long-double, double-long, double-double.
>  * ColumnArithmeticColumn.txt will be separated into
>  ** BaseLongColLongColumn.java
>  *** Add: long func(long a, long b) \{ return a + b; }
>  *** Subtract: long func(long a, long b) \{ return a - b; }
>  *** Multiply: long func(long a, long b) \{ return a * b; }
>  *** CheckedAdd: boolean supportsCheckedExecution() \{ return true; }
>  *** CheckedSubtract: boolean supportsCheckedExecution() \{ return true; }
>  *** CheckedMultiply: boolean supportsCheckedExecution() \{ return true; }
>  ** BaseLongColDoubleColumn.java
>  *** Add: double func(long a, double b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckMultiply
>  ** BaseDoubleColLongColumn.java
>  *** Add: double func(double a, long b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckMultiply
>  ** BaseDoubleColDoubleColumn.java
>  *** Add: double func(double a, double b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  * ColumnArithmeticScalar.txt
>  ** BaseLongColLongScalar.java
>  *** Add: long func(long a, long b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  ** BaseLongColDoubleScalar.java
>  *** Add: double func(long a, double b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  ** BaseDoubleColLongScalar.java
>  *** Add: double func(double a, long b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  ** BaseDoubleColDoubleColumn.java
>  *** Add: double func(double a, double b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26602) Replace vectorization templates with overrides

2022-10-06 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-26602:
-


> Replace vectorization templates with overrides
> --
>
> Key: HIVE-26602
> URL: https://issues.apache.org/jira/browse/HIVE-26602
> Project: Hive
>  Issue Type: Improvement
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>
> Replace vectorization templates with overrides.
> h1. Background
> There are many combinations among different data types, column/scalar types, 
> and operators in vectorization. It leaves a lot of code to implement. The 
> current Hive vectorization is implemented with a simple string template 
> engine for it. It replaces a  with a value at all places within each 
> template file.
> However, the template is written in a text file. It's not natively supported 
> by modern IDEs. Also any change on the template needs a separate Maven step 
> to generate actual code. It's time consuming.
> h1. Design
> The base abstract classes will respect Java's data type system. Each string 
> template will be divided into several sub data types, such as long-long, 
> long-double, double-long, double-double.
>  * ColumnArithmeticColumn.txt will be separated into
>  ** BaseLongColLongColumn.java
>  *** Add: long func(long a, long b) \{ return a + b; }
>  *** Subtract: long func(long a, long b) \{ return a - b; }
>  *** Multiply: long func(long a, long b) \{ return a * b; }
>  *** CheckedAdd: boolean supportsCheckedExecution() \{ return true; }
>  *** CheckedSubtract: boolean supportsCheckedExecution() \{ return true; }
>  *** CheckedMultiply: boolean supportsCheckedExecution() \{ return true; }
>  ** BaseLongColDoubleColumn.java
>  *** Add: double func(long a, double b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckMultiply
>  ** BaseDoubleColLongColumn.java
>  *** Add: double func(double a, long b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckMultiply
>  ** BaseDoubleColDoubleColumn.java
>  *** Add: double func(double a, double b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  * ColumnArithmeticScalar.txt
>  ** BaseLongColLongScalar.java
>  *** Add: long func(long a, long b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  ** BaseLongColDoubleScalar.java
>  *** Add: double func(long a, double b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  ** BaseDoubleColLongScalar.java
>  *** Add: double func(double a, long b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  ** BaseDoubleColDoubleColumn.java
>  *** Add: double func(double a, double b) \{ return a + b; }
>  *** Subtract, Multiply, CheckedAdd, CheckedSubtract, CheckedMultiply
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26601) Fix NPE encountered in second load cycle of optimised bootstrap

2022-10-06 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-26601:
-


> Fix NPE encountered in second load cycle of optimised bootstrap 
> 
>
> Key: HIVE-26601
> URL: https://issues.apache.org/jira/browse/HIVE-26601
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Blocker
>
> After creating reverse replication policy  after failover is completed from 
> Primary to DR cluster and DR takes over. First dump and load cycle of 
> optimised bootstrap is completing successfully, Second dump cycle on DR is 
> also completed which does selective bootstrap of tables that it read from 
> table_diff directory. However we observed issue with Second load cycle on 
> Primary Cluster side which is failing with following exception logs that 
> needs to be fixed.
> {code:java}
> [Scheduled Query Executor(schedule:repl_vinreverse, execution_id:421)]: 
> Exception while logging metrics 
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.hive.ql.parse.repl.metric.ReplicationMetricCollector.reportStageProgress(ReplicationMetricCollector.java:192)
>  ~[hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplStateLogWork.replStateLog(ReplStateLogWork.java:145)
>  ~[hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplStateLogTask.execute(ReplStateLogTask.java:39)
>  [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) 
> [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
> [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
> [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:749) 
> [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:504) 
> [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:498) 
> [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
> [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:232) 
> [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at 
> org.apache.hadoop.hive.ql.scheduled.ScheduledQueryExecutionService$ScheduledQueryExecutor.processQuery(ScheduledQueryExecutionService.java:240)
>  [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at 
> org.apache.hadoop.hive.ql.scheduled.ScheduledQueryExecutionService$ScheduledQueryExecutor.run(ScheduledQueryExecutionService.java:193)
>  [hive-exec-3.1.3000.7.1.8.0-801.jar:3.1.3000.7.1.8.0-801]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_232]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [?:1.8.0_232]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_232]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_232]
>   at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26600) Handle failover during optimized bootstrap

2022-10-06 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-26600:
-


> Handle failover during optimized bootstrap
> --
>
> Key: HIVE-26600
> URL: https://issues.apache.org/jira/browse/HIVE-26600
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Rakshith C
>Priority: Blocker
>
> when the reverse policy is enabled from DR to PROD, there is a situation 
> wherein the user may initiate a failover from DR to PROD before the optimized 
> bootstrap is ever run.
> Current observations:
>  * Repl Dump will place a failover ready marker but failover metadata won't 
> be generated.
>  * Repl Load will throw an error since failover will be set to true but 
> failovermetadata is missing.
> Replication fails and we reach an undefined state.
> Fix :
>  * create failover ready marker only during second cycle of optimized 
> bootstrap if possible.
>  * since some tables may need to be bootstrapped, it may take upto 3 cycles 
> before failover from DR to PROD is complete.
>  * if no tables are modified, second dump from DR to PROD will be marked as 
> failover ready.
> Result:
>  * users can initiate a failover immediately after enabling reverse policy 
> without any hassles.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26599) Fix NPE encountered in second dump cycle of optimised bootstrap

2022-10-06 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-26599:
-


> Fix NPE encountered in second dump cycle of optimised bootstrap
> ---
>
> Key: HIVE-26599
> URL: https://issues.apache.org/jira/browse/HIVE-26599
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Blocker
>
> After creating reverse replication policy  after failover is completed from 
> Primary to DR cluster and DR takes over. First dump and load cycle of 
> optimised bootstrap is completing successfully, But We are encountering Null 
> pointer exception in the second dump cycle which is halting this reverse 
> replication and major blocker to test complete cycle of replication. 
> {code:java}
> Scheduled Query Executor(schedule:repl_reverse, execution_id:14)]: FAILED: 
> Execution Error, return code -101 from 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask. 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.parse.repl.metric.ReplicationMetricCollector.reportStageProgress(ReplicationMetricCollector.java:192)
> at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.dumpTable(ReplDumpTask.java:1458)
> at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:961)
> at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:290)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:749)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:504)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:498)
> at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:232){code}
> After doing RCA, we figured out that In second dump cycle on DR cluster when 
> StageStart method is invoked by code,  metrics corresponding to Tables is not 
> being registered (which should be registered as we are doing selective 
> bootstrap of tables for optimise bootstrap along with incremental dump) which 
> is causing NPE when it is trying to update the progress corresponding to this 
> metric latter on after bootstrap of table is completed. 
> Fix is to register the Tables metric before updating the progress.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26598) Fix unset db params for optimized bootstrap incase of data copy tasks run on target and testcases

2022-10-06 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-26598:
-


> Fix unset db params for optimized bootstrap incase of data copy tasks run on 
> target and testcases
> -
>
> Key: HIVE-26598
> URL: https://issues.apache.org/jira/browse/HIVE-26598
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Rakshith C
>Priority: Major
>
> when hive.repl.run.data.copy.tasks.on.target is set to false, repl dump task 
> will initiate the copy task from source cluster to staging directory.
> In current code flow repl dump task dumps the metadata and then creates 
> another repl dump task with datacopyIterators initialized.
> when the second dump cycle executes, it directly begins data copy tasks. 
> Because of this we don't enter second reverse dump flow and  
> unsetDbPropertiesForOptimisedBootstrap is never set to true again.
> this results in db params (repl.target.for, repl.background.threads, etc) not 
> being unset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26597) Fix unsetting of db prop repl.target.for in ReplicationSemanticAnalyzer

2022-10-06 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-26597:
-


> Fix unsetting of db prop repl.target.for in ReplicationSemanticAnalyzer
> ---
>
> Key: HIVE-26597
> URL: https://issues.apache.org/jira/browse/HIVE-26597
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Rakshith C
>Priority: Major
>
> when repl policy is set from A -> B
>  * *repl.target.for* is set on B.
> when failover is initiated
>  * *repl.failover.endpoint* = *'TARGET'* is set on B.
>  
> now when reverse policy is set up from {*}A <- B{*};
> there is a check in 
> [ReplicationSemanticAnalyzer#initReplDump|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java#L196]
>  which checks for existence of these two properties and if they are set,
> it unsets the *repl.target.for* property.
> Because of this optimisedBootstrap won't be triggered because it checks for 
> the existence of *repl.target.for* property during repl dump on target 
> [HERE|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/OptimisedBootstrapUtils.java#L93].
>  
> Fix : remove the code which unsets repl.target.for in 
> ReplicationSemanticAnalyzer, because second dump cycle of optimized bootstrap 
> unsets it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26555) Read-only mode for Hive database

2022-10-04 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-26555:
--
Fix Version/s: 4.0.0-alpha-2

> Read-only mode for Hive database
> 
>
> Key: HIVE-26555
> URL: https://issues.apache.org/jira/browse/HIVE-26555
> Project: Hive
>  Issue Type: New Feature
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> h1. Purpose
> In failover/fail-back scenarios, a Hive database needs to be read-only, while 
> other one is writable to keep a single source of truth.
> h1. Design
> Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext 
> interface. hive.exec.pre.hooks needs to have the class name to initiate an 
> instance. The "readonly" database property can be configured to turn it on 
> and off.
> Allowed operations prefixes
>  * EXPLAIN
>  * USE(or SWITCHDATABASE)
>  * REPLDUMP
>  * REPLSTATUS
>  * EXPORT
>  * KILL_QUERY
>  * DESC
>  * SHOW
> h1. Tests
>  * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT
>  * read_only_delete.q
>  * read_only_insert.q



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26555) Read-only mode for Hive database

2022-09-25 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-26555:
--
Summary: Read-only mode for Hive database  (was: Read-only mode)

> Read-only mode for Hive database
> 
>
> Key: HIVE-26555
> URL: https://issues.apache.org/jira/browse/HIVE-26555
> Project: Hive
>  Issue Type: New Feature
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> h1. Purpose
> In failover/fail-back scenarios, a Hive database needs to be read-only, while 
> other one is writable to keep a single source of truth.
> h1. Design
> Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext 
> interface. hive.exec.pre.hooks needs to have the class name to initiate an 
> instance. The "readonly" database property can be configured to turn it on 
> and off.
> Allowed operations prefixes
>  * EXPLAIN
>  * USE(or SWITCHDATABASE)
>  * REPLDUMP
>  * REPLSTATUS
>  * EXPORT
>  * KILL_QUERY
>  * DESC
>  * SHOW
> h1. Tests
>  * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT
>  * read_only_delete.q
>  * read_only_insert.q



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26555) Read-only mode

2022-09-25 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-26555:
--
Status: Patch Available  (was: In Progress)

> Read-only mode
> --
>
> Key: HIVE-26555
> URL: https://issues.apache.org/jira/browse/HIVE-26555
> Project: Hive
>  Issue Type: New Feature
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> h1. Purpose
> In failover/fail-back scenarios, a Hive database needs to be read-only, while 
> other one is writable to keep a single source of truth.
> h1. Design
> Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext 
> interface. hive.exec.pre.hooks needs to have the class name to initiate an 
> instance. The "readonly" database property can be configured to turn it on 
> and off.
> Allowed operations prefixes
>  * EXPLAIN
>  * USE(or SWITCHDATABASE)
>  * REPLDUMP
>  * REPLSTATUS
>  * EXPORT
>  * KILL_QUERY
>  * DESC
>  * SHOW
> h1. Tests
>  * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT
>  * read_only_delete.q
>  * read_only_insert.q



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26555) Read-only mode

2022-09-25 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-26555:
--
Description: 
h1. Purpose

In failover/fail-back scenarios, a Hive database needs to be read-only, while 
other one is writable to keep a single source of truth.
h1. Design

Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext 
interface. hive.exec.pre.hooks needs to have the class name to initiate an 
instance. The "readonly" database property can be configured to turn it on and 
off.

Allowed operations prefixes
 * EXPLAIN
 * USE(or SWITCHDATABASE)
 * REPLDUMP
 * REPLSTATUS
 * EXPORT
 * KILL_QUERY
 * DESC
 * SHOW

h1. Tests
 * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT
 * read_only_delete.q
 * read_only_insert.q

  was:
h1. Purpose

In failover/fail-back scenarios, a Hive database needs to be read-only, while 
other one is writable to keep a single source of truth.
h1. Design

Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext 
interface. hive.exec.pre.hooks needs to have the class name to initiate an 
instance. The "readonly" database property can be configured to turn it on and 
off.

Allowed operations prefixes
 * EXPLAIN
 * USE(or SWITCHDATABASE)
 * REPLDUMP
 * REPLSTATUS
 * EXPORT
 * KILL_QUERY
 * DESC
 * SHOW

Tests
 * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT
 * read_only_delete.q
 * read_only_insert.q


> Read-only mode
> --
>
> Key: HIVE-26555
> URL: https://issues.apache.org/jira/browse/HIVE-26555
> Project: Hive
>  Issue Type: New Feature
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> h1. Purpose
> In failover/fail-back scenarios, a Hive database needs to be read-only, while 
> other one is writable to keep a single source of truth.
> h1. Design
> Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext 
> interface. hive.exec.pre.hooks needs to have the class name to initiate an 
> instance. The "readonly" database property can be configured to turn it on 
> and off.
> Allowed operations prefixes
>  * EXPLAIN
>  * USE(or SWITCHDATABASE)
>  * REPLDUMP
>  * REPLSTATUS
>  * EXPORT
>  * KILL_QUERY
>  * DESC
>  * SHOW
> h1. Tests
>  * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT
>  * read_only_delete.q
>  * read_only_insert.q



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26555) Read-only mode

2022-09-25 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-26555:
--
Description: 
h1. Purpose

In failover/fail-back scenarios, a Hive database needs to be read-only, while 
other one is writable to keep a single source of truth.
h1. Design

Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext 
interface. hive.exec.pre.hooks needs to have the class name to initiate an 
instance. The "readonly" database property can be configured to turn it on and 
off.

Allowed operations prefixes
 * EXPLAIN
 * USE(or SWITCHDATABASE)
 * REPLDUMP
 * REPLSTATUS
 * EXPORT
 * KILL_QUERY
 * DESC
 * SHOW

Tests
 * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT
 * read_only_delete.q
 * read_only_insert.q

  was:
h1. Purpose

In failover/fail-back scenarios, a Hive instance needs to be read-only, while 
other one is writable to keep a single source of truth.
h1. Design

EnforceReadOnlyHiveHook class can implement ExecuteWithHookContext interface. 
hive.exec.pre.hooks needs to have the class name to initiate an instance. 
"hive.enforce.readonly" can be configured to turn it on and off.
h2. Allowed operations prefixes
 * USE(or SWITCHDATABASE)
 * SELECT
 * DESC
 * DESCRIBE
 * SET
 * EXPLAIN
 * ROLLBACK
 * KILL
 * ABORT

h1. Tests
 * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT
 * read_only_hook_delete_failure.q
 * read_only_hook_insert_failure.q
 * read_only_hook_update_failure.q


> Read-only mode
> --
>
> Key: HIVE-26555
> URL: https://issues.apache.org/jira/browse/HIVE-26555
> Project: Hive
>  Issue Type: New Feature
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> h1. Purpose
> In failover/fail-back scenarios, a Hive database needs to be read-only, while 
> other one is writable to keep a single source of truth.
> h1. Design
> Yes. EnforceReadOnlyDatabaseHook class implements ExecuteWithHookContext 
> interface. hive.exec.pre.hooks needs to have the class name to initiate an 
> instance. The "readonly" database property can be configured to turn it on 
> and off.
> Allowed operations prefixes
>  * EXPLAIN
>  * USE(or SWITCHDATABASE)
>  * REPLDUMP
>  * REPLSTATUS
>  * EXPORT
>  * KILL_QUERY
>  * DESC
>  * SHOW
> Tests
>  * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT
>  * read_only_delete.q
>  * read_only_insert.q



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work started] (HIVE-26555) Read-only mode

2022-09-22 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26555 started by Teddy Choi.
-
> Read-only mode
> --
>
> Key: HIVE-26555
> URL: https://issues.apache.org/jira/browse/HIVE-26555
> Project: Hive
>  Issue Type: New Feature
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> h1. Purpose
> In failover/fail-back scenarios, a Hive instance needs to be read-only, while 
> other one is writable to keep a single source of truth.
> h1. Design
> EnforceReadOnlyHiveHook class can implement ExecuteWithHookContext interface. 
> hive.exec.pre.hooks needs to have the class name to initiate an instance. 
> "hive.enforce.readonly" can be configured to turn it on and off.
> h2. Allowed operations prefixes
>  * USE(or SWITCHDATABASE)
>  * SELECT
>  * DESC
>  * DESCRIBE
>  * SET
>  * EXPLAIN
>  * ROLLBACK
>  * KILL
>  * ABORT
> h1. Tests
>  * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT
>  * read_only_hook_delete_failure.q
>  * read_only_hook_insert_failure.q
>  * read_only_hook_update_failure.q



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26555) Read-only mode

2022-09-22 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-26555:
-


> Read-only mode
> --
>
> Key: HIVE-26555
> URL: https://issues.apache.org/jira/browse/HIVE-26555
> Project: Hive
>  Issue Type: New Feature
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
>
> h1. Purpose
> In failover/fail-back scenarios, a Hive instance needs to be read-only, while 
> other one is writable to keep a single source of truth.
> h1. Design
> EnforceReadOnlyHiveHook class can implement ExecuteWithHookContext interface. 
> hive.exec.pre.hooks needs to have the class name to initiate an instance. 
> "hive.enforce.readonly" can be configured to turn it on and off.
> h2. Allowed operations prefixes
>  * USE(or SWITCHDATABASE)
>  * SELECT
>  * DESC
>  * DESCRIBE
>  * SET
>  * EXPLAIN
>  * ROLLBACK
>  * KILL
>  * ABORT
> h1. Tests
>  * read_only_hook.q: USE, SHOW, DESC, DESCRIBE, EXPLAIN, SELECT
>  * read_only_hook_delete_failure.q
>  * read_only_hook_insert_failure.q
>  * read_only_hook_update_failure.q



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-25790) Make managed table copies handle updates (FileUtils)

2022-09-19 Thread Teddy Choi (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606858#comment-17606858
 ] 

Teddy Choi commented on HIVE-25790:
---

The Jenkins tests passed.

> Make managed table copies handle updates (FileUtils)
> 
>
> Key: HIVE-25790
> URL: https://issues.apache.org/jira/browse/HIVE-25790
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] (HIVE-25790) Make managed table copies handle updates (FileUtils)

2022-09-19 Thread Teddy Choi (Jira)



[ https://issues.apache.org/jira/browse/HIVE-25790 ]


Teddy Choi deleted comment on HIVE-25790:
---

was (Author: teddy.choi):
I created a pull request. Its third commit is running on the upstream Jenkins.

> Make managed table copies handle updates (FileUtils)
> 
>
> Key: HIVE-25790
> URL: https://issues.apache.org/jira/browse/HIVE-25790
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-25790) Make managed table copies handle updates (FileUtils)

2022-09-19 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-25790:
--
Status: Patch Available  (was: In Progress)

I created a pull request. Its third commit is running on the upstream Jenkins.

> Make managed table copies handle updates (FileUtils)
> 
>
> Key: HIVE-25790
> URL: https://issues.apache.org/jira/browse/HIVE-25790
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-25790) Make managed table copies handle updates (FileUtils)

2022-09-19 Thread Teddy Choi (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606546#comment-17606546
 ] 

Teddy Choi commented on HIVE-25790:
---

I made a pull request. There's its third commit running on Jenkins. It copies 
the only different files from a source path to a destination path. For existing 
directories and files, it skips full copies but updates modification times to 
represent that it's updated. It's most optimized for HDFS-HDFS replication 
scenarios with checksum, block size, and length comparisons.

> Make managed table copies handle updates (FileUtils)
> 
>
> Key: HIVE-25790
> URL: https://issues.apache.org/jira/browse/HIVE-25790
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work started] (HIVE-25790) Make managed table copies handle updates (FileUtils)

2022-09-07 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25790 started by Teddy Choi.
-
> Make managed table copies handle updates (FileUtils)
> 
>
> Key: HIVE-25790
> URL: https://issues.apache.org/jira/browse/HIVE-25790
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Teddy Choi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-25790) Make managed table copies handle updates (FileUtils)

2022-09-06 Thread Teddy Choi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-25790:
-

Assignee: Teddy Choi  (was: Haymant Mangla)

> Make managed table copies handle updates (FileUtils)
> 
>
> Key: HIVE-25790
> URL: https://issues.apache.org/jira/browse/HIVE-25790
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Teddy Choi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-1626) stop using java.util.Stack

2022-07-14 Thread Teddy Choi (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566850#comment-17566850
 ] 

Teddy Choi commented on HIVE-1626:
--

It's rewritten. A new pull request was created. It introduces an ArrayStack 
implementation for faster indexed accesses without synchronization.

> stop using java.util.Stack
> --
>
> Key: HIVE-1626
> URL: https://issues.apache.org/jira/browse/HIVE-1626
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-1626.2.patch, HIVE-1626.2.patch, HIVE-1626.3.patch, 
> HIVE-1626.3.patch, HIVE-1626.3.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We currently use Stack as part of the generic node walking library.  Stack 
> should not be used for this since its inheritance from Vector incurs 
> superfluous synchronization overhead.
> Most projects end up adding an ArrayStack implementation and using that 
> instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-21437) Vectorization: Decimal64 division with integer columns

2019-03-17 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21437:
--
Attachment: HIVE-21437.3.patch

> Vectorization: Decimal64 division with integer columns
> --
>
> Key: HIVE-21437
> URL: https://issues.apache.org/jira/browse/HIVE-21437
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21437.1.patch, HIVE-21437.2.patch, 
> HIVE-21437.3.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Vectorizer fails for
> {code}
> CREATE temporary TABLE `catalog_Sales`(
>   `cs_quantity` int, 
>   `cs_wholesale_cost` decimal(7,2), 
>   `cs_list_price` decimal(7,2), 
>   `cs_sales_price` decimal(7,2), 
>   `cs_ext_discount_amt` decimal(7,2), 
>   `cs_ext_sales_price` decimal(7,2), 
>   `cs_ext_wholesale_cost` decimal(7,2), 
>   `cs_ext_list_price` decimal(7,2), 
>   `cs_ext_tax` decimal(7,2), 
>   `cs_coupon_amt` decimal(7,2), 
>   `cs_ext_ship_cost` decimal(7,2), 
>   `cs_net_paid` decimal(7,2), 
>   `cs_net_paid_inc_tax` decimal(7,2), 
>   `cs_net_paid_inc_ship` decimal(7,2), 
>   `cs_net_paid_inc_ship_tax` decimal(7,2), 
>   `cs_net_profit` decimal(7,2))
>  ;
> explain vectorization detail select maxcs_ext_list_price - 
> cs_ext_wholesale_cost) - cs_ext_discount_amt) + cs_ext_sales_price) / 2) from 
> catalog_sales;
> {code}
> {code}
> 'Map Vectorization:'
> 'enabled: true'
> 'enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true'
> 'inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> 'notVectorizedReason: SELECT operator: Could not instantiate 
> DecimalColDivideDecimalScalar with arguments arguments: [21, 20, 22], 
> argument classes: [Integer, Integer, Integer], exception: 
> java.lang.IllegalArgumentException: java.lang.ClassCastException@63b56be0 
> stack trace: 
> sun.reflect.GeneratedConstructorAccessor.newInstance(Unknown 
> Source), 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45),
>  java.lang.reflect.Constructor.newInstance(Constructor.java:423), 
> org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.instantiateExpression(VectorizationContext.java:2088),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4662),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4602),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.vectorizeSelectOperator(Vectorizer.java:4584),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperator(Vectorizer.java:5171),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChild(Vectorizer.java:923),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChildren(Vectorizer.java:809),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperatorTree(Vectorizer.java:776),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.access$2400(Vectorizer.java:240),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:2038),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:1990),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(Vectorizer.java:1963),
>  ...'
> 'vectorized: false'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21437) Vectorization: Decimal64 division with integer columns

2019-03-16 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21437:
--
Attachment: HIVE-21437.2.patch

> Vectorization: Decimal64 division with integer columns
> --
>
> Key: HIVE-21437
> URL: https://issues.apache.org/jira/browse/HIVE-21437
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21437.1.patch, HIVE-21437.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Vectorizer fails for
> {code}
> CREATE temporary TABLE `catalog_Sales`(
>   `cs_quantity` int, 
>   `cs_wholesale_cost` decimal(7,2), 
>   `cs_list_price` decimal(7,2), 
>   `cs_sales_price` decimal(7,2), 
>   `cs_ext_discount_amt` decimal(7,2), 
>   `cs_ext_sales_price` decimal(7,2), 
>   `cs_ext_wholesale_cost` decimal(7,2), 
>   `cs_ext_list_price` decimal(7,2), 
>   `cs_ext_tax` decimal(7,2), 
>   `cs_coupon_amt` decimal(7,2), 
>   `cs_ext_ship_cost` decimal(7,2), 
>   `cs_net_paid` decimal(7,2), 
>   `cs_net_paid_inc_tax` decimal(7,2), 
>   `cs_net_paid_inc_ship` decimal(7,2), 
>   `cs_net_paid_inc_ship_tax` decimal(7,2), 
>   `cs_net_profit` decimal(7,2))
>  ;
> explain vectorization detail select maxcs_ext_list_price - 
> cs_ext_wholesale_cost) - cs_ext_discount_amt) + cs_ext_sales_price) / 2) from 
> catalog_sales;
> {code}
> {code}
> 'Map Vectorization:'
> 'enabled: true'
> 'enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true'
> 'inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> 'notVectorizedReason: SELECT operator: Could not instantiate 
> DecimalColDivideDecimalScalar with arguments arguments: [21, 20, 22], 
> argument classes: [Integer, Integer, Integer], exception: 
> java.lang.IllegalArgumentException: java.lang.ClassCastException@63b56be0 
> stack trace: 
> sun.reflect.GeneratedConstructorAccessor.newInstance(Unknown 
> Source), 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45),
>  java.lang.reflect.Constructor.newInstance(Constructor.java:423), 
> org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.instantiateExpression(VectorizationContext.java:2088),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4662),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4602),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.vectorizeSelectOperator(Vectorizer.java:4584),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperator(Vectorizer.java:5171),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChild(Vectorizer.java:923),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChildren(Vectorizer.java:809),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperatorTree(Vectorizer.java:776),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.access$2400(Vectorizer.java:240),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:2038),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:1990),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(Vectorizer.java:1963),
>  ...'
> 'vectorized: false'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21437) Vectorization: Decimal64 division with integer columns

2019-03-14 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21437:
--
Attachment: HIVE-21437.1.patch

> Vectorization: Decimal64 division with integer columns
> --
>
> Key: HIVE-21437
> URL: https://issues.apache.org/jira/browse/HIVE-21437
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-21437.1.patch
>
>
> Vectorizer fails for
> {code}
> CREATE temporary TABLE `catalog_Sales`(
>   `cs_quantity` int, 
>   `cs_wholesale_cost` decimal(7,2), 
>   `cs_list_price` decimal(7,2), 
>   `cs_sales_price` decimal(7,2), 
>   `cs_ext_discount_amt` decimal(7,2), 
>   `cs_ext_sales_price` decimal(7,2), 
>   `cs_ext_wholesale_cost` decimal(7,2), 
>   `cs_ext_list_price` decimal(7,2), 
>   `cs_ext_tax` decimal(7,2), 
>   `cs_coupon_amt` decimal(7,2), 
>   `cs_ext_ship_cost` decimal(7,2), 
>   `cs_net_paid` decimal(7,2), 
>   `cs_net_paid_inc_tax` decimal(7,2), 
>   `cs_net_paid_inc_ship` decimal(7,2), 
>   `cs_net_paid_inc_ship_tax` decimal(7,2), 
>   `cs_net_profit` decimal(7,2))
>  ;
> explain vectorization detail select maxcs_ext_list_price - 
> cs_ext_wholesale_cost) - cs_ext_discount_amt) + cs_ext_sales_price) / 2) from 
> catalog_sales;
> {code}
> {code}
> 'Map Vectorization:'
> 'enabled: true'
> 'enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true'
> 'inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> 'notVectorizedReason: SELECT operator: Could not instantiate 
> DecimalColDivideDecimalScalar with arguments arguments: [21, 20, 22], 
> argument classes: [Integer, Integer, Integer], exception: 
> java.lang.IllegalArgumentException: java.lang.ClassCastException@63b56be0 
> stack trace: 
> sun.reflect.GeneratedConstructorAccessor.newInstance(Unknown 
> Source), 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45),
>  java.lang.reflect.Constructor.newInstance(Constructor.java:423), 
> org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.instantiateExpression(VectorizationContext.java:2088),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4662),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4602),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.vectorizeSelectOperator(Vectorizer.java:4584),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperator(Vectorizer.java:5171),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChild(Vectorizer.java:923),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChildren(Vectorizer.java:809),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperatorTree(Vectorizer.java:776),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.access$2400(Vectorizer.java:240),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:2038),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:1990),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(Vectorizer.java:1963),
>  ...'
> 'vectorized: false'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21437) Vectorization: Decimal64 division with integer columns

2019-03-14 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21437:
--
Status: Patch Available  (was: Open)

> Vectorization: Decimal64 division with integer columns
> --
>
> Key: HIVE-21437
> URL: https://issues.apache.org/jira/browse/HIVE-21437
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>
> Vectorizer fails for
> {code}
> CREATE temporary TABLE `catalog_Sales`(
>   `cs_quantity` int, 
>   `cs_wholesale_cost` decimal(7,2), 
>   `cs_list_price` decimal(7,2), 
>   `cs_sales_price` decimal(7,2), 
>   `cs_ext_discount_amt` decimal(7,2), 
>   `cs_ext_sales_price` decimal(7,2), 
>   `cs_ext_wholesale_cost` decimal(7,2), 
>   `cs_ext_list_price` decimal(7,2), 
>   `cs_ext_tax` decimal(7,2), 
>   `cs_coupon_amt` decimal(7,2), 
>   `cs_ext_ship_cost` decimal(7,2), 
>   `cs_net_paid` decimal(7,2), 
>   `cs_net_paid_inc_tax` decimal(7,2), 
>   `cs_net_paid_inc_ship` decimal(7,2), 
>   `cs_net_paid_inc_ship_tax` decimal(7,2), 
>   `cs_net_profit` decimal(7,2))
>  ;
> explain vectorization detail select maxcs_ext_list_price - 
> cs_ext_wholesale_cost) - cs_ext_discount_amt) + cs_ext_sales_price) / 2) from 
> catalog_sales;
> {code}
> {code}
> 'Map Vectorization:'
> 'enabled: true'
> 'enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true'
> 'inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> 'notVectorizedReason: SELECT operator: Could not instantiate 
> DecimalColDivideDecimalScalar with arguments arguments: [21, 20, 22], 
> argument classes: [Integer, Integer, Integer], exception: 
> java.lang.IllegalArgumentException: java.lang.ClassCastException@63b56be0 
> stack trace: 
> sun.reflect.GeneratedConstructorAccessor.newInstance(Unknown 
> Source), 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45),
>  java.lang.reflect.Constructor.newInstance(Constructor.java:423), 
> org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.instantiateExpression(VectorizationContext.java:2088),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4662),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4602),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.vectorizeSelectOperator(Vectorizer.java:4584),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperator(Vectorizer.java:5171),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChild(Vectorizer.java:923),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChildren(Vectorizer.java:809),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperatorTree(Vectorizer.java:776),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.access$2400(Vectorizer.java:240),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:2038),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:1990),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(Vectorizer.java:1963),
>  ...'
> 'vectorized: false'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-21437) Vectorization: Decimal64 division with integer columns

2019-03-13 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-21437:
-

Assignee: Teddy Choi

> Vectorization: Decimal64 division with integer columns
> --
>
> Key: HIVE-21437
> URL: https://issues.apache.org/jira/browse/HIVE-21437
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>
> Vectorizer fails for
> {code}
> CREATE temporary TABLE `catalog_Sales`(
>   `cs_quantity` int, 
>   `cs_wholesale_cost` decimal(7,2), 
>   `cs_list_price` decimal(7,2), 
>   `cs_sales_price` decimal(7,2), 
>   `cs_ext_discount_amt` decimal(7,2), 
>   `cs_ext_sales_price` decimal(7,2), 
>   `cs_ext_wholesale_cost` decimal(7,2), 
>   `cs_ext_list_price` decimal(7,2), 
>   `cs_ext_tax` decimal(7,2), 
>   `cs_coupon_amt` decimal(7,2), 
>   `cs_ext_ship_cost` decimal(7,2), 
>   `cs_net_paid` decimal(7,2), 
>   `cs_net_paid_inc_tax` decimal(7,2), 
>   `cs_net_paid_inc_ship` decimal(7,2), 
>   `cs_net_paid_inc_ship_tax` decimal(7,2), 
>   `cs_net_profit` decimal(7,2))
>  ;
> explain vectorization detail select maxcs_ext_list_price - 
> cs_ext_wholesale_cost) - cs_ext_discount_amt) + cs_ext_sales_price) / 2) from 
> catalog_sales;
> {code}
> {code}
> 'Map Vectorization:'
> 'enabled: true'
> 'enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true'
> 'inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> 'notVectorizedReason: SELECT operator: Could not instantiate 
> DecimalColDivideDecimalScalar with arguments arguments: [21, 20, 22], 
> argument classes: [Integer, Integer, Integer], exception: 
> java.lang.IllegalArgumentException: java.lang.ClassCastException@63b56be0 
> stack trace: 
> sun.reflect.GeneratedConstructorAccessor.newInstance(Unknown 
> Source), 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45),
>  java.lang.reflect.Constructor.newInstance(Constructor.java:423), 
> org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.instantiateExpression(VectorizationContext.java:2088),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4662),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4602),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.vectorizeSelectOperator(Vectorizer.java:4584),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperator(Vectorizer.java:5171),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChild(Vectorizer.java:923),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChildren(Vectorizer.java:809),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperatorTree(Vectorizer.java:776),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.access$2400(Vectorizer.java:240),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:2038),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:1990),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(Vectorizer.java:1963),
>  ...'
> 'vectorized: false'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21368) Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion

2019-03-13 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21368:
--
   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

> Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion
> --
>
> Key: HIVE-21368
> URL: https://issues.apache.org/jira/browse/HIVE-21368
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21368.1.patch, HIVE-21368.2.patch
>
>
> Joins projecting Decimal64 have a suspicious cast in the inner loop
> {code}
> ConvertDecimal64ToDecimal(col 14:decimal(7,2)/DECIMAL_64) -> 24:decimal(7,2)'
> {code}
> {code}
> create temporary table foo(x int , y decimal(7,2));
> create temporary table bar(x int , y decimal(7,2));
> set hive.explain.user=false;
> explain vectorization detail select sum(foo.y) from foo, bar where foo.x = 
> bar.x;
> {code}
> {code}
> '  Map Join Operator'
> 'condition map:'
> ' Inner Join 0 to 1'
> 'keys:'
> '  0 _col0 (type: int)'
> '  1 _col0 (type: int)'
> 'Map Join Vectorization:'
> 'bigTableKeyColumnNums: [0]'
> 'bigTableRetainedColumnNums: [3]'
> 'bigTableValueColumnNums: [3]'
> 'bigTableValueExpressions: 
> ConvertDecimal64ToDecimal(col 1:decimal(7,2)/DECIMAL_64) -> 3:decimal(7,2)'
> 'className: VectorMapJoinInnerBigOnlyLongOperator'
> 'native: true'
> 'nativeConditionsMet: 
> hive.mapjoin.optimized.hashtable IS true, 
> hive.vectorized.execution.mapjoin.native.enabled IS true, 
> hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS 
> true, No nullsafe IS true, Small table vectorizes IS true, Fast Hash Table 
> and No Hybrid Hash Join IS true'
> 'projectedOutputColumnNums: [3]'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21368) Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion

2019-03-13 Thread Teddy Choi (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792346#comment-16792346
 ] 

Teddy Choi commented on HIVE-21368:
---

Committed to master. Thanks. [~gopalv]

> Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion
> --
>
> Key: HIVE-21368
> URL: https://issues.apache.org/jira/browse/HIVE-21368
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-21368.1.patch, HIVE-21368.2.patch
>
>
> Joins projecting Decimal64 have a suspicious cast in the inner loop
> {code}
> ConvertDecimal64ToDecimal(col 14:decimal(7,2)/DECIMAL_64) -> 24:decimal(7,2)'
> {code}
> {code}
> create temporary table foo(x int , y decimal(7,2));
> create temporary table bar(x int , y decimal(7,2));
> set hive.explain.user=false;
> explain vectorization detail select sum(foo.y) from foo, bar where foo.x = 
> bar.x;
> {code}
> {code}
> '  Map Join Operator'
> 'condition map:'
> ' Inner Join 0 to 1'
> 'keys:'
> '  0 _col0 (type: int)'
> '  1 _col0 (type: int)'
> 'Map Join Vectorization:'
> 'bigTableKeyColumnNums: [0]'
> 'bigTableRetainedColumnNums: [3]'
> 'bigTableValueColumnNums: [3]'
> 'bigTableValueExpressions: 
> ConvertDecimal64ToDecimal(col 1:decimal(7,2)/DECIMAL_64) -> 3:decimal(7,2)'
> 'className: VectorMapJoinInnerBigOnlyLongOperator'
> 'native: true'
> 'nativeConditionsMet: 
> hive.mapjoin.optimized.hashtable IS true, 
> hive.vectorized.execution.mapjoin.native.enabled IS true, 
> hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS 
> true, No nullsafe IS true, Small table vectorizes IS true, Fast Hash Table 
> and No Hybrid Hash Join IS true'
> 'projectedOutputColumnNums: [3]'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21368) Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion

2019-03-12 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21368:
--
Attachment: HIVE-21368.2.patch

> Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion
> --
>
> Key: HIVE-21368
> URL: https://issues.apache.org/jira/browse/HIVE-21368
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-21368.1.patch, HIVE-21368.2.patch
>
>
> Joins projecting Decimal64 have a suspicious cast in the inner loop
> {code}
> ConvertDecimal64ToDecimal(col 14:decimal(7,2)/DECIMAL_64) -> 24:decimal(7,2)'
> {code}
> {code}
> create temporary table foo(x int , y decimal(7,2));
> create temporary table bar(x int , y decimal(7,2));
> set hive.explain.user=false;
> explain vectorization detail select sum(foo.y) from foo, bar where foo.x = 
> bar.x;
> {code}
> {code}
> '  Map Join Operator'
> 'condition map:'
> ' Inner Join 0 to 1'
> 'keys:'
> '  0 _col0 (type: int)'
> '  1 _col0 (type: int)'
> 'Map Join Vectorization:'
> 'bigTableKeyColumnNums: [0]'
> 'bigTableRetainedColumnNums: [3]'
> 'bigTableValueColumnNums: [3]'
> 'bigTableValueExpressions: 
> ConvertDecimal64ToDecimal(col 1:decimal(7,2)/DECIMAL_64) -> 3:decimal(7,2)'
> 'className: VectorMapJoinInnerBigOnlyLongOperator'
> 'native: true'
> 'nativeConditionsMet: 
> hive.mapjoin.optimized.hashtable IS true, 
> hive.vectorized.execution.mapjoin.native.enabled IS true, 
> hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS 
> true, No nullsafe IS true, Small table vectorizes IS true, Fast Hash Table 
> and No Hybrid Hash Join IS true'
> 'projectedOutputColumnNums: [3]'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21368) Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion

2019-03-11 Thread Teddy Choi (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789239#comment-16789239
 ] 

Teddy Choi commented on HIVE-21368:
---

[~gopalv], thanks for pointing out. I made a patch for it.

> Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion
> --
>
> Key: HIVE-21368
> URL: https://issues.apache.org/jira/browse/HIVE-21368
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-21368.1.patch
>
>
> Joins projecting Decimal64 have a suspicious cast in the inner loop
> {code}
> ConvertDecimal64ToDecimal(col 14:decimal(7,2)/DECIMAL_64) -> 24:decimal(7,2)'
> {code}
> {code}
> create temporary table foo(x int , y decimal(7,2));
> create temporary table bar(x int , y decimal(7,2));
> set hive.explain.user=false;
> explain vectorization detail select sum(foo.y) from foo, bar where foo.x = 
> bar.x;
> {code}
> {code}
> '  Map Join Operator'
> 'condition map:'
> ' Inner Join 0 to 1'
> 'keys:'
> '  0 _col0 (type: int)'
> '  1 _col0 (type: int)'
> 'Map Join Vectorization:'
> 'bigTableKeyColumnNums: [0]'
> 'bigTableRetainedColumnNums: [3]'
> 'bigTableValueColumnNums: [3]'
> 'bigTableValueExpressions: 
> ConvertDecimal64ToDecimal(col 1:decimal(7,2)/DECIMAL_64) -> 3:decimal(7,2)'
> 'className: VectorMapJoinInnerBigOnlyLongOperator'
> 'native: true'
> 'nativeConditionsMet: 
> hive.mapjoin.optimized.hashtable IS true, 
> hive.vectorized.execution.mapjoin.native.enabled IS true, 
> hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS 
> true, No nullsafe IS true, Small table vectorizes IS true, Fast Hash Table 
> and No Hybrid Hash Join IS true'
> 'projectedOutputColumnNums: [3]'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21368) Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion

2019-03-11 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21368:
--
Attachment: HIVE-21368.1.patch

> Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion
> --
>
> Key: HIVE-21368
> URL: https://issues.apache.org/jira/browse/HIVE-21368
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-21368.1.patch
>
>
> Joins projecting Decimal64 have a suspicious cast in the inner loop
> {code}
> ConvertDecimal64ToDecimal(col 14:decimal(7,2)/DECIMAL_64) -> 24:decimal(7,2)'
> {code}
> {code}
> create temporary table foo(x int , y decimal(7,2));
> create temporary table bar(x int , y decimal(7,2));
> set hive.explain.user=false;
> explain vectorization detail select sum(foo.y) from foo, bar where foo.x = 
> bar.x;
> {code}
> {code}
> '  Map Join Operator'
> 'condition map:'
> ' Inner Join 0 to 1'
> 'keys:'
> '  0 _col0 (type: int)'
> '  1 _col0 (type: int)'
> 'Map Join Vectorization:'
> 'bigTableKeyColumnNums: [0]'
> 'bigTableRetainedColumnNums: [3]'
> 'bigTableValueColumnNums: [3]'
> 'bigTableValueExpressions: 
> ConvertDecimal64ToDecimal(col 1:decimal(7,2)/DECIMAL_64) -> 3:decimal(7,2)'
> 'className: VectorMapJoinInnerBigOnlyLongOperator'
> 'native: true'
> 'nativeConditionsMet: 
> hive.mapjoin.optimized.hashtable IS true, 
> hive.vectorized.execution.mapjoin.native.enabled IS true, 
> hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS 
> true, No nullsafe IS true, Small table vectorizes IS true, Fast Hash Table 
> and No Hybrid Hash Join IS true'
> 'projectedOutputColumnNums: [3]'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21368) Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion

2019-03-11 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21368:
--
Status: Patch Available  (was: Open)

> Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion
> --
>
> Key: HIVE-21368
> URL: https://issues.apache.org/jira/browse/HIVE-21368
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-21368.1.patch
>
>
> Joins projecting Decimal64 have a suspicious cast in the inner loop
> {code}
> ConvertDecimal64ToDecimal(col 14:decimal(7,2)/DECIMAL_64) -> 24:decimal(7,2)'
> {code}
> {code}
> create temporary table foo(x int , y decimal(7,2));
> create temporary table bar(x int , y decimal(7,2));
> set hive.explain.user=false;
> explain vectorization detail select sum(foo.y) from foo, bar where foo.x = 
> bar.x;
> {code}
> {code}
> '  Map Join Operator'
> 'condition map:'
> ' Inner Join 0 to 1'
> 'keys:'
> '  0 _col0 (type: int)'
> '  1 _col0 (type: int)'
> 'Map Join Vectorization:'
> 'bigTableKeyColumnNums: [0]'
> 'bigTableRetainedColumnNums: [3]'
> 'bigTableValueColumnNums: [3]'
> 'bigTableValueExpressions: 
> ConvertDecimal64ToDecimal(col 1:decimal(7,2)/DECIMAL_64) -> 3:decimal(7,2)'
> 'className: VectorMapJoinInnerBigOnlyLongOperator'
> 'native: true'
> 'nativeConditionsMet: 
> hive.mapjoin.optimized.hashtable IS true, 
> hive.vectorized.execution.mapjoin.native.enabled IS true, 
> hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS 
> true, No nullsafe IS true, Small table vectorizes IS true, Fast Hash Table 
> and No Hybrid Hash Join IS true'
> 'projectedOutputColumnNums: [3]'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-21368) Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion

2019-03-07 Thread Teddy Choi (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787503#comment-16787503
 ] 

Teddy Choi edited comment on HIVE-21368 at 3/8/19 3:57 AM:
---

I found a commit that reverts HIVE-20315. [According to 
Matt|https://issues.apache.org/jira/browse/HIVE-20315?focusedCommentId=16592355=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16592355],
 "Removed DECIMAL_64 conversion avoidance changes for GROUP BY / JOIN since 
they caused external test failures". It may take more than few simple changes.


was (Author: teddy.choi):
I found a commit that reverts HIVE-20315. [According to 
Matt|https://issues.apache.org/jira/browse/HIVE-20315?focusedCommentId=16592355=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16592355],
 the DECIMAL_64 to DECIMAL conversion was on purpose since they caused external 
test failures. It may be related with more tests and take more than few simple 
changes.

> Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion
> --
>
> Key: HIVE-21368
> URL: https://issues.apache.org/jira/browse/HIVE-21368
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>
> Joins projecting Decimal64 have a suspicious cast in the inner loop
> {code}
> ConvertDecimal64ToDecimal(col 14:decimal(7,2)/DECIMAL_64) -> 24:decimal(7,2)'
> {code}
> {code}
> create temporary table foo(x int , y decimal(7,2));
> create temporary table bar(x int , y decimal(7,2));
> set hive.explain.user=false;
> explain vectorization detail select sum(foo.y) from foo, bar where foo.x = 
> bar.x;
> {code}
> {code}
> '  Map Join Operator'
> 'condition map:'
> ' Inner Join 0 to 1'
> 'keys:'
> '  0 _col0 (type: int)'
> '  1 _col0 (type: int)'
> 'Map Join Vectorization:'
> 'bigTableKeyColumnNums: [0]'
> 'bigTableRetainedColumnNums: [3]'
> 'bigTableValueColumnNums: [3]'
> 'bigTableValueExpressions: 
> ConvertDecimal64ToDecimal(col 1:decimal(7,2)/DECIMAL_64) -> 3:decimal(7,2)'
> 'className: VectorMapJoinInnerBigOnlyLongOperator'
> 'native: true'
> 'nativeConditionsMet: 
> hive.mapjoin.optimized.hashtable IS true, 
> hive.vectorized.execution.mapjoin.native.enabled IS true, 
> hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS 
> true, No nullsafe IS true, Small table vectorizes IS true, Fast Hash Table 
> and No Hybrid Hash Join IS true'
> 'projectedOutputColumnNums: [3]'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21368) Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion

2019-03-07 Thread Teddy Choi (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787503#comment-16787503
 ] 

Teddy Choi commented on HIVE-21368:
---

I found a commit that reverts HIVE-20315. [According to 
Matt|https://issues.apache.org/jira/browse/HIVE-20315?focusedCommentId=16592355=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16592355],
 the DECIMAL_64 to DECIMAL conversion was on purpose since they caused external 
test failures. It may be related with more tests and take more than few simple 
changes.

> Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion
> --
>
> Key: HIVE-21368
> URL: https://issues.apache.org/jira/browse/HIVE-21368
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>
> Joins projecting Decimal64 have a suspicious cast in the inner loop
> {code}
> ConvertDecimal64ToDecimal(col 14:decimal(7,2)/DECIMAL_64) -> 24:decimal(7,2)'
> {code}
> {code}
> create temporary table foo(x int , y decimal(7,2));
> create temporary table bar(x int , y decimal(7,2));
> set hive.explain.user=false;
> explain vectorization detail select sum(foo.y) from foo, bar where foo.x = 
> bar.x;
> {code}
> {code}
> '  Map Join Operator'
> 'condition map:'
> ' Inner Join 0 to 1'
> 'keys:'
> '  0 _col0 (type: int)'
> '  1 _col0 (type: int)'
> 'Map Join Vectorization:'
> 'bigTableKeyColumnNums: [0]'
> 'bigTableRetainedColumnNums: [3]'
> 'bigTableValueColumnNums: [3]'
> 'bigTableValueExpressions: 
> ConvertDecimal64ToDecimal(col 1:decimal(7,2)/DECIMAL_64) -> 3:decimal(7,2)'
> 'className: VectorMapJoinInnerBigOnlyLongOperator'
> 'native: true'
> 'nativeConditionsMet: 
> hive.mapjoin.optimized.hashtable IS true, 
> hive.vectorized.execution.mapjoin.native.enabled IS true, 
> hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS 
> true, No nullsafe IS true, Small table vectorizes IS true, Fast Hash Table 
> and No Hybrid Hash Join IS true'
> 'projectedOutputColumnNums: [3]'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Issue Comment Deleted] (HIVE-21368) Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion

2019-03-07 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21368:
--
Comment: was deleted

(was: I found following code in Vectorizer.java. It was reverted in commit 
470ba3e2835ef769f940d013acbe6c05d9208903 by McMcline in 2018-08-16 to revert 
HIVE-20315. I don't know why it was reverted.

{code:java}
// For now, we don't support joins on or using DECIMAL_64.
VectorExpression[] allBigTableValueExpressions =
vContext.getVectorExpressionsUpConvertDecimal64(bigTableExprs);
{code})

> Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion
> --
>
> Key: HIVE-21368
> URL: https://issues.apache.org/jira/browse/HIVE-21368
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>
> Joins projecting Decimal64 have a suspicious cast in the inner loop
> {code}
> ConvertDecimal64ToDecimal(col 14:decimal(7,2)/DECIMAL_64) -> 24:decimal(7,2)'
> {code}
> {code}
> create temporary table foo(x int , y decimal(7,2));
> create temporary table bar(x int , y decimal(7,2));
> set hive.explain.user=false;
> explain vectorization detail select sum(foo.y) from foo, bar where foo.x = 
> bar.x;
> {code}
> {code}
> '  Map Join Operator'
> 'condition map:'
> ' Inner Join 0 to 1'
> 'keys:'
> '  0 _col0 (type: int)'
> '  1 _col0 (type: int)'
> 'Map Join Vectorization:'
> 'bigTableKeyColumnNums: [0]'
> 'bigTableRetainedColumnNums: [3]'
> 'bigTableValueColumnNums: [3]'
> 'bigTableValueExpressions: 
> ConvertDecimal64ToDecimal(col 1:decimal(7,2)/DECIMAL_64) -> 3:decimal(7,2)'
> 'className: VectorMapJoinInnerBigOnlyLongOperator'
> 'native: true'
> 'nativeConditionsMet: 
> hive.mapjoin.optimized.hashtable IS true, 
> hive.vectorized.execution.mapjoin.native.enabled IS true, 
> hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS 
> true, No nullsafe IS true, Small table vectorizes IS true, Fast Hash Table 
> and No Hybrid Hash Join IS true'
> 'projectedOutputColumnNums: [3]'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21368) Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion

2019-03-07 Thread Teddy Choi (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787498#comment-16787498
 ] 

Teddy Choi commented on HIVE-21368:
---

I found following code in Vectorizer.java. It was reverted in commit 
470ba3e2835ef769f940d013acbe6c05d9208903 by McMcline in 2018-08-16 to revert 
HIVE-20315. I don't know why it was reverted.

{code:java}
// For now, we don't support joins on or using DECIMAL_64.
VectorExpression[] allBigTableValueExpressions =
vContext.getVectorExpressionsUpConvertDecimal64(bigTableExprs);
{code}

> Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion
> --
>
> Key: HIVE-21368
> URL: https://issues.apache.org/jira/browse/HIVE-21368
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>
> Joins projecting Decimal64 have a suspicious cast in the inner loop
> {code}
> ConvertDecimal64ToDecimal(col 14:decimal(7,2)/DECIMAL_64) -> 24:decimal(7,2)'
> {code}
> {code}
> create temporary table foo(x int , y decimal(7,2));
> create temporary table bar(x int , y decimal(7,2));
> set hive.explain.user=false;
> explain vectorization detail select sum(foo.y) from foo, bar where foo.x = 
> bar.x;
> {code}
> {code}
> '  Map Join Operator'
> 'condition map:'
> ' Inner Join 0 to 1'
> 'keys:'
> '  0 _col0 (type: int)'
> '  1 _col0 (type: int)'
> 'Map Join Vectorization:'
> 'bigTableKeyColumnNums: [0]'
> 'bigTableRetainedColumnNums: [3]'
> 'bigTableValueColumnNums: [3]'
> 'bigTableValueExpressions: 
> ConvertDecimal64ToDecimal(col 1:decimal(7,2)/DECIMAL_64) -> 3:decimal(7,2)'
> 'className: VectorMapJoinInnerBigOnlyLongOperator'
> 'native: true'
> 'nativeConditionsMet: 
> hive.mapjoin.optimized.hashtable IS true, 
> hive.vectorized.execution.mapjoin.native.enabled IS true, 
> hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS 
> true, No nullsafe IS true, Small table vectorizes IS true, Fast Hash Table 
> and No Hybrid Hash Join IS true'
> 'projectedOutputColumnNums: [3]'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-21368) Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion

2019-03-05 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-21368:
-

Assignee: Teddy Choi

> Vectorization: Unnecessary Decimal64 -> HiveDecimal conversion
> --
>
> Key: HIVE-21368
> URL: https://issues.apache.org/jira/browse/HIVE-21368
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>
> Joins projecting Decimal64 have a suspicious cast in the inner loop
> {code}
> ConvertDecimal64ToDecimal(col 14:decimal(7,2)/DECIMAL_64) -> 24:decimal(7,2)'
> {code}
> {code}
> create temporary table foo(x int , y decimal(7,2));
> create temporary table bar(x int , y decimal(7,2));
> set hive.explain.user=false;
> explain vectorization detail select sum(foo.y) from foo, bar where foo.x = 
> bar.x;
> {code}
> {code}
> '  Map Join Operator'
> 'condition map:'
> ' Inner Join 0 to 1'
> 'keys:'
> '  0 _col0 (type: int)'
> '  1 _col0 (type: int)'
> 'Map Join Vectorization:'
> 'bigTableKeyColumnNums: [0]'
> 'bigTableRetainedColumnNums: [3]'
> 'bigTableValueColumnNums: [3]'
> 'bigTableValueExpressions: 
> ConvertDecimal64ToDecimal(col 1:decimal(7,2)/DECIMAL_64) -> 3:decimal(7,2)'
> 'className: VectorMapJoinInnerBigOnlyLongOperator'
> 'native: true'
> 'nativeConditionsMet: 
> hive.mapjoin.optimized.hashtable IS true, 
> hive.vectorized.execution.mapjoin.native.enabled IS true, 
> hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS 
> true, No nullsafe IS true, Small table vectorizes IS true, Fast Hash Table 
> and No Hybrid Hash Join IS true'
> 'projectedOutputColumnNums: [3]'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21294) Vectorization: 1-reducer Shuffle can skip the object hash functions

2019-02-27 Thread Teddy Choi (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779908#comment-16779908
 ] 

Teddy Choi commented on HIVE-21294:
---

[~gopalv], I fixed the differences in murmur_hash_migration.q.out. 
TestObjectStore failures seem unrelated. I tested them on my laptop and there 
were no errors. Will it be okay to push it to master?

> Vectorization: 1-reducer Shuffle can skip the object hash functions
> ---
>
> Key: HIVE-21294
> URL: https://issues.apache.org/jira/browse/HIVE-21294
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21294.2.patch, HIVE-21294.3.patch, 
> HIVE-21294.4.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> VectorReduceSinkObjectHashOperator can skip the object hashing entirely if 
> the reducer count = 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21294) Vectorization: 1-reducer Shuffle can skip the object hash functions

2019-02-27 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21294:
--
Attachment: HIVE-21294.4.patch

> Vectorization: 1-reducer Shuffle can skip the object hash functions
> ---
>
> Key: HIVE-21294
> URL: https://issues.apache.org/jira/browse/HIVE-21294
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21294.2.patch, HIVE-21294.3.patch, 
> HIVE-21294.4.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> VectorReduceSinkObjectHashOperator can skip the object hashing entirely if 
> the reducer count = 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21294) Vectorization: 1-reducer Shuffle can skip the object hash functions

2019-02-26 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21294:
--
Attachment: HIVE-21294.3.patch

> Vectorization: 1-reducer Shuffle can skip the object hash functions
> ---
>
> Key: HIVE-21294
> URL: https://issues.apache.org/jira/browse/HIVE-21294
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21294.2.patch, HIVE-21294.3.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> VectorReduceSinkObjectHashOperator can skip the object hashing entirely if 
> the reducer count = 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21294) Vectorization: 1-reducer Shuffle can skip the object hash functions

2019-02-22 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21294:
--
Description: VectorReduceSinkObjectHashOperator can skip the object hashing 
entirely if the reducer count = 1.  (was: VectorObjectSinkHashOperator can skip 
the object hashing entirely if the reducer count = 1.)

> Vectorization: 1-reducer Shuffle can skip the object hash functions
> ---
>
> Key: HIVE-21294
> URL: https://issues.apache.org/jira/browse/HIVE-21294
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21294.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> VectorReduceSinkObjectHashOperator can skip the object hashing entirely if 
> the reducer count = 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21294) Vectorization: 1-reducer Shuffle can skip the object hash functions

2019-02-22 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21294:
--
Attachment: HIVE-21294.2.patch

> Vectorization: 1-reducer Shuffle can skip the object hash functions
> ---
>
> Key: HIVE-21294
> URL: https://issues.apache.org/jira/browse/HIVE-21294
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21294.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> VectorReduceSinkObjectHashOperator can skip the object hashing entirely if 
> the reducer count = 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21294) Vectorization: 1-reducer Shuffle can skip the object hash functions

2019-02-22 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21294:
--
Attachment: (was: HIVE-21294.1.patch)

> Vectorization: 1-reducer Shuffle can skip the object hash functions
> ---
>
> Key: HIVE-21294
> URL: https://issues.apache.org/jira/browse/HIVE-21294
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> VectorReduceSinkObjectHashOperator can skip the object hashing entirely if 
> the reducer count = 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21294) Vectorization: 1-reducer Shuffle can skip the object hash functions

2019-02-21 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21294:
--
Attachment: HIVE-21294.1.patch

> Vectorization: 1-reducer Shuffle can skip the object hash functions
> ---
>
> Key: HIVE-21294
> URL: https://issues.apache.org/jira/browse/HIVE-21294
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-21294.1.patch
>
>
> VectorObjectSinkHashOperator can skip the object hashing entirely if the 
> reducer count = 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21294) Vectorization: 1-reducer Shuffle can skip the object hash functions

2019-02-21 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21294:
--
Status: Patch Available  (was: Open)

> Vectorization: 1-reducer Shuffle can skip the object hash functions
> ---
>
> Key: HIVE-21294
> URL: https://issues.apache.org/jira/browse/HIVE-21294
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-21294.1.patch
>
>
> VectorObjectSinkHashOperator can skip the object hashing entirely if the 
> reducer count = 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-21294) Vectorization: 1-reducer Shuffle can skip the object hash functions

2019-02-20 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-21294:
-

Assignee: Teddy Choi

> Vectorization: 1-reducer Shuffle can skip the object hash functions
> ---
>
> Key: HIVE-21294
> URL: https://issues.apache.org/jira/browse/HIVE-21294
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>
> VectorObjectSinkHashOperator can skip the object hashing entirely if the 
> reducer count = 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21294) Vectorization: 1-reducer Shuffle can skip the object hash functions

2019-02-20 Thread Teddy Choi (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773693#comment-16773693
 ] 

Teddy Choi commented on HIVE-21294:
---

I guess you meant VectorReduceSinkObjectHashOperator.

> Vectorization: 1-reducer Shuffle can skip the object hash functions
> ---
>
> Key: HIVE-21294
> URL: https://issues.apache.org/jira/browse/HIVE-21294
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>
> VectorObjectSinkHashOperator can skip the object hashing entirely if the 
> reducer count = 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21257) Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in Hive 3+

2019-02-14 Thread Teddy Choi (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767946#comment-16767946
 ] 

Teddy Choi commented on HIVE-21257:
---

I tested this issue in Hive 3+ and I found that it only occurs in Hive 2. I 
will close this issue.

> Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in 
> Hive 3+
> --
>
> Key: HIVE-21257
> URL: https://issues.apache.org/jira/browse/HIVE-21257
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 3.1.1
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>
> After HIVE-19951 is fixed, there still are some cases that vectorized length 
> UDF doesn't respect the max length of VARCHAR or CHAR. StringLength has an 
> internal bug. Moreover, it's hard to get input data type details in Hive 2, 
> unlike Hive 3. So separate both implementation to keep code clean in Hive 3 
> while the changes minimal in Hive 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (HIVE-21257) Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in Hive 3+

2019-02-13 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi resolved HIVE-21257.
---
Resolution: Not A Problem

> Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in 
> Hive 3+
> --
>
> Key: HIVE-21257
> URL: https://issues.apache.org/jira/browse/HIVE-21257
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 3.1.1
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>
> After HIVE-19951 is fixed, there still are some cases that vectorized length 
> UDF doesn't respect the max length of VARCHAR or CHAR. StringLength has an 
> internal bug. Moreover, it's hard to get input data type details in Hive 2, 
> unlike Hive 3. So separate both implementation to keep code clean in Hive 3 
> while the changes minimal in Hive 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21257) Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in Hive 3+

2019-02-13 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21257:
--
Status: Open  (was: Patch Available)

> Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in 
> Hive 3+
> --
>
> Key: HIVE-21257
> URL: https://issues.apache.org/jira/browse/HIVE-21257
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.1, 4.0.0
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>
> After HIVE-19951 is fixed, there still are some cases that vectorized length 
> UDF doesn't respect the max length of VARCHAR or CHAR. StringLength has an 
> internal bug. Moreover, it's hard to get input data type details in Hive 2, 
> unlike Hive 3. So separate both implementation to keep code clean in Hive 3 
> while the changes minimal in Hive 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21256) Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in Hive 2

2019-02-13 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21256:
--
Status: Patch Available  (was: Open)

> Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in 
> Hive 2
> -
>
> Key: HIVE-21256
> URL: https://issues.apache.org/jira/browse/HIVE-21256
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.4
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-21256.2.branch-2.patch
>
>
> After HIVE-19951 is fixed, there still are some cases that vectorized length 
> UDF doesn't respect the max length of VARCHAR or CHAR. StringLength has an 
> internal bug. Moreover, it's hard to get input data type details in Hive 2, 
> unlike Hive 3. So separate both implementation to keep code clean in Hive 3 
> while the changes minimal in Hive 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21256) Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in Hive 2

2019-02-13 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21256:
--
Attachment: HIVE-21256.2.branch-2.patch

> Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in 
> Hive 2
> -
>
> Key: HIVE-21256
> URL: https://issues.apache.org/jira/browse/HIVE-21256
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.4
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-21256.2.branch-2.patch
>
>
> After HIVE-19951 is fixed, there still are some cases that vectorized length 
> UDF doesn't respect the max length of VARCHAR or CHAR. StringLength has an 
> internal bug. Moreover, it's hard to get input data type details in Hive 2, 
> unlike Hive 3. So separate both implementation to keep code clean in Hive 3 
> while the changes minimal in Hive 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21257) Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in Hive 3+

2019-02-13 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21257:
--
Attachment: (was: HIVE-21256.1.branch-2.patch)

> Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in 
> Hive 3+
> --
>
> Key: HIVE-21257
> URL: https://issues.apache.org/jira/browse/HIVE-21257
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 3.1.1
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>
> After HIVE-19951 is fixed, there still are some cases that vectorized length 
> UDF doesn't respect the max length of VARCHAR or CHAR. StringLength has an 
> internal bug. Moreover, it's hard to get input data type details in Hive 2, 
> unlike Hive 3. So separate both implementation to keep code clean in Hive 3 
> while the changes minimal in Hive 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21257) Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in Hive 3+

2019-02-13 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21257:
--
Status: Patch Available  (was: Open)

> Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in 
> Hive 3+
> --
>
> Key: HIVE-21257
> URL: https://issues.apache.org/jira/browse/HIVE-21257
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.1, 4.0.0
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-21256.1.branch-2.patch
>
>
> After HIVE-19951 is fixed, there still are some cases that vectorized length 
> UDF doesn't respect the max length of VARCHAR or CHAR. StringLength has an 
> internal bug. Moreover, it's hard to get input data type details in Hive 2, 
> unlike Hive 3. So separate both implementation to keep code clean in Hive 3 
> while the changes minimal in Hive 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21257) Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in Hive 3+

2019-02-13 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21257:
--
Attachment: HIVE-21256.1.branch-2.patch

> Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in 
> Hive 3+
> --
>
> Key: HIVE-21257
> URL: https://issues.apache.org/jira/browse/HIVE-21257
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 3.1.1
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-21256.1.branch-2.patch
>
>
> After HIVE-19951 is fixed, there still are some cases that vectorized length 
> UDF doesn't respect the max length of VARCHAR or CHAR. StringLength has an 
> internal bug. Moreover, it's hard to get input data type details in Hive 2, 
> unlike Hive 3. So separate both implementation to keep code clean in Hive 3 
> while the changes minimal in Hive 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-21257) Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in Hive 3+

2019-02-12 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-21257:
-


> Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in 
> Hive 3+
> --
>
> Key: HIVE-21257
> URL: https://issues.apache.org/jira/browse/HIVE-21257
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.1, 4.0.0
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>
> After HIVE-19951 is fixed, there still are some cases that vectorized length 
> UDF doesn't respect the max length of VARCHAR or CHAR. StringLength has an 
> internal bug. Moreover, it's hard to get input data type details in Hive 2, 
> unlike Hive 3. So separate both implementation to keep code clean in Hive 3 
> while the changes minimal in Hive 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-21256) Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in Hive 2

2019-02-12 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-21256:
-


> Vectorized LENGTH UDF doesn't respect the max length of VARCHAR or CHAR in 
> Hive 2
> -
>
> Key: HIVE-21256
> URL: https://issues.apache.org/jira/browse/HIVE-21256
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.4
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>
> After HIVE-19951 is fixed, there still are some cases that vectorized length 
> UDF doesn't respect the max length of VARCHAR or CHAR. StringLength has an 
> internal bug. Moreover, it's hard to get input data type details in Hive 2, 
> unlike Hive 3. So separate both implementation to keep code clean in Hive 3 
> while the changes minimal in Hive 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21126) Allow session level queries in LlapBaseInputFormat#getSplits() before actual get_splits() call

2019-01-24 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21126:
--
Fix Version/s: 3.2.0
   4.0.0

> Allow session level queries in LlapBaseInputFormat#getSplits() before actual 
> get_splits() call
> --
>
> Key: HIVE-21126
> URL: https://issues.apache.org/jira/browse/HIVE-21126
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 3.1.1
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-21126.1.patch, HIVE-21126.2.patch, 
> HIVE-21126.3.patch
>
>
> Facilitate execution of session level queries before \{{select get_splits()}} 
> call. This will allow us to set params like \{{tez.grouping.split-count}} 
> which can be taken into consideration while splits calculation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21126) Allow session level queries in LlapBaseInputFormat#getSplits() before actual get_splits() call

2019-01-24 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21126:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Allow session level queries in LlapBaseInputFormat#getSplits() before actual 
> get_splits() call
> --
>
> Key: HIVE-21126
> URL: https://issues.apache.org/jira/browse/HIVE-21126
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 3.1.1
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-21126.1.patch, HIVE-21126.2.patch, 
> HIVE-21126.3.patch
>
>
> Facilitate execution of session level queries before \{{select get_splits()}} 
> call. This will allow us to set params like \{{tez.grouping.split-count}} 
> which can be taken into consideration while splits calculation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-21163) ParseUtils.parseQueryAndGetSchema fails on views with global limit

2019-01-24 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-21163:
-

Assignee: Teddy Choi

> ParseUtils.parseQueryAndGetSchema fails on views with global limit
> --
>
> Key: HIVE-21163
> URL: https://issues.apache.org/jira/browse/HIVE-21163
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Teddy Choi
>Priority: Major
>
> {code:java}
> hive> USE tpcds_bin_partitioned_orc_1000;
> hive> CREATE VIEW profit_view AS SELECT ss_net_profit, d_date FROM 
> store_sales, date_dim WHERE d_date = ss_sold_date LIMIT 100;
> hive> SELECT get_splits("SELECT * from profit_view", 0);
> Error: java.io.IOException: 
> org.apache.hadoop.hive.ql.parse.SemanticException: View profit_view is 
> corresponding to HiveSortLimit#3447, rather than a HiveProject. 
> (state=,code=0)
> {code}
> This works fine if the view doesn't have a global limit. 
> It also works fine if you define a view without a global limit, and then 
> apply a limit on top of the view. 
> {{Calcite.genLogicalPlan}} is expecting a {{HiveProject}} root but when going 
> through {{ParseUtils.parseQueryAndGetSchema}} the {{HiveSortLimit}} appears 
> at the root. Perhaps it is simply missing a step to wrap the limit with a 
> projection?
> {code}
> Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: View 
> profit_view is corresponding to HiveSortLimit#2275, rather than a HiveProject.
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:4931)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1741)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1689)
>   at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118)
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1043)
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154)
>   at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1448)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genLogicalPlan(CalcitePlanner.java:395)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.parseQueryAndGetSchema(ParseUtils.java:561)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.createPlanFragment(GenericUDTFGetSplits.java:254)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21091) Arrow serializer sets null at wrong index

2019-01-22 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21091:
--
Attachment: HIVE-21091.3.patch

> Arrow serializer sets null at wrong index
> -
>
> Key: HIVE-21091
> URL: https://issues.apache.org/jira/browse/HIVE-21091
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21091.1.patch, HIVE-21091.2.patch, 
> HIVE-21091.3.patch, HIVE-21091.3.patch
>
>
> Arrow serializer sets null at wrong index



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key

2019-01-21 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-20419:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Vectorization: Prevent mutation of VectorPartitionDesc after being used in a 
> hashmap key
> 
>
> Key: HIVE-20419
> URL: https://issues.apache.org/jira/browse/HIVE-20419
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20419.1.patch, HIVE-20419.2.patch, 
> HIVE-20419.4.patch
>
>
> This is going into the loop because the VectorPartitionDesc is modified after 
> it is used in the HashMap key - resulting in a hashcode & equals modification 
> after it has been placed in the hashmap.
> {code}
> HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: 
> 621ms
> java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869  <7 
> recursive calls>
> java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, 
> Object) HashMap.java:1989
> java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 
> HashMap.java:637
> java.util.HashMap.put(Object, Object) HashMap.java:611
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc,
>  VectorPartitionDesc, Map) Vectorizer.java:1272
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc,
>  boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork,
>  String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) 
> Vectorizer.java:1654
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork,
>  Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork,
>  boolean) Vectorizer.java:1109
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node,
>  Stack, Object[]) Vectorizer.java:961
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, 
> TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) 
> TaskGraphWalker.java:180
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, 
> HashMap) TaskGraphWalker.java:125
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext)
>  Vectorizer.java:2442
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, 
> ParseContext, Context) TezCompiler.java:717
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, 
> HashSet, HashSet) TaskCompiler.java:258
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, 
> SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) 
> CalcitePlanner.java:358
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key

2019-01-21 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-20419:
--
Attachment: (was: BUG-116953.4.patch)

> Vectorization: Prevent mutation of VectorPartitionDesc after being used in a 
> hashmap key
> 
>
> Key: HIVE-20419
> URL: https://issues.apache.org/jira/browse/HIVE-20419
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20419.1.patch, HIVE-20419.2.patch, 
> HIVE-20419.4.patch
>
>
> This is going into the loop because the VectorPartitionDesc is modified after 
> it is used in the HashMap key - resulting in a hashcode & equals modification 
> after it has been placed in the hashmap.
> {code}
> HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: 
> 621ms
> java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869  <7 
> recursive calls>
> java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, 
> Object) HashMap.java:1989
> java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 
> HashMap.java:637
> java.util.HashMap.put(Object, Object) HashMap.java:611
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc,
>  VectorPartitionDesc, Map) Vectorizer.java:1272
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc,
>  boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork,
>  String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) 
> Vectorizer.java:1654
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork,
>  Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork,
>  boolean) Vectorizer.java:1109
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node,
>  Stack, Object[]) Vectorizer.java:961
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, 
> TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) 
> TaskGraphWalker.java:180
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, 
> HashMap) TaskGraphWalker.java:125
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext)
>  Vectorizer.java:2442
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, 
> ParseContext, Context) TezCompiler.java:717
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, 
> HashSet, HashSet) TaskCompiler.java:258
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, 
> SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) 
> CalcitePlanner.java:358
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key

2019-01-21 Thread Teddy Choi (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748416#comment-16748416
 ] 

Teddy Choi commented on HIVE-20419:
---

Revised and pushed to master. Thanks [~gopalv].

> Vectorization: Prevent mutation of VectorPartitionDesc after being used in a 
> hashmap key
> 
>
> Key: HIVE-20419
> URL: https://issues.apache.org/jira/browse/HIVE-20419
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20419.1.patch, HIVE-20419.2.patch, 
> HIVE-20419.4.patch
>
>
> This is going into the loop because the VectorPartitionDesc is modified after 
> it is used in the HashMap key - resulting in a hashcode & equals modification 
> after it has been placed in the hashmap.
> {code}
> HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: 
> 621ms
> java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869  <7 
> recursive calls>
> java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, 
> Object) HashMap.java:1989
> java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 
> HashMap.java:637
> java.util.HashMap.put(Object, Object) HashMap.java:611
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc,
>  VectorPartitionDesc, Map) Vectorizer.java:1272
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc,
>  boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork,
>  String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) 
> Vectorizer.java:1654
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork,
>  Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork,
>  boolean) Vectorizer.java:1109
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node,
>  Stack, Object[]) Vectorizer.java:961
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, 
> TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) 
> TaskGraphWalker.java:180
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, 
> HashMap) TaskGraphWalker.java:125
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext)
>  Vectorizer.java:2442
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, 
> ParseContext, Context) TezCompiler.java:717
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, 
> HashSet, HashSet) TaskCompiler.java:258
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, 
> SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) 
> CalcitePlanner.java:358
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key

2019-01-21 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-20419:
--
Fix Version/s: 4.0.0

> Vectorization: Prevent mutation of VectorPartitionDesc after being used in a 
> hashmap key
> 
>
> Key: HIVE-20419
> URL: https://issues.apache.org/jira/browse/HIVE-20419
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-20419.1.patch, HIVE-20419.2.patch, 
> HIVE-20419.4.patch
>
>
> This is going into the loop because the VectorPartitionDesc is modified after 
> it is used in the HashMap key - resulting in a hashcode & equals modification 
> after it has been placed in the hashmap.
> {code}
> HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: 
> 621ms
> java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869  <7 
> recursive calls>
> java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, 
> Object) HashMap.java:1989
> java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 
> HashMap.java:637
> java.util.HashMap.put(Object, Object) HashMap.java:611
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc,
>  VectorPartitionDesc, Map) Vectorizer.java:1272
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc,
>  boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork,
>  String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) 
> Vectorizer.java:1654
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork,
>  Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork,
>  boolean) Vectorizer.java:1109
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node,
>  Stack, Object[]) Vectorizer.java:961
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, 
> TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) 
> TaskGraphWalker.java:180
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, 
> HashMap) TaskGraphWalker.java:125
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext)
>  Vectorizer.java:2442
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, 
> ParseContext, Context) TezCompiler.java:717
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, 
> HashSet, HashSet) TaskCompiler.java:258
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, 
> SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) 
> CalcitePlanner.java:358
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key

2019-01-21 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-20419:
--
Attachment: (was: HIVE-20419.3.patch)

> Vectorization: Prevent mutation of VectorPartitionDesc after being used in a 
> hashmap key
> 
>
> Key: HIVE-20419
> URL: https://issues.apache.org/jira/browse/HIVE-20419
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20419.1.patch, HIVE-20419.2.patch
>
>
> This is going into the loop because the VectorPartitionDesc is modified after 
> it is used in the HashMap key - resulting in a hashcode & equals modification 
> after it has been placed in the hashmap.
> {code}
> HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: 
> 621ms
> java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869  <7 
> recursive calls>
> java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, 
> Object) HashMap.java:1989
> java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 
> HashMap.java:637
> java.util.HashMap.put(Object, Object) HashMap.java:611
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc,
>  VectorPartitionDesc, Map) Vectorizer.java:1272
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc,
>  boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork,
>  String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) 
> Vectorizer.java:1654
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork,
>  Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork,
>  boolean) Vectorizer.java:1109
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node,
>  Stack, Object[]) Vectorizer.java:961
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, 
> TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) 
> TaskGraphWalker.java:180
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, 
> HashMap) TaskGraphWalker.java:125
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext)
>  Vectorizer.java:2442
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, 
> ParseContext, Context) TezCompiler.java:717
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, 
> HashSet, HashSet) TaskCompiler.java:258
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, 
> SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) 
> CalcitePlanner.java:358
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key

2019-01-21 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-20419:
--
Attachment: HIVE-20419.4.patch

> Vectorization: Prevent mutation of VectorPartitionDesc after being used in a 
> hashmap key
> 
>
> Key: HIVE-20419
> URL: https://issues.apache.org/jira/browse/HIVE-20419
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20419.1.patch, HIVE-20419.2.patch, 
> HIVE-20419.4.patch
>
>
> This is going into the loop because the VectorPartitionDesc is modified after 
> it is used in the HashMap key - resulting in a hashcode & equals modification 
> after it has been placed in the hashmap.
> {code}
> HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: 
> 621ms
> java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869  <7 
> recursive calls>
> java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, 
> Object) HashMap.java:1989
> java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 
> HashMap.java:637
> java.util.HashMap.put(Object, Object) HashMap.java:611
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc,
>  VectorPartitionDesc, Map) Vectorizer.java:1272
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc,
>  boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork,
>  String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) 
> Vectorizer.java:1654
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork,
>  Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork,
>  boolean) Vectorizer.java:1109
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node,
>  Stack, Object[]) Vectorizer.java:961
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, 
> TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) 
> TaskGraphWalker.java:180
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, 
> HashMap) TaskGraphWalker.java:125
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext)
>  Vectorizer.java:2442
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, 
> ParseContext, Context) TezCompiler.java:717
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, 
> HashSet, HashSet) TaskCompiler.java:258
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, 
> SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) 
> CalcitePlanner.java:358
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key

2019-01-21 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-20419:
--
Attachment: BUG-116953.4.patch

> Vectorization: Prevent mutation of VectorPartitionDesc after being used in a 
> hashmap key
> 
>
> Key: HIVE-20419
> URL: https://issues.apache.org/jira/browse/HIVE-20419
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20419.1.patch, HIVE-20419.2.patch, 
> HIVE-20419.4.patch
>
>
> This is going into the loop because the VectorPartitionDesc is modified after 
> it is used in the HashMap key - resulting in a hashcode & equals modification 
> after it has been placed in the hashmap.
> {code}
> HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: 
> 621ms
> java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869  <7 
> recursive calls>
> java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, 
> Object) HashMap.java:1989
> java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 
> HashMap.java:637
> java.util.HashMap.put(Object, Object) HashMap.java:611
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc,
>  VectorPartitionDesc, Map) Vectorizer.java:1272
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc,
>  boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork,
>  String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) 
> Vectorizer.java:1654
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork,
>  Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork,
>  boolean) Vectorizer.java:1109
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node,
>  Stack, Object[]) Vectorizer.java:961
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, 
> TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) 
> TaskGraphWalker.java:180
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, 
> HashMap) TaskGraphWalker.java:125
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext)
>  Vectorizer.java:2442
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, 
> ParseContext, Context) TezCompiler.java:717
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, 
> HashSet, HashSet) TaskCompiler.java:258
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, 
> SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) 
> CalcitePlanner.java:358
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key

2019-01-21 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-20419:
--
Attachment: HIVE-20419.3.patch

> Vectorization: Prevent mutation of VectorPartitionDesc after being used in a 
> hashmap key
> 
>
> Key: HIVE-20419
> URL: https://issues.apache.org/jira/browse/HIVE-20419
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20419.1.patch, HIVE-20419.2.patch, 
> HIVE-20419.3.patch
>
>
> This is going into the loop because the VectorPartitionDesc is modified after 
> it is used in the HashMap key - resulting in a hashcode & equals modification 
> after it has been placed in the hashmap.
> {code}
> HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: 
> 621ms
> java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869  <7 
> recursive calls>
> java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, 
> Object) HashMap.java:1989
> java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 
> HashMap.java:637
> java.util.HashMap.put(Object, Object) HashMap.java:611
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc,
>  VectorPartitionDesc, Map) Vectorizer.java:1272
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc,
>  boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork,
>  String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) 
> Vectorizer.java:1654
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork,
>  Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork,
>  boolean) Vectorizer.java:1109
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node,
>  Stack, Object[]) Vectorizer.java:961
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, 
> TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) 
> TaskGraphWalker.java:180
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, 
> HashMap) TaskGraphWalker.java:125
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext)
>  Vectorizer.java:2442
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, 
> ParseContext, Context) TezCompiler.java:717
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, 
> HashSet, HashSet) TaskCompiler.java:258
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, 
> SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) 
> CalcitePlanner.java:358
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21126) Allow session level queries in LlapBaseInputFormat#getSplits() before actual get_splits() call

2019-01-21 Thread Teddy Choi (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748402#comment-16748402
 ] 

Teddy Choi commented on HIVE-21126:
---

+1. Looks good to me.

> Allow session level queries in LlapBaseInputFormat#getSplits() before actual 
> get_splits() call
> --
>
> Key: HIVE-21126
> URL: https://issues.apache.org/jira/browse/HIVE-21126
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 3.1.1
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21126.1.patch, HIVE-21126.2.patch, 
> HIVE-21126.3.patch
>
>
> Facilitate execution of session level queries before \{{select get_splits()}} 
> call. This will allow us to set params like \{{tez.grouping.split-count}} 
> which can be taken into consideration while splits calculation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key

2019-01-21 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-20419:
--
Attachment: HIVE-20419.2.patch

> Vectorization: Prevent mutation of VectorPartitionDesc after being used in a 
> hashmap key
> 
>
> Key: HIVE-20419
> URL: https://issues.apache.org/jira/browse/HIVE-20419
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20419.1.patch, HIVE-20419.2.patch
>
>
> This is going into the loop because the VectorPartitionDesc is modified after 
> it is used in the HashMap key - resulting in a hashcode & equals modification 
> after it has been placed in the hashmap.
> {code}
> HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: 
> 621ms
> java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869  <7 
> recursive calls>
> java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, 
> Object) HashMap.java:1989
> java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 
> HashMap.java:637
> java.util.HashMap.put(Object, Object) HashMap.java:611
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc,
>  VectorPartitionDesc, Map) Vectorizer.java:1272
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc,
>  boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork,
>  String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) 
> Vectorizer.java:1654
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork,
>  Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork,
>  boolean) Vectorizer.java:1109
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node,
>  Stack, Object[]) Vectorizer.java:961
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, 
> TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) 
> TaskGraphWalker.java:180
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, 
> HashMap) TaskGraphWalker.java:125
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext)
>  Vectorizer.java:2442
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, 
> ParseContext, Context) TezCompiler.java:717
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, 
> HashSet, HashSet) TaskCompiler.java:258
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, 
> SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) 
> CalcitePlanner.java:358
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key

2019-01-21 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-20419:
--
Attachment: HIVE-20419.1.patch

> Vectorization: Prevent mutation of VectorPartitionDesc after being used in a 
> hashmap key
> 
>
> Key: HIVE-20419
> URL: https://issues.apache.org/jira/browse/HIVE-20419
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20419.1.patch
>
>
> This is going into the loop because the VectorPartitionDesc is modified after 
> it is used in the HashMap key - resulting in a hashcode & equals modification 
> after it has been placed in the hashmap.
> {code}
> HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: 
> 621ms
> java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869  <7 
> recursive calls>
> java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, 
> Object) HashMap.java:1989
> java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 
> HashMap.java:637
> java.util.HashMap.put(Object, Object) HashMap.java:611
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc,
>  VectorPartitionDesc, Map) Vectorizer.java:1272
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc,
>  boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork,
>  String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) 
> Vectorizer.java:1654
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork,
>  Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork,
>  boolean) Vectorizer.java:1109
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node,
>  Stack, Object[]) Vectorizer.java:961
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, 
> TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) 
> TaskGraphWalker.java:180
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, 
> HashMap) TaskGraphWalker.java:125
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext)
>  Vectorizer.java:2442
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, 
> ParseContext, Context) TezCompiler.java:717
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, 
> HashSet, HashSet) TaskCompiler.java:258
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, 
> SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) 
> CalcitePlanner.java:358
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key

2019-01-21 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-20419:
--
Status: Patch Available  (was: Open)

> Vectorization: Prevent mutation of VectorPartitionDesc after being used in a 
> hashmap key
> 
>
> Key: HIVE-20419
> URL: https://issues.apache.org/jira/browse/HIVE-20419
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20419.1.patch
>
>
> This is going into the loop because the VectorPartitionDesc is modified after 
> it is used in the HashMap key - resulting in a hashcode & equals modification 
> after it has been placed in the hashmap.
> {code}
> HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: 
> 621ms
> java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869  <7 
> recursive calls>
> java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, 
> Object) HashMap.java:1989
> java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 
> HashMap.java:637
> java.util.HashMap.put(Object, Object) HashMap.java:611
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc,
>  VectorPartitionDesc, Map) Vectorizer.java:1272
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc,
>  boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork,
>  String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) 
> Vectorizer.java:1654
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork,
>  Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork,
>  boolean) Vectorizer.java:1109
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node,
>  Stack, Object[]) Vectorizer.java:961
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, 
> TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) 
> TaskGraphWalker.java:180
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, 
> HashMap) TaskGraphWalker.java:125
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext)
>  Vectorizer.java:2442
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, 
> ParseContext, Context) TezCompiler.java:717
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, 
> HashSet, HashSet) TaskCompiler.java:258
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, 
> SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) 
> CalcitePlanner.java:358
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21091) Arrow serializer sets null at wrong index

2019-01-20 Thread Teddy Choi (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747698#comment-16747698
 ] 

Teddy Choi commented on HIVE-21091:
---

After adding unit tests, I found that there's still a bug in null handling for 
lists. I fixed it, too. Thanks [~bslim] for advising for UT.

> Arrow serializer sets null at wrong index
> -
>
> Key: HIVE-21091
> URL: https://issues.apache.org/jira/browse/HIVE-21091
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21091.1.patch, HIVE-21091.2.patch, 
> HIVE-21091.3.patch
>
>
> Arrow serializer sets null at wrong index



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21091) Arrow serializer sets null at wrong index

2019-01-20 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21091:
--
Attachment: HIVE-21091.3.patch

> Arrow serializer sets null at wrong index
> -
>
> Key: HIVE-21091
> URL: https://issues.apache.org/jira/browse/HIVE-21091
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21091.1.patch, HIVE-21091.2.patch, 
> HIVE-21091.3.patch
>
>
> Arrow serializer sets null at wrong index



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21091) Arrow serializer sets null at wrong index

2019-01-20 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21091:
--
Attachment: HIVE-21091.2.patch

> Arrow serializer sets null at wrong index
> -
>
> Key: HIVE-21091
> URL: https://issues.apache.org/jira/browse/HIVE-21091
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-21091.1.patch, HIVE-21091.2.patch
>
>
> Arrow serializer sets null at wrong index



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21091) Arrow serializer sets null at wrong index

2019-01-06 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21091:
--
Attachment: HIVE-21091.1.patch

> Arrow serializer sets null at wrong index
> -
>
> Key: HIVE-21091
> URL: https://issues.apache.org/jira/browse/HIVE-21091
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-21091.1.patch
>
>
> Arrow serializer sets null at wrong index



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21091) Arrow serializer sets null at wrong index

2019-01-06 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21091:
--
Status: Patch Available  (was: Open)

> Arrow serializer sets null at wrong index
> -
>
> Key: HIVE-21091
> URL: https://issues.apache.org/jira/browse/HIVE-21091
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-21091.1.patch
>
>
> Arrow serializer sets null at wrong index



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-21091) Arrow serializer sets null at wrong index

2019-01-06 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-21091:
-


> Arrow serializer sets null at wrong index
> -
>
> Key: HIVE-21091
> URL: https://issues.apache.org/jira/browse/HIVE-21091
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>
> Arrow serializer sets null at wrong index



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21041) NPE, ParseException in getting schema from logical plan

2018-12-18 Thread Teddy Choi (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724141#comment-16724141
 ] 

Teddy Choi commented on HIVE-21041:
---

Pushed to master and branch-3. Thanks, [~jcamachorodriguez].

> NPE, ParseException in getting schema from logical plan
> ---
>
> Key: HIVE-21041
> URL: https://issues.apache.org/jira/browse/HIVE-21041
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-21041.2.patch, HIVE-21041.3.patch
>
>
> HIVE-20552 makes getting schema from logical plan faster. But it throws 
> ParseException when it has column alias, and NullPointerException when it has 
> subqueries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21041) NPE, ParseException in getting schema from logical plan

2018-12-18 Thread Teddy Choi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-21041:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> NPE, ParseException in getting schema from logical plan
> ---
>
> Key: HIVE-21041
> URL: https://issues.apache.org/jira/browse/HIVE-21041
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-21041.2.patch, HIVE-21041.3.patch
>
>
> HIVE-20552 makes getting schema from logical plan faster. But it throws 
> ParseException when it has column alias, and NullPointerException when it has 
> subqueries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 3 4 5 6 7 8 9 >

1 - 100 of 831 matches

Mail list logo