[jira] [Commented] (HIVE-24322) In case of direct insert, the attempt ID has to be checked when reading the manifest files

2021-06-30 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371967#comment-17371967
 ] 

Aditya Shah commented on HIVE-24322:


[~kuczoram] I was confused as to how the biggest task attempt id ensure that 
the particular attempt was completed successfully. In case of speculative 
execution the previous task may have finished first and the speculative task 
might have been killed post this while it is writing commit Paths or partition 
specs (multi-stmt IOW with DP). Am I missing something?

> In case of direct insert, the attempt ID has to be checked when reading the 
> manifest files
> --
>
> Key: HIVE-24322
> URL: https://issues.apache.org/jira/browse/HIVE-24322
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In IMPALA-10247 there was an exception from Hive when tyring to load the data:
> {noformat}
> 2020-10-13T16:50:53,424 ERROR [HiveServer2-Background-Pool: Thread-23832] 
> exec.Task: Job Commit failed with exception 
> 'org.apache.hadoop.hive.ql.metadata.HiveException(java.io.EOFException)'
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
>  at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1468)
>  at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
>  at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
>  at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
>  at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:627)
>  at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:342)
>  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>  at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>  at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
>  at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
>  at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
>  at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
>  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
>  at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.EOFException
>  at java.io.DataInputStream.readInt(DataInputStream.java:392)
>  at 
> org.apache.hadoop.hive.ql.exec.Utilities.handleDirectInsertTableFinalPath(Utilities.java:4587)
>  at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1462)
>  ... 29 more
> {noformat}
> The reason of the exception was that Hive was trying to read an empty 
> manifest file. Manifest files are used in case of direct insert to determine 
> which files needs to be kept and which one needs to be cleaned up. They are 
> created by the tasks and they use the task attempt Id as postfix. In this 
> particular test what happened is that one of the container ran out of memory 
> so Tez decided to kill it right after the manifest file got created but 
> before the paths got written into the manifest file. This was the manifest 
> file for the task attempt 0. Then Tez assigned a new container to the task, 
> so a new attempt was made with attemptId=1. This one was successful, and 
> wrote the manifest file correctly. But Hive didn't know about this, since 
> this out of memory issue got 

[jira] [Commented] (HIVE-23804) Adding defaults for Columns Stats table in the schema to make them backward compatible

2020-07-13 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156524#comment-17156524
 ] 

Aditya Shah commented on HIVE-23804:


[~ngangam] ping for review of my previous comments and the patch. 

Thanks,

Aditya

> Adding defaults for Columns Stats table in the schema to make them backward 
> compatible
> --
>
> Key: HIVE-23804
> URL: https://issues.apache.org/jira/browse/HIVE-23804
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 2.3.7
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-23804-1.patch, HIVE-23804.patch
>
>
> Since the table/part column statistics tables have added a new `CAT_NAME` 
> column with `NOT NULL` constraint in version >3.0.0, queries to analyze 
> statistics break for Hive versions <3.0.0 when used against an upgraded DB. 
> One such miss is handled in HIVE-21739.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23804) Adding defaults for Columns Stats table in the schema to make them backward compatible

2020-07-07 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152908#comment-17152908
 ] 

Aditya Shah commented on HIVE-23804:


[~ngangam] some details on the exact issue:

Env:
Hive version : 2.3 + hs2 + Tez
Metastore Db version: 3.1.2

query:
{code:java}
analyze table testTbl compute statistics for columns col1,col3;{code}

Stack trace:
{code:java}
Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Insert of object 
"org.apache.hadoop.hive.metastore.model.MTableColumnStatistics@e5ef653" using 
statement "INSERT INTO `TAB_COL_STATS` 
(`CS_ID`,`AVG_COL_LEN`,`COLUMN_NAME`,`COLUMN_TYPE`,`DB_NAME`,`BIG_DECIMAL_HIGH_VALUE`,`BIG_DECIMAL_LOW_VALUE`,`DOUBLE_HIGH_VALUE`,`DOUBLE_LOW_VALUE`,`LAST_ANALYZED`,`LONG_HIGH_VALUE`,`LONG_LOW_VALUE`,`MAX_COL_LEN`,`NUM_DISTINCTS`,`NUM_FALSES`,`NUM_NULLS`,`NUM_TRUES`,`TBL_ID`,`TABLE_NAME`)
 VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)" failed : Field 'CAT_NAME' 
doesn't have a default value
 at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$set_aggr_stats_for_result$set_aggr_stats_for_resultStandardScheme.read(ThriftHiveMetastore.java)
 ~[hive-exec-2.3.4-qubole.jar:2.3.4-qubole]
 at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$set_aggr_stats_for_result$set_aggr_stats_for_resultStandardScheme.read(ThriftHiveMetastore.java)
 ~[hive-exec-2.3.4-qubole.jar:2.3.4-qubole]
 at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$set_aggr_stats_for_result.read(ThriftHiveMetastore.java)
 ~[hive-exec-2.3.4-qubole.jar:2.3.4-qubole]
 at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) 
~[hive-exec-2.3.4-qubole.jar:2.3.4-qubole]
 at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_aggr_stats_for(ThriftHiveMetastore.java:3559)
 ~[hive-exec-2.3.4-qubole.jar:2.3.4-qubole]
 at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_aggr_stats_for(ThriftHiveMetastore.java:3546)
 ~[hive-exec-2.3.4-qubole.jar:2.3.4-qubole]
 at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.setPartitionColumnStatistics(HiveMetaStoreClient.java:1692)
 ~[hive-exec-2.3.4-qubole.jar:2.3.4-qubole]
 at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.setPartitionColumnStatistics(SessionHiveMetaStoreClient.java:355)
 ~[hive-exec-2.3.4-qubole.jar:2.3.4-qubole]{code}

Ignore the line numbers as it is an internal build in Qubole. But, there are no 
code changes in this path from OSS. 

Thanks,

Aditya

> Adding defaults for Columns Stats table in the schema to make them backward 
> compatible
> --
>
> Key: HIVE-23804
> URL: https://issues.apache.org/jira/browse/HIVE-23804
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 2.3.7
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-23804-1.patch, HIVE-23804.patch
>
>
> Since the table/part column statistics tables have added a new `CAT_NAME` 
> column with `NOT NULL` constraint in version >3.0.0, queries to analyze 
> statistics break for Hive versions <3.0.0 when used against an upgraded DB. 
> One such miss is handled in HIVE-21739.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23804) Adding defaults for Columns Stats table in the schema to make them backward compatible

2020-07-06 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-23804:
---
Attachment: HIVE-23804-1.patch
Status: Patch Available  (was: Open)

[~ngangam] thanks for reviewing the patch. I have run the DB install tests now 
and have made a few corrections in oracle db scripts. To answer your other 
questions:
1. For user who has already upgraded to 3.1.2 his analyze statistics commands 
would anyway not be working from a lower version's JVM. But for someone who is 
looking forward to upgrading to 3.1.2 which is officially released until now 
the backward compatibility is expected. We should probably not leave 3.1/3.0 
version backward-incompatible whereas making 4 so. 
2. This has been taken care of already setting explicitly
3. I ran the DB install tests and they passed successfully after some changes 
(new patch).
4. For Postgres I found the create table commands for this stats tables did not 
have a not null constraint in fresh schema script whereas upgrade scripts did. 
I mentioned this and thought it should be addressed separately.

Thanks,
Aditya

> Adding defaults for Columns Stats table in the schema to make them backward 
> compatible
> --
>
> Key: HIVE-23804
> URL: https://issues.apache.org/jira/browse/HIVE-23804
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.3.7, 2.1.1
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-23804-1.patch, HIVE-23804.patch
>
>
> Since the table/part column statistics tables have added a new `CAT_NAME` 
> column with `NOT NULL` constraint in version >3.0.0, queries to analyze 
> statistics break for Hive versions <3.0.0 when used against an upgraded DB. 
> One such miss is handled in HIVE-21739.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23804) Adding defaults for Columns Stats table in the schema to make them backward compatible

2020-07-06 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-23804:
---
Status: Open  (was: Patch Available)

> Adding defaults for Columns Stats table in the schema to make them backward 
> compatible
> --
>
> Key: HIVE-23804
> URL: https://issues.apache.org/jira/browse/HIVE-23804
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.3.7, 2.1.1
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-23804.patch
>
>
> Since the table/part column statistics tables have added a new `CAT_NAME` 
> column with `NOT NULL` constraint in version >3.0.0, queries to analyze 
> statistics break for Hive versions <3.0.0 when used against an upgraded DB. 
> One such miss is handled in HIVE-21739.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23804) Adding defaults for Columns Stats table in the schema to make them backward compatible

2020-07-06 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151958#comment-17151958
 ] 

Aditya Shah commented on HIVE-23804:


[~pvary] [~gates] can you please take a look. One additional observation was to 
check the create table commands for column stats tables in Postgres scripts. 
The upgrade scripts seem different in column constraints than the fresh schema 
scripts.

 

Thanks,

Aditya

> Adding defaults for Columns Stats table in the schema to make them backward 
> compatible
> --
>
> Key: HIVE-23804
> URL: https://issues.apache.org/jira/browse/HIVE-23804
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 2.3.7
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-23804.patch
>
>
> Since the table/part column statistics tables have added a new `CAT_NAME` 
> column with `NOT NULL` constraint in version >3.0.0, queries to analyze 
> statistics break for Hive versions <3.0.0 when used against an upgraded DB. 
> One such miss is handled in HIVE-21739.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23804) Adding defaults for Columns Stats table in the schema to make them backward compatible

2020-07-06 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-23804:
---
Attachment: HIVE-23804.patch
Status: Patch Available  (was: Open)

> Adding defaults for Columns Stats table in the schema to make them backward 
> compatible
> --
>
> Key: HIVE-23804
> URL: https://issues.apache.org/jira/browse/HIVE-23804
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.3.7, 2.1.1
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-23804.patch
>
>
> Since the table/part column statistics tables have added a new `CAT_NAME` 
> column with `NOT NULL` constraint in version >3.0.0, queries to analyze 
> statistics break for Hive versions <3.0.0 when used against an upgraded DB. 
> One such miss is handled in HIVE-21739.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23804) Adding defaults for Columns Stats table in the schema to make them backward compatible

2020-07-06 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah reassigned HIVE-23804:
--


> Adding defaults for Columns Stats table in the schema to make them backward 
> compatible
> --
>
> Key: HIVE-23804
> URL: https://issues.apache.org/jira/browse/HIVE-23804
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.3.7, 2.1.1
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
>
> Since the table/part column statistics tables have added a new `CAT_NAME` 
> column with `NOT NULL` constraint in version >3.0.0, queries to analyze 
> statistics break for Hive versions <3.0.0 when used against an upgraded DB. 
> One such miss is handled in HIVE-21739.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23803) Initiator misses compactions of table which were just allowed auto compaction after a given interval

2020-07-06 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151916#comment-17151916
 ] 

Aditya Shah commented on HIVE-23803:


[~pvary] yes correct. [~dkuzmenko] whenever we would change any property (no 
auto compaction or the ones Peter mentioned) and we expect compaction to be 
enqueued with new conf., auto compaction may not pick the table up if the 
interval has passed. So, is it expected behavior from the user that he should 
trigger manual compaction or at least wait for new entries in Completed 
transaction components?

> Initiator misses compactions of table which were just allowed auto compaction 
> after a given interval
> 
>
> Key: HIVE-23803
> URL: https://issues.apache.org/jira/browse/HIVE-23803
> Project: Hive
>  Issue Type: Bug
>Reporter: Aditya Shah
>Priority: Major
>
> After HIVE-21917  we are just looking at completed transaction components 
> entries that have a timestamp in past check interval for initiators. But if 
> there is a table which has `NO_AUTO_COMPACTION` set as true for a while (at 
> least check interval) and is toggled to false, auto compaction will not 
> happen for that table till there is no new entry in completed transaction 
> component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23803) Initiator misses compactions of table which were just allowed auto compaction after a given interval

2020-07-06 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151892#comment-17151892
 ] 

Aditya Shah commented on HIVE-23803:


[~pvary] [~dkuzmenko] can you please take a look?

Thanks, 

Aditya

> Initiator misses compactions of table which were just allowed auto compaction 
> after a given interval
> 
>
> Key: HIVE-23803
> URL: https://issues.apache.org/jira/browse/HIVE-23803
> Project: Hive
>  Issue Type: Bug
>Reporter: Aditya Shah
>Priority: Major
>
> After HIVE-21917  we are just looking at completed transaction components 
> entries that have a timestamp in past check interval for initiators. But if 
> there is a table which has `NO_AUTO_COMPACTION` set as true for a while (at 
> least check interval) and is toggled to false, auto compaction will not 
> happen for that table till there is no new entry in completed transaction 
> component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23724) Hive ACID Lock conflicts not getting resolved correctly.

2020-06-19 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140420#comment-17140420
 ] 

Aditya Shah commented on HIVE-23724:


[~pvary] thanks for going through it. Yes, HIVE-22888 will solve the issue, but 
it seems it was done as an improvement and not correction. So, should we do the 
minimal change I suggested for branch-3.1 to ensure correctness or I can close 
the issue?

> Hive ACID Lock conflicts not getting resolved correctly.
> 
>
> Key: HIVE-23724
> URL: https://issues.apache.org/jira/browse/HIVE-23724
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.1.2
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-23724.1.branch-3.1.patch
>
>
> Steps to reproduce:
> 1. `Drop database temp cascade`
> 2. Parallelly (after 1. but while 1. is running) fire a `create table 
> temp.temp_table (a int, b int) clustered by (a) into 2 buckets stored as orc 
> TBLPROPERTIES ('transactional'='true')`
> 3. Parallelly (after 2. but while 2. is running) fire a `insert overwrite 
> table temp.temp_table values (1,2)`
> note: The above could be easily reproduced by a unit test in testDbTxnManager.
> Observation: Exclusive lock for Table in 3. is granted although exclusive 
> lock for DB acquired in 1. is still acquired and shared read lock on DB for 
> 2. is waiting.
> Cause of issue: while acquiring a lock if we choose to ignore a conflict 
> between the desired lock and one of the existing locks we immediately allow 
> the desired lock to be acquired without checking against all the existing 
> locks. The above-mentioned scenario was one such ignore conflict condition in 
> 2. and 3. There could be other possible combinations where this may occur. 
> Like for example when we request a lock with the same txn ids. Although hive 
> guarantees that this scenario will not occur due to all lock requests related 
> to a txn are asked at the same and failure of one guarantees failure of all, 
> we in future will have to be extra careful with it.
> Resolution: Whenever we ignore conflict we should keep looking against all 
> the existing locks and only then allow the lock to be acquired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23724) Hive ACID Lock conflicts not getting resolved correctly.

2020-06-18 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-23724:
---
Attachment: HIVE-23724.1.branch-3.1.patch
Status: Patch Available  (was: Open)

[~pvary] [~dkuzmenko] can you please check it once. Will add relevant unit test 
if my theory is correct.

Thanks!

> Hive ACID Lock conflicts not getting resolved correctly.
> 
>
> Key: HIVE-23724
> URL: https://issues.apache.org/jira/browse/HIVE-23724
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.1.2
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-23724.1.branch-3.1.patch
>
>
> Steps to reproduce:
> 1. `Drop database temp cascade`
> 2. Parallelly (after 1. but while 1. is running) fire a `create table 
> temp.temp_table (a int, b int) clustered by (a) into 2 buckets stored as orc 
> TBLPROPERTIES ('transactional'='true')`
> 3. Parallelly (after 2. but while 2. is running) fire a `insert overwrite 
> table temp.temp_table values (1,2)`
> note: The above could be easily reproduced by a unit test in testDbTxnManager.
> Observation: Exclusive lock for Table in 3. is granted although exclusive 
> lock for DB acquired in 1. is still acquired and shared read lock on DB for 
> 2. is waiting.
> Cause of issue: while acquiring a lock if we choose to ignore a conflict 
> between the desired lock and one of the existing locks we immediately allow 
> the desired lock to be acquired without checking against all the existing 
> locks. The above-mentioned scenario was one such ignore conflict condition in 
> 2. and 3. There could be other possible combinations where this may occur. 
> Like for example when we request a lock with the same txn ids. Although hive 
> guarantees that this scenario will not occur due to all lock requests related 
> to a txn are asked at the same and failure of one guarantees failure of all, 
> we in future will have to be extra careful with it.
> Resolution: Whenever we ignore conflict we should keep looking against all 
> the existing locks and only then allow the lock to be acquired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23724) Hive ACID Lock conflicts not getting resolved correctly.

2020-06-18 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah reassigned HIVE-23724:
--


> Hive ACID Lock conflicts not getting resolved correctly.
> 
>
> Key: HIVE-23724
> URL: https://issues.apache.org/jira/browse/HIVE-23724
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.1.2
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
>
> Steps to reproduce:
> 1. `Drop database temp cascade`
> 2. Parallelly (after 1. but while 1. is running) fire a `create table 
> temp.temp_table (a int, b int) clustered by (a) into 2 buckets stored as orc 
> TBLPROPERTIES ('transactional'='true')`
> 3. Parallelly (after 2. but while 2. is running) fire a `insert overwrite 
> table temp.temp_table values (1,2)`
> note: The above could be easily reproduced by a unit test in testDbTxnManager.
> Observation: Exclusive lock for Table in 3. is granted although exclusive 
> lock for DB acquired in 1. is still acquired and shared read lock on DB for 
> 2. is waiting.
> Cause of issue: while acquiring a lock if we choose to ignore a conflict 
> between the desired lock and one of the existing locks we immediately allow 
> the desired lock to be acquired without checking against all the existing 
> locks. The above-mentioned scenario was one such ignore conflict condition in 
> 2. and 3. There could be other possible combinations where this may occur. 
> Like for example when we request a lock with the same txn ids. Although hive 
> guarantees that this scenario will not occur due to all lock requests related 
> to a txn are asked at the same and failure of one guarantees failure of all, 
> we in future will have to be extra careful with it.
> Resolution: Whenever we ignore conflict we should keep looking against all 
> the existing locks and only then allow the lock to be acquired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-21164) ACID: explore how we can avoid a move step during inserts/compaction

2020-03-31 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah reassigned HIVE-21164:
--

Assignee: Marta Kuczora  (was: Aditya Shah)

> ACID: explore how we can avoid a move step during inserts/compaction
> 
>
> Key: HIVE-21164
> URL: https://issues.apache.org/jira/browse/HIVE-21164
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Marta Kuczora
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21164.1.patch, HIVE-21164.10.patch, 
> HIVE-21164.11.patch, HIVE-21164.11.patch, HIVE-21164.12.patch, 
> HIVE-21164.13.patch, HIVE-21164.14.patch, HIVE-21164.14.patch, 
> HIVE-21164.15.patch, HIVE-21164.16.patch, HIVE-21164.17.patch, 
> HIVE-21164.18.patch, HIVE-21164.19.patch, HIVE-21164.2.patch, 
> HIVE-21164.20.patch, HIVE-21164.21.patch, HIVE-21164.22.patch, 
> HIVE-21164.3.patch, HIVE-21164.4.patch, HIVE-21164.5.patch, 
> HIVE-21164.6.patch, HIVE-21164.7.patch, HIVE-21164.8.patch, HIVE-21164.9.patch
>
>
> Currently, we write compacted data to a temporary location and then move the 
> files to a final location, which is an expensive operation on some cloud file 
> systems. Since HIVE-20823 is already in, it can control the visibility of 
> compacted data for the readers. Therefore, we can perhaps avoid writing data 
> to a temporary location and directly write compacted data to the intended 
> final path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-21164) ACID: explore how we can avoid a move step during inserts/compaction

2020-03-31 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah reassigned HIVE-21164:
--

Assignee: Aditya Shah  (was: Marta Kuczora)

> ACID: explore how we can avoid a move step during inserts/compaction
> 
>
> Key: HIVE-21164
> URL: https://issues.apache.org/jira/browse/HIVE-21164
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21164.1.patch, HIVE-21164.10.patch, 
> HIVE-21164.11.patch, HIVE-21164.11.patch, HIVE-21164.12.patch, 
> HIVE-21164.13.patch, HIVE-21164.14.patch, HIVE-21164.14.patch, 
> HIVE-21164.15.patch, HIVE-21164.16.patch, HIVE-21164.17.patch, 
> HIVE-21164.18.patch, HIVE-21164.19.patch, HIVE-21164.2.patch, 
> HIVE-21164.20.patch, HIVE-21164.21.patch, HIVE-21164.22.patch, 
> HIVE-21164.3.patch, HIVE-21164.4.patch, HIVE-21164.5.patch, 
> HIVE-21164.6.patch, HIVE-21164.7.patch, HIVE-21164.8.patch, HIVE-21164.9.patch
>
>
> Currently, we write compacted data to a temporary location and then move the 
> files to a final location, which is an expensive operation on some cloud file 
> systems. Since HIVE-20823 is already in, it can control the visibility of 
> compacted data for the readers. Therefore, we can perhaps avoid writing data 
> to a temporary location and directly write compacted data to the intended 
> final path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22964) MM table split computation is very slow

2020-03-12 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057719#comment-17057719
 ] 

Aditya Shah commented on HIVE-22964:


[~pvary] sorry to have missed it! The change LGTM. Thanks!

> MM table split computation is very slow
> ---
>
> Key: HIVE-22964
> URL: https://issues.apache.org/jira/browse/HIVE-22964
> Project: Hive
>  Issue Type: Improvement
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22964.1.patch, HIVE-22964.2.patch, 
> HIVE-22964.3.patch, HIVE-22964.patch
>
>
> Since for MM table we process the paths prior to inputFormat.getSplits() we 
> end up doing listing on the whole table at once. This could be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22964) MM table split computation is very slow

2020-03-11 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-22964:
---
Attachment: HIVE-22964.2.patch
Status: Patch Available  (was: Open)

Thanks for the review! I have made the recommended changes and waiting for a 
green run.

> MM table split computation is very slow
> ---
>
> Key: HIVE-22964
> URL: https://issues.apache.org/jira/browse/HIVE-22964
> Project: Hive
>  Issue Type: Improvement
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22964.1.patch, HIVE-22964.2.patch, HIVE-22964.patch
>
>
> Since for MM table we process the paths prior to inputFormat.getSplits() we 
> end up doing listing on the whole table at once. This could be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22964) MM table split computation is very slow

2020-03-11 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-22964:
---
Status: Open  (was: Patch Available)

> MM table split computation is very slow
> ---
>
> Key: HIVE-22964
> URL: https://issues.apache.org/jira/browse/HIVE-22964
> Project: Hive
>  Issue Type: Improvement
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22964.1.patch, HIVE-22964.patch
>
>
> Since for MM table we process the paths prior to inputFormat.getSplits() we 
> end up doing listing on the whole table at once. This could be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22964) MM table split computation is very slow

2020-03-10 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055749#comment-17055749
 ] 

Aditya Shah commented on HIVE-22964:


Hi [~pvary],
 * yes, you are correct, no need for UGI here and thus removed it.
 * I used a synchronized list and removed the encompassing class for the lists. 
The overhead comparison was ~ 71s (w/o synchronized lists (earlier patch)) vs 
78s (with synchronized list) for 150 partitions with 2 delta directories each.
 * For rename of confs, I wasn't sure how to go about deprecation. I tried to 
do it the way metastore cones were handled.

 

Thanks!

> MM table split computation is very slow
> ---
>
> Key: HIVE-22964
> URL: https://issues.apache.org/jira/browse/HIVE-22964
> Project: Hive
>  Issue Type: Improvement
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22964.1.patch, HIVE-22964.patch
>
>
> Since for MM table we process the paths prior to inputFormat.getSplits() we 
> end up doing listing on the whole table at once. This could be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22964) MM table split computation is very slow

2020-03-10 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-22964:
---
Status: Open  (was: Patch Available)

> MM table split computation is very slow
> ---
>
> Key: HIVE-22964
> URL: https://issues.apache.org/jira/browse/HIVE-22964
> Project: Hive
>  Issue Type: Improvement
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22964.1.patch, HIVE-22964.patch
>
>
> Since for MM table we process the paths prior to inputFormat.getSplits() we 
> end up doing listing on the whole table at once. This could be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22964) MM table split computation is very slow

2020-03-10 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-22964:
---
Attachment: HIVE-22964.1.patch
Status: Patch Available  (was: Open)

> MM table split computation is very slow
> ---
>
> Key: HIVE-22964
> URL: https://issues.apache.org/jira/browse/HIVE-22964
> Project: Hive
>  Issue Type: Improvement
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22964.1.patch, HIVE-22964.patch
>
>
> Since for MM table we process the paths prior to inputFormat.getSplits() we 
> end up doing listing on the whole table at once. This could be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22964) MM table split computation is very slow

2020-03-05 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052059#comment-17052059
 ] 

Aditya Shah commented on HIVE-22964:


Hi [~pvary],
 * I've propagated ugi considering HIVE-13120. Hence a separate class too.
 * MMPathInfo is required as we'll populate two lists shared across threads.

I cill correct considering the rest of your comments and upload a patch again. 
Thanks!

> MM table split computation is very slow
> ---
>
> Key: HIVE-22964
> URL: https://issues.apache.org/jira/browse/HIVE-22964
> Project: Hive
>  Issue Type: Improvement
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22964.patch
>
>
> Since for MM table we process the paths prior to inputFormat.getSplits() we 
> end up doing listing on the whole table at once. This could be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-22964) MM table split computation is very slow

2020-03-05 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052059#comment-17052059
 ] 

Aditya Shah edited comment on HIVE-22964 at 3/5/20, 11:59 AM:
--

Hi [~pvary],
 * I've propagated ugi considering HIVE-13120. Hence a separate class too.
 * MMPathInfo is required as we'll populate two lists shared across threads.

I will correct considering the rest of your comments and upload a patch again. 
Thanks!


was (Author: aditya-shah):
Hi [~pvary],
 * I've propagated ugi considering HIVE-13120. Hence a separate class too.
 * MMPathInfo is required as we'll populate two lists shared across threads.

I cill correct considering the rest of your comments and upload a patch again. 
Thanks!

> MM table split computation is very slow
> ---
>
> Key: HIVE-22964
> URL: https://issues.apache.org/jira/browse/HIVE-22964
> Project: Hive
>  Issue Type: Improvement
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22964.patch
>
>
> Since for MM table we process the paths prior to inputFormat.getSplits() we 
> end up doing listing on the whole table at once. This could be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22964) MM table split computation is very slow

2020-03-05 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051983#comment-17051983
 ] 

Aditya Shah commented on HIVE-22964:


Hi [~pvary], thanks for reviewing.

For Conf reuse, I was thinking of reusing "HIVE_ORC_COMPUTE_SPLITS_NUM_THREADS" 
and renaming it to "HIVE_COMPUTE_SPLITS_NUM_THREADS". Should that be fine? 



Also, for the second point are you referring to shutting down and canceling 
futures in case one of the thread fails? I have done some handling for that 
case. Am I understanding it correctly?

> MM table split computation is very slow
> ---
>
> Key: HIVE-22964
> URL: https://issues.apache.org/jira/browse/HIVE-22964
> Project: Hive
>  Issue Type: Improvement
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22964.patch
>
>
> Since for MM table we process the paths prior to inputFormat.getSplits() we 
> end up doing listing on the whole table at once. This could be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-22964) MM table split computation is very slow

2020-03-02 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049932#comment-17049932
 ] 

Aditya Shah edited comment on HIVE-22964 at 3/3/20 6:22 AM:


Adding threadpool to processPathforMMRead

[~gopalv] [~pvary] [~prasanth_j] can you please review. Thanks!


was (Author: aditya-shah):
Adding threadpool to processPathforMMRead

> MM table split computation is very slow
> ---
>
> Key: HIVE-22964
> URL: https://issues.apache.org/jira/browse/HIVE-22964
> Project: Hive
>  Issue Type: Improvement
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22964.patch
>
>
> Since for MM table we process the paths prior to inputFormat.getSplits() we 
> end up doing listing on the whole table at once. This could be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22964) MM table split computation is very slow

2020-03-02 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-22964:
---
Attachment: HIVE-22964.patch
Status: Patch Available  (was: Open)

Adding threadpool to processPathforMMRead

> MM table split computation is very slow
> ---
>
> Key: HIVE-22964
> URL: https://issues.apache.org/jira/browse/HIVE-22964
> Project: Hive
>  Issue Type: Improvement
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22964.patch
>
>
> Since for MM table we process the paths prior to inputFormat.getSplits() we 
> end up doing listing on the whole table at once. This could be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22964) MM table split computation is very slow

2020-03-02 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah reassigned HIVE-22964:
--


> MM table split computation is very slow
> ---
>
> Key: HIVE-22964
> URL: https://issues.apache.org/jira/browse/HIVE-22964
> Project: Hive
>  Issue Type: Improvement
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
>
> Since for MM table we process the paths prior to inputFormat.getSplits() we 
> end up doing listing on the whole table at once. This could be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21225) ACID: getAcidState() should cache a recursive dir listing locally

2020-03-02 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049879#comment-17049879
 ] 

Aditya Shah commented on HIVE-21225:


[~gopalv]  I further noticed Hive-22001. It seems we are swallowing the fnf 
exception in the case where we do the listing to populate the cache. So, we 
could have always done this in case of multiple listings as well since the 
snapshot will be consistent once the valid Txn Write Ids list is made. And as I 
already pointed out the performance loss due to this, should we have avoided 
this?

> ACID: getAcidState() should cache a recursive dir listing locally
> -
>
> Key: HIVE-21225
> URL: https://issues.apache.org/jira/browse/HIVE-21225
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Gopal Vijayaraghavan
>Assignee: Vaibhav Gumashta
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21225.1.patch, HIVE-21225.10.patch, 
> HIVE-21225.11.patch, HIVE-21225.12.patch, HIVE-21225.13.patch, 
> HIVE-21225.14.patch, HIVE-21225.15.patch, HIVE-21225.15.patch, 
> HIVE-21225.16.patch, HIVE-21225.17.patch, HIVE-21225.2.patch, 
> HIVE-21225.3.patch, HIVE-21225.4.patch, HIVE-21225.4.patch, 
> HIVE-21225.5.patch, HIVE-21225.6.patch, HIVE-21225.7.patch, 
> HIVE-21225.7.patch, HIVE-21225.8.patch, HIVE-21225.9.patch, async-pid-44-2.svg
>
>
> Currently getAcidState() makes 3 calls into the FS api which could be 
> answered by making a single recursive listDir call and reusing the same data 
> to check for isRawFormat() and isValidBase().
> All delta operations for a single partition can go against a single listed 
> directory snapshot instead of interacting with the NameNode or ObjectStore 
> within the inner loop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-21225) ACID: getAcidState() should cache a recursive dir listing locally

2020-02-16 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah resolved HIVE-21225.

Resolution: Fixed

> ACID: getAcidState() should cache a recursive dir listing locally
> -
>
> Key: HIVE-21225
> URL: https://issues.apache.org/jira/browse/HIVE-21225
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Gopal Vijayaraghavan
>Assignee: Vaibhav Gumashta
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21225.1.patch, HIVE-21225.10.patch, 
> HIVE-21225.11.patch, HIVE-21225.12.patch, HIVE-21225.13.patch, 
> HIVE-21225.14.patch, HIVE-21225.15.patch, HIVE-21225.15.patch, 
> HIVE-21225.16.patch, HIVE-21225.17.patch, HIVE-21225.2.patch, 
> HIVE-21225.3.patch, HIVE-21225.4.patch, HIVE-21225.4.patch, 
> HIVE-21225.5.patch, HIVE-21225.6.patch, HIVE-21225.7.patch, 
> HIVE-21225.7.patch, HIVE-21225.8.patch, HIVE-21225.9.patch, async-pid-44-2.svg
>
>
> Currently getAcidState() makes 3 calls into the FS api which could be 
> answered by making a single recursive listDir call and reusing the same data 
> to check for isRawFormat() and isValidBase().
> All delta operations for a single partition can go against a single listed 
> directory snapshot instead of interacting with the NameNode or ObjectStore 
> within the inner loop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21225) ACID: getAcidState() should cache a recursive dir listing locally

2020-02-14 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17036891#comment-17036891
 ] 

Aditya Shah commented on HIVE-21225:


Still couldn't attach files. Image hosted: 

1. [https://imgur.com/tpgP37g]

2. [https://imgur.com/Pradd7e]

> ACID: getAcidState() should cache a recursive dir listing locally
> -
>
> Key: HIVE-21225
> URL: https://issues.apache.org/jira/browse/HIVE-21225
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Gopal Vijayaraghavan
>Assignee: Vaibhav Gumashta
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21225.1.patch, HIVE-21225.10.patch, 
> HIVE-21225.11.patch, HIVE-21225.12.patch, HIVE-21225.13.patch, 
> HIVE-21225.14.patch, HIVE-21225.15.patch, HIVE-21225.15.patch, 
> HIVE-21225.16.patch, HIVE-21225.17.patch, HIVE-21225.2.patch, 
> HIVE-21225.3.patch, HIVE-21225.4.patch, HIVE-21225.4.patch, 
> HIVE-21225.5.patch, HIVE-21225.6.patch, HIVE-21225.7.patch, 
> HIVE-21225.7.patch, HIVE-21225.8.patch, HIVE-21225.9.patch, async-pid-44-2.svg
>
>
> Currently getAcidState() makes 3 calls into the FS api which could be 
> answered by making a single recursive listDir call and reusing the same data 
> to check for isRawFormat() and isValidBase().
> All delta operations for a single partition can go against a single listed 
> directory snapshot instead of interacting with the NameNode or ObjectStore 
> within the inner loop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HIVE-21225) ACID: getAcidState() should cache a recursive dir listing locally

2020-02-14 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah reopened HIVE-21225:


I have reopened the issue as I couldn't attach new files. [~gopalv] do let me 
know should I move the discussion to a new Jira. Thanks!

> ACID: getAcidState() should cache a recursive dir listing locally
> -
>
> Key: HIVE-21225
> URL: https://issues.apache.org/jira/browse/HIVE-21225
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Gopal Vijayaraghavan
>Assignee: Vaibhav Gumashta
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21225.1.patch, HIVE-21225.10.patch, 
> HIVE-21225.11.patch, HIVE-21225.12.patch, HIVE-21225.13.patch, 
> HIVE-21225.14.patch, HIVE-21225.15.patch, HIVE-21225.15.patch, 
> HIVE-21225.16.patch, HIVE-21225.17.patch, HIVE-21225.2.patch, 
> HIVE-21225.3.patch, HIVE-21225.4.patch, HIVE-21225.4.patch, 
> HIVE-21225.5.patch, HIVE-21225.6.patch, HIVE-21225.7.patch, 
> HIVE-21225.7.patch, HIVE-21225.8.patch, HIVE-21225.9.patch, async-pid-44-2.svg
>
>
> Currently getAcidState() makes 3 calls into the FS api which could be 
> answered by making a single recursive listDir call and reusing the same data 
> to check for isRawFormat() and isValidBase().
> All delta operations for a single partition can go against a single listed 
> directory snapshot instead of interacting with the NameNode or ObjectStore 
> within the inner loop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21225) ACID: getAcidState() should cache a recursive dir listing locally

2020-02-14 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17036870#comment-17036870
 ] 

Aditya Shah commented on HIVE-21225:


[~gopalv] 

Thanks for your reply. I'm not using S3guard which might have resulted in a 
significant amount of time for S3 calls. But, as far as the numbers are 
concerned I did attach a profiler for the same and have attached the flame 
graphs for cases (row 2, col 2) and (row 2, col 3) from the above comment:

As you can see the listing takes around 96% of the time. Even the 
amazonHttpClient calls were 29k vs 520k. My concerns/doubts were the following 
two:

1) As you pointed out the correctness issue. Any plan to backport to Hive3.1.1.
2) Should we have additional optimization for listing in place (something 
similar to getInputPath's threadpool)

> ACID: getAcidState() should cache a recursive dir listing locally
> -
>
> Key: HIVE-21225
> URL: https://issues.apache.org/jira/browse/HIVE-21225
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Gopal Vijayaraghavan
>Assignee: Vaibhav Gumashta
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21225.1.patch, HIVE-21225.10.patch, 
> HIVE-21225.11.patch, HIVE-21225.12.patch, HIVE-21225.13.patch, 
> HIVE-21225.14.patch, HIVE-21225.15.patch, HIVE-21225.15.patch, 
> HIVE-21225.16.patch, HIVE-21225.17.patch, HIVE-21225.2.patch, 
> HIVE-21225.3.patch, HIVE-21225.4.patch, HIVE-21225.4.patch, 
> HIVE-21225.5.patch, HIVE-21225.6.patch, HIVE-21225.7.patch, 
> HIVE-21225.7.patch, HIVE-21225.8.patch, HIVE-21225.9.patch, async-pid-44-2.svg
>
>
> Currently getAcidState() makes 3 calls into the FS api which could be 
> answered by making a single recursive listDir call and reusing the same data 
> to check for isRawFormat() and isValidBase().
> All delta operations for a single partition can go against a single listed 
> directory snapshot instead of interacting with the NameNode or ObjectStore 
> within the inner loop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21225) ACID: getAcidState() should cache a recursive dir listing locally

2020-02-13 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17036142#comment-17036142
 ] 

Aditya Shah commented on HIVE-21225:


[~vgumashta]


I had some doubts similar to what [~vgarg] raised before. Introducing caching 
which stores the whole status object of each directory is quite expensive for 
S3. Since we only did listStatus prior to this in getAcidState it was very 
fast. The overhead compared to the benefit where we use the statuses once per 
delta directory (After HIVE-21177) to determine RawFormat seems very high. 


I evaluated 2 examples of tables. One (non-partitioned) with around 900 files 
in each delta directory and 3 deltas, and other (100 partitions, 40 deltas and 
45 files each). The matrix for time for split computation in each was as 
follows:

 
||Table||Hive version 3.1.1||With HIVE-21177||With HIVE-21225, HIVE-22537, and 
HIVE-21177||
|3 deltas, 900 files|798s|1s|367s|
|100 partitions,40 deltas, 45 files|12952s|70s|942s|

 

Am I missing something here?

> ACID: getAcidState() should cache a recursive dir listing locally
> -
>
> Key: HIVE-21225
> URL: https://issues.apache.org/jira/browse/HIVE-21225
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Gopal Vijayaraghavan
>Assignee: Vaibhav Gumashta
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21225.1.patch, HIVE-21225.10.patch, 
> HIVE-21225.11.patch, HIVE-21225.12.patch, HIVE-21225.13.patch, 
> HIVE-21225.14.patch, HIVE-21225.15.patch, HIVE-21225.15.patch, 
> HIVE-21225.16.patch, HIVE-21225.17.patch, HIVE-21225.2.patch, 
> HIVE-21225.3.patch, HIVE-21225.4.patch, HIVE-21225.4.patch, 
> HIVE-21225.5.patch, HIVE-21225.6.patch, HIVE-21225.7.patch, 
> HIVE-21225.7.patch, HIVE-21225.8.patch, HIVE-21225.9.patch, async-pid-44-2.svg
>
>
> Currently getAcidState() makes 3 calls into the FS api which could be 
> answered by making a single recursive listDir call and reusing the same data 
> to check for isRawFormat() and isValidBase().
> All delta operations for a single partition can go against a single listed 
> directory snapshot instead of interacting with the NameNode or ObjectStore 
> within the inner loop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-22764) Create new command for "optimize" compaction and have basic implementation.

2020-01-23 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021893#comment-17021893
 ] 

Aditya Shah edited comment on HIVE-22764 at 1/23/20 9:46 AM:
-

I have submitted a basic implementation for the new compaction. [~pvary] 
[~lpinter] can you please review the patch as well as the 
doc.[[https://docs.google.com/document/d/10zWk7FR6I0CMy57Uykbkcox4HZTMQv2sgLoZrHVeLYU/edit?usp=sharing]]

Thanks,

Aditya


was (Author: aditya-shah):
I have submitted a basic implementation for the new compaction. [~pvary] 
[~lpinter] can you please review the patch as well as the 
[[https://docs.google.com/document/d/10zWk7FR6I0CMy57Uykbkcox4HZTMQv2sgLoZrHVeLYU/edit?usp=sharing]]

Thanks,

Aditya

> Create new command for "optimize" compaction and have basic implementation.
> ---
>
> Key: HIVE-22764
> URL: https://issues.apache.org/jira/browse/HIVE-22764
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22764.patch
>
>
> Created new blocking compaction (added compaction type "optimize") by adding 
> a lock request on the compaction's transaction. It works mostly like 
> mmMajorCompaction and writes files w/o row_IDs. I have added an additional 
> table property to provide optimize columns that is used by the compactor to 
> cluster the data by. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-22764) Create new command for "optimize" compaction and have basic implementation.

2020-01-23 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021893#comment-17021893
 ] 

Aditya Shah edited comment on HIVE-22764 at 1/23/20 9:46 AM:
-

I have submitted a basic implementation for the new compaction. [~pvary] 
[~lpinter] can you please review the patch as well as the 
[[https://docs.google.com/document/d/10zWk7FR6I0CMy57Uykbkcox4HZTMQv2sgLoZrHVeLYU/edit?usp=sharing]]

Thanks,

Aditya


was (Author: aditya-shah):
I have submitted a basic implementation for the new compaction. [~pvary] 
[~lpinter] can you please review the patch as well as the 
[doc|[http://example.com|https://docs.google.com/document/d/10zWk7FR6I0CMy57Uykbkcox4HZTMQv2sgLoZrHVeLYU/edit?usp=sharing]]

Thanks,

Aditya

> Create new command for "optimize" compaction and have basic implementation.
> ---
>
> Key: HIVE-22764
> URL: https://issues.apache.org/jira/browse/HIVE-22764
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22764.patch
>
>
> Created new blocking compaction (added compaction type "optimize") by adding 
> a lock request on the compaction's transaction. It works mostly like 
> mmMajorCompaction and writes files w/o row_IDs. I have added an additional 
> table property to provide optimize columns that is used by the compactor to 
> cluster the data by. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22764) Create new command for "optimize" compaction and have basic implementation.

2020-01-23 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-22764:
---
Attachment: HIVE-22764.patch
Status: Patch Available  (was: Open)

I have submitted a basic implementation for the new compaction. [~pvary] 
[~lpinter] can you please review the patch as well as the 
[doc|[http://example.com|https://docs.google.com/document/d/10zWk7FR6I0CMy57Uykbkcox4HZTMQv2sgLoZrHVeLYU/edit?usp=sharing]]

Thanks,

Aditya

> Create new command for "optimize" compaction and have basic implementation.
> ---
>
> Key: HIVE-22764
> URL: https://issues.apache.org/jira/browse/HIVE-22764
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22764.patch
>
>
> Created new blocking compaction (added compaction type "optimize") by adding 
> a lock request on the compaction's transaction. It works mostly like 
> mmMajorCompaction and writes files w/o row_IDs. I have added an additional 
> table property to provide optimize columns that is used by the compactor to 
> cluster the data by. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22764) Create new command for "optimize" compaction and have basic implementation.

2020-01-23 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah reassigned HIVE-22764:
--


> Create new command for "optimize" compaction and have basic implementation.
> ---
>
> Key: HIVE-22764
> URL: https://issues.apache.org/jira/browse/HIVE-22764
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
>
> Created new blocking compaction (added compaction type "optimize") by adding 
> a lock request on the compaction's transaction. It works mostly like 
> mmMajorCompaction and writes files w/o row_IDs. I have added an additional 
> table property to provide optimize columns that is used by the compactor to 
> cluster the data by. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22561) Data loss on map join for bucketed, partitioned table

2019-12-16 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-22561:
---
Status: Open  (was: Patch Available)

> Data loss on map join for bucketed, partitioned table
> -
>
> Key: HIVE-22561
> URL: https://issues.apache.org/jira/browse/HIVE-22561
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Blocker
> Fix For: 3.1.0, 3.0.0
>
> Attachments: HIVE-22561.1.branch-3.1.patch, 
> HIVE-22561.branch-3.1.patch, HIVE-22561.patch, Screenshot 2019-11-28 at 
> 8.45.17 PM.png, image-2019-11-28-20-46-25-432.png
>
>
> A map join on a column (which is neither involved in bucketing and partition) 
> causes data loss. 
> Steps to reproduce:
> Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.2.
> Create tables:
>  
> {code:java}
> CREATE TABLE `testj2`(
>   `id` int, 
>   `bn` string, 
>   `cn` string, 
>   `ad` map, 
>   `mi` array)
> PARTITIONED BY ( 
>   `br` string)
> CLUSTERED BY ( 
>   bn) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> CREATE TABLE `testj1`(
>   `id` int, 
>   `can` string, 
>   `cn` string, 
>   `ad` map, 
>   `av` boolean, 
>   `mi` array)
> PARTITIONED BY ( 
>   `brand` string)
> CLUSTERED BY ( 
>   can) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> {code}
> insert some data in both:
> {code:java}
> insert into testj1 values (100, 'mes_1', 'customer_1',  map('city1', 560077), 
> false, array(5, 10), 'brand_1'),
> (101, 'mes_2', 'customer_2',  map('city2', 560078), true, array(10, 20), 
> 'brand_2'),
> (102, 'mes_3', 'customer_3',  map('city3', 560079), false, array(15, 30), 
> 'brand_3'),
> (103, 'mes_4', 'customer_4',  map('city4', 560080), true, array(20, 40), 
> 'brand_4'),
> (104, 'mes_5', 'customer_5',  map('city5', 560081), false, array(25, 50), 
> 'brand_5');
> insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', 
> 560076),array(0, 0, 0), 'tv'),
> (101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv'),
> (102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv'),
> (103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv'),
> (104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv');
> {code}
> Do a join between them:
> {code:java}
> select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj2 t2 
> on (t1.id = t2.id) order by t1.id;
> {code}
> Observed results:
> !image-2019-11-28-20-46-25-432.png|width=524,height=100!
> In the plan, I can see a map join. Disabling it gives the correct result.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22561) Data loss on map join for bucketed, partitioned table

2019-12-16 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-22561:
---
Attachment: HIVE-22561.1.branch-3.1.patch
Status: Patch Available  (was: Open)

> Data loss on map join for bucketed, partitioned table
> -
>
> Key: HIVE-22561
> URL: https://issues.apache.org/jira/browse/HIVE-22561
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Blocker
> Fix For: 3.1.0, 3.0.0
>
> Attachments: HIVE-22561.1.branch-3.1.patch, 
> HIVE-22561.branch-3.1.patch, HIVE-22561.patch, Screenshot 2019-11-28 at 
> 8.45.17 PM.png, image-2019-11-28-20-46-25-432.png
>
>
> A map join on a column (which is neither involved in bucketing and partition) 
> causes data loss. 
> Steps to reproduce:
> Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.2.
> Create tables:
>  
> {code:java}
> CREATE TABLE `testj2`(
>   `id` int, 
>   `bn` string, 
>   `cn` string, 
>   `ad` map, 
>   `mi` array)
> PARTITIONED BY ( 
>   `br` string)
> CLUSTERED BY ( 
>   bn) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> CREATE TABLE `testj1`(
>   `id` int, 
>   `can` string, 
>   `cn` string, 
>   `ad` map, 
>   `av` boolean, 
>   `mi` array)
> PARTITIONED BY ( 
>   `brand` string)
> CLUSTERED BY ( 
>   can) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> {code}
> insert some data in both:
> {code:java}
> insert into testj1 values (100, 'mes_1', 'customer_1',  map('city1', 560077), 
> false, array(5, 10), 'brand_1'),
> (101, 'mes_2', 'customer_2',  map('city2', 560078), true, array(10, 20), 
> 'brand_2'),
> (102, 'mes_3', 'customer_3',  map('city3', 560079), false, array(15, 30), 
> 'brand_3'),
> (103, 'mes_4', 'customer_4',  map('city4', 560080), true, array(20, 40), 
> 'brand_4'),
> (104, 'mes_5', 'customer_5',  map('city5', 560081), false, array(25, 50), 
> 'brand_5');
> insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', 
> 560076),array(0, 0, 0), 'tv'),
> (101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv'),
> (102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv'),
> (103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv'),
> (104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv');
> {code}
> Do a join between them:
> {code:java}
> select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj2 t2 
> on (t1.id = t2.id) order by t1.id;
> {code}
> Observed results:
> !image-2019-11-28-20-46-25-432.png|width=524,height=100!
> In the plan, I can see a map join. Disabling it gives the correct result.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-22636) Data loss on skewjoin for ACID tables.

2019-12-12 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994530#comment-16994530
 ] 

Aditya Shah edited comment on HIVE-22636 at 12/12/19 10:59 AM:
---

[~kgyrtkirk] [~sershe] can you please take a look. I can add a check similar to 
HIVE-16051 in SkewJoinResolver for full acid too. But if there is a better way, 
we can do that?

Thanks,
 Aditya


was (Author: aditya-shah):
[~kgyrtkirk] [~sershe] can you please take a look. I can add a similar check 
similar to HIVE-16051 in SkewJoinResolver for full acid too. But if there is a 
better way, we can do that?

Thanks,
Aditya

> Data loss on skewjoin for ACID tables.
> --
>
> Key: HIVE-22636
> URL: https://issues.apache.org/jira/browse/HIVE-22636
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Aditya Shah
>Priority: Blocker
>
> I am trying to do a skewjoin and writing the result into a FullAcid table. 
> The results are incorrect. The issue is similar to seen for MM tables in 
> HIVE-16051 where the fix was to skip having a skewjoin for MM table. 
> Steps to reproduce:
> Used a qtest similar to HIVE-16051:
> {code:java}
> --! qt:dataset:src1
> --! qt:dataset:src
> -- MASK_LINEAGE
> set hive.mapred.mode=nonstrict;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.optimize.skewjoin=true;
> set hive.skewjoin.key=2;
> set hive.optimize.metadataonly=false;
> CREATE TABLE skewjoin_acid(key INT, value STRING) STORED AS ORC tblproperties 
> ("transactional"="true");
> FROM src src1 JOIN src src2 ON (src1.key = src2.key) INSERT into TABLE 
> skewjoin_acid SELECT src1.key, src2.value;
> select count(distinct key) from skewjoin_acid;
> drop table skewjoin_acid;
> {code}
> The expected result for the count was 309 but got 173. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22636) Data loss on skewjoin for ACID tables.

2019-12-12 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994530#comment-16994530
 ] 

Aditya Shah commented on HIVE-22636:


[~kgyrtkirk] [~sershe] can you please take a look. I can add a similar check 
similar to HIVE-16051 in SkewJoinResolver for full acid too. But if there is a 
better way, we can do that?

Thanks,
Aditya

> Data loss on skewjoin for ACID tables.
> --
>
> Key: HIVE-22636
> URL: https://issues.apache.org/jira/browse/HIVE-22636
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Aditya Shah
>Priority: Blocker
>
> I am trying to do a skewjoin and writing the result into a FullAcid table. 
> The results are incorrect. The issue is similar to seen for MM tables in 
> HIVE-16051 where the fix was to skip having a skewjoin for MM table. 
> Steps to reproduce:
> Used a qtest similar to HIVE-16051:
> {code:java}
> --! qt:dataset:src1
> --! qt:dataset:src
> -- MASK_LINEAGE
> set hive.mapred.mode=nonstrict;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.optimize.skewjoin=true;
> set hive.skewjoin.key=2;
> set hive.optimize.metadataonly=false;
> CREATE TABLE skewjoin_acid(key INT, value STRING) STORED AS ORC tblproperties 
> ("transactional"="true");
> FROM src src1 JOIN src src2 ON (src1.key = src2.key) INSERT into TABLE 
> skewjoin_acid SELECT src1.key, src2.value;
> select count(distinct key) from skewjoin_acid;
> drop table skewjoin_acid;
> {code}
> The expected result for the count was 309 but got 173. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22561) Data loss on map join for bucketed, partitioned table

2019-12-12 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994365#comment-16994365
 ] 

Aditya Shah commented on HIVE-22561:


[~jcamachorodriguez] it seems to me that the profile for branch-3.1 does not 
run even if I submit the patch with that name. Can you please check once and 
let me know if I'm missing something here?

Thanks,
Aditya

> Data loss on map join for bucketed, partitioned table
> -
>
> Key: HIVE-22561
> URL: https://issues.apache.org/jira/browse/HIVE-22561
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Blocker
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-22561.branch-3.1.patch, HIVE-22561.patch, 
> Screenshot 2019-11-28 at 8.45.17 PM.png, image-2019-11-28-20-46-25-432.png
>
>
> A map join on a column (which is neither involved in bucketing and partition) 
> causes data loss. 
> Steps to reproduce:
> Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.2.
> Create tables:
>  
> {code:java}
> CREATE TABLE `testj2`(
>   `id` int, 
>   `bn` string, 
>   `cn` string, 
>   `ad` map, 
>   `mi` array)
> PARTITIONED BY ( 
>   `br` string)
> CLUSTERED BY ( 
>   bn) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> CREATE TABLE `testj1`(
>   `id` int, 
>   `can` string, 
>   `cn` string, 
>   `ad` map, 
>   `av` boolean, 
>   `mi` array)
> PARTITIONED BY ( 
>   `brand` string)
> CLUSTERED BY ( 
>   can) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> {code}
> insert some data in both:
> {code:java}
> insert into testj1 values (100, 'mes_1', 'customer_1',  map('city1', 560077), 
> false, array(5, 10), 'brand_1'),
> (101, 'mes_2', 'customer_2',  map('city2', 560078), true, array(10, 20), 
> 'brand_2'),
> (102, 'mes_3', 'customer_3',  map('city3', 560079), false, array(15, 30), 
> 'brand_3'),
> (103, 'mes_4', 'customer_4',  map('city4', 560080), true, array(20, 40), 
> 'brand_4'),
> (104, 'mes_5', 'customer_5',  map('city5', 560081), false, array(25, 50), 
> 'brand_5');
> insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', 
> 560076),array(0, 0, 0), 'tv'),
> (101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv'),
> (102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv'),
> (103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv'),
> (104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv');
> {code}
> Do a join between them:
> {code:java}
> select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj2 t2 
> on (t1.id = t2.id) order by t1.id;
> {code}
> Observed results:
> !image-2019-11-28-20-46-25-432.png|width=524,height=100!
> In the plan, I can see a map join. Disabling it gives the correct result.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21917) COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs

2019-12-11 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994322#comment-16994322
 ] 

Aditya Shah commented on HIVE-21917:


[~pvary] the follow-up fix is HIVE-22625. Can you please take a look at that 
too. 

Thanks,
Aditya

> COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs
> 
>
> Key: HIVE-21917
> URL: https://issues.apache.org/jira/browse/HIVE-21917
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.1.0, 3.1.1
>Reporter: Craig Condit
>Assignee: Denys Kuzmenko
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21917.1.patch, HIVE-21917.2.patch, 
> HIVE-21917.3.patch, HIVE-21917.4.patch, HIVE-21917.5.patch, HIVE-21917.6.patch
>
>
> The Initiator thread in the metastore repeatedly loops over entries in the 
> COMPLETED_TXN_COMPONENTS table to determine which partitions / tables might 
> need to be compacted. However, entries are never removed from this table 
> except by a completed Compactor run.
> In a cluster where most tables / partitions are write-once read-many, this 
> results in stale entries in this table never being cleaned up. In a small 
> test cluster, we have observed approximately 45k entries in this table 
> (virtually equal to the number of partitions in the cluster) while < 100 of 
> these tables have delta files at all. Since most of the tables will never get 
> enough writes to trigger a compaction (and in fact have only ever been 
> written to once), the initiator thread keeps trying to evaluate them on every 
> loop.
> On this test cluster, it takes approximately 10 minutes to loop through all 
> the entries and results in severe performance degradation on metastore 
> operations. With the default run timing of 5 minutes, the initiator basically 
> never stops running.
> On a production cluster with 2M partitions, this would be a non-starter.
> The initiator thread should proactively remove entries from 
> COMPLETED_TXN_COMPONENTS when it determines that a compaction is not needed, 
> so that they are not evaluated again on the next loop.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22625) Syntax Error in findPotentialCompactions SQL query for MySql/Postgres

2019-12-11 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993441#comment-16993441
 ] 

Aditya Shah commented on HIVE-22625:


+1 Lgtm

> Syntax Error in findPotentialCompactions SQL query for MySql/Postgres
> -
>
> Key: HIVE-22625
> URL: https://issues.apache.org/jira/browse/HIVE-22625
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
> Attachments: HIVE-22625.1.patch
>
>
> {code}
> ERROR 1064 (42000): You have an error in your SQL syntax; check the manual 
> that corresponds to your MySQL server version for the right syntax to use 
> near '=> current_timestamp - interval '254' second'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21917) COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs

2019-12-11 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993366#comment-16993366
 ] 

Aditya Shah commented on HIVE-21917:


[~dkuzmenko] I'm getting an error after this fix. On some analysis I figured 
the greater than or equal symbol ("=>" of ">=" instead) is incorrect for MySQL 
(perhaps postgres too) database while checking the interval in "TxnHandler" 
class. The error I'm getting is as follows:
{code:java}
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that 
corresponds to your MySQL server version for the right syntax to use near '=> 
current_timestamp - interval '254' second'
{code}

> COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs
> 
>
> Key: HIVE-21917
> URL: https://issues.apache.org/jira/browse/HIVE-21917
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.1.0, 3.1.1
>Reporter: Craig Condit
>Assignee: Denys Kuzmenko
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21917.1.patch, HIVE-21917.2.patch, 
> HIVE-21917.3.patch, HIVE-21917.4.patch, HIVE-21917.5.patch, HIVE-21917.6.patch
>
>
> The Initiator thread in the metastore repeatedly loops over entries in the 
> COMPLETED_TXN_COMPONENTS table to determine which partitions / tables might 
> need to be compacted. However, entries are never removed from this table 
> except by a completed Compactor run.
> In a cluster where most tables / partitions are write-once read-many, this 
> results in stale entries in this table never being cleaned up. In a small 
> test cluster, we have observed approximately 45k entries in this table 
> (virtually equal to the number of partitions in the cluster) while < 100 of 
> these tables have delta files at all. Since most of the tables will never get 
> enough writes to trigger a compaction (and in fact have only ever been 
> written to once), the initiator thread keeps trying to evaluate them on every 
> loop.
> On this test cluster, it takes approximately 10 minutes to loop through all 
> the entries and results in severe performance degradation on metastore 
> operations. With the default run timing of 5 minutes, the initiator basically 
> never stops running.
> On a production cluster with 2M partitions, this would be a non-starter.
> The initiator thread should proactively remove entries from 
> COMPLETED_TXN_COMPONENTS when it determines that a compaction is not needed, 
> so that they are not evaluated again on the next loop.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22561) Data loss on map join for bucketed, partitioned table

2019-12-09 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-22561:
---
Attachment: HIVE-22561.branch-3.1.patch
Status: Patch Available  (was: Open)

Submitting patch with the correct name to run with branch-3.1 profile.

> Data loss on map join for bucketed, partitioned table
> -
>
> Key: HIVE-22561
> URL: https://issues.apache.org/jira/browse/HIVE-22561
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Blocker
> Fix For: 3.1.0, 3.0.0
>
> Attachments: HIVE-22561.branch-3.1.patch, HIVE-22561.patch, 
> Screenshot 2019-11-28 at 8.45.17 PM.png, image-2019-11-28-20-46-25-432.png
>
>
> A map join on a column (which is neither involved in bucketing and partition) 
> causes data loss. 
> Steps to reproduce:
> Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.2.
> Create tables:
>  
> {code:java}
> CREATE TABLE `testj2`(
>   `id` int, 
>   `bn` string, 
>   `cn` string, 
>   `ad` map, 
>   `mi` array)
> PARTITIONED BY ( 
>   `br` string)
> CLUSTERED BY ( 
>   bn) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> CREATE TABLE `testj1`(
>   `id` int, 
>   `can` string, 
>   `cn` string, 
>   `ad` map, 
>   `av` boolean, 
>   `mi` array)
> PARTITIONED BY ( 
>   `brand` string)
> CLUSTERED BY ( 
>   can) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> {code}
> insert some data in both:
> {code:java}
> insert into testj1 values (100, 'mes_1', 'customer_1',  map('city1', 560077), 
> false, array(5, 10), 'brand_1'),
> (101, 'mes_2', 'customer_2',  map('city2', 560078), true, array(10, 20), 
> 'brand_2'),
> (102, 'mes_3', 'customer_3',  map('city3', 560079), false, array(15, 30), 
> 'brand_3'),
> (103, 'mes_4', 'customer_4',  map('city4', 560080), true, array(20, 40), 
> 'brand_4'),
> (104, 'mes_5', 'customer_5',  map('city5', 560081), false, array(25, 50), 
> 'brand_5');
> insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', 
> 560076),array(0, 0, 0), 'tv'),
> (101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv'),
> (102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv'),
> (103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv'),
> (104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv');
> {code}
> Do a join between them:
> {code:java}
> select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj2 t2 
> on (t1.id = t2.id) order by t1.id;
> {code}
> Observed results:
> !image-2019-11-28-20-46-25-432.png|width=524,height=100!
> In the plan, I can see a map join. Disabling it gives the correct result.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22561) Data loss on map join for bucketed, partitioned table

2019-12-09 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-22561:
---
Status: Open  (was: Patch Available)

> Data loss on map join for bucketed, partitioned table
> -
>
> Key: HIVE-22561
> URL: https://issues.apache.org/jira/browse/HIVE-22561
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Blocker
> Fix For: 3.1.0, 3.0.0
>
> Attachments: HIVE-22561.patch, Screenshot 2019-11-28 at 8.45.17 
> PM.png, image-2019-11-28-20-46-25-432.png
>
>
> A map join on a column (which is neither involved in bucketing and partition) 
> causes data loss. 
> Steps to reproduce:
> Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.2.
> Create tables:
>  
> {code:java}
> CREATE TABLE `testj2`(
>   `id` int, 
>   `bn` string, 
>   `cn` string, 
>   `ad` map, 
>   `mi` array)
> PARTITIONED BY ( 
>   `br` string)
> CLUSTERED BY ( 
>   bn) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> CREATE TABLE `testj1`(
>   `id` int, 
>   `can` string, 
>   `cn` string, 
>   `ad` map, 
>   `av` boolean, 
>   `mi` array)
> PARTITIONED BY ( 
>   `brand` string)
> CLUSTERED BY ( 
>   can) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> {code}
> insert some data in both:
> {code:java}
> insert into testj1 values (100, 'mes_1', 'customer_1',  map('city1', 560077), 
> false, array(5, 10), 'brand_1'),
> (101, 'mes_2', 'customer_2',  map('city2', 560078), true, array(10, 20), 
> 'brand_2'),
> (102, 'mes_3', 'customer_3',  map('city3', 560079), false, array(15, 30), 
> 'brand_3'),
> (103, 'mes_4', 'customer_4',  map('city4', 560080), true, array(20, 40), 
> 'brand_4'),
> (104, 'mes_5', 'customer_5',  map('city5', 560081), false, array(25, 50), 
> 'brand_5');
> insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', 
> 560076),array(0, 0, 0), 'tv'),
> (101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv'),
> (102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv'),
> (103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv'),
> (104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv');
> {code}
> Do a join between them:
> {code:java}
> select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj2 t2 
> on (t1.id = t2.id) order by t1.id;
> {code}
> Observed results:
> !image-2019-11-28-20-46-25-432.png|width=524,height=100!
> In the plan, I can see a map join. Disabling it gives the correct result.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22567) Data loss when map join is off ,the result is diffrent when the number of reduce tasks is diffrent;

2019-12-09 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991518#comment-16991518
 ] 

Aditya Shah commented on HIVE-22567:


[~zero_never] can you please attach explain plans for the query. Moreover, I 
faced a similar issue in 3.1.2 (HIVE-22561). Can you please check if the patch 
for HIVE-20187 makes things work fine?

> Data loss when map join is off ,the result is diffrent when the number of 
> reduce tasks is diffrent;
> ---
>
> Key: HIVE-22567
> URL: https://issues.apache.org/jira/browse/HIVE-22567
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.1, 3.1.2
> Environment: select count(1) from (
>     select department_code 
>     from tmp.tmp_mon
>     where business_unit_code in (select business_unit_code from 
> tmp.business_unit_config)
>     group by department_code 
> )tmp
>Reporter: Zhang Xiaoyang
>Assignee: Aditya Shah
>Priority: Blocker
>
> I get diffrent results when the map join is off and the reduce tasks is 
> diffrent !
> the tmp.tmp_mon is a big table and tmp.business_unit_config has only 7 
> records;
> when set hive.auto.convert.join=false,the result is diffrent when the number 
> of the reduce tasks changed;
> set mapred.reduce.tasks=1 the result seems right and when  set 
> mapred.reduce.tasks=2 or other,the result is missing some data;
> what can cause this ? 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22561) Data loss on map join for bucketed, partitioned table

2019-12-09 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-22561:
---
   Attachment: HIVE-22561.patch
Fix Version/s: 3.0.0
   3.1.0
   Status: Patch Available  (was: Open)

The issue is duplicate of HIVE-20187. Since the patch is not pushed to 
branch-3.0 and branch-3.1 in that Jira, I'm adding the same patch here to be 
merged. 

 

cc: [~djaiswal] [~gunther] [~jcamachorodriguez] 

 

> Data loss on map join for bucketed, partitioned table
> -
>
> Key: HIVE-22561
> URL: https://issues.apache.org/jira/browse/HIVE-22561
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Blocker
> Fix For: 3.1.0, 3.0.0
>
> Attachments: HIVE-22561.patch, Screenshot 2019-11-28 at 8.45.17 
> PM.png, image-2019-11-28-20-46-25-432.png
>
>
> A map join on a column (which is neither involved in bucketing and partition) 
> causes data loss. 
> Steps to reproduce:
> Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.2.
> Create tables:
>  
> {code:java}
> CREATE TABLE `testj2`(
>   `id` int, 
>   `bn` string, 
>   `cn` string, 
>   `ad` map, 
>   `mi` array)
> PARTITIONED BY ( 
>   `br` string)
> CLUSTERED BY ( 
>   bn) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> CREATE TABLE `testj1`(
>   `id` int, 
>   `can` string, 
>   `cn` string, 
>   `ad` map, 
>   `av` boolean, 
>   `mi` array)
> PARTITIONED BY ( 
>   `brand` string)
> CLUSTERED BY ( 
>   can) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> {code}
> insert some data in both:
> {code:java}
> insert into testj1 values (100, 'mes_1', 'customer_1',  map('city1', 560077), 
> false, array(5, 10), 'brand_1'),
> (101, 'mes_2', 'customer_2',  map('city2', 560078), true, array(10, 20), 
> 'brand_2'),
> (102, 'mes_3', 'customer_3',  map('city3', 560079), false, array(15, 30), 
> 'brand_3'),
> (103, 'mes_4', 'customer_4',  map('city4', 560080), true, array(20, 40), 
> 'brand_4'),
> (104, 'mes_5', 'customer_5',  map('city5', 560081), false, array(25, 50), 
> 'brand_5');
> insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', 
> 560076),array(0, 0, 0), 'tv'),
> (101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv'),
> (102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv'),
> (103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv'),
> (104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv');
> {code}
> Do a join between them:
> {code:java}
> select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj2 t2 
> on (t1.id = t2.id) order by t1.id;
> {code}
> Observed results:
> !image-2019-11-28-20-46-25-432.png|width=524,height=100!
> In the plan, I can see a map join. Disabling it gives the correct result.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22561) Data loss on map join for bucketed, partitioned table

2019-12-09 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah reassigned HIVE-22561:
--

Assignee: Aditya Shah

> Data loss on map join for bucketed, partitioned table
> -
>
> Key: HIVE-22561
> URL: https://issues.apache.org/jira/browse/HIVE-22561
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Blocker
> Attachments: Screenshot 2019-11-28 at 8.45.17 PM.png, 
> image-2019-11-28-20-46-25-432.png
>
>
> A map join on a column (which is neither involved in bucketing and partition) 
> causes data loss. 
> Steps to reproduce:
> Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.2.
> Create tables:
>  
> {code:java}
> CREATE TABLE `testj2`(
>   `id` int, 
>   `bn` string, 
>   `cn` string, 
>   `ad` map, 
>   `mi` array)
> PARTITIONED BY ( 
>   `br` string)
> CLUSTERED BY ( 
>   bn) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> CREATE TABLE `testj1`(
>   `id` int, 
>   `can` string, 
>   `cn` string, 
>   `ad` map, 
>   `av` boolean, 
>   `mi` array)
> PARTITIONED BY ( 
>   `brand` string)
> CLUSTERED BY ( 
>   can) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> {code}
> insert some data in both:
> {code:java}
> insert into testj1 values (100, 'mes_1', 'customer_1',  map('city1', 560077), 
> false, array(5, 10), 'brand_1'),
> (101, 'mes_2', 'customer_2',  map('city2', 560078), true, array(10, 20), 
> 'brand_2'),
> (102, 'mes_3', 'customer_3',  map('city3', 560079), false, array(15, 30), 
> 'brand_3'),
> (103, 'mes_4', 'customer_4',  map('city4', 560080), true, array(20, 40), 
> 'brand_4'),
> (104, 'mes_5', 'customer_5',  map('city5', 560081), false, array(25, 50), 
> 'brand_5');
> insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', 
> 560076),array(0, 0, 0), 'tv'),
> (101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv'),
> (102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv'),
> (103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv'),
> (104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv');
> {code}
> Do a join between them:
> {code:java}
> select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj2 t2 
> on (t1.id = t2.id) order by t1.id;
> {code}
> Observed results:
> !image-2019-11-28-20-46-25-432.png|width=524,height=100!
> In the plan, I can see a map join. Disabling it gives the correct result.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22567) Data loss when map join is off ,the result is diffrent when the number of reduce tasks is diffrent;

2019-12-09 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah reassigned HIVE-22567:
--

Assignee: Aditya Shah

> Data loss when map join is off ,the result is diffrent when the number of 
> reduce tasks is diffrent;
> ---
>
> Key: HIVE-22567
> URL: https://issues.apache.org/jira/browse/HIVE-22567
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.1, 3.1.2
> Environment: select count(1) from (
>     select department_code 
>     from tmp.tmp_mon
>     where business_unit_code in (select business_unit_code from 
> tmp.business_unit_config)
>     group by department_code 
> )tmp
>Reporter: Zhang Xiaoyang
>Assignee: Aditya Shah
>Priority: Blocker
>
> I get diffrent results when the map join is off and the reduce tasks is 
> diffrent !
> the tmp.tmp_mon is a big table and tmp.business_unit_config has only 7 
> records;
> when set hive.auto.convert.join=false,the result is diffrent when the number 
> of the reduce tasks changed;
> set mapred.reduce.tasks=1 the result seems right and when  set 
> mapred.reduce.tasks=2 or other,the result is missing some data;
> what can cause this ? 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22582) Avoid reading table as ACID when table name is starting with "delta" , but table is not transactional and BI Split Strategy is used

2019-12-06 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989656#comment-16989656
 ] 

Aditya Shah commented on HIVE-22582:


[~pvary] no failures, can you please merge? Thanks!

> Avoid reading table as ACID when table name is starting with "delta" , but 
> table is not transactional and BI Split Strategy is used
> ---
>
> Key: HIVE-22582
> URL: https://issues.apache.org/jira/browse/HIVE-22582
> Project: Hive
>  Issue Type: Bug
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22582.patch
>
>
> The issue is fixed in HIVE-22473 but missed a check for BI Split Strategy.
> Steps to reproduce: 
> {code:java}
> set hive.exec.orc.split.strategy=BI;
> create table delta_result (a int) stored as orc 
> tblproperties('transactional'='false');
> insert into delta_result select 1;
> select * from delta_result;
> {code}
> Exception Stack Trace:
> {code:java}
> Caused by: java.lang.RuntimeException: ORC split generation failed with 
> exception: String index out of range: -1
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1929)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:2016)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.generateWrappedSplits(FetchOperator.java:461)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:430)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:336)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:576)
> ... 50 more
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: -1
> at java.lang.String.substring(String.java:1967)
> at 
> org.apache.hadoop.hive.ql.io.AcidUtils.parsedDelta(AcidUtils.java:1128)
> at 
> org.apache.hadoop.hive.ql.io.AcidUtils$ParsedDeltaLight.parse(AcidUtils.java:921)
> at 
> org.apache.hadoop.hive.ql.io.AcidUtils.getLogicalLength(AcidUtils.java:2084)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$BISplitStrategy.getSplits(OrcInputFormat.java:1115)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1905)
> ... 55 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22004) Non-acid to acid conversion doesn't handle random filenames

2019-12-06 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah reassigned HIVE-22004:
--

Assignee: Aditya Shah

> Non-acid to acid conversion doesn't handle random filenames
> ---
>
> Key: HIVE-22004
> URL: https://issues.apache.org/jira/browse/HIVE-22004
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
>
> Right now the supported filename patterns for non-acid to acid table's files 
> (original files) are the only ones created by Hive itself (eg. 00, 
> 00_COPY_1, bucket_0, etc). But at the same time Hive non-acid table 
> supports reading from tables having files with random filenames. We should 
> support the same for acid tables.
> A way to handle this would be to rename such files and though rename is not a 
> costly operation for HDFS, But for non-acid tables with the location on a 
> blobstore like s3 and having random filenames will have costly added steps to 
> convert to acid.
> Current scenario: What we do now for original files is assign them a logical 
> bucket id and for unrecognized patterns we assign -1 and ignore those files.
> Proposed alternatives:
> 1) For all the random files assume the logical bucket id as 0 and let the 
> files belong to the same bucket in the way similar to we do for multiple 
> files with same bucket id (_copy_N). 
> 2) For all the random files lexicographically sort them and sequentially 
> assign them a bucket id similar to the handling of multiple files for a 
> non-bucketed table where we extract the bucket id simply from filenames



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22582) Avoid reading table as ACID when table name is starting with "delta" , but table is not transactional and BI Split Strategy is used

2019-12-05 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988677#comment-16988677
 ] 

Aditya Shah commented on HIVE-22582:


[~lpinter] [~szita] Can you please review? Thanks!

> Avoid reading table as ACID when table name is starting with "delta" , but 
> table is not transactional and BI Split Strategy is used
> ---
>
> Key: HIVE-22582
> URL: https://issues.apache.org/jira/browse/HIVE-22582
> Project: Hive
>  Issue Type: Bug
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22582.patch
>
>
> The issue is fixed in HIVE-22473 but missed a check for BI Split Strategy.
> Steps to reproduce: 
> {code:java}
> set hive.exec.orc.split.strategy=BI;
> create table delta_result (a int) stored as orc 
> tblproperties('transactional'='false');
> insert into delta_result select 1;
> select * from delta_result;
> {code}
> Exception Stack Trace:
> {code:java}
> Caused by: java.lang.RuntimeException: ORC split generation failed with 
> exception: String index out of range: -1
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1929)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:2016)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.generateWrappedSplits(FetchOperator.java:461)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:430)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:336)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:576)
> ... 50 more
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: -1
> at java.lang.String.substring(String.java:1967)
> at 
> org.apache.hadoop.hive.ql.io.AcidUtils.parsedDelta(AcidUtils.java:1128)
> at 
> org.apache.hadoop.hive.ql.io.AcidUtils$ParsedDeltaLight.parse(AcidUtils.java:921)
> at 
> org.apache.hadoop.hive.ql.io.AcidUtils.getLogicalLength(AcidUtils.java:2084)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$BISplitStrategy.getSplits(OrcInputFormat.java:1115)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1905)
> ... 55 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22582) Avoid reading table as ACID when table name is starting with "delta" , but table is not transactional and BI Split Strategy is used

2019-12-05 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-22582:
---
Attachment: HIVE-22582.patch
Status: Patch Available  (was: Open)

> Avoid reading table as ACID when table name is starting with "delta" , but 
> table is not transactional and BI Split Strategy is used
> ---
>
> Key: HIVE-22582
> URL: https://issues.apache.org/jira/browse/HIVE-22582
> Project: Hive
>  Issue Type: Bug
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22582.patch
>
>
> The issue is fixed in HIVE-22473 but missed a check for BI Split Strategy.
> Steps to reproduce: 
> {code:java}
> set hive.exec.orc.split.strategy=BI;
> create table delta_result (a int) stored as orc 
> tblproperties('transactional'='false');
> insert into delta_result select 1;
> select * from delta_result;
> {code}
> Exception Stack Trace:
> {code:java}
> Caused by: java.lang.RuntimeException: ORC split generation failed with 
> exception: String index out of range: -1
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1929)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:2016)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.generateWrappedSplits(FetchOperator.java:461)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:430)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:336)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:576)
> ... 50 more
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: -1
> at java.lang.String.substring(String.java:1967)
> at 
> org.apache.hadoop.hive.ql.io.AcidUtils.parsedDelta(AcidUtils.java:1128)
> at 
> org.apache.hadoop.hive.ql.io.AcidUtils$ParsedDeltaLight.parse(AcidUtils.java:921)
> at 
> org.apache.hadoop.hive.ql.io.AcidUtils.getLogicalLength(AcidUtils.java:2084)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$BISplitStrategy.getSplits(OrcInputFormat.java:1115)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1905)
> ... 55 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22582) Avoid reading table as ACID when table name is starting with "delta" , but table is not transactional and BI Split Strategy is used

2019-12-05 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah reassigned HIVE-22582:
--

Assignee: Aditya Shah

> Avoid reading table as ACID when table name is starting with "delta" , but 
> table is not transactional and BI Split Strategy is used
> ---
>
> Key: HIVE-22582
> URL: https://issues.apache.org/jira/browse/HIVE-22582
> Project: Hive
>  Issue Type: Bug
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
>
> The issue is fixed in HIVE-22473 but missed a check for BI Split Strategy.
> Steps to reproduce: 
> {code:java}
> set hive.exec.orc.split.strategy=BI;
> create table delta_result (a int) stored as orc 
> tblproperties('transactional'='false');
> insert into delta_result select 1;
> select * from delta_result;
> {code}
> Exception Stack Trace:
> {code:java}
> Caused by: java.lang.RuntimeException: ORC split generation failed with 
> exception: String index out of range: -1
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1929)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:2016)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.generateWrappedSplits(FetchOperator.java:461)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:430)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:336)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:576)
> ... 50 more
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: -1
> at java.lang.String.substring(String.java:1967)
> at 
> org.apache.hadoop.hive.ql.io.AcidUtils.parsedDelta(AcidUtils.java:1128)
> at 
> org.apache.hadoop.hive.ql.io.AcidUtils$ParsedDeltaLight.parse(AcidUtils.java:921)
> at 
> org.apache.hadoop.hive.ql.io.AcidUtils.getLogicalLength(AcidUtils.java:2084)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$BISplitStrategy.getSplits(OrcInputFormat.java:1115)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1905)
> ... 55 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22473) Avoid reading table as ACID when table name is starting with "delta", but table is not transactional

2019-12-04 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988522#comment-16988522
 ] 

Aditya Shah commented on HIVE-22473:


[~lpinter], I faced the same issue after the fix. I think, we should add a 
similar check for table to be acid in BISplitStategy's getSplit function?

> Avoid reading table as ACID when table name is starting with "delta", but 
> table is not transactional
> 
>
> Key: HIVE-22473
> URL: https://issues.apache.org/jira/browse/HIVE-22473
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Laszlo Pinter
>Assignee: Laszlo Pinter
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22473.01.patch, HIVE-22473.02.patch, 
> HIVE-22473.03.patch, HIVE-22473.04.patch
>
>
> {code:sql}
> create table delta_result (a int) stored as orc 
> tblproperties('transactional'='false');
> insert into delta_result select 1;
> select * from delta_result;
> {code}
>  The above query will result in the following exception:
> 2019-11-08T13:49:05,780  WARN [HiveServer2-Handler-Pool: Thread-7906] 
> thrift.ThriftCLIService: Error fetching results:2019-11-08T13:49:05,780  WARN 
> [HiveServer2-Handler-Pool: Thread-7906] thrift.ThriftCLIService: Error 
> fetching results:org.apache.hive.service.cli.HiveSQLException: 
> java.io.IOException: java.lang.RuntimeException: ORC split generation failed 
> with exception: java.lang.StringIndexOutOfBoundsException: String index out 
> of range: -1 at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:481)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at 
> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:331)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at 
> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:946)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at 
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:567) 
> ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:801)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1837)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1822)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at 
> org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at 
> org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_211] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_211] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211]Caused 
> by: java.io.IOException: java.lang.RuntimeException: ORC split generation 
> failed with exception: java.lang.StringIndexOutOfBoundsException: String 
> index out of range: -1 at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:638)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:545) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at 
> org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:151) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at 
> org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2142) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:241)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:476)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] ... 13 moreCaused by: 
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1929)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:2016)
>  

[jira] [Commented] (HIVE-22561) Data loss on map join for bucketed, partitioned table

2019-11-28 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984499#comment-16984499
 ] 

Aditya Shah commented on HIVE-22561:


[~djaiswal] [~prasanth_j] [~jcamachorodriguez] Can you please take a look at 
this. I tried debugging a bit. Some of the observations I made where:
 # The mapjoin operator does not populate the hashtable (hybrid as well as 
normal) completely for each task.
 # The results vary with the number of buckets. 

Is the hashtable distributed in someway according to buckets?

> Data loss on map join for bucketed, partitioned table
> -
>
> Key: HIVE-22561
> URL: https://issues.apache.org/jira/browse/HIVE-22561
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Aditya Shah
>Priority: Blocker
> Attachments: Screenshot 2019-11-28 at 8.45.17 PM.png, 
> image-2019-11-28-20-46-25-432.png
>
>
> A map join on a column (which is neither involved in bucketing and partition) 
> causes data loss. 
> Steps to reproduce:
> Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.2.
> Create tables:
>  
> {code:java}
> CREATE TABLE `testj2`(
>   `id` int, 
>   `bn` string, 
>   `cn` string, 
>   `ad` map, 
>   `mi` array)
> PARTITIONED BY ( 
>   `br` string)
> CLUSTERED BY ( 
>   bn) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> CREATE TABLE `testj1`(
>   `id` int, 
>   `can` string, 
>   `cn` string, 
>   `ad` map, 
>   `av` boolean, 
>   `mi` array)
> PARTITIONED BY ( 
>   `brand` string)
> CLUSTERED BY ( 
>   can) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> {code}
> insert some data in both:
> {code:java}
> insert into testj1 values (100, 'mes_1', 'customer_1',  map('city1', 560077), 
> false, array(5, 10), 'brand_1'),
> (101, 'mes_2', 'customer_2',  map('city2', 560078), true, array(10, 20), 
> 'brand_2'),
> (102, 'mes_3', 'customer_3',  map('city3', 560079), false, array(15, 30), 
> 'brand_3'),
> (103, 'mes_4', 'customer_4',  map('city4', 560080), true, array(20, 40), 
> 'brand_4'),
> (104, 'mes_5', 'customer_5',  map('city5', 560081), false, array(25, 50), 
> 'brand_5');
> insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', 
> 560076),array(0, 0, 0), 'tv'),
> (101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv'),
> (102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv'),
> (103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv'),
> (104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv');
> {code}
> Do a join between them:
> {code:java}
> select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj2 t2 
> on (t1.id = t2.id) order by t1.id;
> {code}
> Observed results:
> !image-2019-11-28-20-46-25-432.png|width=524,height=100!
> In the plan, I can see a map join. Disabling it gives the correct result.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22407) Hive metastore upgrade scripts have incorrect (or outdated) comment syntax

2019-10-31 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-22407:
---
Attachment: HIVE-22407.branch-3.patch

> Hive metastore upgrade scripts have incorrect (or outdated) comment syntax
> --
>
> Key: HIVE-22407
> URL: https://issues.apache.org/jira/browse/HIVE-22407
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0, 3.1.2
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0, 3.1.3
>
> Attachments: HIVE-22407.branch-3.1.patch, HIVE-22407.branch-3.patch, 
> HIVE-22407.patch
>
>
> MySQL has made the single line comment which starts with `--` syntax to have 
> min one space after this. This causes the current upgrade scripts in the 
> standalone-metastore to throw an exception. 
> ref: [https://dev.mysql.com/doc/refman/5.7/en/ansi-diff-comments.html]  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22407) Hive metastore upgrade scripts have incorrect (or outdated) comment syntax

2019-10-31 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963816#comment-16963816
 ] 

Aditya Shah commented on HIVE-22407:


[~pvary] Its duplicate of the one for branch-3.1 itself.

Thanks!

> Hive metastore upgrade scripts have incorrect (or outdated) comment syntax
> --
>
> Key: HIVE-22407
> URL: https://issues.apache.org/jira/browse/HIVE-22407
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0, 3.1.2
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0, 3.1.3
>
> Attachments: HIVE-22407.branch-3.1.patch, HIVE-22407.branch-3.patch, 
> HIVE-22407.patch
>
>
> MySQL has made the single line comment which starts with `--` syntax to have 
> min one space after this. This causes the current upgrade scripts in the 
> standalone-metastore to throw an exception. 
> ref: [https://dev.mysql.com/doc/refman/5.7/en/ansi-diff-comments.html]  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22407) Hive metastore upgrade scripts have incorrect (or outdated) comment syntax

2019-10-30 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963673#comment-16963673
 ] 

Aditya Shah commented on HIVE-22407:


[~pvary] I've attached a patch for branch-3.1 as well. The same could be pushed 
to branch 3.0. Can you please review and commit.

Thanks!

> Hive metastore upgrade scripts have incorrect (or outdated) comment syntax
> --
>
> Key: HIVE-22407
> URL: https://issues.apache.org/jira/browse/HIVE-22407
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0, 3.1.2
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22407.branch-3.1.patch, HIVE-22407.patch
>
>
> MySQL has made the single line comment which starts with `--` syntax to have 
> min one space after this. This causes the current upgrade scripts in the 
> standalone-metastore to throw an exception. 
> ref: [https://dev.mysql.com/doc/refman/5.7/en/ansi-diff-comments.html]  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22407) Hive metastore upgrade scripts have incorrect (or outdated) comment syntax

2019-10-30 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-22407:
---
Attachment: HIVE-22407.branch-3.1.patch

> Hive metastore upgrade scripts have incorrect (or outdated) comment syntax
> --
>
> Key: HIVE-22407
> URL: https://issues.apache.org/jira/browse/HIVE-22407
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0, 3.1.2
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22407.branch-3.1.patch, HIVE-22407.patch
>
>
> MySQL has made the single line comment which starts with `--` syntax to have 
> min one space after this. This causes the current upgrade scripts in the 
> standalone-metastore to throw an exception. 
> ref: [https://dev.mysql.com/doc/refman/5.7/en/ansi-diff-comments.html]  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22407) Hive metastore upgrade scripts have incorrect (or outdated) comment syntax

2019-10-28 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961677#comment-16961677
 ] 

Aditya Shah commented on HIVE-22407:


[~gates] [~pvary] can you please take a look? Thanks!

> Hive metastore upgrade scripts have incorrect (or outdated) comment syntax
> --
>
> Key: HIVE-22407
> URL: https://issues.apache.org/jira/browse/HIVE-22407
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0, 3.1.2
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22407.patch
>
>
> MySQL has made the single line comment which starts with `--` syntax to have 
> min one space after this. This causes the current upgrade scripts in the 
> standalone-metastore to throw an exception. 
> ref: [https://dev.mysql.com/doc/refman/5.7/en/ansi-diff-comments.html]  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22407) Hive metastore upgrade scripts have incorrect (or outdated) comment syntax

2019-10-26 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-22407:
---
Attachment: HIVE-22407.patch
Status: Patch Available  (was: Open)

> Hive metastore upgrade scripts have incorrect (or outdated) comment syntax
> --
>
> Key: HIVE-22407
> URL: https://issues.apache.org/jira/browse/HIVE-22407
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22407.patch
>
>
> MySQL has made the single line comment which starts with `--` syntax to have 
> min one space after this. This causes the current upgrade scripts in the 
> standalone-metastore to throw an exception. 
> ref: [https://dev.mysql.com/doc/refman/5.7/en/ansi-diff-comments.html]  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22407) Hive metastore upgrade scripts have incorrect (or outdated) comment syntax

2019-10-26 Thread Aditya Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah reassigned HIVE-22407:
--


> Hive metastore upgrade scripts have incorrect (or outdated) comment syntax
> --
>
> Key: HIVE-22407
> URL: https://issues.apache.org/jira/browse/HIVE-22407
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
>
> MySQL has made the single line comment which starts with `--` syntax to have 
> min one space after this. This causes the current upgrade scripts in the 
> standalone-metastore to throw an exception. 
> ref: [https://dev.mysql.com/doc/refman/5.7/en/ansi-diff-comments.html]  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22198) Execute unoin-all with childs Join in parallel

2019-09-22 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935564#comment-16935564
 ] 

Aditya Shah commented on HIVE-22198:


[~luguangming], can we repair parents just in case of the conditional task 
instead of doing the same for all? 

> Execute unoin-all with childs Join in parallel
> --
>
> Key: HIVE-22198
> URL: https://issues.apache.org/jira/browse/HIVE-22198
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: LuGuangMing
>Assignee: LuGuangMing
>Priority: Major
> Attachments: HIVE-22198.patch, image-2019-09-20-11-38-37-433.png, 
> image-2019-09-20-11-39-30-347.png, test-parallel.sql
>
>
> set parallel is true, set skewjoin is false, set auto convert join is false. 
> run a unoin all, There is nothing error message, but some result data is 
> missing, details check attatchment [^test-parallel.sql]
> create table tab1(tid int, com string) row format delimited fields terminated 
> by '\t' stored as textfile;
>  create table tab2(tid int, com string) row format delimited fields 
> terminated by '\t' stored as textfile;
>  create table tab3(tid int, com string) row format delimited fields 
> terminated by '\t' stored as textfile;
>  create table tab4(tid int, com string) row format delimited fields 
> terminated by '\t' stored as textfile;
> insert into tab1 values(1,'abc'),(2,'bcd'),(3,'cde');
>  insert into tab2 values(1,'abc'),(2,'bcd'),(3,'cde');
>  insert into tab3 values(1,'abc'),(2,'bcd'),(3,'cde');
>  insert into tab4 values(1,'abc'),(2,'bcd'),(3,'cde');
> set hive.auto.convert.join=false;
>  set hive.optimize.skewjoin=true;
>  set hive.exec.parallel=true;
> SELECT sum(1) as a 
>  FROM tab1 t1 
>  INNER JOIN tab2 t2 
>  ON t1.com = t2.com
>  UNION ALL
>  SELECT sum(1) as a 
>  FROM tab3 t3 
>  INNER JOIN tab4 t4 
>  ON t3.com = t4.com;
> create table test_parallel stored as orcfile as 
>  SELECT sum(1) as a 
>  FROM tab1 t1 
>  INNER JOIN tab2 t2 
>  ON t1.com = t2.com
>  UNION ALL
>  SELECT sum(1) as a 
>  FROM tab3 t3 
>  INNER JOIN tab4 t4 
>  ON t3.com = t4.com;
> select * from test_parallel;
> The result data should be two, but only one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22067) Null pointer exception for update query on a partitioned acid table

2019-08-01 Thread Aditya Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-22067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898017#comment-16898017
 ] 

Aditya Shah commented on HIVE-22067:


Unrelated failures. [~gopalv] [~vgumashta] [~vgarg] can you please review the 
change?

 

> Null pointer exception for update query on a partitioned acid table
> ---
>
> Key: HIVE-22067
> URL: https://issues.apache.org/jira/browse/HIVE-22067
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.1.1
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22067.1.patch, HIVE-22067.patch
>
>
> In case of an acid table, the final paths (array) of the filesink operator is 
> populated by using bucket id as the index. This causes the final paths to 
> have null entries when we don't write to some of the buckets. Thus, finally 
> while committing the paths in closeOp this results in an NPE.
> Observed for the following query:
> {code:java}
> CREATE TABLE if not exists test_bckt_part(a int) partitioned by (b int)
> stored as orc;
> CREATE TABLE test_src_delete (a int, b int) CLUSTERED BY (b) into 5 BUCKETS;
> INSERT INTO TABLE test_src_delete values 
> (1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23),(1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23);
> set tez.grouping.split-count=5;
> INSERT OVERWRITE TABLE test_bckt_part SELECT * FROM test_src_delete;
> Alter table test_bckt_part SET TBLPROPERTIES ('transactional'='true');
> update test_bckt_part set a=99 where b=23;
> {code}
>   



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (HIVE-22067) Null pointer exception for update query on a partitioned acid table

2019-08-01 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-22067:
---
Attachment: HIVE-22067.1.patch
Status: Patch Available  (was: Open)

> Null pointer exception for update query on a partitioned acid table
> ---
>
> Key: HIVE-22067
> URL: https://issues.apache.org/jira/browse/HIVE-22067
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.1.1
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22067.1.patch, HIVE-22067.patch
>
>
> In case of an acid table, the final paths (array) of the filesink operator is 
> populated by using bucket id as the index. This causes the final paths to 
> have null entries when we don't write to some of the buckets. Thus, finally 
> while committing the paths in closeOp this results in an NPE.
> Observed for the following query:
> {code:java}
> CREATE TABLE if not exists test_bckt_part(a int) partitioned by (b int)
> stored as orc;
> CREATE TABLE test_src_delete (a int, b int) CLUSTERED BY (b) into 5 BUCKETS;
> INSERT INTO TABLE test_src_delete values 
> (1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23),(1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23);
> set tez.grouping.split-count=5;
> INSERT OVERWRITE TABLE test_bckt_part SELECT * FROM test_src_delete;
> Alter table test_bckt_part SET TBLPROPERTIES ('transactional'='true');
> update test_bckt_part set a=99 where b=23;
> {code}
>   



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (HIVE-22067) Null pointer exception for update query on a partitioned acid table

2019-08-01 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-22067:
---
Status: Open  (was: Patch Available)

> Null pointer exception for update query on a partitioned acid table
> ---
>
> Key: HIVE-22067
> URL: https://issues.apache.org/jira/browse/HIVE-22067
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.1.1
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22067.patch
>
>
> In case of an acid table, the final paths (array) of the filesink operator is 
> populated by using bucket id as the index. This causes the final paths to 
> have null entries when we don't write to some of the buckets. Thus, finally 
> while committing the paths in closeOp this results in an NPE.
> Observed for the following query:
> {code:java}
> CREATE TABLE if not exists test_bckt_part(a int) partitioned by (b int)
> stored as orc;
> CREATE TABLE test_src_delete (a int, b int) CLUSTERED BY (b) into 5 BUCKETS;
> INSERT INTO TABLE test_src_delete values 
> (1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23),(1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23);
> set tez.grouping.split-count=5;
> INSERT OVERWRITE TABLE test_bckt_part SELECT * FROM test_src_delete;
> Alter table test_bckt_part SET TBLPROPERTIES ('transactional'='true');
> update test_bckt_part set a=99 where b=23;
> {code}
>   



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (HIVE-22067) Null pointer exception for update query on a partitioned acid table

2019-08-01 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-22067:
---
Attachment: HIVE-22067.patch
Status: Patch Available  (was: Open)

> Null pointer exception for update query on a partitioned acid table
> ---
>
> Key: HIVE-22067
> URL: https://issues.apache.org/jira/browse/HIVE-22067
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.1.1
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22067.patch
>
>
> In case of an acid table, the final paths (array) of the filesink operator is 
> populated by using bucket id as the index. This causes the final paths to 
> have null entries when we don't write to some of the buckets. Thus, finally 
> while committing the paths in closeOp this results in an NPE.
> Observed for the following query:
> {code:java}
> CREATE TABLE if not exists test_bckt_part(a int) partitioned by (b int)
> stored as orc;
> CREATE TABLE test_src_delete (a int, b int) CLUSTERED BY (b) into 5 BUCKETS;
> INSERT INTO TABLE test_src_delete values 
> (1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23),(1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23);
> set tez.grouping.split-count=5;
> INSERT OVERWRITE TABLE test_bckt_part SELECT * FROM test_src_delete;
> Alter table test_bckt_part SET TBLPROPERTIES ('transactional'='true');
> update test_bckt_part set a=99 where b=23;
> {code}
>   



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (HIVE-22067) Null pointer exception for update query on a partitioned acid table

2019-08-01 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah reassigned HIVE-22067:
--

Assignee: Aditya Shah

> Null pointer exception for update query on a partitioned acid table
> ---
>
> Key: HIVE-22067
> URL: https://issues.apache.org/jira/browse/HIVE-22067
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.1.1
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-22067.patch
>
>
> In case of an acid table, the final paths (array) of the filesink operator is 
> populated by using bucket id as the index. This causes the final paths to 
> have null entries when we don't write to some of the buckets. Thus, finally 
> while committing the paths in closeOp this results in an NPE.
> Observed for the following query:
> {code:java}
> CREATE TABLE if not exists test_bckt_part(a int) partitioned by (b int)
> stored as orc;
> CREATE TABLE test_src_delete (a int, b int) CLUSTERED BY (b) into 5 BUCKETS;
> INSERT INTO TABLE test_src_delete values 
> (1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23),(1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23);
> set tez.grouping.split-count=5;
> INSERT OVERWRITE TABLE test_bckt_part SELECT * FROM test_src_delete;
> Alter table test_bckt_part SET TBLPROPERTIES ('transactional'='true');
> update test_bckt_part set a=99 where b=23;
> {code}
>   



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (HIVE-22067) Null pointer exception for update query on a partitioned acid table

2019-08-01 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-22067:
---
Affects Version/s: 3.1.1

> Null pointer exception for update query on a partitioned acid table
> ---
>
> Key: HIVE-22067
> URL: https://issues.apache.org/jira/browse/HIVE-22067
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Aditya Shah
>Priority: Major
>
> In case of an acid table, the final paths (array) of the filesink operator is 
> populated by using bucket id as the index. This causes the final paths to 
> have null entries when we don't write to some of the buckets. Thus, finally 
> while committing the paths in closeOp this results in an NPE.
> Observed for the following query:
>  
> {code:java}
> CREATE TABLE if not exists test_bckt_part(a int) partitioned by (b int)
> stored as orc;
> CREATE TABLE test_src_delete (a int, b int) CLUSTERED BY (b) into 5 BUCKETS;
> INSERT INTO TABLE test_src_delete values 
> (1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23),(1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23);
> set tez.grouping.split-count=5;
> INSERT OVERWRITE TABLE test_bckt_part SELECT * FROM test_src_delete;
> Alter table test_bckt_part SET TBLPROPERTIES ('transactional'='true');
> update test_bckt_part set a=99 where b=23;
> {code}
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (HIVE-22067) Null pointer exception for update query on a partitioned acid table

2019-08-01 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-22067:
---
Description: 
In case of an acid table, the final paths (array) of the filesink operator is 
populated by using bucket id as the index. This causes the final paths to have 
null entries when we don't write to some of the buckets. Thus, finally while 
committing the paths in closeOp this results in an NPE.

Observed for the following query:
{code:java}
CREATE TABLE if not exists test_bckt_part(a int) partitioned by (b int)
stored as orc;
CREATE TABLE test_src_delete (a int, b int) CLUSTERED BY (b) into 5 BUCKETS;
INSERT INTO TABLE test_src_delete values 
(1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23),(1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23);
set tez.grouping.split-count=5;
INSERT OVERWRITE TABLE test_bckt_part SELECT * FROM test_src_delete;
Alter table test_bckt_part SET TBLPROPERTIES ('transactional'='true');
update test_bckt_part set a=99 where b=23;
{code}
  

  was:
In case of an acid table, the final paths (array) of the filesink operator is 
populated by using bucket id as the index. This causes the final paths to have 
null entries when we don't write to some of the buckets. Thus, finally while 
committing the paths in closeOp this results in an NPE.

Observed for the following query:

 
{code:java}
CREATE TABLE if not exists test_bckt_part(a int) partitioned by (b int)
stored as orc;
CREATE TABLE test_src_delete (a int, b int) CLUSTERED BY (b) into 5 BUCKETS;
INSERT INTO TABLE test_src_delete values 
(1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23),(1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23);
set tez.grouping.split-count=5;
INSERT OVERWRITE TABLE test_bckt_part SELECT * FROM test_src_delete;
Alter table test_bckt_part SET TBLPROPERTIES ('transactional'='true');
update test_bckt_part set a=99 where b=23;
{code}
 

 

 

 


> Null pointer exception for update query on a partitioned acid table
> ---
>
> Key: HIVE-22067
> URL: https://issues.apache.org/jira/browse/HIVE-22067
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.1.1
>Reporter: Aditya Shah
>Priority: Major
>
> In case of an acid table, the final paths (array) of the filesink operator is 
> populated by using bucket id as the index. This causes the final paths to 
> have null entries when we don't write to some of the buckets. Thus, finally 
> while committing the paths in closeOp this results in an NPE.
> Observed for the following query:
> {code:java}
> CREATE TABLE if not exists test_bckt_part(a int) partitioned by (b int)
> stored as orc;
> CREATE TABLE test_src_delete (a int, b int) CLUSTERED BY (b) into 5 BUCKETS;
> INSERT INTO TABLE test_src_delete values 
> (1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23),(1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23);
> set tez.grouping.split-count=5;
> INSERT OVERWRITE TABLE test_bckt_part SELECT * FROM test_src_delete;
> Alter table test_bckt_part SET TBLPROPERTIES ('transactional'='true');
> update test_bckt_part set a=99 where b=23;
> {code}
>   



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (HIVE-22067) Null pointer exception for update query on a partitioned acid table

2019-08-01 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-22067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-22067:
---
Component/s: Transactions

> Null pointer exception for update query on a partitioned acid table
> ---
>
> Key: HIVE-22067
> URL: https://issues.apache.org/jira/browse/HIVE-22067
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.1.1
>Reporter: Aditya Shah
>Priority: Major
>
> In case of an acid table, the final paths (array) of the filesink operator is 
> populated by using bucket id as the index. This causes the final paths to 
> have null entries when we don't write to some of the buckets. Thus, finally 
> while committing the paths in closeOp this results in an NPE.
> Observed for the following query:
>  
> {code:java}
> CREATE TABLE if not exists test_bckt_part(a int) partitioned by (b int)
> stored as orc;
> CREATE TABLE test_src_delete (a int, b int) CLUSTERED BY (b) into 5 BUCKETS;
> INSERT INTO TABLE test_src_delete values 
> (1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23),(1,2),(3,4),(5,2),(7,8),(9,10),(11,2),(34,53),(95,23);
> set tez.grouping.split-count=5;
> INSERT OVERWRITE TABLE test_bckt_part SELECT * FROM test_src_delete;
> Alter table test_bckt_part SET TBLPROPERTIES ('transactional'='true');
> update test_bckt_part set a=99 where b=23;
> {code}
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HIVE-22004) Non-acid to acid conversion doesn't handle random filenames

2019-07-18 Thread Aditya Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-22004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888515#comment-16888515
 ] 

Aditya Shah commented on HIVE-22004:


[~owen.omalley] [~ekoifman] [~vgumashta] [~vgarg] Can you please take a look 
and guide me for this?

> Non-acid to acid conversion doesn't handle random filenames
> ---
>
> Key: HIVE-22004
> URL: https://issues.apache.org/jira/browse/HIVE-22004
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Aditya Shah
>Priority: Major
>
> Right now the supported filename patterns for non-acid to acid table's files 
> (original files) are the only ones created by Hive itself (eg. 00, 
> 00_COPY_1, bucket_0, etc). But at the same time Hive non-acid table 
> supports reading from tables having files with random filenames. We should 
> support the same for acid tables.
> A way to handle this would be to rename such files and though rename is not a 
> costly operation for HDFS, But for non-acid tables with the location on a 
> blobstore like s3 and having random filenames will have costly added steps to 
> convert to acid.
> Current scenario: What we do now for original files is assign them a logical 
> bucket id and for unrecognized patterns we assign -1 and ignore those files.
> Proposed alternatives:
> 1) For all the random files assume the logical bucket id as 0 and let the 
> files belong to the same bucket in the way similar to we do for multiple 
> files with same bucket id (_copy_N). 
> 2) For all the random files lexicographically sort them and sequentially 
> assign them a bucket id similar to the handling of multiple files for a 
> non-bucketed table where we extract the bucket id simply from filenames



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (HIVE-21821) Backport HIVE-21739 to branch-3.1

2019-06-04 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-21821:
---
Attachment: HIVE-21821.branch-3.1.1.patch
Status: Patch Available  (was: Open)

+Thanks, [~alangates] for review.
+

++The CTLG_ID being inserted was 1 in test and hence clashing with the default 
CTLG. Corrected the test with the new patch.

> Backport HIVE-21739 to branch-3.1
> -
>
> Key: HIVE-21821
> URL: https://issues.apache.org/jira/browse/HIVE-21821
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-21821.branch-3.1.1.patch, 
> HIVE-21821.branch-3.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-21821) Backport HIVE-21739 to branch-3.1

2019-06-04 Thread Aditya Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855355#comment-16855355
 ] 

Aditya Shah edited comment on HIVE-21821 at 6/4/19 6:01 AM:


Thanks, [~alangates] for review.



The CTLG_ID being inserted was 1 in test and hence clashing with the default 
CTLG. Corrected the test with the new patch.


was (Author: aditya-shah):
+Thanks, [~alangates] for review.
+

++The CTLG_ID being inserted was 1 in test and hence clashing with the default 
CTLG. Corrected the test with the new patch.

> Backport HIVE-21739 to branch-3.1
> -
>
> Key: HIVE-21821
> URL: https://issues.apache.org/jira/browse/HIVE-21821
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-21821.branch-3.1.1.patch, 
> HIVE-21821.branch-3.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21821) Backport HIVE-21739 to branch-3.1

2019-06-03 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-21821:
---
Affects Version/s: 3.1.1

> Backport HIVE-21739 to branch-3.1
> -
>
> Key: HIVE-21821
> URL: https://issues.apache.org/jira/browse/HIVE-21821
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-21821.branch-3.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21821) Backport HIVE-21739 to branch-3.1

2019-06-03 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-21821:
---
Attachment: HIVE-21821.branch-3.1.patch
Status: Patch Available  (was: Open)

> Backport HIVE-21739 to branch-3.1
> -
>
> Key: HIVE-21821
> URL: https://issues.apache.org/jira/browse/HIVE-21821
> Project: Hive
>  Issue Type: Bug
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-21821.branch-3.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21821) Backport HIVE-21739 to branch-3.1

2019-06-03 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah reassigned HIVE-21821:
--


> Backport HIVE-21739 to branch-3.1
> -
>
> Key: HIVE-21821
> URL: https://issues.apache.org/jira/browse/HIVE-21821
> Project: Hive
>  Issue Type: Bug
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21739) Make metastore DB backward compatible with pre-catalog versions of hive.

2019-05-27 Thread Aditya Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16849323#comment-16849323
 ] 

Aditya Shah commented on HIVE-21739:


[~alangates] [~pvary] ping for review. 

Thanks

 

> Make metastore DB backward compatible with pre-catalog versions of hive.
> 
>
> Key: HIVE-21739
> URL: https://issues.apache.org/jira/browse/HIVE-21739
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 1.2.0, 2.1.1
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21739.1.patch, HIVE-21739.2.patch, 
> HIVE-21739.3.patch, HIVE-21739.4.patch, HIVE-21739.patch
>
>
> Since the addition of foreign key constraint between Database ('DBS') table 
> and catalogs ('CTLGS') table in HIVE-18755 we are unable to run a simple 
> create database command with an older version of Metastore Server. This is 
> due to older versions having JDO schema as per older schema of 'DBS' which 
> did not have an additional 'CTLG_NAME' column.
> The error is as follows: 
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Exception thrown flushing changes to datastore)
> 
> java.sql.BatchUpdateException: Cannot add or update a child row: a foreign 
> key constraint fails ("metastore_1238"."DBS", CONSTRAINT "CTLG_FK1" FOREIGN 
> KEY ("CTLG_NAME") REFERENCES "CTLGS" ("NAME"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21739) Make metastore DB backward compatible with pre-catalog versions of hive.

2019-05-27 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-21739:
---
Description: 
Since the addition of foreign key constraint between Database ('DBS') table and 
catalogs ('CTLGS') table in HIVE-18755 we are unable to run a simple create 
database command with an older version of Metastore Server. This is due to 
older versions having JDO schema as per older schema of 'DBS' which did not 
have an additional 'CTLG_NAME' column.

The error is as follows: 
{code:java}
org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:Exception thrown flushing changes to datastore)

java.sql.BatchUpdateException: Cannot add or update a child row: a foreign key 
constraint fails ("metastore_1238"."DBS", CONSTRAINT "CTLG_FK1" FOREIGN KEY 
("CTLG_NAME") REFERENCES "CTLGS" ("NAME"))
{code}

  was:
Since the addition of foreign key constraint between Database ('DBS') table and 
catalogs ('CTLGS') table in HIVE-18755 we are able to run a simple create 
database command with an older version of Metastore Server. This is due to 
older versions having JDO schema as per older schema of 'DBS' which did not 
have an additional 'CTLG_NAME' column.

The error is as follows: 
{code:java}
org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:Exception thrown flushing changes to datastore)

java.sql.BatchUpdateException: Cannot add or update a child row: a foreign key 
constraint fails ("metastore_1238"."DBS", CONSTRAINT "CTLG_FK1" FOREIGN KEY 
("CTLG_NAME") REFERENCES "CTLGS" ("NAME"))
{code}


> Make metastore DB backward compatible with pre-catalog versions of hive.
> 
>
> Key: HIVE-21739
> URL: https://issues.apache.org/jira/browse/HIVE-21739
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 1.2.0, 2.1.1
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21739.1.patch, HIVE-21739.2.patch, 
> HIVE-21739.3.patch, HIVE-21739.4.patch, HIVE-21739.patch
>
>
> Since the addition of foreign key constraint between Database ('DBS') table 
> and catalogs ('CTLGS') table in HIVE-18755 we are unable to run a simple 
> create database command with an older version of Metastore Server. This is 
> due to older versions having JDO schema as per older schema of 'DBS' which 
> did not have an additional 'CTLG_NAME' column.
> The error is as follows: 
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Exception thrown flushing changes to datastore)
> 
> java.sql.BatchUpdateException: Cannot add or update a child row: a foreign 
> key constraint fails ("metastore_1238"."DBS", CONSTRAINT "CTLG_FK1" FOREIGN 
> KEY ("CTLG_NAME") REFERENCES "CTLGS" ("NAME"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21739) Make metastore DB backward compatible with pre-catalog versions of hive.

2019-05-21 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-21739:
---
Status: Open  (was: Patch Available)

> Make metastore DB backward compatible with pre-catalog versions of hive.
> 
>
> Key: HIVE-21739
> URL: https://issues.apache.org/jira/browse/HIVE-21739
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 1.2.0
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21739.1.patch, HIVE-21739.2.patch, 
> HIVE-21739.3.patch, HIVE-21739.4.patch, HIVE-21739.patch
>
>
> Since the addition of foreign key constraint between Database ('DBS') table 
> and catalogs ('CTLGS') table in HIVE-18755 we are able to run a simple create 
> database command with an older version of Metastore Server. This is due to 
> older versions having JDO schema as per older schema of 'DBS' which did not 
> have an additional 'CTLG_NAME' column.
> The error is as follows: 
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Exception thrown flushing changes to datastore)
> 
> java.sql.BatchUpdateException: Cannot add or update a child row: a foreign 
> key constraint fails ("metastore_1238"."DBS", CONSTRAINT "CTLG_FK1" FOREIGN 
> KEY ("CTLG_NAME") REFERENCES "CTLGS" ("NAME"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21739) Make metastore DB backward compatible with pre-catalog versions of hive.

2019-05-21 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-21739:
---
Attachment: HIVE-21739.4.patch
Status: Patch Available  (was: Open)

Some corrections after DBinsntall tests

> Make metastore DB backward compatible with pre-catalog versions of hive.
> 
>
> Key: HIVE-21739
> URL: https://issues.apache.org/jira/browse/HIVE-21739
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 1.2.0
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21739.1.patch, HIVE-21739.2.patch, 
> HIVE-21739.3.patch, HIVE-21739.4.patch, HIVE-21739.patch
>
>
> Since the addition of foreign key constraint between Database ('DBS') table 
> and catalogs ('CTLGS') table in HIVE-18755 we are able to run a simple create 
> database command with an older version of Metastore Server. This is due to 
> older versions having JDO schema as per older schema of 'DBS' which did not 
> have an additional 'CTLG_NAME' column.
> The error is as follows: 
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Exception thrown flushing changes to datastore)
> 
> java.sql.BatchUpdateException: Cannot add or update a child row: a foreign 
> key constraint fails ("metastore_1238"."DBS", CONSTRAINT "CTLG_FK1" FOREIGN 
> KEY ("CTLG_NAME") REFERENCES "CTLGS" ("NAME"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21758) DBInstall tests broken on master and branch-3.1

2019-05-21 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-21758:
---
Attachment: HIVE-21758.patch
Status: Patch Available  (was: Open)

# Added a check DROP INDEX introduced in HIVE-21462
 # Changed the oracle database docker image to a one available. But, last one 
was notified and taken down by oracle and this one could also be. 
ref. for followup: [https://github.com/oracle/docker-images/issues/1156] and 
[https://github.com/wnameless/docker-oracle-xe-11g/issues/118]
 # Observed a few numberFormatexception in logs. These are taken care of when 
upgrading sqlline from 1.3.0 to 1.4.0 but have not included that in the patch.

cc [~alangates]

> DBInstall tests broken on master and branch-3.1
> ---
>
> Key: HIVE-21758
> URL: https://issues.apache.org/jira/browse/HIVE-21758
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Tests
>Affects Versions: 3.1.1
>Reporter: Alan Gates
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-21758.patch
>
>
> The Oracle and SqlServer install and upgrade tests in standalone-metastore 
> fail in master and branch-3.1.  In the Oracle case it appears the docker 
> container that was used no longer exists.  For SqlServer the cause of the 
> failures is not immediately clear.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21758) DBInstall tests broken on master and branch-3.1

2019-05-21 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah reassigned HIVE-21758:
--

Assignee: Aditya Shah  (was: Alan Gates)

> DBInstall tests broken on master and branch-3.1
> ---
>
> Key: HIVE-21758
> URL: https://issues.apache.org/jira/browse/HIVE-21758
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Tests
>Affects Versions: 3.1.1
>Reporter: Alan Gates
>Assignee: Aditya Shah
>Priority: Major
>
> The Oracle and SqlServer install and upgrade tests in standalone-metastore 
> fail in master and branch-3.1.  In the Oracle case it appears the docker 
> container that was used no longer exists.  For SqlServer the cause of the 
> failures is not immediately clear.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21739) Make metastore DB backward compatible with pre-catalog versions of hive.

2019-05-20 Thread Aditya Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844504#comment-16844504
 ] 

Aditya Shah commented on HIVE-21739:


[~alangates] for verifying the changes I was trying to fix (HIVE-21751) and run 
tests present in the testutils module. I also noticed the DB install tests in 
standalone-metastore (broken HIVE-21758). Do the standalone-metastore's 
DBinstall tests suffice the testing requirement for schema changes or the 
testutils tests are also performed.

Thanks, 

Aditya

> Make metastore DB backward compatible with pre-catalog versions of hive.
> 
>
> Key: HIVE-21739
> URL: https://issues.apache.org/jira/browse/HIVE-21739
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 1.2.0, 2.1.1
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21739.1.patch, HIVE-21739.2.patch, 
> HIVE-21739.3.patch, HIVE-21739.patch
>
>
> Since the addition of foreign key constraint between Database ('DBS') table 
> and catalogs ('CTLGS') table in HIVE-18755 we are able to run a simple create 
> database command with an older version of Metastore Server. This is due to 
> older versions having JDO schema as per older schema of 'DBS' which did not 
> have an additional 'CTLG_NAME' column.
> The error is as follows: 
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Exception thrown flushing changes to datastore)
> 
> java.sql.BatchUpdateException: Cannot add or update a child row: a foreign 
> key constraint fails ("metastore_1238"."DBS", CONSTRAINT "CTLG_FK1" FOREIGN 
> KEY ("CTLG_NAME") REFERENCES "CTLGS" ("NAME"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21739) Make metastore DB backward compatible with pre-catalog versions of hive.

2019-05-20 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-21739:
---
Status: Open  (was: Patch Available)

> Make metastore DB backward compatible with pre-catalog versions of hive.
> 
>
> Key: HIVE-21739
> URL: https://issues.apache.org/jira/browse/HIVE-21739
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 1.2.0
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21739.1.patch, HIVE-21739.2.patch, 
> HIVE-21739.3.patch, HIVE-21739.patch
>
>
> Since the addition of foreign key constraint between Database ('DBS') table 
> and catalogs ('CTLGS') table in HIVE-18755 we are able to run a simple create 
> database command with an older version of Metastore Server. This is due to 
> older versions having JDO schema as per older schema of 'DBS' which did not 
> have an additional 'CTLG_NAME' column.
> The error is as follows: 
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Exception thrown flushing changes to datastore)
> 
> java.sql.BatchUpdateException: Cannot add or update a child row: a foreign 
> key constraint fails ("metastore_1238"."DBS", CONSTRAINT "CTLG_FK1" FOREIGN 
> KEY ("CTLG_NAME") REFERENCES "CTLGS" ("NAME"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21739) Make metastore DB backward compatible with pre-catalog versions of hive.

2019-05-20 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-21739:
---
Attachment: HIVE-21739.3.patch
Status: Patch Available  (was: Open)

Fixing unit tests

> Make metastore DB backward compatible with pre-catalog versions of hive.
> 
>
> Key: HIVE-21739
> URL: https://issues.apache.org/jira/browse/HIVE-21739
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 1.2.0
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21739.1.patch, HIVE-21739.2.patch, 
> HIVE-21739.3.patch, HIVE-21739.patch
>
>
> Since the addition of foreign key constraint between Database ('DBS') table 
> and catalogs ('CTLGS') table in HIVE-18755 we are able to run a simple create 
> database command with an older version of Metastore Server. This is due to 
> older versions having JDO schema as per older schema of 'DBS' which did not 
> have an additional 'CTLG_NAME' column.
> The error is as follows: 
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Exception thrown flushing changes to datastore)
> 
> java.sql.BatchUpdateException: Cannot add or update a child row: a foreign 
> key constraint fails ("metastore_1238"."DBS", CONSTRAINT "CTLG_FK1" FOREIGN 
> KEY ("CTLG_NAME") REFERENCES "CTLGS" ("NAME"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21739) Make metastore DB backward compatible with pre-catalog versions of hive.

2019-05-17 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-21739:
---
Attachment: HIVE-21739.2.patch
Status: Patch Available  (was: Open)

Submitting a patch making changes for all the databases. Thanks, [~alangates]  
and [~pvary] for the review. I couldn't find the job for Hive-Hms-tests. I will 
try in local and put an update soon.

> Make metastore DB backward compatible with pre-catalog versions of hive.
> 
>
> Key: HIVE-21739
> URL: https://issues.apache.org/jira/browse/HIVE-21739
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 1.2.0
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21739.1.patch, HIVE-21739.2.patch, HIVE-21739.patch
>
>
> Since the addition of foreign key constraint between Database ('DBS') table 
> and catalogs ('CTLGS') table in HIVE-18755 we are able to run a simple create 
> database command with an older version of Metastore Server. This is due to 
> older versions having JDO schema as per older schema of 'DBS' which did not 
> have an additional 'CTLG_NAME' column.
> The error is as follows: 
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Exception thrown flushing changes to datastore)
> 
> java.sql.BatchUpdateException: Cannot add or update a child row: a foreign 
> key constraint fails ("metastore_1238"."DBS", CONSTRAINT "CTLG_FK1" FOREIGN 
> KEY ("CTLG_NAME") REFERENCES "CTLGS" ("NAME"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21739) Make metastore DB backward compatible with pre-catalog versions of hive.

2019-05-17 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-21739:
---
Status: Open  (was: Patch Available)

> Make metastore DB backward compatible with pre-catalog versions of hive.
> 
>
> Key: HIVE-21739
> URL: https://issues.apache.org/jira/browse/HIVE-21739
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 1.2.0
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21739.1.patch, HIVE-21739.patch
>
>
> Since the addition of foreign key constraint between Database ('DBS') table 
> and catalogs ('CTLGS') table in HIVE-18755 we are able to run a simple create 
> database command with an older version of Metastore Server. This is due to 
> older versions having JDO schema as per older schema of 'DBS' which did not 
> have an additional 'CTLG_NAME' column.
> The error is as follows: 
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Exception thrown flushing changes to datastore)
> 
> java.sql.BatchUpdateException: Cannot add or update a child row: a foreign 
> key constraint fails ("metastore_1238"."DBS", CONSTRAINT "CTLG_FK1" FOREIGN 
> KEY ("CTLG_NAME") REFERENCES "CTLGS" ("NAME"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21739) Make metastore DB backward compatible with pre-catalog versions of hive.

2019-05-16 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-21739:
---
Attachment: HIVE-21739.1.patch
Status: Patch Available  (was: Open)

Unrelated failures. Triggering tests again. 

cc [~alangates] [~pvary] can you please review

> Make metastore DB backward compatible with pre-catalog versions of hive.
> 
>
> Key: HIVE-21739
> URL: https://issues.apache.org/jira/browse/HIVE-21739
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 1.2.0
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21739.1.patch, HIVE-21739.patch
>
>
> Since the addition of foreign key constraint between Database ('DBS') table 
> and catalogs ('CTLGS') table in HIVE-18755 we are able to run a simple create 
> database command with an older version of Metastore Server. This is due to 
> older versions having JDO schema as per older schema of 'DBS' which did not 
> have an additional 'CTLG_NAME' column.
> The error is as follows: 
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Exception thrown flushing changes to datastore)
> 
> java.sql.BatchUpdateException: Cannot add or update a child row: a foreign 
> key constraint fails ("metastore_1238"."DBS", CONSTRAINT "CTLG_FK1" FOREIGN 
> KEY ("CTLG_NAME") REFERENCES "CTLGS" ("NAME"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21739) Make metastore DB backward compatible with pre-catalog versions of hive.

2019-05-16 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-21739:
---
Status: Open  (was: Patch Available)

> Make metastore DB backward compatible with pre-catalog versions of hive.
> 
>
> Key: HIVE-21739
> URL: https://issues.apache.org/jira/browse/HIVE-21739
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 1.2.0
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21739.patch
>
>
> Since the addition of foreign key constraint between Database ('DBS') table 
> and catalogs ('CTLGS') table in HIVE-18755 we are able to run a simple create 
> database command with an older version of Metastore Server. This is due to 
> older versions having JDO schema as per older schema of 'DBS' which did not 
> have an additional 'CTLG_NAME' column.
> The error is as follows: 
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Exception thrown flushing changes to datastore)
> 
> java.sql.BatchUpdateException: Cannot add or update a child row: a foreign 
> key constraint fails ("metastore_1238"."DBS", CONSTRAINT "CTLG_FK1" FOREIGN 
> KEY ("CTLG_NAME") REFERENCES "CTLGS" ("NAME"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21739) Make metastore DB backward compatible with pre-catalog versions of hive.

2019-05-16 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-21739:
---
Fix Version/s: 4.0.0
   Attachment: HIVE-21739.patch
   Status: Patch Available  (was: Open)

I've added "hive" as default value to the 'CTLG_NAME' column of 'DBS' table and 
added a default in 'CTLGS'. This also makes an upgraded schema from 2.3 or 
previous and a fresh schema consistent

> Make metastore DB backward compatible with pre-catalog versions of hive.
> 
>
> Key: HIVE-21739
> URL: https://issues.apache.org/jira/browse/HIVE-21739
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 1.2.0
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21739.patch
>
>
> Since the addition of foreign key constraint between Database ('DBS') table 
> and catalogs ('CTLGS') table in HIVE-18755 we are able to run a simple create 
> database command with an older version of Metastore Server. This is due to 
> older versions having JDO schema as per older schema of 'DBS' which did not 
> have an additional 'CTLG_NAME' column.
> The error is as follows: 
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Exception thrown flushing changes to datastore)
> 
> java.sql.BatchUpdateException: Cannot add or update a child row: a foreign 
> key constraint fails ("metastore_1238"."DBS", CONSTRAINT "CTLG_FK1" FOREIGN 
> KEY ("CTLG_NAME") REFERENCES "CTLGS" ("NAME"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21739) Make metastore DB backward compatible with pre-catalog versions of hive.

2019-05-16 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah reassigned HIVE-21739:
--


> Make metastore DB backward compatible with pre-catalog versions of hive.
> 
>
> Key: HIVE-21739
> URL: https://issues.apache.org/jira/browse/HIVE-21739
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 1.2.0
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
>
> Since the addition of foreign key constraint between Database ('DBS') table 
> and catalogs ('CTLGS') table in HIVE-18755 we are able to run a simple create 
> database command with an older version of Metastore Server. This is due to 
> older versions having JDO schema as per older schema of 'DBS' which did not 
> have an additional 'CTLG_NAME' column.
> The error is as follows: 
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Exception thrown flushing changes to datastore)
> 
> java.sql.BatchUpdateException: Cannot add or update a child row: a foreign 
> key constraint fails ("metastore_1238"."DBS", CONSTRAINT "CTLG_FK1" FOREIGN 
> KEY ("CTLG_NAME") REFERENCES "CTLGS" ("NAME"))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18685) Add catalogs to Hive

2019-05-10 Thread Aditya Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836949#comment-16836949
 ] 

Aditya Shah commented on HIVE-18685:


[~alangates] I wanted to verify the backward compatibility of the new schema. 
As you mentioned we have associated each database with a catalog and therefore, 
a new default catalog "hive" was created. But, at the same time shouldn't the 
default value of "CTLG_NAME" in "DBS" be set as "hive", Because when using 
preCatalog versions of hive and trying to create a database the foreign key 
constraint caused the command to fail with the following error:
{code:java}
org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:Exception thrown flushing changes to datastore)

java.sql.BatchUpdateException: Cannot add or update a child row: a foreign key 
constraint fails ("metastore_1238"."DBS", CONSTRAINT "CTLG_FK1" FOREIGN KEY 
("CTLG_NAME") REFERENCES "CTLGS" ("NAME"))
{code}
 

> Add catalogs to Hive
> 
>
> Key: HIVE-18685
> URL: https://issues.apache.org/jira/browse/HIVE-18685
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore, Parser, Security, SQL
>Affects Versions: 3.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Major
> Attachments: HMS Catalog Design Doc.pdf
>
>
> SQL supports two levels of namespaces, called in the spec catalogs and 
> schemas (with schema being equivalent to Hive's database).  I propose to add 
> the upper level of catalog.  The attached design doc covers the use cases, 
> requirements, and brief discussion of how it will be implemented in a 
> backwards compatible way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21650) QOutProcessor should provide configurable partial masks for qtests

2019-04-26 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-21650:
---
Attachment: HIVE-21650.1.patch
Status: Patch Available  (was: Open)

Changed the file name to trigger test run.

> QOutProcessor should provide configurable partial masks for qtests
> --
>
> Key: HIVE-21650
> URL: https://issues.apache.org/jira/browse/HIVE-21650
> Project: Hive
>  Issue Type: Improvement
>  Components: Test, Testing Infrastructure
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21650-1.patch, HIVE-21650.1.patch, HIVE-21650.patch
>
>
> QOutProcessor would mask a whole bunch of outputs in q.out files if it sees 
> any of the target mask patterns. This restricts us from testing a whole bunch 
> of tests like for example testing directories being formed for an acid table. 
> Thus, internal configurations where we can provide additional partial masks 
> for us to cover such similar case would help us make our tests better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21650) QOutProcessor should provide configurable partial masks for qtests

2019-04-26 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-21650:
---
Status: Open  (was: Patch Available)

> QOutProcessor should provide configurable partial masks for qtests
> --
>
> Key: HIVE-21650
> URL: https://issues.apache.org/jira/browse/HIVE-21650
> Project: Hive
>  Issue Type: Improvement
>  Components: Test, Testing Infrastructure
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21650-1.patch, HIVE-21650.patch
>
>
> QOutProcessor would mask a whole bunch of outputs in q.out files if it sees 
> any of the target mask patterns. This restricts us from testing a whole bunch 
> of tests like for example testing directories being formed for an acid table. 
> Thus, internal configurations where we can provide additional partial masks 
> for us to cover such similar case would help us make our tests better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21650) QOutProcessor should provide configurable partial masks for qtests

2019-04-25 Thread Aditya Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826248#comment-16826248
 ] 

Aditya Shah commented on HIVE-21650:


[~jcamachorodriguez] [~abstractdog] can you please review the patch?

> QOutProcessor should provide configurable partial masks for qtests
> --
>
> Key: HIVE-21650
> URL: https://issues.apache.org/jira/browse/HIVE-21650
> Project: Hive
>  Issue Type: Improvement
>  Components: Test, Testing Infrastructure
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21650-1.patch, HIVE-21650.patch
>
>
> QOutProcessor would mask a whole bunch of outputs in q.out files if it sees 
> any of the target mask patterns. This restricts us from testing a whole bunch 
> of tests like for example testing directories being formed for an acid table. 
> Thus, internal configurations where we can provide additional partial masks 
> for us to cover such similar case would help us make our tests better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21650) QOutProcessor should provide configurable partial masks for qtests

2019-04-25 Thread Aditya Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Shah updated HIVE-21650:
---
Status: Open  (was: Patch Available)

> QOutProcessor should provide configurable partial masks for qtests
> --
>
> Key: HIVE-21650
> URL: https://issues.apache.org/jira/browse/HIVE-21650
> Project: Hive
>  Issue Type: Improvement
>  Components: Test, Testing Infrastructure
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21650.patch
>
>
> QOutProcessor would mask a whole bunch of outputs in q.out files if it sees 
> any of the target mask patterns. This restricts us from testing a whole bunch 
> of tests like for example testing directories being formed for an acid table. 
> Thus, internal configurations where we can provide additional partial masks 
> for us to cover such similar case would help us make our tests better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >