[jira] [Assigned] (HIVE-20542) Incremental REPL DUMP progress information log message is incorrect.

2018-09-12 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat reassigned HIVE-20542:
-

Assignee: Ashutosh Bapat  (was: Sankar Hariappan)

> Incremental REPL DUMP progress information log message is incorrect.
> 
>
> Key: HIVE-20542
> URL: https://issues.apache.org/jira/browse/HIVE-20542
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Sankar Hariappan
>Assignee: Ashutosh Bapat
>Priority: Minor
>  Labels: DR, Replication
>
> Incremental REPL DUMP have the progress information logged as 
> "eventsDumpProgress":"49/0".
> It should actually log the estimated number of events are denominator but it 
> is coming as 0 always.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20542) Incremental REPL DUMP progress information log message is incorrect.

2018-09-20 Thread Ashutosh Bapat (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621963#comment-16621963
 ] 

Ashutosh Bapat commented on HIVE-20542:
---

To reproduce the issue run
mvn test -DskipSparkTests -Dtest=TestReplicationScenarios#testIncrementalLoad
Check itests/hive-unit/target/tmp/log/hive.log for messages with 
eventsDumpProgress.

> Incremental REPL DUMP progress information log message is incorrect.
> 
>
> Key: HIVE-20542
> URL: https://issues.apache.org/jira/browse/HIVE-20542
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Sankar Hariappan
>Assignee: Ashutosh Bapat
>Priority: Minor
>  Labels: DR, Replication
>
> Incremental REPL DUMP have the progress information logged as 
> "eventsDumpProgress":"49/0".
> It should actually log the estimated number of events are denominator but it 
> is coming as 0 always.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20542) Incremental REPL DUMP progress information log message is incorrect.

2018-09-20 Thread Ashutosh Bapat (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621975#comment-16621975
 ] 

Ashutosh Bapat commented on HIVE-20542:
---

There are two issues
1. For every event, DbNotificationListner adds an event in the metastore using 
addNotificationLog() method. When the event has no database associated with it 
e.g. an open transaction event, this method inserts a row with dbname set to 
'null' instead of NULL. The fix is not to list DB_NAME column when inserting 
such events. The dbname is set to 'null' because of we try to use a null String 
object in concatenation. This seems to be an unintentional side-effect.

2. When reporting the estimated number of events in 
ObjectStore::getNotificationEventsCount() the query has equality condition on 
db_name and catname columns. The condition filters out all the events with NULL 
dbname and NULL catname, i.e. the events related to transactions. These events 
are important for replication since they need to be replicated to provide 
consistent data to the readers running in parallel to the load operation. The 
fix is to add a "OR is NULL clause on the DB_NAME and CAT_NAME".

Attached patch fixes both these issues.

> Incremental REPL DUMP progress information log message is incorrect.
> 
>
> Key: HIVE-20542
> URL: https://issues.apache.org/jira/browse/HIVE-20542
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Sankar Hariappan
>Assignee: Ashutosh Bapat
>Priority: Minor
>  Labels: DR, Replication
>
> Incremental REPL DUMP have the progress information logged as 
> "eventsDumpProgress":"49/0".
> It should actually log the estimated number of events are denominator but it 
> is coming as 0 always.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20542) Incremental REPL DUMP progress information log message is incorrect.

2018-09-20 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-20542:
--
Attachment: HIVE-20542.01.patch

> Incremental REPL DUMP progress information log message is incorrect.
> 
>
> Key: HIVE-20542
> URL: https://issues.apache.org/jira/browse/HIVE-20542
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Sankar Hariappan
>Assignee: Ashutosh Bapat
>Priority: Minor
>  Labels: DR, Replication
> Attachments: HIVE-20542.01.patch
>
>
> Incremental REPL DUMP have the progress information logged as 
> "eventsDumpProgress":"49/0".
> It should actually log the estimated number of events are denominator but it 
> is coming as 0 always.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20542) Incremental REPL DUMP progress information log message is incorrect.

2018-09-20 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-20542:
--
Attachment: (was: HIVE-20542.01.patch)

> Incremental REPL DUMP progress information log message is incorrect.
> 
>
> Key: HIVE-20542
> URL: https://issues.apache.org/jira/browse/HIVE-20542
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Sankar Hariappan
>Assignee: Ashutosh Bapat
>Priority: Minor
>  Labels: DR, Replication
>
> Incremental REPL DUMP have the progress information logged as 
> "eventsDumpProgress":"49/0".
> It should actually log the estimated number of events are denominator but it 
> is coming as 0 always.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20542) Incremental REPL DUMP progress information log message is incorrect.

2018-09-20 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-20542:
--
Attachment: HIVE-20542.01.patch
Status: Patch Available  (was: Open)

> Incremental REPL DUMP progress information log message is incorrect.
> 
>
> Key: HIVE-20542
> URL: https://issues.apache.org/jira/browse/HIVE-20542
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Sankar Hariappan
>Assignee: Ashutosh Bapat
>Priority: Minor
>  Labels: DR, Replication
> Attachments: HIVE-20542.01.patch
>
>
> Incremental REPL DUMP have the progress information logged as 
> "eventsDumpProgress":"49/0".
> It should actually log the estimated number of events are denominator but it 
> is coming as 0 always.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20644) Avoid exposing sensitive infomation through an error message

2018-09-26 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat reassigned HIVE-20644:
-


> Avoid exposing sensitive infomation through an error message
> 
>
> Key: HIVE-20644
> URL: https://issues.apache.org/jira/browse/HIVE-20644
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Minor
>
> The HiveException raised from the following methods is exposing the datarow 
> the caused the run time exception.
>  # ReduceRecordSource::GroupIterator::next() - around line 372
>  # MapOperator::process() - around line 567
>  # ExecReducer::reduce() - around line 243
> In all the cases, a string representation of the row is constructed on the 
> fly and is included in
> the error message.
> VectorMapOperator::process() - around line 973 raises the same exception but 
> it's not exposing the row since the row contents are not included in the 
> error message.
> While trying to reproduce above error, I also found that the arguments to a 
> UDF get exposed in log messages from FunctionRegistry::invoke() around line 
> 1114. This too can cause sensitive information to be leaked through error 
> message.
> This way some sensitive information is leaked to a user through exception 
> message. That information may not be available to the user otherwise. Hence 
> it's a kind of security breach or violation of access control.
> The contents of the row or the arguments to a function may be useful for 
> debugging and hence it's worth to add those to logs. Hence proposal here to 
> log a separate message with log level DEBUG or INFO containing the string 
> representation of the row. Users can configure their logging so that 
> DEBUG/INFO messages do not go to the client but at the same time are 
> available in the hive server logs for debugging. The actual exception message 
> will not contain any sensitive data like row data or argument data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20644) Avoid exposing sensitive infomation through a Hive Runtime exception

2018-09-26 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-20644:
--
Summary: Avoid exposing sensitive infomation through a Hive Runtime 
exception  (was: Avoid exposing sensitive infomation through an error message)

> Avoid exposing sensitive infomation through a Hive Runtime exception
> 
>
> Key: HIVE-20644
> URL: https://issues.apache.org/jira/browse/HIVE-20644
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Minor
>
> The HiveException raised from the following methods is exposing the datarow 
> the caused the run time exception.
>  # ReduceRecordSource::GroupIterator::next() - around line 372
>  # MapOperator::process() - around line 567
>  # ExecReducer::reduce() - around line 243
> In all the cases, a string representation of the row is constructed on the 
> fly and is included in
> the error message.
> VectorMapOperator::process() - around line 973 raises the same exception but 
> it's not exposing the row since the row contents are not included in the 
> error message.
> While trying to reproduce above error, I also found that the arguments to a 
> UDF get exposed in log messages from FunctionRegistry::invoke() around line 
> 1114. This too can cause sensitive information to be leaked through error 
> message.
> This way some sensitive information is leaked to a user through exception 
> message. That information may not be available to the user otherwise. Hence 
> it's a kind of security breach or violation of access control.
> The contents of the row or the arguments to a function may be useful for 
> debugging and hence it's worth to add those to logs. Hence proposal here to 
> log a separate message with log level DEBUG or INFO containing the string 
> representation of the row. Users can configure their logging so that 
> DEBUG/INFO messages do not go to the client but at the same time are 
> available in the hive server logs for debugging. The actual exception message 
> will not contain any sensitive data like row data or argument data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20644) Avoid exposing sensitive infomation through a Hive Runtime exception

2018-09-27 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-20644:
--
Attachment: HIVE-20644.01

> Avoid exposing sensitive infomation through a Hive Runtime exception
> 
>
> Key: HIVE-20644
> URL: https://issues.apache.org/jira/browse/HIVE-20644
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-20644.01
>
>
> The HiveException raised from the following methods is exposing the datarow 
> the caused the run time exception.
>  # ReduceRecordSource::GroupIterator::next() - around line 372
>  # MapOperator::process() - around line 567
>  # ExecReducer::reduce() - around line 243
> In all the cases, a string representation of the row is constructed on the 
> fly and is included in
> the error message.
> VectorMapOperator::process() - around line 973 raises the same exception but 
> it's not exposing the row since the row contents are not included in the 
> error message.
> While trying to reproduce above error, I also found that the arguments to a 
> UDF get exposed in log messages from FunctionRegistry::invoke() around line 
> 1114. This too can cause sensitive information to be leaked through error 
> message.
> This way some sensitive information is leaked to a user through exception 
> message. That information may not be available to the user otherwise. Hence 
> it's a kind of security breach or violation of access control.
> The contents of the row or the arguments to a function may be useful for 
> debugging and hence it's worth to add those to logs. Hence proposal here to 
> log a separate message with log level DEBUG or INFO containing the string 
> representation of the row. Users can configure their logging so that 
> DEBUG/INFO messages do not go to the client but at the same time are 
> available in the hive server logs for debugging. The actual exception message 
> will not contain any sensitive data like row data or argument data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20644) Avoid exposing sensitive infomation through a Hive Runtime exception

2018-09-27 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-20644:
--
Attachment: (was: HIVE-20644.01)

> Avoid exposing sensitive infomation through a Hive Runtime exception
> 
>
> Key: HIVE-20644
> URL: https://issues.apache.org/jira/browse/HIVE-20644
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Minor
>  Labels: pull-request-available
>
> The HiveException raised from the following methods is exposing the datarow 
> the caused the run time exception.
>  # ReduceRecordSource::GroupIterator::next() - around line 372
>  # MapOperator::process() - around line 567
>  # ExecReducer::reduce() - around line 243
> In all the cases, a string representation of the row is constructed on the 
> fly and is included in
> the error message.
> VectorMapOperator::process() - around line 973 raises the same exception but 
> it's not exposing the row since the row contents are not included in the 
> error message.
> While trying to reproduce above error, I also found that the arguments to a 
> UDF get exposed in log messages from FunctionRegistry::invoke() around line 
> 1114. This too can cause sensitive information to be leaked through error 
> message.
> This way some sensitive information is leaked to a user through exception 
> message. That information may not be available to the user otherwise. Hence 
> it's a kind of security breach or violation of access control.
> The contents of the row or the arguments to a function may be useful for 
> debugging and hence it's worth to add those to logs. Hence proposal here to 
> log a separate message with log level DEBUG or INFO containing the string 
> representation of the row. Users can configure their logging so that 
> DEBUG/INFO messages do not go to the client but at the same time are 
> available in the hive server logs for debugging. The actual exception message 
> will not contain any sensitive data like row data or argument data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20644) Avoid exposing sensitive infomation through a Hive Runtime exception

2018-09-27 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-20644:
--
Fix Version/s: 3.1.0
Affects Version/s: 3.1.0
   Attachment: HIVE-20644.01
   Status: Patch Available  (was: Open)

> Avoid exposing sensitive infomation through a Hive Runtime exception
> 
>
> Key: HIVE-20644
> URL: https://issues.apache.org/jira/browse/HIVE-20644
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.1.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.1.0
>
> Attachments: HIVE-20644.01
>
>
> The HiveException raised from the following methods is exposing the datarow 
> the caused the run time exception.
>  # ReduceRecordSource::GroupIterator::next() - around line 372
>  # MapOperator::process() - around line 567
>  # ExecReducer::reduce() - around line 243
> In all the cases, a string representation of the row is constructed on the 
> fly and is included in
> the error message.
> VectorMapOperator::process() - around line 973 raises the same exception but 
> it's not exposing the row since the row contents are not included in the 
> error message.
> While trying to reproduce above error, I also found that the arguments to a 
> UDF get exposed in log messages from FunctionRegistry::invoke() around line 
> 1114. This too can cause sensitive information to be leaked through error 
> message.
> This way some sensitive information is leaked to a user through exception 
> message. That information may not be available to the user otherwise. Hence 
> it's a kind of security breach or violation of access control.
> The contents of the row or the arguments to a function may be useful for 
> debugging and hence it's worth to add those to logs. Hence proposal here to 
> log a separate message with log level DEBUG or INFO containing the string 
> representation of the row. Users can configure their logging so that 
> DEBUG/INFO messages do not go to the client but at the same time are 
> available in the hive server logs for debugging. The actual exception message 
> will not contain any sensitive data like row data or argument data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21114) Create read-only transactions

2019-10-18 Thread Ashutosh Bapat (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954328#comment-16954328
 ] 

Ashutosh Bapat commented on HIVE-21114:
---

 

[~anishek] brought me to this one.

I quickly went through the patch, but haven't looked in detail. It looks like 
we will allocate a transaction id for a "read-only"  transaction as well but 
annotate it as "read-only". Whether a transaction (really a statement) is 
read-only is determined by parser/semantic analyzer. According to the SQL 
standard queries can have side-effects meaning an apparent SELECT query might 
change the db (through a subquery with UPDATE or a procedure/function invoked 
by a function running DML etc.). So relying on the parser to decide whether 
it's a read-only query may not be  the right approach.  Since the 
parser/semantic analyzer can deem a "read-only" statement as "write" to be on 
the safer side OR worst the other way round, which would be buggy. Also letting 
a user specify whether a transaction is read-only won't generally help in 
multi-statement scenario since the user may want to be on safer side to 
accommodate writes in a transaction. A strategy that regards a transaction to 
be read-only until a write will work here. The idea is to separate snapshot 
from transaction id. Use the first one for reading data and fetch the later one 
to track writes when the first write happens. So, if a query never allocated a 
write id, it never require to get a transaction id and thus never wrote 
anything. If we go that route, we will not create any transaction ids for 
read-only transactions and won't have corresponding events. Furthermore that 
might help us take out write-ids from the equation altogether.

During my investigation, I had also found another problem. Since the snapshot 
and transaction id are tied together right now, a transaction would never 
update its snapshot from one statement to the other in a multi-statement 
transaction. This means that a true "read committed' multi-statement 
transaction won't be possible. To fix that also requires us to separate 
snapshot and transaction id.

Please let me know if I am missing something.

> Create read-only transactions
> -
>
> Key: HIVE-21114
> URL: https://issues.apache.org/jira/browse/HIVE-21114
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0
>Reporter: Eugene Koifman
>Assignee: Denys Kuzmenko
>Priority: Major
> Attachments: HIVE-21114.1.patch, HIVE-21114.2.patch, 
> HIVE-21114.3.patch, HIVE-21114.4.patch, HIVE-21114.5.patch, 
> HIVE-21114.6.patch, HIVE-21114.7.patch
>
>
> With HIVE-21036 we have a way to indicate that a txn is read only.
> We should (at least in auto-commit mode) determine if the single stmt is a 
> read and mark the txn accordingly.  
> Then we can optimize {{TxnHandler.commitTxn()}} so that it doesn't do any 
> checks in write_set etc.
> {{TxnHandler.commitTxn()}} already starts with {{lockTransactionRecord(stmt, 
> txnid, TXN_OPEN)}} so it can read the txn type in the same SQL stmt.
> HiveOperation only has QUERY, which includes Insert and Select, so this 
> requires figuring out how to determine if a query is a SELECT.  By the time 
> {{Driver.openTransaction();}} is called, we have already parsed the query so 
> there should be a way to know if the statement only reads.
> For multi-stmt txns (once these are supported) we should allow user to 
> indicate that a txn is read-only and then not allow any statements that can 
> make modifications in this txn.  This should be a different jira.
> cc [~ikryvenko]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-21114) Create read-only transactions

2019-10-18 Thread Ashutosh Bapat (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954328#comment-16954328
 ] 

Ashutosh Bapat edited comment on HIVE-21114 at 10/18/19 7:04 AM:
-

 

[~anishek] brought me to this one.

I quickly went through the patch, but haven't looked in detail. It looks like 
we will allocate a transaction id for a "read-only"  transaction as well but 
annotate it as "read-only". Whether a transaction (really a statement) is 
read-only is determined by parser/semantic analyzer. According to the SQL 
standard queries can have side-effects meaning an apparent SELECT query might 
change the db (through a subquery with UPDATE or a procedure/function invoked 
by a function running DML etc.). So relying on the parser to decide whether 
it's a read-only query may not be  the right approach.  Since the 
parser/semantic analyzer can deem a "read-only" statement as "write" to be on 
the safer side OR worst the other way round, which would be buggy. Also letting 
a user specify whether a transaction is read-only won't generally help in 
multi-statement scenario since the user may want to be on safer side to 
accommodate writes in a transaction. A strategy that regards a transaction to 
be read-only until a write will work here. The idea is to separate snapshot 
from transaction id. Use the first one for reading data and fetch the later one 
to track writes when the first write happens. So, if a query never allocated a 
write id, it never require to get a transaction id and thus never wrote 
anything. If we go that route, we will not create any transaction ids for 
read-only transactions and won't have corresponding events. Furthermore that 
might help us take out write-ids from the equation altogether.

During my investigation, I had also found another problem. Since the snapshot 
and transaction id are tied together right now, a transaction would never 
update its snapshot from one statement to the other in a multi-statement 
transaction. This means that a true "read committed' multi-statement 
transaction won't be possible. To fix that also requires us to separate 
snapshot and transaction id.

Transactions at serializable isolation are different through. They are 
considered to be "write" even if they are "read-only", since they block writes 
to the dataset already read. That might be another issue to look at.

Please let me know if I am missing something.


was (Author: ashutosh.bapat):
 

[~anishek] brought me to this one.

I quickly went through the patch, but haven't looked in detail. It looks like 
we will allocate a transaction id for a "read-only"  transaction as well but 
annotate it as "read-only". Whether a transaction (really a statement) is 
read-only is determined by parser/semantic analyzer. According to the SQL 
standard queries can have side-effects meaning an apparent SELECT query might 
change the db (through a subquery with UPDATE or a procedure/function invoked 
by a function running DML etc.). So relying on the parser to decide whether 
it's a read-only query may not be  the right approach.  Since the 
parser/semantic analyzer can deem a "read-only" statement as "write" to be on 
the safer side OR worst the other way round, which would be buggy. Also letting 
a user specify whether a transaction is read-only won't generally help in 
multi-statement scenario since the user may want to be on safer side to 
accommodate writes in a transaction. A strategy that regards a transaction to 
be read-only until a write will work here. The idea is to separate snapshot 
from transaction id. Use the first one for reading data and fetch the later one 
to track writes when the first write happens. So, if a query never allocated a 
write id, it never require to get a transaction id and thus never wrote 
anything. If we go that route, we will not create any transaction ids for 
read-only transactions and won't have corresponding events. Furthermore that 
might help us take out write-ids from the equation altogether.

During my investigation, I had also found another problem. Since the snapshot 
and transaction id are tied together right now, a transaction would never 
update its snapshot from one statement to the other in a multi-statement 
transaction. This means that a true "read committed' multi-statement 
transaction won't be possible. To fix that also requires us to separate 
snapshot and transaction id.

Please let me know if I am missing something.

> Create read-only transactions
> -
>
> Key: HIVE-21114
> URL: https://issues.apache.org/jira/browse/HIVE-21114
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0
>Reporter: Eugene Koifman
>Assignee: Denys Kuzmenko
>Priority: Major
> Attachmen

[jira] [Commented] (HIVE-21114) Create read-only transactions

2019-10-24 Thread Ashutosh Bapat (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958796#comment-16958796
 ] 

Ashutosh Bapat commented on HIVE-21114:
---

[~dkuzmenko], I don't have a Review Board account. May be you can create a pull 
request which is what I am familiar with. 

Anyway, here are some comments.

Changes in isValidTxnListState(), getTransactionalTableList() mostly look 
refactoring changes. Are they essential to rest of the patch?

+ assert queryTxnType != TxnType.READ_ONLY || getWrittenTables(plan).isEmpty() 
: String.format( + "Inferred transaction type '%s' doesn't conform to the 
actual query string '%s'", + queryTxnType, queryState.getQueryString()); The 
assertion may not be useful in production code. Can we please convert this into 
an Exception? It will be better to add an exception in 
allocate_table_write_ids() or TxnHandler#allocateTableWriteIds(), if the 
request is to allocate writeId for a read-only transaction. 

+ new int[]\{HiveParser.TOK_INSERT_INTO}, + new int[]\{HiveParser.TOK_INSERT, 
HiveParser.TOK_TAB}) - shouldn't we also worry about TOK_UPDATE*, TOK_DELETE*

getTxnType() should also check for at least existance of UDFs, and deem the 
query as non-read-only if one exists. Given the cryptic code of this function, 
please add an "English" :) prologue for the function explaining the login in 
the function.

Can you please add some tests for READ_ONLY txntype in TestDbTxnManager.java?

It looks like TestParseUtils should be pretty extensive, esp. since we want to 
error on safer side always. That's the fear I have. We may not be able to cover 
all the corner cases and possibilities. You need to add at least materialized 
views, combinations of partitioned and non-partitioned tables etc.

What happens if a query involves a view and the view has UPDATE in its 
definition?

 

> Create read-only transactions
> -
>
> Key: HIVE-21114
> URL: https://issues.apache.org/jira/browse/HIVE-21114
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0
>Reporter: Eugene Koifman
>Assignee: Denys Kuzmenko
>Priority: Major
> Attachments: HIVE-21114.1.patch, HIVE-21114.2.patch, 
> HIVE-21114.3.patch, HIVE-21114.4.patch, HIVE-21114.5.patch, 
> HIVE-21114.6.patch, HIVE-21114.7.patch
>
>
> With HIVE-21036 we have a way to indicate that a txn is read only.
> We should (at least in auto-commit mode) determine if the single stmt is a 
> read and mark the txn accordingly.  
> Then we can optimize {{TxnHandler.commitTxn()}} so that it doesn't do any 
> checks in write_set etc.
> {{TxnHandler.commitTxn()}} already starts with {{lockTransactionRecord(stmt, 
> txnid, TXN_OPEN)}} so it can read the txn type in the same SQL stmt.
> HiveOperation only has QUERY, which includes Insert and Select, so this 
> requires figuring out how to determine if a query is a SELECT.  By the time 
> {{Driver.openTransaction();}} is called, we have already parsed the query so 
> there should be a way to know if the statement only reads.
> For multi-stmt txns (once these are supported) we should allow user to 
> indicate that a txn is read-only and then not allow any statements that can 
> make modifications in this txn.  This should be a different jira.
> cc [~ikryvenko]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21114) Create read-only transactions

2019-10-29 Thread Ashutosh Bapat (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962750#comment-16962750
 ] 

Ashutosh Bapat commented on HIVE-21114:
---

I would request a small change in the error report from allocateWriteIds(). The 
current error message "This should never happen for txnIds: " + txnIds, was 
fine because the condition should never happen and there was only one condition 
i.e. non-open transaction. But now there are two conditions i.e. non-open txn 
and read-only txn. We should be a bit elaborate in conveying these two 
conditions. Better even, if we could tell which txns and in what condition.

Otherwise, [~dkuzmenko], the changes look ok to me.

> Create read-only transactions
> -
>
> Key: HIVE-21114
> URL: https://issues.apache.org/jira/browse/HIVE-21114
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0
>Reporter: Eugene Koifman
>Assignee: Denys Kuzmenko
>Priority: Major
> Attachments: HIVE-21114.1.patch, HIVE-21114.2.patch, 
> HIVE-21114.3.patch, HIVE-21114.4.patch, HIVE-21114.5.patch, 
> HIVE-21114.6.patch, HIVE-21114.7.patch, HIVE-21114.8.patch
>
>
> With HIVE-21036 we have a way to indicate that a txn is read only.
> We should (at least in auto-commit mode) determine if the single stmt is a 
> read and mark the txn accordingly.  
> Then we can optimize {{TxnHandler.commitTxn()}} so that it doesn't do any 
> checks in write_set etc.
> {{TxnHandler.commitTxn()}} already starts with {{lockTransactionRecord(stmt, 
> txnid, TXN_OPEN)}} so it can read the txn type in the same SQL stmt.
> HiveOperation only has QUERY, which includes Insert and Select, so this 
> requires figuring out how to determine if a query is a SELECT.  By the time 
> {{Driver.openTransaction();}} is called, we have already parsed the query so 
> there should be a way to know if the statement only reads.
> For multi-stmt txns (once these are supported) we should allow user to 
> indicate that a txn is read-only and then not allow any statements that can 
> make modifications in this txn.  This should be a different jira.
> cc [~ikryvenko]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22512) Use direct SQL to fetch column privileges in refreshPrivileges

2019-11-19 Thread Ashutosh Bapat (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat reassigned HIVE-22512:
-


> Use direct SQL to fetch column privileges in refreshPrivileges
> --
>
> Key: HIVE-22512
> URL: https://issues.apache.org/jira/browse/HIVE-22512
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>
> refreshPrivileges() calls listTableAllColumnGrants() to fetch the column 
> level privileges. The later function retrieves the individual column objects 
> by firing one query per column privilege object, thus causing the backend db 
> to be swamped by these queries when PrivilegeSynchronizer is run. 
> PrivilegeSynchronizer synchronizes privileges of all the databases, tables 
> and columns and thus the backend db can get swamped really bad when there are 
> thousands of tables with hundreds of columns.
> The output of listTableAllColumnGrants() is not used completely so all the 
> columns the PM has tried to retrieves anyway goes waste.
> Fix this by using direct SQL to fetch column privileges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22512) Use direct SQL to fetch column privileges in refreshPrivileges

2019-11-19 Thread Ashutosh Bapat (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-22512:
--
Attachment: HIVE-22512.01.patch
Status: Patch Available  (was: Open)

> Use direct SQL to fetch column privileges in refreshPrivileges
> --
>
> Key: HIVE-22512
> URL: https://issues.apache.org/jira/browse/HIVE-22512
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
> Attachments: HIVE-22512.01.patch
>
>
> refreshPrivileges() calls listTableAllColumnGrants() to fetch the column 
> level privileges. The later function retrieves the individual column objects 
> by firing one query per column privilege object, thus causing the backend db 
> to be swamped by these queries when PrivilegeSynchronizer is run. 
> PrivilegeSynchronizer synchronizes privileges of all the databases, tables 
> and columns and thus the backend db can get swamped really bad when there are 
> thousands of tables with hundreds of columns.
> The output of listTableAllColumnGrants() is not used completely so all the 
> columns the PM has tried to retrieves anyway goes waste.
> Fix this by using direct SQL to fetch column privileges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22428) Remove superfluous "Failed to get database" WARN Logging in ObjectStore

2019-11-19 Thread Ashutosh Bapat (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978105#comment-16978105
 ] 

Ashutosh Bapat commented on HIVE-22428:
---

[~belugabehr], [~mgergely],

In this commit, you have modified debugLog() and pass (new Exception()) to 
LOG.debug() instead of earlier getStackTrace(). (new Exception()) prints the 
whole stack trace which can be misled as some error condition/exception. But 
getStackTrace() printed just a few frames rationalising the output, which was 
easy not to mistake for an actual exception.

I was looking at the debug output of one of my tests and had this confusion. It 
took me a bit of time to know that it wasn't a real exception. Here's how it 
looked. I thought it was NullPointer exception or something like that.
{code:java}
2019-11-19T08:13:31,392 DEBUG [PrivilegeSynchronizer] metastore.ObjectStore: 
Commit transaction: count = 1, isactive true
java.lang.Exception: null
 at 
org.apache.hadoop.hive.metastore.ObjectStore.debugLog(ObjectStore.java:9671) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:475)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.commit(ObjectStore.java:3707)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:3608)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.metastore.ObjectStore.getTableAllColumnGrants(ObjectStore.java:6521)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.metastore.ObjectStore.refreshPrivileges(ObjectStore.java:6455)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) ~[?:?]
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_222]
 at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222]
 at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) 
[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at com.sun.proxy.$Proxy40.refreshPrivileges(Unknown Source) [?:?]
 at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.refresh_privileges(HiveMetaStore.java:7136)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) ~[?:?]
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_222]{code}
Is it possible to revert back to the old method or at least sanitize the output 
so that it doesn't look like a real exception?

> Remove superfluous "Failed to get database" WARN Logging in ObjectStore
> ---
>
> Key: HIVE-22428
> URL: https://issues.apache.org/jira/browse/HIVE-22428
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22428.1.patch
>
>
> In my testing, I get lots of logs like this:
> {code:none}
>   Line 26319: 2019-10-28T21:09:52,134  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.compdb, returning 
> NoSuchObjectException
>   Line 26327: 2019-10-28T21:09:52,135  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.compdb, returning 
> NoSuchObjectException
>   Line 26504: 2019-10-28T21:09:52,600  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.tstatsfast, returning 
> NoSuchObjectException
>   Line 26519: 2019-10-28T21:09:52,606  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.tstatsfast, returning 
> NoSuchObjectException
>   Line 26695: 2019-10-28T21:09:52,922  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.createDb, returning 
> NoSuchObjectException
>   Line 26703: 2019-10-28T21:09:52,923  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.createDb, returning 
> NoSuchObjectException
>   Line 26763: 2019-10-28T21:09:52,936  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.compdb, returning 
> NoSuchObjectException
>   Line 26778: 2019-10-28T21:09:52,939  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.compdb, returning 
> NoSuchObjectException
>   Line 26963: 2019-10-28T21:09:53,273  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.db1, returning 
> NoSuchObjectException
>   Line 26978: 2019-10-28T21:09:53,276  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to ge

[jira] [Commented] (HIVE-22428) Remove superfluous "Failed to get database" WARN Logging in ObjectStore

2019-11-20 Thread Ashutosh Bapat (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978389#comment-16978389
 ] 

Ashutosh Bapat commented on HIVE-22428:
---

Looks ok. A slight suggestion "Thread Stack Trace for debugging (Not an 
Error)". Somehow indicate that this is a debug output.

> Remove superfluous "Failed to get database" WARN Logging in ObjectStore
> ---
>
> Key: HIVE-22428
> URL: https://issues.apache.org/jira/browse/HIVE-22428
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22428.1.patch
>
>
> In my testing, I get lots of logs like this:
> {code:none}
>   Line 26319: 2019-10-28T21:09:52,134  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.compdb, returning 
> NoSuchObjectException
>   Line 26327: 2019-10-28T21:09:52,135  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.compdb, returning 
> NoSuchObjectException
>   Line 26504: 2019-10-28T21:09:52,600  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.tstatsfast, returning 
> NoSuchObjectException
>   Line 26519: 2019-10-28T21:09:52,606  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.tstatsfast, returning 
> NoSuchObjectException
>   Line 26695: 2019-10-28T21:09:52,922  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.createDb, returning 
> NoSuchObjectException
>   Line 26703: 2019-10-28T21:09:52,923  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.createDb, returning 
> NoSuchObjectException
>   Line 26763: 2019-10-28T21:09:52,936  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.compdb, returning 
> NoSuchObjectException
>   Line 26778: 2019-10-28T21:09:52,939  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.compdb, returning 
> NoSuchObjectException
>   Line 26963: 2019-10-28T21:09:53,273  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.db1, returning 
> NoSuchObjectException
>   Line 26978: 2019-10-28T21:09:53,276  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.db2, returning 
> NoSuchObjectException
>   Line 26986: 2019-10-28T21:09:53,277  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.db1, returning 
> NoSuchObjectException
>   Line 27018: 2019-10-28T21:09:53,300  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.db2, returning 
> NoSuchObjectException
> {code}
> This is a superfluous log message.  It might be pretty common for a database 
> to not exists if, for example, a user fat-fingers the name of the database.  
> The code also has the bad habit of log-and-throw.  Just log or throw, not 
> both.
> Since I'm looking at this class, touch up some of the other logging as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22512) Use direct SQL to fetch column privileges in refreshPrivileges

2019-11-20 Thread Ashutosh Bapat (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-22512:
--
Status: In Progress  (was: Patch Available)

> Use direct SQL to fetch column privileges in refreshPrivileges
> --
>
> Key: HIVE-22512
> URL: https://issues.apache.org/jira/browse/HIVE-22512
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22512.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> refreshPrivileges() calls listTableAllColumnGrants() to fetch the column 
> level privileges. The later function retrieves the individual column objects 
> by firing one query per column privilege object, thus causing the backend db 
> to be swamped by these queries when PrivilegeSynchronizer is run. 
> PrivilegeSynchronizer synchronizes privileges of all the databases, tables 
> and columns and thus the backend db can get swamped really bad when there are 
> thousands of tables with hundreds of columns.
> The output of listTableAllColumnGrants() is not used completely so all the 
> columns the PM has tried to retrieves anyway goes waste.
> Fix this by using direct SQL to fetch column privileges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22512) Use direct SQL to fetch column privileges in refreshPrivileges

2019-11-20 Thread Ashutosh Bapat (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-22512:
--
Attachment: HIVE-22512.02.patch
Status: Patch Available  (was: In Progress)

Checkstyle comments fixed.

Some checkstyle comments are existing ones, which I haven't fixed.

> Use direct SQL to fetch column privileges in refreshPrivileges
> --
>
> Key: HIVE-22512
> URL: https://issues.apache.org/jira/browse/HIVE-22512
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22512.01.patch, HIVE-22512.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> refreshPrivileges() calls listTableAllColumnGrants() to fetch the column 
> level privileges. The later function retrieves the individual column objects 
> by firing one query per column privilege object, thus causing the backend db 
> to be swamped by these queries when PrivilegeSynchronizer is run. 
> PrivilegeSynchronizer synchronizes privileges of all the databases, tables 
> and columns and thus the backend db can get swamped really bad when there are 
> thousands of tables with hundreds of columns.
> The output of listTableAllColumnGrants() is not used completely so all the 
> columns the PM has tried to retrieves anyway goes waste.
> Fix this by using direct SQL to fetch column privileges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22512) Use direct SQL to fetch column privileges in refreshPrivileges

2019-11-21 Thread Ashutosh Bapat (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-22512:
--
Attachment: HIVE-22512.03.patch
Status: Patch Available  (was: In Progress)

Patch with Mahesh's comments addressed.

> Use direct SQL to fetch column privileges in refreshPrivileges
> --
>
> Key: HIVE-22512
> URL: https://issues.apache.org/jira/browse/HIVE-22512
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22512.01.patch, HIVE-22512.02.patch, 
> HIVE-22512.03.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> refreshPrivileges() calls listTableAllColumnGrants() to fetch the column 
> level privileges. The later function retrieves the individual column objects 
> by firing one query per column privilege object, thus causing the backend db 
> to be swamped by these queries when PrivilegeSynchronizer is run. 
> PrivilegeSynchronizer synchronizes privileges of all the databases, tables 
> and columns and thus the backend db can get swamped really bad when there are 
> thousands of tables with hundreds of columns.
> The output of listTableAllColumnGrants() is not used completely so all the 
> columns the PM has tried to retrieves anyway goes waste.
> Fix this by using direct SQL to fetch column privileges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22512) Use direct SQL to fetch column privileges in refreshPrivileges

2019-11-21 Thread Ashutosh Bapat (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-22512:
--
Status: In Progress  (was: Patch Available)

> Use direct SQL to fetch column privileges in refreshPrivileges
> --
>
> Key: HIVE-22512
> URL: https://issues.apache.org/jira/browse/HIVE-22512
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22512.01.patch, HIVE-22512.02.patch, 
> HIVE-22512.03.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> refreshPrivileges() calls listTableAllColumnGrants() to fetch the column 
> level privileges. The later function retrieves the individual column objects 
> by firing one query per column privilege object, thus causing the backend db 
> to be swamped by these queries when PrivilegeSynchronizer is run. 
> PrivilegeSynchronizer synchronizes privileges of all the databases, tables 
> and columns and thus the backend db can get swamped really bad when there are 
> thousands of tables with hundreds of columns.
> The output of listTableAllColumnGrants() is not used completely so all the 
> columns the PM has tried to retrieves anyway goes waste.
> Fix this by using direct SQL to fetch column privileges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22428) Remove superfluous "Failed to get database" WARN Logging in ObjectStore

2019-11-24 Thread Ashutosh Bapat (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981295#comment-16981295
 ] 

Ashutosh Bapat commented on HIVE-22428:
---

[~belugabehr],

That's true, however, that DEBUG is printed on the previous line and not on the 
line where "Thread Stack Trace ..." is printed. Since outputs from different 
threads can be intermingled, usually the two consecutive lines are not 
associated with each other.

> Remove superfluous "Failed to get database" WARN Logging in ObjectStore
> ---
>
> Key: HIVE-22428
> URL: https://issues.apache.org/jira/browse/HIVE-22428
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22428.1.patch
>
>
> In my testing, I get lots of logs like this:
> {code:none}
>   Line 26319: 2019-10-28T21:09:52,134  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.compdb, returning 
> NoSuchObjectException
>   Line 26327: 2019-10-28T21:09:52,135  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.compdb, returning 
> NoSuchObjectException
>   Line 26504: 2019-10-28T21:09:52,600  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.tstatsfast, returning 
> NoSuchObjectException
>   Line 26519: 2019-10-28T21:09:52,606  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.tstatsfast, returning 
> NoSuchObjectException
>   Line 26695: 2019-10-28T21:09:52,922  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.createDb, returning 
> NoSuchObjectException
>   Line 26703: 2019-10-28T21:09:52,923  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.createDb, returning 
> NoSuchObjectException
>   Line 26763: 2019-10-28T21:09:52,936  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.compdb, returning 
> NoSuchObjectException
>   Line 26778: 2019-10-28T21:09:52,939  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.compdb, returning 
> NoSuchObjectException
>   Line 26963: 2019-10-28T21:09:53,273  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.db1, returning 
> NoSuchObjectException
>   Line 26978: 2019-10-28T21:09:53,276  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.db2, returning 
> NoSuchObjectException
>   Line 26986: 2019-10-28T21:09:53,277  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.db1, returning 
> NoSuchObjectException
>   Line 27018: 2019-10-28T21:09:53,300  WARN [pool-6-thread-5] 
> metastore.ObjectStore: Failed to get database hive.db2, returning 
> NoSuchObjectException
> {code}
> This is a superfluous log message.  It might be pretty common for a database 
> to not exists if, for example, a user fat-fingers the name of the database.  
> The code also has the bad habit of log-and-throw.  Just log or throw, not 
> both.
> Since I'm looking at this class, touch up some of the other logging as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22512) Use direct SQL to fetch column privileges in refreshPrivileges

2019-11-24 Thread Ashutosh Bapat (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-22512:
--
Attachment: (was: HIVE-22512.03.patch)

> Use direct SQL to fetch column privileges in refreshPrivileges
> --
>
> Key: HIVE-22512
> URL: https://issues.apache.org/jira/browse/HIVE-22512
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22512.01.patch, HIVE-22512.02.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> refreshPrivileges() calls listTableAllColumnGrants() to fetch the column 
> level privileges. The later function retrieves the individual column objects 
> by firing one query per column privilege object, thus causing the backend db 
> to be swamped by these queries when PrivilegeSynchronizer is run. 
> PrivilegeSynchronizer synchronizes privileges of all the databases, tables 
> and columns and thus the backend db can get swamped really bad when there are 
> thousands of tables with hundreds of columns.
> The output of listTableAllColumnGrants() is not used completely so all the 
> columns the PM has tried to retrieves anyway goes waste.
> Fix this by using direct SQL to fetch column privileges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22512) Use direct SQL to fetch column privileges in refreshPrivileges

2019-11-24 Thread Ashutosh Bapat (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-22512:
--
Status: Open  (was: Patch Available)

> Use direct SQL to fetch column privileges in refreshPrivileges
> --
>
> Key: HIVE-22512
> URL: https://issues.apache.org/jira/browse/HIVE-22512
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22512.01.patch, HIVE-22512.02.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> refreshPrivileges() calls listTableAllColumnGrants() to fetch the column 
> level privileges. The later function retrieves the individual column objects 
> by firing one query per column privilege object, thus causing the backend db 
> to be swamped by these queries when PrivilegeSynchronizer is run. 
> PrivilegeSynchronizer synchronizes privileges of all the databases, tables 
> and columns and thus the backend db can get swamped really bad when there are 
> thousands of tables with hundreds of columns.
> The output of listTableAllColumnGrants() is not used completely so all the 
> columns the PM has tried to retrieves anyway goes waste.
> Fix this by using direct SQL to fetch column privileges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22512) Use direct SQL to fetch column privileges in refreshPrivileges

2019-11-24 Thread Ashutosh Bapat (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-22512:
--
Attachment: HIVE-22512.03.patch
Status: Patch Available  (was: Open)

> Use direct SQL to fetch column privileges in refreshPrivileges
> --
>
> Key: HIVE-22512
> URL: https://issues.apache.org/jira/browse/HIVE-22512
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22512.01.patch, HIVE-22512.02.patch, 
> HIVE-22512.03.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> refreshPrivileges() calls listTableAllColumnGrants() to fetch the column 
> level privileges. The later function retrieves the individual column objects 
> by firing one query per column privilege object, thus causing the backend db 
> to be swamped by these queries when PrivilegeSynchronizer is run. 
> PrivilegeSynchronizer synchronizes privileges of all the databases, tables 
> and columns and thus the backend db can get swamped really bad when there are 
> thousands of tables with hundreds of columns.
> The output of listTableAllColumnGrants() is not used completely so all the 
> columns the PM has tried to retrieves anyway goes waste.
> Fix this by using direct SQL to fetch column privileges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22512) Use direct SQL to fetch column privileges in refreshPrivileges

2019-11-25 Thread Ashutosh Bapat (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-22512:
--
Status: Open  (was: Patch Available)

> Use direct SQL to fetch column privileges in refreshPrivileges
> --
>
> Key: HIVE-22512
> URL: https://issues.apache.org/jira/browse/HIVE-22512
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22512.01.patch, HIVE-22512.02.patch, 
> HIVE-22512.03.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> refreshPrivileges() calls listTableAllColumnGrants() to fetch the column 
> level privileges. The later function retrieves the individual column objects 
> by firing one query per column privilege object, thus causing the backend db 
> to be swamped by these queries when PrivilegeSynchronizer is run. 
> PrivilegeSynchronizer synchronizes privileges of all the databases, tables 
> and columns and thus the backend db can get swamped really bad when there are 
> thousands of tables with hundreds of columns.
> The output of listTableAllColumnGrants() is not used completely so all the 
> columns the PM has tried to retrieves anyway goes waste.
> Fix this by using direct SQL to fetch column privileges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22512) Use direct SQL to fetch column privileges in refreshPrivileges

2019-11-25 Thread Ashutosh Bapat (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-22512:
--
Attachment: HIVE-22512.03.patch
Status: Patch Available  (was: Open)

> Use direct SQL to fetch column privileges in refreshPrivileges
> --
>
> Key: HIVE-22512
> URL: https://issues.apache.org/jira/browse/HIVE-22512
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22512.01.patch, HIVE-22512.02.patch, 
> HIVE-22512.03.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> refreshPrivileges() calls listTableAllColumnGrants() to fetch the column 
> level privileges. The later function retrieves the individual column objects 
> by firing one query per column privilege object, thus causing the backend db 
> to be swamped by these queries when PrivilegeSynchronizer is run. 
> PrivilegeSynchronizer synchronizes privileges of all the databases, tables 
> and columns and thus the backend db can get swamped really bad when there are 
> thousands of tables with hundreds of columns.
> The output of listTableAllColumnGrants() is not used completely so all the 
> columns the PM has tried to retrieves anyway goes waste.
> Fix this by using direct SQL to fetch column privileges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22512) Use direct SQL to fetch column privileges in refreshPrivileges

2019-11-25 Thread Ashutosh Bapat (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-22512:
--
Attachment: (was: HIVE-22512.03.patch)

> Use direct SQL to fetch column privileges in refreshPrivileges
> --
>
> Key: HIVE-22512
> URL: https://issues.apache.org/jira/browse/HIVE-22512
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22512.01.patch, HIVE-22512.02.patch, 
> HIVE-22512.03.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> refreshPrivileges() calls listTableAllColumnGrants() to fetch the column 
> level privileges. The later function retrieves the individual column objects 
> by firing one query per column privilege object, thus causing the backend db 
> to be swamped by these queries when PrivilegeSynchronizer is run. 
> PrivilegeSynchronizer synchronizes privileges of all the databases, tables 
> and columns and thus the backend db can get swamped really bad when there are 
> thousands of tables with hundreds of columns.
> The output of listTableAllColumnGrants() is not used completely so all the 
> columns the PM has tried to retrieves anyway goes waste.
> Fix this by using direct SQL to fetch column privileges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22512) Use direct SQL to fetch column privileges in refreshPrivileges

2019-11-25 Thread Ashutosh Bapat (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-22512:
--
Status: Open  (was: Patch Available)

> Use direct SQL to fetch column privileges in refreshPrivileges
> --
>
> Key: HIVE-22512
> URL: https://issues.apache.org/jira/browse/HIVE-22512
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22512.01.patch, HIVE-22512.02.patch, 
> HIVE-22512.03.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> refreshPrivileges() calls listTableAllColumnGrants() to fetch the column 
> level privileges. The later function retrieves the individual column objects 
> by firing one query per column privilege object, thus causing the backend db 
> to be swamped by these queries when PrivilegeSynchronizer is run. 
> PrivilegeSynchronizer synchronizes privileges of all the databases, tables 
> and columns and thus the backend db can get swamped really bad when there are 
> thousands of tables with hundreds of columns.
> The output of listTableAllColumnGrants() is not used completely so all the 
> columns the PM has tried to retrieves anyway goes waste.
> Fix this by using direct SQL to fetch column privileges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22512) Use direct SQL to fetch column privileges in refreshPrivileges

2019-11-25 Thread Ashutosh Bapat (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-22512:
--
Attachment: HIVE-22512.03.patch
Status: Patch Available  (was: Open)

> Use direct SQL to fetch column privileges in refreshPrivileges
> --
>
> Key: HIVE-22512
> URL: https://issues.apache.org/jira/browse/HIVE-22512
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22512.01.patch, HIVE-22512.02.patch, 
> HIVE-22512.03.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> refreshPrivileges() calls listTableAllColumnGrants() to fetch the column 
> level privileges. The later function retrieves the individual column objects 
> by firing one query per column privilege object, thus causing the backend db 
> to be swamped by these queries when PrivilegeSynchronizer is run. 
> PrivilegeSynchronizer synchronizes privileges of all the databases, tables 
> and columns and thus the backend db can get swamped really bad when there are 
> thousands of tables with hundreds of columns.
> The output of listTableAllColumnGrants() is not used completely so all the 
> columns the PM has tried to retrieves anyway goes waste.
> Fix this by using direct SQL to fetch column privileges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22512) Use direct SQL to fetch column privileges in refreshPrivileges

2019-11-25 Thread Ashutosh Bapat (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-22512:
--
Attachment: (was: HIVE-22512.03.patch)

> Use direct SQL to fetch column privileges in refreshPrivileges
> --
>
> Key: HIVE-22512
> URL: https://issues.apache.org/jira/browse/HIVE-22512
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22512.01.patch, HIVE-22512.02.patch, 
> HIVE-22512.03.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> refreshPrivileges() calls listTableAllColumnGrants() to fetch the column 
> level privileges. The later function retrieves the individual column objects 
> by firing one query per column privilege object, thus causing the backend db 
> to be swamped by these queries when PrivilegeSynchronizer is run. 
> PrivilegeSynchronizer synchronizes privileges of all the databases, tables 
> and columns and thus the backend db can get swamped really bad when there are 
> thousands of tables with hundreds of columns.
> The output of listTableAllColumnGrants() is not used completely so all the 
> columns the PM has tried to retrieves anyway goes waste.
> Fix this by using direct SQL to fetch column privileges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22559) Maintain ownership of parent directories of an external table directory after replication

2019-11-28 Thread Ashutosh Bapat (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat reassigned HIVE-22559:
-


> Maintain ownership of parent directories of an external table directory after 
> replication
> -
>
> Key: HIVE-22559
> URL: https://issues.apache.org/jira/browse/HIVE-22559
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashutosh Bapat
>Assignee: Anishek Agarwal
>Priority: Major
>
> For replicating an external table we specify a base directory on the target 
> (say /base_ext for example). The path of an external table directory on the 
> source (say /xyz/abc/ext_t1) is prefixed with the base directory on the 
> target (/base_ext in our example) when replicating the external table data. 
> Thus the path of the external table on the target becomes 
> /base_ext/xyz/abc/ext_t1. In this path only the ownership permissions of 
> ext_t1 directory is preserved but the owenship of xyz and abc directories is 
> set to the user executing REPL LOAD. Instead we should preserve the ownership 
> of xyz and abc as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22529) Make Debugging Stacktrace More Explicit

2019-12-06 Thread Ashutosh Bapat (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989670#comment-16989670
 ] 

Ashutosh Bapat commented on HIVE-22529:
---

Sorry for the delay. The patch looks good to me. I see that it's already 
committed. Thanks for working on it, [~belugabehr] and [~vgarg].

> Make Debugging Stacktrace More Explicit
> ---
>
> Key: HIVE-22529
> URL: https://issues.apache.org/jira/browse/HIVE-22529
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-22529.1.patch, HIVE-22529.1.patch
>
>
> In some places, the following DEBUG logging was introduced:
> {code:java}
> LOG.debug("Message", new Exception());
> {code}
> The purpose of this is to log the stack trace of the Thread calling this 
> debug logging method.  However, the resulting log message includes the 
> following:
> {code:none}
> 2019-11-19T08:13:31,392 DEBUG [Thread] Logger: Message
> java.lang.Exception: null
>  at 
> {code}
> To the observer, it appears that there was perhaps some sort of NPE.  Add a 
> message to the Exception being generated to make it more clear that this 
> "Exception" is for debugging purposes and not an actual error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22546) Postgres schema not using quoted identifiers for certain tables

2019-12-18 Thread Ashutosh Bapat (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16999759#comment-16999759
 ] 

Ashutosh Bapat commented on HIVE-22546:
---

[~pvary], [~zchovan], this change is causing following stack trace when I run 
Hive with PostgreSQL as backend db for the metastore.

0: jdbc:hive2://localhost:1> create database dumpdb with 
('repl.source.for'='1,2,3');0: jdbc:hive2://localhost:1> create database 
dumpdb with ('repl.source.for'='1,2,3');Error: Error while compiling statement: 
FAILED: ParseException line 1:28 missing KW_DBPROPERTIES at '(' near '' 
(state=42000,code=4)0: jdbc:hive2://localhost:1> create database dumpdb 
with dbproperties ('repl.source.for'='1,2,3');ERROR : FAILED: Hive Internal 
Error: org.apache.hadoop.hive.ql.lockmgr.LockException(Error communicating with 
the metastore)org.apache.hadoop.hive.ql.lockmgr.LockException: Error 
communicating with the metastore at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.commitTxn(DbTxnManager.java:541) 
at 
org.apache.hadoop.hive.ql.Driver.releaseLocksAndCommitOrRollback(Driver.java:687)
 at 
org.apache.hadoop.hive.ql.Driver.releaseLocksAndCommitOrRollback(Driver.java:653)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:969)

... stack trace clipped

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)Caused by: MetaException(message:Unable 
to update transaction database org.postgresql.util.PSQLException: ERROR: 
relation "materialization_rebuild_locks" does not exist  Position: 13 at 
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440)
 at 
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183)
 at 
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308) at 
org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441) at 
org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365) at 

This happens because the table names in all the queries in TxnHandler.java 
(including the one at 1312, which causes this stack trace) are not quoting the 
table names. I think we need to go through all these queries and quote the 
tablenames and column names there. Just this change won't suffice.

I have opened HIVE-22663 for the same. I have left the assignee automatic since 
I am not sure who should work on this.

> Postgres schema not using quoted identifiers for certain tables
> ---
>
> Key: HIVE-22546
> URL: https://issues.apache.org/jira/browse/HIVE-22546
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 4.0.0
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22546.2.patch, HIVE-22546.3.patch, 
> HIVE-22546.3.patch, HIVE-22546.4.patch, HIVE-22546.5.patch, 
> HIVE-22546.6.patch, HIVE-22546.patch
>
>
> In the latest postgresql schema 
> (standalone-metastore/metastore-server/src/main/sql/postgres/hive-schema-4.0.0.postgres.sql)
>  the following tables have lowercase table and column names:
> {code:java}
> aux_table 
> compaction_queue 
> completed_compactions 
> completed_txn_components 
> hive_locks 
> materialization_rebuild_locks 
> min_history_level 
> next_compaction_queue_id 
> next_lock_id 
> next_txn_id 
> next_write_id 
> repl_txn_map 
> runtime_stats 
> txn_components 
> txn_to_write_id 
> txns 
> write_set{code}
> As these tables are referenced from the Hive sys database, the queries to 
> these tables will fail with a "Table not found" error.
> The problem is that the table and column names are not enclosed in quotes, so 
> postgres will turn these identifiers into lowercase.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22663) Quote all table and column names or do not quote any

2020-01-08 Thread Ashutosh Bapat (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010541#comment-17010541
 ] 

Ashutosh Bapat commented on HIVE-22663:
---

I looked at the patch. The patch is huge so didn't go into the details. Can you 
please create a PR so that it's easy to review and provide comments on 
particular changes if necessary?

I have only one comment right now. Please handle the table names in a fashion 
similar to MetaStoreDirectSql.java. In this case, we might want to go a step 
further and handle column names in the same fashion.

> Quote all table and column names or do not quote any
> 
>
> Key: HIVE-22663
> URL: https://issues.apache.org/jira/browse/HIVE-22663
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Zoltan Chovan
>Priority: Major
> Attachments: HIVE-22663.patch
>
>
> The change in HIVE-22546 is causing following stack trace when I run Hive 
> with PostgreSQL as backend db for the metastore.
> 0: jdbc:hive2://localhost:1> create database dumpdb with 
> ('repl.source.for'='1,2,3');0: jdbc:hive2://localhost:1> create database 
> dumpdb with ('repl.source.for'='1,2,3');Error: Error while compiling 
> statement: FAILED: ParseException line 1:28 missing KW_DBPROPERTIES at '(' 
> near '' (state=42000,code=4)0: jdbc:hive2://localhost:1> create 
> database dumpdb with dbproperties ('repl.source.for'='1,2,3');ERROR : FAILED: 
> Hive Internal Error: org.apache.hadoop.hive.ql.lockmgr.LockException(Error 
> communicating with the 
> metastore)org.apache.hadoop.hive.ql.lockmgr.LockException: Error 
> communicating with the metastore at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.commitTxn(DbTxnManager.java:541)
>  at 
> org.apache.hadoop.hive.ql.Driver.releaseLocksAndCommitOrRollback(Driver.java:687)
>  at 
> org.apache.hadoop.hive.ql.Driver.releaseLocksAndCommitOrRollback(Driver.java:653)
>  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:969)
> ... stack trace clipped
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)Caused by: 
> MetaException(message:Unable to update transaction database 
> org.postgresql.util.PSQLException: ERROR: relation 
> "materialization_rebuild_locks" does not exist  Position: 13 at 
> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440)
>  at 
> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183)
>  at 
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308) 
> at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441) at 
> org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365) at 
> This happens because the table names in all the queries in TxnHandler.java 
> (including the one at 1312, which causes this stack trace) are not quoting 
> the table names. All the tablenames and column names should be quoted there. 
> Just the change in HIVE-22546 won't suffice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2020-01-08 Thread Ashutosh Bapat (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010595#comment-17010595
 ] 

Ashutosh Bapat commented on HIVE-21213:
---

Reviewed the changes. They look fine to me.

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch, HIVE-21213.04.patch, HIVE-21213.05.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21745) Change in join order causes query parse to fail

2019-05-21 Thread Ashutosh Bapat (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844873#comment-16844873
 ] 

Ashutosh Bapat commented on HIVE-21745:
---

I verified that this is not reproducible on the HEAD. Attached is repro.sql 
which has the same commands as the description plus the version string of hive 
at the beginning. The select commands run without any error. Please see the 
output in repro.out.

> Change in join order causes query parse to fail
> ---
>
> Key: HIVE-21745
> URL: https://issues.apache.org/jira/browse/HIVE-21745
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
>Reporter: Andre Araujo
>Priority: Major
>
> I ran into the following case, where a query fails to parse if the join order 
> is changed:
> {code}
> create database if not exists test;
> drop table if exists test.table1;
> create table test.table1 (
>   id string,
>   col_a string
> )
> stored as textfile;
> drop table if exists test.table2;
> create table test.table2 (
>   id string
> )
> stored as textfile;
> drop table if exists test.table3;
> create table test.table3 (
>   col_a string,
>   col_b string
> )
> stored as textfile;
> drop table if exists test.table4;
> create table test.table4 (
>   id string
> )
> stored as textfile;
> -- This fails with: Invalid table alias or column reference 't3': (possible 
> column names are: id, col_a)
> select
>   1
> from
>   test.table1 as t1
>   left join test.table2 as t2 on t2.id = t1.id
>   left join test.table3 as t3 on t1.col_a = t3.col_a
>   left join test.table4 as t4 on t1.id = t4.id and t3.col_b = 'X'
> ;
> -- This works
> select
>   1
> from
>   test.table1 as t1
>   left join test.table3 as t3 on t1.col_a = t3.col_a
>   left join test.table4 as t4 on t1.id = t4.id and t3.col_b = 'X'
>   left join test.table2 as t2 on t2.id = t1.id
> ;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-21745) Change in join order causes query parse to fail

2019-05-21 Thread Ashutosh Bapat (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844873#comment-16844873
 ] 

Ashutosh Bapat edited comment on HIVE-21745 at 5/21/19 2:17 PM:


I verified that this is not reproducible on the HEAD. Here's the output of the 
commands when run using beeline with silent=true

++
| _c0 |
++
| 4.0.0-SNAPSHOT r86a7eb7730b224f038ff48286cf5d9009ba422c5 |
++
+--+
| _c0 |
+--+
+--+
+--+
| _c0 |
+--+
+--+

Both the select queries do not throw any exception.


was (Author: ashutosh.bapat):
I verified that this is not reproducible on the HEAD. Attached is repro.sql 
which has the same commands as the description plus the version string of hive 
at the beginning. The select commands run without any error. Please see the 
output in repro.out.

> Change in join order causes query parse to fail
> ---
>
> Key: HIVE-21745
> URL: https://issues.apache.org/jira/browse/HIVE-21745
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
>Reporter: Andre Araujo
>Priority: Major
>
> I ran into the following case, where a query fails to parse if the join order 
> is changed:
> {code}
> create database if not exists test;
> drop table if exists test.table1;
> create table test.table1 (
>   id string,
>   col_a string
> )
> stored as textfile;
> drop table if exists test.table2;
> create table test.table2 (
>   id string
> )
> stored as textfile;
> drop table if exists test.table3;
> create table test.table3 (
>   col_a string,
>   col_b string
> )
> stored as textfile;
> drop table if exists test.table4;
> create table test.table4 (
>   id string
> )
> stored as textfile;
> -- This fails with: Invalid table alias or column reference 't3': (possible 
> column names are: id, col_a)
> select
>   1
> from
>   test.table1 as t1
>   left join test.table2 as t2 on t2.id = t1.id
>   left join test.table3 as t3 on t1.col_a = t3.col_a
>   left join test.table4 as t4 on t1.id = t4.id and t3.col_b = 'X'
> ;
> -- This works
> select
>   1
> from
>   test.table1 as t1
>   left join test.table3 as t3 on t1.col_a = t3.col_a
>   left join test.table4 as t4 on t1.id = t4.id and t3.col_b = 'X'
>   left join test.table2 as t2 on t2.id = t1.id
> ;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21776) Add test for incremental replication of a UDF with jar on HDFS

2019-05-22 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat reassigned HIVE-21776:
-


> Add test for incremental replication of a UDF with jar on HDFS
> --
>
> Key: HIVE-21776
> URL: https://issues.apache.org/jira/browse/HIVE-21776
> Project: Hive
>  Issue Type: Test
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
> Fix For: 4.0.0
>
>
> TestReplicationScenariosAcrossInstances has test to test bootstrap of a UDF 
> with jar on HDFS but no test for incremental. Add the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21776) Add test for incremental replication of a UDF with jar on HDFS

2019-05-22 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21776:
--
Attachment: HIVE-21776.01.patch
Status: Patch Available  (was: Open)

Patch with the test. Also a correction in log output of CreateFunctionHandler. 
[~sankarh], can you please review it?

> Add test for incremental replication of a UDF with jar on HDFS
> --
>
> Key: HIVE-21776
> URL: https://issues.apache.org/jira/browse/HIVE-21776
> Project: Hive
>  Issue Type: Test
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21776.01.patch
>
>
> TestReplicationScenariosAcrossInstances has test to test bootstrap of a UDF 
> with jar on HDFS but no test for incremental. Add the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21776) Replication fails to replicate a UDF with jar on HDFS during incremental

2019-05-23 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21776:
--
Summary: Replication fails to replicate a UDF with jar on HDFS during 
incremental  (was: Add test for incremental replication of a UDF with jar on 
HDFS)

> Replication fails to replicate a UDF with jar on HDFS during incremental
> 
>
> Key: HIVE-21776
> URL: https://issues.apache.org/jira/browse/HIVE-21776
> Project: Hive
>  Issue Type: Test
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21776.01.patch
>
>
> TestReplicationScenariosAcrossInstances has test to test bootstrap of a UDF 
> with jar on HDFS but no test for incremental. Add the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21776) Replication fails to replicate a UDF with jar on HDFS during incremental

2019-05-23 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21776:
--
Issue Type: Bug  (was: Test)

> Replication fails to replicate a UDF with jar on HDFS during incremental
> 
>
> Key: HIVE-21776
> URL: https://issues.apache.org/jira/browse/HIVE-21776
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21776.01.patch
>
>
> TestReplicationScenariosAcrossInstances has test to test bootstrap of a UDF 
> with jar on HDFS but no test for incremental. Add the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21776) Replication fails to replicate a UDF with jar on HDFS during incremental

2019-05-23 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21776:
--
Status: In Progress  (was: Patch Available)

> Replication fails to replicate a UDF with jar on HDFS during incremental
> 
>
> Key: HIVE-21776
> URL: https://issues.apache.org/jira/browse/HIVE-21776
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21776.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TestReplicationScenariosAcrossInstances has test to test bootstrap of a UDF 
> with jar on HDFS but no test for incremental. Add the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21776) Replication fails to replicate a UDF with jar on HDFS during incremental

2019-05-23 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21776:
--
Attachment: HIVE-21776.02.patch
Status: Patch Available  (was: In Progress)

Patch has fix for the reported issue and also has fix for checkstyle errors in 
the previous ptest run.

When function is dumped during incremental, ReplChangeManager may not be 
initialized. In such case, JAR URL dumped will not have checksum and cmroot 
appended to it. Hence ReplCopyTask interprets it as normal file instead of 
function JAR resource and doesn't copy it. Hence function creation fails on the 
target. The fix is to initialize ReplChangeManager before dumping the URL.

> Replication fails to replicate a UDF with jar on HDFS during incremental
> 
>
> Key: HIVE-21776
> URL: https://issues.apache.org/jira/browse/HIVE-21776
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21776.01.patch, HIVE-21776.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TestReplicationScenariosAcrossInstances has test to test bootstrap of a UDF 
> with jar on HDFS but no test for incremental. Add the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21776) Replication fails to replicate a UDF with jar on HDFS during incremental

2019-05-23 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21776:
--
Status: In Progress  (was: Patch Available)

> Replication fails to replicate a UDF with jar on HDFS during incremental
> 
>
> Key: HIVE-21776
> URL: https://issues.apache.org/jira/browse/HIVE-21776
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21776.01.patch, HIVE-21776.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TestReplicationScenariosAcrossInstances has test to test bootstrap of a UDF 
> with jar on HDFS but no test for incremental. Add the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21776) Replication fails to replicate a UDF with jar on HDFS during incremental

2019-05-23 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21776:
--
Attachment: HIVE-21776.03.patch
Status: Patch Available  (was: In Progress)

Fixes build failure in the previous ptest run.

> Replication fails to replicate a UDF with jar on HDFS during incremental
> 
>
> Key: HIVE-21776
> URL: https://issues.apache.org/jira/browse/HIVE-21776
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21776.01.patch, HIVE-21776.02.patch, 
> HIVE-21776.03.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TestReplicationScenariosAcrossInstances has test to test bootstrap of a UDF 
> with jar on HDFS but no test for incremental. Add the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21783) Avoid authentication for connection from the same domain

2019-05-23 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat reassigned HIVE-21783:
-


> Avoid authentication for connection from the same domain
> 
>
> Key: HIVE-21783
> URL: https://issues.apache.org/jira/browse/HIVE-21783
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>
> When a connection comes from the same domain do not authenticate the user. 
> This is similar to NONE authentication but only for the connection from the 
> same domain.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21776) Replication fails to replicate a UDF with jar on HDFS during incremental

2019-05-23 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21776:
--
Status: In Progress  (was: Patch Available)

> Replication fails to replicate a UDF with jar on HDFS during incremental
> 
>
> Key: HIVE-21776
> URL: https://issues.apache.org/jira/browse/HIVE-21776
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21776.01.patch, HIVE-21776.02.patch, 
> HIVE-21776.03.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> TestReplicationScenariosAcrossInstances has test to test bootstrap of a UDF 
> with jar on HDFS but no test for incremental. Add the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21776) Replication fails to replicate a UDF with jar on HDFS during incremental

2019-05-23 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21776:
--
Attachment: HIVE-21776.04.patch
Status: Patch Available  (was: In Progress)

> Replication fails to replicate a UDF with jar on HDFS during incremental
> 
>
> Key: HIVE-21776
> URL: https://issues.apache.org/jira/browse/HIVE-21776
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21776.01.patch, HIVE-21776.02.patch, 
> HIVE-21776.03.patch, HIVE-21776.04.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> TestReplicationScenariosAcrossInstances has test to test bootstrap of a UDF 
> with jar on HDFS but no test for incremental. Add the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21776) Replication fails to replicate a UDF with jar on HDFS during incremental

2019-05-24 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21776:
--
Description: When a UDF with jar on HDFS is replicated, we add the jar path 
to the dump. The dumped URL of jar has checksum and cmroot added to it. During 
load, we load the jar on target. ReplCopyTask handles the jar paths separately 
from the paths in _files and it uses the presence of checksum and cmroot for 
that decision. (Those two are not present in _files URL). If ReplChangeManager 
is not initialized during dump, dumped URL of jar does not contain checksum and 
cmroot and thus ReplCopyTask fails to copy the UDF jar to the target. This 
fails the repl load since the function can not be created. Fix is to initialize 
ReplChangeManager always.  (was: TestReplicationScenariosAcrossInstances has 
test to test bootstrap of a UDF with jar on HDFS but no test for incremental. 
Add the same.)

> Replication fails to replicate a UDF with jar on HDFS during incremental
> 
>
> Key: HIVE-21776
> URL: https://issues.apache.org/jira/browse/HIVE-21776
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21776.01.patch, HIVE-21776.02.patch, 
> HIVE-21776.03.patch, HIVE-21776.04.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When a UDF with jar on HDFS is replicated, we add the jar path to the dump. 
> The dumped URL of jar has checksum and cmroot added to it. During load, we 
> load the jar on target. ReplCopyTask handles the jar paths separately from 
> the paths in _files and it uses the presence of checksum and cmroot for that 
> decision. (Those two are not present in _files URL). If ReplChangeManager is 
> not initialized during dump, dumped URL of jar does not contain checksum and 
> cmroot and thus ReplCopyTask fails to copy the UDF jar to the target. This 
> fails the repl load since the function can not be created. Fix is to 
> initialize ReplChangeManager always.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21801) Tests using miniHS2 with HTTP as transport are creating miniHS2 with binary transport

2019-05-29 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat reassigned HIVE-21801:
-


> Tests using miniHS2 with HTTP as transport are creating miniHS2 with binary 
> transport
> -
>
> Key: HIVE-21801
> URL: https://issues.apache.org/jira/browse/HIVE-21801
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>
> Even though tests using miniHS2 set the config hive.server2.transport.mode is 
> set to http, miniHS2 is created with binary transport.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21783) Avoid authentication for connection from the same domain

2019-05-29 Thread Ashutosh Bapat (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16850622#comment-16850622
 ] 

Ashutosh Bapat commented on HIVE-21783:
---

Found HIVE-21801 when testing http mode for HIVE-21783.

> Avoid authentication for connection from the same domain
> 
>
> Key: HIVE-21783
> URL: https://issues.apache.org/jira/browse/HIVE-21783
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When a connection comes from the same domain do not authenticate the user. 
> This is similar to NONE authentication but only for the connection from the 
> same domain.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21801) Tests using miniHS2 with HTTP as transport are creating miniHS2 with binary transport

2019-05-29 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21801:
--
Attachment: HIVE-21801.01.patch
Status: Patch Available  (was: Open)

> Tests using miniHS2 with HTTP as transport are creating miniHS2 with binary 
> transport
> -
>
> Key: HIVE-21801
> URL: https://issues.apache.org/jira/browse/HIVE-21801
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
> Attachments: HIVE-21801.01.patch
>
>
> Even though tests using miniHS2 set the config hive.server2.transport.mode is 
> set to http, miniHS2 is created with binary transport.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21783) Avoid authentication for connection from the same domain

2019-05-30 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21783:
--
Attachment: HIVE-21801.01.patch
Status: Patch Available  (was: Open)

WIP patch. Need to understand how to get the client user name when 
authentication is skipped for the same domain.

> Avoid authentication for connection from the same domain
> 
>
> Key: HIVE-21783
> URL: https://issues.apache.org/jira/browse/HIVE-21783
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21801.01.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When a connection comes from the same domain do not authenticate the user. 
> This is similar to NONE authentication but only for the connection from the 
> same domain.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21783) Avoid authentication for connection from the same domain

2019-05-30 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21783:
--
Status: Open  (was: Patch Available)

> Avoid authentication for connection from the same domain
> 
>
> Key: HIVE-21783
> URL: https://issues.apache.org/jira/browse/HIVE-21783
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21801.01.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When a connection comes from the same domain do not authenticate the user. 
> This is similar to NONE authentication but only for the connection from the 
> same domain.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21783) Avoid authentication for connection from the same domain

2019-05-30 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21783:
--
Attachment: HIVE-21783.01.patch
Status: Patch Available  (was: Open)

> Avoid authentication for connection from the same domain
> 
>
> Key: HIVE-21783
> URL: https://issues.apache.org/jira/browse/HIVE-21783
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21783.01.patch, HIVE-21801.01.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When a connection comes from the same domain do not authenticate the user. 
> This is similar to NONE authentication but only for the connection from the 
> same domain.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21788) Support replication from haddop-2 (hive 3.0 and beelow) on-prem cluster to hadoop-3 (hive 4 and above) cloud cluster

2019-06-03 Thread Ashutosh Bapat (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854709#comment-16854709
 ] 

Ashutosh Bapat commented on HIVE-21788:
---

 
{noformat}
Collection redactedProperties =
- jobConf.getStringCollection(MRJobConfig.MR_JOB_REDACTED_PROPERTIES);
+ jobConf.getStringCollection("mapreduce.job.redacted-properties");
 
 // Hide sensitive configuration values from MR HistoryUI by telling MR to 
redact the following list.
- jobConf.set(MRJobConfig.MR_JOB_REDACTED_PROPERTIES,
+ jobConf.set("mapreduce.job.redacted-properties",
 StringUtils.join(redactedProperties, COMMA));
 }{noformat}
 

Why do we need those changes? Aren't these constants defined when Hadoop-2 is 
used? This comment is
applicable to all the places where this change is repeated.

 
{noformat}
+ if (conf.get("mapreduce.framework.name") != null
+ && conf.get("mapreduce.framework.name").equals("yarn")) {{noformat}
{noformat}
+ jConf.set("yarn.scheduler.capacity.root.queues", "default");
+ jConf.set("yarn.scheduler.capacity.root.default.capacity", "100");
{noformat}
 
{noformat}
+ public int getJobTrackerPort() throws UnsupportedOperationException {
+ String address = conf.get("yarn.resourcemanager.address");{noformat}
 

 
{noformat}
+
+ if (!isLlap) { // Conf for non-llap
+ conf.setBoolean("hive.llap.io.enabled", false);
+ } else { // Conf for llap
+ conf.set("hive.llap.execution.mode", "only");{noformat}
 

 
{noformat}
+ conf.setInt("hive.tez.container.size", 128);{noformat}
Can we use ConfVar or some such static declaration instead of a hard-coded 
constant? This comment is
applicable to all the places where we are using hard-coded strings for config. 
The problem with
hard-coded configs is that if we change the config in future we won't be able 
to catch all the
places where it is used and won't be able to change all such places.

Can you please create PR?

 

> Support replication from haddop-2 (hive 3.0 and beelow) on-prem cluster to 
> hadoop-3 (hive 4 and above) cloud cluster
> 
>
> Key: HIVE-21788
> URL: https://issues.apache.org/jira/browse/HIVE-21788
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2, repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21788.01.patch
>
>
> In case of replication to cloud both dump and load are executed in the source 
> cluster. This push based replication is done to avoid computation at target 
> cloud cluster. In case in the source cluster, strict managed table is not set 
> to true the tables will be non acid. So during replication to a cluster with 
> strict managed table, migration logic same as upgrade tool has to be applied 
> on the replicated data. This migration logic is implemented only in hive4.0. 
> So it's required that a hive 4.0 instance started at the source cluster. If 
> the source cluster has hadoop-2 installation, hive4 has to be built with 
> hadoop-2 and necessary changes are required in the pom files and the shim 
> files.
> 1. Change the pom.xml files to accept a profile for hadoop-2. If hadoop-2 
> profile is set, the hadoop version should be set accordingly to hadoop-2.
> 2. In shim creare a new file for hadoop-2. Based on the profile the 
> respective file will be included in the build.
> 3. Changed artifactId hadoop-hdfs-client to hadoop-client as in hadoop-2 the 
> jars are stored under hadoop-client folder.
>  
>  
> Command to enable hadop-2 dependency  —  mvn clean install package 
> -DskipTests  -Pdist -pl '!standalone-metastore, !llap-common, !llap-client, 
> !llap-ext-client, !llap-tez, !llap-server, !hbase-handler, !service, !hplsql, 
> !kryo-registrator' -Phadoop-2.7
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21783) Avoid authentication for connection from the same domain

2019-06-04 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21783:
--
Status: In Progress  (was: Patch Available)

> Avoid authentication for connection from the same domain
> 
>
> Key: HIVE-21783
> URL: https://issues.apache.org/jira/browse/HIVE-21783
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21783.01.patch, HIVE-21801.01.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> When a connection comes from the same domain do not authenticate the user. 
> This is similar to NONE authentication but only for the connection from the 
> same domain.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21783) Avoid authentication for connection from the same domain

2019-06-04 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21783:
--
Attachment: (was: HIVE-21801.01.patch)

> Avoid authentication for connection from the same domain
> 
>
> Key: HIVE-21783
> URL: https://issues.apache.org/jira/browse/HIVE-21783
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21783.01.patch, HIVE-21783.02.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> When a connection comes from the same domain do not authenticate the user. 
> This is similar to NONE authentication but only for the connection from the 
> same domain.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21783) Avoid authentication for connection from the same domain

2019-06-04 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21783:
--
Attachment: HIVE-21783.02.patch
Status: Patch Available  (was: In Progress)

Patch addressing comments from [~draese] and [~prasanth_j]. Also added tests.

> Avoid authentication for connection from the same domain
> 
>
> Key: HIVE-21783
> URL: https://issues.apache.org/jira/browse/HIVE-21783
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21783.01.patch, HIVE-21783.02.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> When a connection comes from the same domain do not authenticate the user. 
> This is similar to NONE authentication but only for the connection from the 
> same domain.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-21648) Enable TestReplAcidTablesWithJsonMessage and TestReplicationScenariosAcidTables back

2019-06-04 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat resolved HIVE-21648.
---
Resolution: Fixed

> Enable TestReplAcidTablesWithJsonMessage and 
> TestReplicationScenariosAcidTables back
> 
>
> Key: HIVE-21648
> URL: https://issues.apache.org/jira/browse/HIVE-21648
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Jesus Camacho Rodriguez
>Assignee: Ashutosh Bapat
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21783) Avoid authentication for connection from the same domain

2019-06-05 Thread Ashutosh Bapat (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857350#comment-16857350
 ] 

Ashutosh Bapat commented on HIVE-21783:
---

[~prasanth_j], [~odraese], thanks for +1s and approving the patch. I am not a 
committer, so will need someone to commit the patch. Can you please commit it?

> Avoid authentication for connection from the same domain
> 
>
> Key: HIVE-21783
> URL: https://issues.apache.org/jira/browse/HIVE-21783
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21783.01.patch, HIVE-21783.02.patch
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> When a connection comes from the same domain do not authenticate the user. 
> This is similar to NONE authentication but only for the connection from the 
> same domain.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21841) Leader election in HMS to run housekeeping tasks.

2019-06-06 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat reassigned HIVE-21841:
-


> Leader election in HMS to run housekeeping tasks.
> -
>
> Key: HIVE-21841
> URL: https://issues.apache.org/jira/browse/HIVE-21841
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>
> HMS performs housekeeping tasks. When there are multiple HMSes we need to 
> have a leader HMS elected which will carry out those housekeeping tasks. 
> These tasks include execution of compaction tasks, auto-discovering 
> partitions for external tables, generation of compaction tasks, repl thread 
> etc.
> Note that, though the code for compaction tasks, auto-discovery of partitions 
> etc. is in Hive, the actual tasks are initiated by an HMS configured to do 
> so. So, leader election is required only for HMS and not for HS2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21788) Support replication from haddop-2 (hive 3.0 and beelow) on-prem cluster to hadoop-3 (hive 4 and above) cloud cluster

2019-06-07 Thread Ashutosh Bapat (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858416#comment-16858416
 ] 

Ashutosh Bapat commented on HIVE-21788:
---

Ok. Thanks. I don't have any more comments.

> Support replication from haddop-2 (hive 3.0 and beelow) on-prem cluster to 
> hadoop-3 (hive 4 and above) cloud cluster
> 
>
> Key: HIVE-21788
> URL: https://issues.apache.org/jira/browse/HIVE-21788
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2, repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21788.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In case of replication to cloud both dump and load are executed in the source 
> cluster. This push based replication is done to avoid computation at target 
> cloud cluster. In case in the source cluster, strict managed table is not set 
> to true the tables will be non acid. So during replication to a cluster with 
> strict managed table, migration logic same as upgrade tool has to be applied 
> on the replicated data. This migration logic is implemented only in hive4.0. 
> So it's required that a hive 4.0 instance started at the source cluster. If 
> the source cluster has hadoop-2 installation, hive4 has to be built with 
> hadoop-2 and necessary changes are required in the pom files and the shim 
> files.
> 1. Change the pom.xml files to accept a profile for hadoop-2. If hadoop-2 
> profile is set, the hadoop version should be set accordingly to hadoop-2.
> 2. In shim creare a new file for hadoop-2. Based on the profile the 
> respective file will be included in the build.
> 3. Changed artifactId hadoop-hdfs-client to hadoop-client as in hadoop-2 the 
> jars are stored under hadoop-client folder.
>  
>  
> Command to enable hadop-2 dependency  —  mvn clean install package 
> -DskipTests  -Pdist -pl '!standalone-metastore, !llap-common, !llap-client, 
> !llap-ext-client, !llap-tez, !llap-server, !hbase-handler, !service, !hplsql, 
> !kryo-registrator' -Phadoop-2.7
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21841) Leader election in HMS to run housekeeping tasks.

2019-06-12 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21841:
--
Fix Version/s: 4.0.0
Affects Version/s: 4.0.0
   Attachment: HIVE-21841.01.patch
 Target Version/s: 4.0.0
   Status: Patch Available  (was: Open)

In environments like k8S, where the URLs are stable, we can designate an HMS 
running at a given URL as leader to run all housekeeping tasks. The attached 
patch adds a config to specify the leader URL. An HMS which binds to this URL 
starts the housekeeping threads.

> Leader election in HMS to run housekeeping tasks.
> -
>
> Key: HIVE-21841
> URL: https://issues.apache.org/jira/browse/HIVE-21841
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21841.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HMS performs housekeeping tasks. When there are multiple HMSes we need to 
> have a leader HMS elected which will carry out those housekeeping tasks. 
> These tasks include execution of compaction tasks, auto-discovering 
> partitions for external tables, generation of compaction tasks, repl thread 
> etc.
> Note that, though the code for compaction tasks, auto-discovery of partitions 
> etc. is in Hive, the actual tasks are initiated by an HMS configured to do 
> so. So, leader election is required only for HMS and not for HS2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21801) Tests using miniHS2 with HTTP as transport are creating miniHS2 with binary transport

2019-06-12 Thread Ashutosh Bapat (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861910#comment-16861910
 ] 

Ashutosh Bapat commented on HIVE-21801:
---

The fix for this is included in HIVE-21783's latest patch.

> Tests using miniHS2 with HTTP as transport are creating miniHS2 with binary 
> transport
> -
>
> Key: HIVE-21801
> URL: https://issues.apache.org/jira/browse/HIVE-21801
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
> Attachments: HIVE-21801.01.patch
>
>
> Even though tests using miniHS2 set the config hive.server2.transport.mode is 
> set to http, miniHS2 is created with binary transport.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21841) Leader election in HMS to run housekeeping tasks.

2019-06-13 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21841:
--
Attachment: HIVE-21841.02.patch
Status: Patch Available  (was: In Progress)

> Leader election in HMS to run housekeeping tasks.
> -
>
> Key: HIVE-21841
> URL: https://issues.apache.org/jira/browse/HIVE-21841
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21841.01.patch, HIVE-21841.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HMS performs housekeeping tasks. When there are multiple HMSes we need to 
> have a leader HMS elected which will carry out those housekeeping tasks. 
> These tasks include execution of compaction tasks, auto-discovering 
> partitions for external tables, generation of compaction tasks, repl thread 
> etc.
> Note that, though the code for compaction tasks, auto-discovery of partitions 
> etc. is in Hive, the actual tasks are initiated by an HMS configured to do 
> so. So, leader election is required only for HMS and not for HS2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21841) Leader election in HMS to run housekeeping tasks.

2019-06-13 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21841:
--
Status: In Progress  (was: Patch Available)

> Leader election in HMS to run housekeeping tasks.
> -
>
> Key: HIVE-21841
> URL: https://issues.apache.org/jira/browse/HIVE-21841
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21841.01.patch, HIVE-21841.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HMS performs housekeeping tasks. When there are multiple HMSes we need to 
> have a leader HMS elected which will carry out those housekeeping tasks. 
> These tasks include execution of compaction tasks, auto-discovering 
> partitions for external tables, generation of compaction tasks, repl thread 
> etc.
> Note that, though the code for compaction tasks, auto-discovery of partitions 
> etc. is in Hive, the actual tasks are initiated by an HMS configured to do 
> so. So, leader election is required only for HMS and not for HS2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21841) Leader election in HMS to run housekeeping tasks.

2019-06-14 Thread Ashutosh Bapat (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16864079#comment-16864079
 ] 

Ashutosh Bapat commented on HIVE-21841:
---

[~asomani], that's correct. However, each of those HMS is running housekeeping 
threads, which will be avoided by this patch. Further, some housekeeping tasks 
are run periodically and running them in multiple HMS unnecessarily wastes 
resources if the same task is run back-to-back from multiple HMSes. Right now 
this can be avoided by configuring each task/thread separately. Using this 
feature there's only one cofig to set.

> Leader election in HMS to run housekeeping tasks.
> -
>
> Key: HIVE-21841
> URL: https://issues.apache.org/jira/browse/HIVE-21841
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21841.01.patch, HIVE-21841.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HMS performs housekeeping tasks. When there are multiple HMSes we need to 
> have a leader HMS elected which will carry out those housekeeping tasks. 
> These tasks include execution of compaction tasks, auto-discovering 
> partitions for external tables, generation of compaction tasks, repl thread 
> etc.
> Note that, though the code for compaction tasks, auto-discovery of partitions 
> etc. is in Hive, the actual tasks are initiated by an HMS configured to do 
> so. So, leader election is required only for HMS and not for HS2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21841) Leader election in HMS to run housekeeping tasks.

2019-06-17 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21841:
--
Status: In Progress  (was: Patch Available)

> Leader election in HMS to run housekeeping tasks.
> -
>
> Key: HIVE-21841
> URL: https://issues.apache.org/jira/browse/HIVE-21841
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21841.01.patch, HIVE-21841.02.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> HMS performs housekeeping tasks. When there are multiple HMSes we need to 
> have a leader HMS elected which will carry out those housekeeping tasks. 
> These tasks include execution of compaction tasks, auto-discovering 
> partitions for external tables, generation of compaction tasks, repl thread 
> etc.
> Note that, though the code for compaction tasks, auto-discovery of partitions 
> etc. is in Hive, the actual tasks are initiated by an HMS configured to do 
> so. So, leader election is required only for HMS and not for HS2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21841) Leader election in HMS to run housekeeping tasks.

2019-06-17 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21841:
--
Attachment: HIVE-21841.04.patch
Status: Patch Available  (was: In Progress)

Patch with [~maheshk114]'s comments addressed and checkstyle warnings fixed. PR 
updated.

> Leader election in HMS to run housekeeping tasks.
> -
>
> Key: HIVE-21841
> URL: https://issues.apache.org/jira/browse/HIVE-21841
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21841.01.patch, HIVE-21841.02.patch, 
> HIVE-21841.04.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> HMS performs housekeeping tasks. When there are multiple HMSes we need to 
> have a leader HMS elected which will carry out those housekeeping tasks. 
> These tasks include execution of compaction tasks, auto-discovering 
> partitions for external tables, generation of compaction tasks, repl thread 
> etc.
> Note that, though the code for compaction tasks, auto-discovery of partitions 
> etc. is in Hive, the actual tasks are initiated by an HMS configured to do 
> so. So, leader election is required only for HMS and not for HS2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21841) Leader election in HMS to run housekeeping tasks.

2019-06-18 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21841:
--
Status: In Progress  (was: Patch Available)

> Leader election in HMS to run housekeeping tasks.
> -
>
> Key: HIVE-21841
> URL: https://issues.apache.org/jira/browse/HIVE-21841
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21841.01.patch, HIVE-21841.02.patch, 
> HIVE-21841.04.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> HMS performs housekeeping tasks. When there are multiple HMSes we need to 
> have a leader HMS elected which will carry out those housekeeping tasks. 
> These tasks include execution of compaction tasks, auto-discovering 
> partitions for external tables, generation of compaction tasks, repl thread 
> etc.
> Note that, though the code for compaction tasks, auto-discovery of partitions 
> etc. is in Hive, the actual tasks are initiated by an HMS configured to do 
> so. So, leader election is required only for HMS and not for HS2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21841) Leader election in HMS to run housekeeping tasks.

2019-06-18 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21841:
--
Attachment: HIVE-21841.05.patch
Status: Patch Available  (was: In Progress)

> Leader election in HMS to run housekeeping tasks.
> -
>
> Key: HIVE-21841
> URL: https://issues.apache.org/jira/browse/HIVE-21841
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21841.01.patch, HIVE-21841.02.patch, 
> HIVE-21841.04.patch, HIVE-21841.05.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> HMS performs housekeeping tasks. When there are multiple HMSes we need to 
> have a leader HMS elected which will carry out those housekeeping tasks. 
> These tasks include execution of compaction tasks, auto-discovering 
> partitions for external tables, generation of compaction tasks, repl thread 
> etc.
> Note that, though the code for compaction tasks, auto-discovery of partitions 
> etc. is in Hive, the actual tasks are initiated by an HMS configured to do 
> so. So, leader election is required only for HMS and not for HS2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21841) Leader election in HMS to run housekeeping tasks.

2019-06-18 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21841:
--
Status: In Progress  (was: Patch Available)

> Leader election in HMS to run housekeeping tasks.
> -
>
> Key: HIVE-21841
> URL: https://issues.apache.org/jira/browse/HIVE-21841
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21841.01.patch, HIVE-21841.02.patch, 
> HIVE-21841.04.patch, HIVE-21841.05.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> HMS performs housekeeping tasks. When there are multiple HMSes we need to 
> have a leader HMS elected which will carry out those housekeeping tasks. 
> These tasks include execution of compaction tasks, auto-discovering 
> partitions for external tables, generation of compaction tasks, repl thread 
> etc.
> Note that, though the code for compaction tasks, auto-discovery of partitions 
> etc. is in Hive, the actual tasks are initiated by an HMS configured to do 
> so. So, leader election is required only for HMS and not for HS2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21841) Leader election in HMS to run housekeeping tasks.

2019-06-18 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21841:
--
Attachment: HIVE-21841.06.patch
Status: Patch Available  (was: In Progress)

The failed testcases in the last run are passing locally on my laptop. 
Re-attaching the 05 patch again as 06.

> Leader election in HMS to run housekeeping tasks.
> -
>
> Key: HIVE-21841
> URL: https://issues.apache.org/jira/browse/HIVE-21841
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21841.01.patch, HIVE-21841.02.patch, 
> HIVE-21841.04.patch, HIVE-21841.05.patch, HIVE-21841.06.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> HMS performs housekeeping tasks. When there are multiple HMSes we need to 
> have a leader HMS elected which will carry out those housekeeping tasks. 
> These tasks include execution of compaction tasks, auto-discovering 
> partitions for external tables, generation of compaction tasks, repl thread 
> etc.
> Note that, though the code for compaction tasks, auto-discovery of partitions 
> etc. is in Hive, the actual tasks are initiated by an HMS configured to do 
> so. So, leader election is required only for HMS and not for HS2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21841) Leader election in HMS to run housekeeping tasks.

2019-06-18 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21841:
--
Status: In Progress  (was: Patch Available)

> Leader election in HMS to run housekeeping tasks.
> -
>
> Key: HIVE-21841
> URL: https://issues.apache.org/jira/browse/HIVE-21841
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21841.01.patch, HIVE-21841.02.patch, 
> HIVE-21841.04.patch, HIVE-21841.05.patch, HIVE-21841.06.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> HMS performs housekeeping tasks. When there are multiple HMSes we need to 
> have a leader HMS elected which will carry out those housekeeping tasks. 
> These tasks include execution of compaction tasks, auto-discovering 
> partitions for external tables, generation of compaction tasks, repl thread 
> etc.
> Note that, though the code for compaction tasks, auto-discovery of partitions 
> etc. is in Hive, the actual tasks are initiated by an HMS configured to do 
> so. So, leader election is required only for HMS and not for HS2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21841) Leader election in HMS to run housekeeping tasks.

2019-06-18 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21841:
--
Attachment: HIVE-21841.07.patch
Status: Patch Available  (was: In Progress)

Same as 06 to trigger ptest.

> Leader election in HMS to run housekeeping tasks.
> -
>
> Key: HIVE-21841
> URL: https://issues.apache.org/jira/browse/HIVE-21841
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21841.01.patch, HIVE-21841.02.patch, 
> HIVE-21841.04.patch, HIVE-21841.05.patch, HIVE-21841.06.patch, 
> HIVE-21841.07.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> HMS performs housekeeping tasks. When there are multiple HMSes we need to 
> have a leader HMS elected which will carry out those housekeeping tasks. 
> These tasks include execution of compaction tasks, auto-discovering 
> partitions for external tables, generation of compaction tasks, repl thread 
> etc.
> Note that, though the code for compaction tasks, auto-discovery of partitions 
> etc. is in Hive, the actual tasks are initiated by an HMS configured to do 
> so. So, leader election is required only for HMS and not for HS2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21841) Leader election in HMS to run housekeeping tasks.

2019-06-18 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21841:
--
Status: In Progress  (was: Patch Available)

> Leader election in HMS to run housekeeping tasks.
> -
>
> Key: HIVE-21841
> URL: https://issues.apache.org/jira/browse/HIVE-21841
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21841.01.patch, HIVE-21841.02.patch, 
> HIVE-21841.04.patch, HIVE-21841.05.patch, HIVE-21841.06.patch, 
> HIVE-21841.07.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> HMS performs housekeeping tasks. When there are multiple HMSes we need to 
> have a leader HMS elected which will carry out those housekeeping tasks. 
> These tasks include execution of compaction tasks, auto-discovering 
> partitions for external tables, generation of compaction tasks, repl thread 
> etc.
> Note that, though the code for compaction tasks, auto-discovery of partitions 
> etc. is in Hive, the actual tasks are initiated by an HMS configured to do 
> so. So, leader election is required only for HMS and not for HS2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21841) Leader election in HMS to run housekeeping tasks.

2019-06-18 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21841:
--
Attachment: HIVE-21841.08.patch
Status: Patch Available  (was: In Progress)

Fix for ptest failure in the last run.

> Leader election in HMS to run housekeeping tasks.
> -
>
> Key: HIVE-21841
> URL: https://issues.apache.org/jira/browse/HIVE-21841
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21841.01.patch, HIVE-21841.02.patch, 
> HIVE-21841.04.patch, HIVE-21841.05.patch, HIVE-21841.06.patch, 
> HIVE-21841.07.patch, HIVE-21841.08.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> HMS performs housekeeping tasks. When there are multiple HMSes we need to 
> have a leader HMS elected which will carry out those housekeeping tasks. 
> These tasks include execution of compaction tasks, auto-discovering 
> partitions for external tables, generation of compaction tasks, repl thread 
> etc.
> Note that, though the code for compaction tasks, auto-discovery of partitions 
> etc. is in Hive, the actual tasks are initiated by an HMS configured to do 
> so. So, leader election is required only for HMS and not for HS2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21892) Trusted domain authentication should look at X-Forwarded-For header as well

2019-06-19 Thread Ashutosh Bapat (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867463#comment-16867463
 ] 

Ashutosh Bapat commented on HIVE-21892:
---

[~jdere], [~prasanth_j], I have given some cosmetic comments on the PR. I don't 
have any serious concern for this patch. So, once addressing those comments, 
you may commit the patch.

> Trusted domain authentication should look at X-Forwarded-For header as well
> ---
>
> Key: HIVE-21892
> URL: https://issues.apache.org/jira/browse/HIVE-21892
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21892.1.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HIVE-21783 added trusted domain authentication. However, it looks only at 
> request.getRemoteAddr() which works in most cases where there are no 
> intermediate forward/reverse proxies. In trusted domain scenarios, if there 
> intermediate proxies, the proxies typically append its own ip address 
> "X-Forwarded-For" header. The X-Forwarded-For will look like clientIp -> 
> proxyIp1 -> proxyIp2. The left most ip address in the X-Forwarded-For 
> represents the real client ip address. For such scenarios, add a config to 
> optionally look at X-Forwarded-For header when available to determine the 
> real client ip. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21801) Tests using miniHS2 with HTTP as transport are creating miniHS2 with binary transport

2019-06-19 Thread Ashutosh Bapat (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867582#comment-16867582
 ] 

Ashutosh Bapat commented on HIVE-21801:
---

The fix for this included in linked JIRA. Resolving this one as the other is 
resolved.

> Tests using miniHS2 with HTTP as transport are creating miniHS2 with binary 
> transport
> -
>
> Key: HIVE-21801
> URL: https://issues.apache.org/jira/browse/HIVE-21801
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
> Attachments: HIVE-21801.01.patch
>
>
> Even though tests using miniHS2 set the config hive.server2.transport.mode is 
> set to http, miniHS2 is created with binary transport.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21801) Tests using miniHS2 with HTTP as transport are creating miniHS2 with binary transport

2019-06-19 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21801:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Tests using miniHS2 with HTTP as transport are creating miniHS2 with binary 
> transport
> -
>
> Key: HIVE-21801
> URL: https://issues.apache.org/jira/browse/HIVE-21801
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
> Attachments: HIVE-21801.01.patch
>
>
> Even though tests using miniHS2 set the config hive.server2.transport.mode is 
> set to http, miniHS2 is created with binary transport.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21841) Leader election in HMS to run housekeeping tasks.

2019-06-19 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21841:
--
Status: In Progress  (was: Patch Available)

> Leader election in HMS to run housekeeping tasks.
> -
>
> Key: HIVE-21841
> URL: https://issues.apache.org/jira/browse/HIVE-21841
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21841.01.patch, HIVE-21841.02.patch, 
> HIVE-21841.04.patch, HIVE-21841.05.patch, HIVE-21841.06.patch, 
> HIVE-21841.07.patch, HIVE-21841.08.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> HMS performs housekeeping tasks. When there are multiple HMSes we need to 
> have a leader HMS elected which will carry out those housekeeping tasks. 
> These tasks include execution of compaction tasks, auto-discovering 
> partitions for external tables, generation of compaction tasks, repl thread 
> etc.
> Note that, though the code for compaction tasks, auto-discovery of partitions 
> etc. is in Hive, the actual tasks are initiated by an HMS configured to do 
> so. So, leader election is required only for HMS and not for HS2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21841) Leader election in HMS to run housekeeping tasks.

2019-06-19 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21841:
--
Attachment: HIVE-21841.09.patch
Status: Patch Available  (was: In Progress)

Same as .08 but submitted again to trigger ptests. No logs were generated for 
previous run.

> Leader election in HMS to run housekeeping tasks.
> -
>
> Key: HIVE-21841
> URL: https://issues.apache.org/jira/browse/HIVE-21841
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21841.01.patch, HIVE-21841.02.patch, 
> HIVE-21841.04.patch, HIVE-21841.05.patch, HIVE-21841.06.patch, 
> HIVE-21841.07.patch, HIVE-21841.08.patch, HIVE-21841.09.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> HMS performs housekeeping tasks. When there are multiple HMSes we need to 
> have a leader HMS elected which will carry out those housekeeping tasks. 
> These tasks include execution of compaction tasks, auto-discovering 
> partitions for external tables, generation of compaction tasks, repl thread 
> etc.
> Note that, though the code for compaction tasks, auto-discovery of partitions 
> etc. is in Hive, the actual tasks are initiated by an HMS configured to do 
> so. So, leader election is required only for HMS and not for HS2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21880) Enable flaky test TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.

2019-06-24 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21880:
--
Attachment: HIVE-21880.01.patch
Status: Patch Available  (was: Open)

The code in getNextNotification() just checks whether the next event has the 
expected event id. This check may fail when there are multiple events with the 
same event id or when event ids are missing. When the test fails, it fails 
because there multiple events with the same event id.

We use derby database as backing db for metastore. Derby doesn't lock the row 
being selected with FOR UPDATE clause. addNotificationLog() and 
addNotificationEvent(), both functions, rely on the this behaviour to generate 
monotonically increasing sequential event ids. Since the row is not locked, we 
could fetch the same event id multiple times and then increment it to the same 
value multiple times. That can cause the event ids to progress in unreliable 
manner. So for Derby we lock the NOTIFICATION_SEQUENCE table instead of using 
FOR UPDATE.

Note: TxnHandler uses a different behaviour to simulate the effect of FOR 
UPDATE on Derby; it uses a JVM wide mutex for that. TxnHandler is not available 
always esp. when there are no ACID tables involved, so we need to move that 
mutex out of TxnHandler to a place common to DbNotificationListener and 
TxnHandler e.g. SQLGenerater and also have to take care of mutex's reentrant 
behaviour. Furthermore such a mutex wouldn't work when there are metastores are 
running in separate JVMs.

Since the test in Subject is flaky, I have added another test which reliably 
reproduces this behaviour.

> Enable flaky test 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.
> ---
>
> Key: HIVE-21880
> URL: https://issues.apache.org/jira/browse/HIVE-21880
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21880.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Need tp enable 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites
>  which is disabled as it is flaky and randomly failing with below error.
> {code}
> Error Message
> Notification events are missing in the meta store.
> Stacktrace
> java.lang.IllegalStateException: Notification events are missing in the meta 
> store.
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getNextNotification(HiveMetaStoreClient.java:3246)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy58.getNextNotification(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$MSClientNotificationFetcher.getNextNotificationEvents(EventUtils.java:107)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.fetchNextBatch(EventUtils.java:159)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.hasNext(EventUtils.java:189)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:231)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:121)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2709)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2361)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2028)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1788)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1782)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:223)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.run(WarehouseInstance.java:227)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:282)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(Warehou

[jira] [Updated] (HIVE-21880) Enable flaky test TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.

2019-06-27 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21880:
--
Status: In Progress  (was: Patch Available)

> Enable flaky test 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.
> ---
>
> Key: HIVE-21880
> URL: https://issues.apache.org/jira/browse/HIVE-21880
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21880.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Need tp enable 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites
>  which is disabled as it is flaky and randomly failing with below error.
> {code}
> Error Message
> Notification events are missing in the meta store.
> Stacktrace
> java.lang.IllegalStateException: Notification events are missing in the meta 
> store.
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getNextNotification(HiveMetaStoreClient.java:3246)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy58.getNextNotification(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$MSClientNotificationFetcher.getNextNotificationEvents(EventUtils.java:107)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.fetchNextBatch(EventUtils.java:159)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.hasNext(EventUtils.java:189)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:231)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:121)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2709)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2361)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2028)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1788)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1782)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:223)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.run(WarehouseInstance.java:227)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:282)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:265)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:289)
>   at 
> org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites(TestReplicationScenariosAcidTablesBootstrap.java:328)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassR

[jira] [Updated] (HIVE-21880) Enable flaky test TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.

2019-06-27 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21880:
--
Attachment: HIVE-21880.02.patch
Status: Patch Available  (was: In Progress)

The earlier failures indicate that we can not execute LOCK TABLE through 
JDOQuery.execute() interface. Instead use MetaStoreDirectSql.executeNoResult() 
whenever possible.

> Enable flaky test 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.
> ---
>
> Key: HIVE-21880
> URL: https://issues.apache.org/jira/browse/HIVE-21880
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21880.01.patch, HIVE-21880.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Need tp enable 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites
>  which is disabled as it is flaky and randomly failing with below error.
> {code}
> Error Message
> Notification events are missing in the meta store.
> Stacktrace
> java.lang.IllegalStateException: Notification events are missing in the meta 
> store.
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getNextNotification(HiveMetaStoreClient.java:3246)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy58.getNextNotification(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$MSClientNotificationFetcher.getNextNotificationEvents(EventUtils.java:107)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.fetchNextBatch(EventUtils.java:159)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.hasNext(EventUtils.java:189)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:231)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:121)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2709)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2361)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2028)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1788)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1782)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:223)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.run(WarehouseInstance.java:227)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:282)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:265)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:289)
>   at 
> org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites(TestReplicationScenariosAcidTablesBootstrap.java:328)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.

[jira] [Updated] (HIVE-21880) Enable flaky test TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.

2019-06-28 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21880:
--
Status: In Progress  (was: Patch Available)

> Enable flaky test 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.
> ---
>
> Key: HIVE-21880
> URL: https://issues.apache.org/jira/browse/HIVE-21880
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21880.01.patch, HIVE-21880.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Need tp enable 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites
>  which is disabled as it is flaky and randomly failing with below error.
> {code}
> Error Message
> Notification events are missing in the meta store.
> Stacktrace
> java.lang.IllegalStateException: Notification events are missing in the meta 
> store.
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getNextNotification(HiveMetaStoreClient.java:3246)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy58.getNextNotification(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$MSClientNotificationFetcher.getNextNotificationEvents(EventUtils.java:107)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.fetchNextBatch(EventUtils.java:159)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.hasNext(EventUtils.java:189)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:231)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:121)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2709)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2361)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2028)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1788)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1782)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:223)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.run(WarehouseInstance.java:227)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:282)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:265)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:289)
>   at 
> org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites(TestReplicationScenariosAcidTablesBootstrap.java:328)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runn

[jira] [Updated] (HIVE-21880) Enable flaky test TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.

2019-06-28 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21880:
--
Attachment: HIVE-21880.03.patch
Status: Patch Available  (was: In Progress)

The failed tests are passing for me locally. Re-submitting .02 patch as .03 to 
trigger ptests.

> Enable flaky test 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.
> ---
>
> Key: HIVE-21880
> URL: https://issues.apache.org/jira/browse/HIVE-21880
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21880.01.patch, HIVE-21880.02.patch, 
> HIVE-21880.03.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Need tp enable 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites
>  which is disabled as it is flaky and randomly failing with below error.
> {code}
> Error Message
> Notification events are missing in the meta store.
> Stacktrace
> java.lang.IllegalStateException: Notification events are missing in the meta 
> store.
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getNextNotification(HiveMetaStoreClient.java:3246)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy58.getNextNotification(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$MSClientNotificationFetcher.getNextNotificationEvents(EventUtils.java:107)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.fetchNextBatch(EventUtils.java:159)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.hasNext(EventUtils.java:189)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:231)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:121)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2709)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2361)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2028)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1788)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1782)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:223)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.run(WarehouseInstance.java:227)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:282)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:265)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:289)
>   at 
> org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites(TestReplicationScenariosAcidTablesBootstrap.java:328)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>  

[jira] [Updated] (HIVE-21880) Enable flaky test TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.

2019-07-02 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21880:
--
Status: In Progress  (was: Patch Available)

> Enable flaky test 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.
> ---
>
> Key: HIVE-21880
> URL: https://issues.apache.org/jira/browse/HIVE-21880
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21880.01.patch, HIVE-21880.02.patch, 
> HIVE-21880.03.patch
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Need tp enable 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites
>  which is disabled as it is flaky and randomly failing with below error.
> {code}
> Error Message
> Notification events are missing in the meta store.
> Stacktrace
> java.lang.IllegalStateException: Notification events are missing in the meta 
> store.
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getNextNotification(HiveMetaStoreClient.java:3246)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy58.getNextNotification(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$MSClientNotificationFetcher.getNextNotificationEvents(EventUtils.java:107)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.fetchNextBatch(EventUtils.java:159)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.hasNext(EventUtils.java:189)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:231)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:121)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2709)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2361)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2028)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1788)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1782)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:223)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.run(WarehouseInstance.java:227)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:282)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:265)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:289)
>   at 
> org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites(TestReplicationScenariosAcidTablesBootstrap.java:328)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>

[jira] [Updated] (HIVE-21880) Enable flaky test TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.

2019-07-02 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21880:
--
Attachment: HIVE-21880.04.patch
Status: Patch Available  (was: In Progress)

Patch addressing [~maheshk114]'s comments.

> Enable flaky test 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.
> ---
>
> Key: HIVE-21880
> URL: https://issues.apache.org/jira/browse/HIVE-21880
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21880.01.patch, HIVE-21880.02.patch, 
> HIVE-21880.03.patch, HIVE-21880.04.patch
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Need tp enable 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites
>  which is disabled as it is flaky and randomly failing with below error.
> {code}
> Error Message
> Notification events are missing in the meta store.
> Stacktrace
> java.lang.IllegalStateException: Notification events are missing in the meta 
> store.
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getNextNotification(HiveMetaStoreClient.java:3246)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy58.getNextNotification(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$MSClientNotificationFetcher.getNextNotificationEvents(EventUtils.java:107)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.fetchNextBatch(EventUtils.java:159)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.hasNext(EventUtils.java:189)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:231)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:121)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2709)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2361)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2028)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1788)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1782)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:223)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.run(WarehouseInstance.java:227)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:282)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:265)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:289)
>   at 
> org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites(TestReplicationScenariosAcidTablesBootstrap.java:328)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRu

[jira] [Updated] (HIVE-21880) Enable flaky test TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.

2019-07-04 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21880:
--
Status: In Progress  (was: Patch Available)

> Enable flaky test 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.
> ---
>
> Key: HIVE-21880
> URL: https://issues.apache.org/jira/browse/HIVE-21880
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21880.01.patch, HIVE-21880.02.patch, 
> HIVE-21880.03.patch, HIVE-21880.04.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Need tp enable 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites
>  which is disabled as it is flaky and randomly failing with below error.
> {code}
> Error Message
> Notification events are missing in the meta store.
> Stacktrace
> java.lang.IllegalStateException: Notification events are missing in the meta 
> store.
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getNextNotification(HiveMetaStoreClient.java:3246)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy58.getNextNotification(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$MSClientNotificationFetcher.getNextNotificationEvents(EventUtils.java:107)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.fetchNextBatch(EventUtils.java:159)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.hasNext(EventUtils.java:189)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:231)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:121)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2709)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2361)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2028)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1788)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1782)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:223)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.run(WarehouseInstance.java:227)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:282)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:265)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:289)
>   at 
> org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites(TestReplicationScenariosAcidTablesBootstrap.java:328)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentR

[jira] [Updated] (HIVE-21880) Enable flaky test TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.

2019-07-04 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21880:
--
Attachment: HIVE-21880.05.patch
Status: Patch Available  (was: In Progress)

Patch removing the new error code as per [~maheshk114]'s suggestion.

> Enable flaky test 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.
> ---
>
> Key: HIVE-21880
> URL: https://issues.apache.org/jira/browse/HIVE-21880
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21880.01.patch, HIVE-21880.02.patch, 
> HIVE-21880.03.patch, HIVE-21880.04.patch, HIVE-21880.05.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Need tp enable 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites
>  which is disabled as it is flaky and randomly failing with below error.
> {code}
> Error Message
> Notification events are missing in the meta store.
> Stacktrace
> java.lang.IllegalStateException: Notification events are missing in the meta 
> store.
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getNextNotification(HiveMetaStoreClient.java:3246)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy58.getNextNotification(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$MSClientNotificationFetcher.getNextNotificationEvents(EventUtils.java:107)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.fetchNextBatch(EventUtils.java:159)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.hasNext(EventUtils.java:189)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:231)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:121)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2709)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2361)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2028)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1788)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1782)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:223)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.run(WarehouseInstance.java:227)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:282)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:265)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:289)
>   at 
> org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites(TestReplicationScenariosAcidTablesBootstrap.java:328)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatche

[jira] [Updated] (HIVE-21880) Enable flaky test TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.

2019-07-04 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21880:
--
Status: In Progress  (was: Patch Available)

> Enable flaky test 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.
> ---
>
> Key: HIVE-21880
> URL: https://issues.apache.org/jira/browse/HIVE-21880
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21880.01.patch, HIVE-21880.02.patch, 
> HIVE-21880.03.patch, HIVE-21880.04.patch, HIVE-21880.05.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Need tp enable 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites
>  which is disabled as it is flaky and randomly failing with below error.
> {code}
> Error Message
> Notification events are missing in the meta store.
> Stacktrace
> java.lang.IllegalStateException: Notification events are missing in the meta 
> store.
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getNextNotification(HiveMetaStoreClient.java:3246)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy58.getNextNotification(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$MSClientNotificationFetcher.getNextNotificationEvents(EventUtils.java:107)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.fetchNextBatch(EventUtils.java:159)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.hasNext(EventUtils.java:189)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:231)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:121)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2709)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2361)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2028)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1788)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1782)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:223)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.run(WarehouseInstance.java:227)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:282)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:265)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:289)
>   at 
> org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites(TestReplicationScenariosAcidTablesBootstrap.java:328)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentR

[jira] [Updated] (HIVE-21880) Enable flaky test TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.

2019-07-04 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21880:
--
Attachment: HIVE-21880.06.patch
Status: Patch Available  (was: In Progress)

The build isn't failing locally for me. So, re-attaching 05 renamed as 06.

> Enable flaky test 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.
> ---
>
> Key: HIVE-21880
> URL: https://issues.apache.org/jira/browse/HIVE-21880
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21880.01.patch, HIVE-21880.02.patch, 
> HIVE-21880.03.patch, HIVE-21880.04.patch, HIVE-21880.05.patch, 
> HIVE-21880.06.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Need tp enable 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites
>  which is disabled as it is flaky and randomly failing with below error.
> {code}
> Error Message
> Notification events are missing in the meta store.
> Stacktrace
> java.lang.IllegalStateException: Notification events are missing in the meta 
> store.
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getNextNotification(HiveMetaStoreClient.java:3246)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy58.getNextNotification(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$MSClientNotificationFetcher.getNextNotificationEvents(EventUtils.java:107)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.fetchNextBatch(EventUtils.java:159)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.hasNext(EventUtils.java:189)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:231)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:121)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2709)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2361)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2028)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1788)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1782)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:223)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.run(WarehouseInstance.java:227)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:282)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:265)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:289)
>   at 
> org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites(TestReplicationScenariosAcidTablesBootstrap.java:328)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.Tes

[jira] [Commented] (HIVE-21893) Handle concurrent write + drop when ACID tables are getting bootstrapped.

2019-07-05 Thread Ashutosh Bapat (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879131#comment-16879131
 ] 

Ashutosh Bapat commented on HIVE-21893:
---

[~sankarh],  these two issues can happen even in case of normal bootstrap for a 
new policy, not just in case of the one during incremental phase. But anyway 
here’s my analysis of problematic cases.

The key point here is following comment in 
org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask#getValidTxnListForReplDump()

 
{code:java}
// Key design point for REPL DUMP is to not have any txns older than current 
txn in which
// dump runs. This is needed to ensure that Repl dump doesn't copy any data 
files written by
// any open txns mainly for streaming ingest case where one delta file shall 
have data from
// committed/aborted/open txns. It may also have data inconsistency if the 
on-going txns
// doesn't have corresponding open/write events captured which means, catch-up 
incremental
// phase won't be able to replicate those txns. So, the logic is to wait for 
the given amount
// of time to see if all open txns < current txn is getting aborted/committed. 
If not, then
// we forcefully abort those txns just like AcidHouseKeeperService.{code}
 

 Case 1
{quote}If Step-11 happens between Step-1 and Step-2. Also, Step-13 completes 
before we forcefully abort Tx2 from REPL DUMP thread T1. Also, assume Step-14 
is done after bootstrap is completed. In this case, bootstrap would replicate 
the data/writeId written by Tx2. But, the next incremental cycle would also 
replicate the open_txn, allocate_writeid and commit_txn events which would 
duplicate the data.
{quote}
If step-11 happens between step-1 and step-2 that itself can cause multiple 
problems as the open transaction event is replayed twice (once during bootstrap 
and once during next incremental), thus causing writeIds on target going out of 
sync with the source. A better solution would be to combine 
setLastReplIdForDump() and openTransaction() in Driver.compile() for REPL DUMP 
case. We should let openTransaction() return the eventId of the open 
transaction event of the REPL DUMP. This eventId would be set as the 
lastReplIdForDump(). The next incremental dump will start from the events 
following this open transaction event.

With that we will prohibit step 11 from happening between step 1 and step 2. So 
step-11 can happen either after step 2 or before 1.
 # If it happens after 2, it will not be recorded in the snapshot of DUMP and 
thus changes within that transaction will not be replicated during bootstrap. 
The next incremental will replicate the events.

 # If step-11 happens before step-1 and commits before we start the dump, the 
changes by it will be replicated during bootstrap since that transaction will 
be considered as visible to the REPL DUMP transaction. If alloc_writeId event 
is idempotent for a given transaction on source, once the open transaction 
event has been replicated as part of bootstrap, same writeId will be allocated 
however times the alloc_writeId event is replicated, thus keeping the writeIds 
on source and target in sync. Any files written will be marked with the same 
writeId, so copying them multiple times will not duplicate data. So there’s not 
correctness issue there in this case either.

case 2
{quote}If Step-11 to Step-14 in Thread T2 happens after Step-1 in REPL DUMP 
thread T1. In this case, table is not bootstrapped but the corresponding 
open_txn, allocate_writeid, commit_txn and drop events would be replicated in 
next cycle. During next cycle, REPL LOAD would fail on commitTxn event as table 
is dropped or event is missing.
{quote}
If step-11 to step 14 happen before step-1, those will be covered by bootstrap 
itself and they will not appear in the incremental. I think you wanted to say 
that step 14 happens before step 4 thus the table is not bootstrapped, but any 
event after open transaction are part of next incremental.

This case is covered by test 
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTables#testAcidTablesBootstrapWithConcurrentDropTable().

In this case, the ALTER TABLE events created by INSERT operation are converted 
to CreateTable on target and thus at the time of commit it sees the table, 
which is dropped by subsequent drop event. So, no correctness issue here as 
well.

> Handle concurrent write + drop when ACID tables are getting bootstrapped.
> -
>
> Key: HIVE-21893
> URL: https://issues.apache.org/jira/browse/HIVE-21893
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: DR, Replication
>
> ACID tables will be bootstrapped d

[jira] [Assigned] (HIVE-21893) Handle concurrent write + drop when ACID tables are getting bootstrapped.

2019-07-05 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat reassigned HIVE-21893:
-

Assignee: Sankar Hariappan  (was: Ashutosh Bapat)

> Handle concurrent write + drop when ACID tables are getting bootstrapped.
> -
>
> Key: HIVE-21893
> URL: https://issues.apache.org/jira/browse/HIVE-21893
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication
>
> ACID tables will be bootstrapped during incremental phase in couple of cases. 
> 1. hive.repl.bootstrap.acid.tables is set to true in WITH clause of REPL DUMP.
> 2. If replication policy is changed using REPLACE clause in REPL DUMP where 
> the ACID table is matching new policy but not old policy.
> REPL DUMP performed below sequence of operations. Let's say Thread (T1)
> 1. Get Last Repl ID (lastId)
> 2. Open Transaction (Tx1)
> 3. Dump events until lastId.
> 4. Get the list of tables in the given DB.
> 5. If table matches current policy, then bootstrap dump it.
> Let's say, concurrently another thread  (let's say T2) is running as follows.
> 11. Open Transaction (Tx2).
> 12. Insert into ACID table Tbl1.
> 13. Commit Transaction (Tx2)
> 14. Drop table (Tbl1) --> Not necessarily same thread, may be from different 
> thread as well.
> *Problematic Use-cases:*
> 1. If Step-11 happens between Step-1 and Step-2. Also, Step-13 completes 
> before we forcefully abort Tx2 from REPL DUMP thread T1. Also, assume Step-14 
> is done after bootstrap is completed. In this case, bootstrap would replicate 
> the data/writeId written by Tx2. But, the next incremental cycle would also 
> replicate the open_txn, allocate_writeid and commit_txn events which would 
> duplicate the data.
> 2. If Step-11 to Step-14 in Thread T2 happens after Step-1 in REPL DUMP 
> thread T1. In this case, table is not bootstrapped but the corresponding 
> open_txn, allocate_writeid, commit_txn and drop events would be replicated in 
> next cycle. During next cycle, REPL LOAD would fail on commmitTxn event as 
> table is dropped or event is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21960) HMS tasks on replica

2019-07-05 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat reassigned HIVE-21960:
-


> HMS tasks on replica
> 
>
> Key: HIVE-21960
> URL: https://issues.apache.org/jira/browse/HIVE-21960
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, repl
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>
> An HMS performs a number of housekeeping tasks. Assess whether
>  # They are required to be performed in the replicated data
>  # Performing those on replicated data causes any issues and how to fix those.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   >