[jira] [Updated] (HIVE-21079) Replicate column statistics for partitions of partitioned table.

2019-01-29 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21079:

Summary: Replicate column statistics for partitions of partitioned table.  
(was: Replicate column statistics for partitions of partitioned Hive table.)

> Replicate column statistics for partitions of partitioned table.
> 
>
> Key: HIVE-21079
> URL: https://issues.apache.org/jira/browse/HIVE-21079
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21079.01.patch, HIVE-21079.02.patch, 
> HIVE-21079.03.patch, HIVE-21079.04.patch, HIVE-21079.05.patch, 
> HIVE-21079.06.patch
>
>
> This task is for replicating statistics for partitions of a partitioned Hive 
> table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21029) External table replication for existing deployments running incremental replication.

2019-01-29 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21029:

Status: Open  (was: Patch Available)

> External table replication for existing deployments running incremental 
> replication.
> 
>
> Key: HIVE-21029
> URL: https://issues.apache.org/jira/browse/HIVE-21029
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, repl
>Affects Versions: 3.1.1, 3.1.0, 3.0.0
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21029.01.patch
>
>
> Existing deployments using hive replication do not get external tables 
> replicated. For such deployments to enable external table replication they 
> will have to provide a specific switch to first bootstrap external tables as 
> part of hive incremental replication, following which the incremental 
> replication will take care of further changes in external tables.
> The switch will be provided by an additional hive configuration (for ex: 
> hive.repl.bootstrap.external.tables) and is to be used in 
> {code} WITH {code}  clause of 
> {code} REPL DUMP {code} command. 
> Additionally the existing hive config _hive.repl.include.external.tables_  
> will always have to be set to "true" in the above clause. 
> Proposed usage for enabling external tables replication on existing 
> replication policy.
> 1. Consider an ongoing repl policy  in incremental phase.
> Enable hive.repl.include.external.tables=true and 
> hive.repl.bootstrap.external.tables=true in next incremental REPL DUMP.
> - Dumps all events but skips events related to external tables.
> - Instead, combine bootstrap dump for all external tables under “_bootstrap” 
> directory.
> - Also, includes the data locations file "_external_tables_info”.
> - LIMIT or TO clause shouldn’t be there to ensure the latest events are 
> dumped before bootstrap dumping external tables.
> 2. REPL LOAD on this dump applies all the events first, copies external 
> tables data and then bootstrap external tables (metadata).
> - It is possible that the external tables (metadata) are not point-in time 
> consistent with rest of the tables.
> - But, it would be eventually consistent when the next incremental load is 
> applied.
> - This REPL LOAD is fault tolerant and can be retried if failed.
> 3. All future REPL DUMPs on this repl policy should set 
> hive.repl.bootstrap.external.tables=false.
> - If not set to false, then target might end up having inconsistent set of 
> external tables as bootstrap wouldn’t clean-up any dropped external tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21029) External table replication for existing deployments running incremental replication.

2019-01-29 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21029:

Status: Patch Available  (was: Open)

> External table replication for existing deployments running incremental 
> replication.
> 
>
> Key: HIVE-21029
> URL: https://issues.apache.org/jira/browse/HIVE-21029
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, repl
>Affects Versions: 3.1.1, 3.1.0, 3.0.0
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21029.01.patch
>
>
> Existing deployments using hive replication do not get external tables 
> replicated. For such deployments to enable external table replication they 
> will have to provide a specific switch to first bootstrap external tables as 
> part of hive incremental replication, following which the incremental 
> replication will take care of further changes in external tables.
> The switch will be provided by an additional hive configuration (for ex: 
> hive.repl.bootstrap.external.tables) and is to be used in 
> {code} WITH {code}  clause of 
> {code} REPL DUMP {code} command. 
> Additionally the existing hive config _hive.repl.include.external.tables_  
> will always have to be set to "true" in the above clause. 
> Proposed usage for enabling external tables replication on existing 
> replication policy.
> 1. Consider an ongoing repl policy  in incremental phase.
> Enable hive.repl.include.external.tables=true and 
> hive.repl.bootstrap.external.tables=true in next incremental REPL DUMP.
> - Dumps all events but skips events related to external tables.
> - Instead, combine bootstrap dump for all external tables under “_bootstrap” 
> directory.
> - Also, includes the data locations file "_external_tables_info”.
> - LIMIT or TO clause shouldn’t be there to ensure the latest events are 
> dumped before bootstrap dumping external tables.
> 2. REPL LOAD on this dump applies all the events first, copies external 
> tables data and then bootstrap external tables (metadata).
> - It is possible that the external tables (metadata) are not point-in time 
> consistent with rest of the tables.
> - But, it would be eventually consistent when the next incremental load is 
> applied.
> - This REPL LOAD is fault tolerant and can be retried if failed.
> 3. All future REPL DUMPs on this repl policy should set 
> hive.repl.bootstrap.external.tables=false.
> - If not set to false, then target might end up having inconsistent set of 
> external tables as bootstrap wouldn’t clean-up any dropped external tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21029) External table replication for existing deployments running incremental replication.

2019-01-29 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21029:

Attachment: HIVE-21029.01.patch

> External table replication for existing deployments running incremental 
> replication.
> 
>
> Key: HIVE-21029
> URL: https://issues.apache.org/jira/browse/HIVE-21029
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, repl
>Affects Versions: 3.0.0, 3.1.0, 3.1.1
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21029.01.patch
>
>
> Existing deployments using hive replication do not get external tables 
> replicated. For such deployments to enable external table replication they 
> will have to provide a specific switch to first bootstrap external tables as 
> part of hive incremental replication, following which the incremental 
> replication will take care of further changes in external tables.
> The switch will be provided by an additional hive configuration (for ex: 
> hive.repl.bootstrap.external.tables) and is to be used in 
> {code} WITH {code}  clause of 
> {code} REPL DUMP {code} command. 
> Additionally the existing hive config _hive.repl.include.external.tables_  
> will always have to be set to "true" in the above clause. 
> Proposed usage for enabling external tables replication on existing 
> replication policy.
> 1. Consider an ongoing repl policy  in incremental phase.
> Enable hive.repl.include.external.tables=true and 
> hive.repl.bootstrap.external.tables=true in next incremental REPL DUMP.
> - Dumps all events but skips events related to external tables.
> - Instead, combine bootstrap dump for all external tables under “_bootstrap” 
> directory.
> - Also, includes the data locations file "_external_tables_info”.
> - LIMIT or TO clause shouldn’t be there to ensure the latest events are 
> dumped before bootstrap dumping external tables.
> 2. REPL LOAD on this dump applies all the events first, copies external 
> tables data and then bootstrap external tables (metadata).
> - It is possible that the external tables (metadata) are not point-in time 
> consistent with rest of the tables.
> - But, it would be eventually consistent when the next incremental load is 
> applied.
> - This REPL LOAD is fault tolerant and can be retried if failed.
> 3. All future REPL DUMPs on this repl policy should set 
> hive.repl.bootstrap.external.tables=false.
> - If not set to false, then target might end up having inconsistent set of 
> external tables as bootstrap wouldn’t clean-up any dropped external tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21029) External table replication for existing deployments running incremental replication.

2019-01-29 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21029:

Attachment: (was: HIVE-21029.01.patch)

> External table replication for existing deployments running incremental 
> replication.
> 
>
> Key: HIVE-21029
> URL: https://issues.apache.org/jira/browse/HIVE-21029
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, repl
>Affects Versions: 3.0.0, 3.1.0, 3.1.1
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21029.01.patch
>
>
> Existing deployments using hive replication do not get external tables 
> replicated. For such deployments to enable external table replication they 
> will have to provide a specific switch to first bootstrap external tables as 
> part of hive incremental replication, following which the incremental 
> replication will take care of further changes in external tables.
> The switch will be provided by an additional hive configuration (for ex: 
> hive.repl.bootstrap.external.tables) and is to be used in 
> {code} WITH {code}  clause of 
> {code} REPL DUMP {code} command. 
> Additionally the existing hive config _hive.repl.include.external.tables_  
> will always have to be set to "true" in the above clause. 
> Proposed usage for enabling external tables replication on existing 
> replication policy.
> 1. Consider an ongoing repl policy  in incremental phase.
> Enable hive.repl.include.external.tables=true and 
> hive.repl.bootstrap.external.tables=true in next incremental REPL DUMP.
> - Dumps all events but skips events related to external tables.
> - Instead, combine bootstrap dump for all external tables under “_bootstrap” 
> directory.
> - Also, includes the data locations file "_external_tables_info”.
> - LIMIT or TO clause shouldn’t be there to ensure the latest events are 
> dumped before bootstrap dumping external tables.
> 2. REPL LOAD on this dump applies all the events first, copies external 
> tables data and then bootstrap external tables (metadata).
> - It is possible that the external tables (metadata) are not point-in time 
> consistent with rest of the tables.
> - But, it would be eventually consistent when the next incremental load is 
> applied.
> - This REPL LOAD is fault tolerant and can be retried if failed.
> 3. All future REPL DUMPs on this repl policy should set 
> hive.repl.bootstrap.external.tables=false.
> - If not set to false, then target might end up having inconsistent set of 
> external tables as bootstrap wouldn’t clean-up any dropped external tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21029) External table replication for existing deployments running incremental replication.

2019-01-29 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21029:

Status: Open  (was: Patch Available)

> External table replication for existing deployments running incremental 
> replication.
> 
>
> Key: HIVE-21029
> URL: https://issues.apache.org/jira/browse/HIVE-21029
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, repl
>Affects Versions: 3.1.1, 3.1.0, 3.0.0
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21029.01.patch
>
>
> Existing deployments using hive replication do not get external tables 
> replicated. For such deployments to enable external table replication they 
> will have to provide a specific switch to first bootstrap external tables as 
> part of hive incremental replication, following which the incremental 
> replication will take care of further changes in external tables.
> The switch will be provided by an additional hive configuration (for ex: 
> hive.repl.bootstrap.external.tables) and is to be used in 
> {code} WITH {code}  clause of 
> {code} REPL DUMP {code} command. 
> Additionally the existing hive config _hive.repl.include.external.tables_  
> will always have to be set to "true" in the above clause. 
> Proposed usage for enabling external tables replication on existing 
> replication policy.
> 1. Consider an ongoing repl policy  in incremental phase.
> Enable hive.repl.include.external.tables=true and 
> hive.repl.bootstrap.external.tables=true in next incremental REPL DUMP.
> - Dumps all events but skips events related to external tables.
> - Instead, combine bootstrap dump for all external tables under “_bootstrap” 
> directory.
> - Also, includes the data locations file "_external_tables_info”.
> - LIMIT or TO clause shouldn’t be there to ensure the latest events are 
> dumped before bootstrap dumping external tables.
> 2. REPL LOAD on this dump applies all the events first, copies external 
> tables data and then bootstrap external tables (metadata).
> - It is possible that the external tables (metadata) are not point-in time 
> consistent with rest of the tables.
> - But, it would be eventually consistent when the next incremental load is 
> applied.
> - This REPL LOAD is fault tolerant and can be retried if failed.
> 3. All future REPL DUMPs on this repl policy should set 
> hive.repl.bootstrap.external.tables=false.
> - If not set to false, then target might end up having inconsistent set of 
> external tables as bootstrap wouldn’t clean-up any dropped external tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21029) External table replication for existing deployments running incremental replication.

2019-01-29 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21029:

Status: Patch Available  (was: Open)

> External table replication for existing deployments running incremental 
> replication.
> 
>
> Key: HIVE-21029
> URL: https://issues.apache.org/jira/browse/HIVE-21029
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.1, 3.1.0, 3.0.0
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21029.01.patch
>
>
> Existing deployments using hive replication do not get external tables 
> replicated. For such deployments to enable external table replication they 
> will have to provide a specific switch to first bootstrap external tables as 
> part of hive incremental replication, following which the incremental 
> replication will take care of further changes in external tables.
> The switch will be provided by an additional hive configuration (for ex: 
> hive.repl.bootstrap.external.tables) and is to be used in 
> {code} WITH {code}  clause of 
> {code} REPL DUMP {code} command. 
> Additionally the existing hive config _hive.repl.include.external.tables_  
> will always have to be set to "true" in the above clause. 
> Proposed usage for enabling external tables replication on existing 
> replication policy.
> 1. Consider an ongoing repl policy  in incremental phase.
> Enable hive.repl.include.external.tables=true and 
> hive.repl.bootstrap.external.tables=true in next incremental REPL DUMP.
> - Dumps all events but skips events related to external tables.
> - Instead, combine bootstrap dump for all external tables under “_bootstrap” 
> directory.
> - Also, includes the data locations file "_external_tables_info”.
> - LIMIT or TO clause shouldn’t be there to ensure the latest events are 
> dumped before bootstrap dumping external tables.
> 2. REPL LOAD on this dump applies all the events first, copies external 
> tables data and then bootstrap external tables (metadata).
> - It is possible that the external tables (metadata) are not point-in time 
> consistent with rest of the tables.
> - But, it would be eventually consistent when the next incremental load is 
> applied.
> - This REPL LOAD is fault tolerant and can be retried if failed.
> 3. All future REPL DUMPs on this repl policy should set 
> hive.repl.bootstrap.external.tables=false.
> - If not set to false, then target might end up having inconsistent set of 
> external tables as bootstrap wouldn’t clean-up any dropped external tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21029) External table replication for existing deployments running incremental replication.

2019-01-29 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754906#comment-16754906
 ] 

Sankar Hariappan commented on HIVE-21029:
-

[~maheshk114]
Can you please review the patch?

> External table replication for existing deployments running incremental 
> replication.
> 
>
> Key: HIVE-21029
> URL: https://issues.apache.org/jira/browse/HIVE-21029
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, repl
>Affects Versions: 3.0.0, 3.1.0, 3.1.1
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21029.01.patch
>
>
> Existing deployments using hive replication do not get external tables 
> replicated. For such deployments to enable external table replication they 
> will have to provide a specific switch to first bootstrap external tables as 
> part of hive incremental replication, following which the incremental 
> replication will take care of further changes in external tables.
> The switch will be provided by an additional hive configuration (for ex: 
> hive.repl.bootstrap.external.tables) and is to be used in 
> {code} WITH {code}  clause of 
> {code} REPL DUMP {code} command. 
> Additionally the existing hive config _hive.repl.include.external.tables_  
> will always have to be set to "true" in the above clause. 
> Proposed usage for enabling external tables replication on existing 
> replication policy.
> 1. Consider an ongoing repl policy  in incremental phase.
> Enable hive.repl.include.external.tables=true and 
> hive.repl.bootstrap.external.tables=true in next incremental REPL DUMP.
> - Dumps all events but skips events related to external tables.
> - Instead, combine bootstrap dump for all external tables under “_bootstrap” 
> directory.
> - Also, includes the data locations file "_external_tables_info”.
> - LIMIT or TO clause shouldn’t be there to ensure the latest events are 
> dumped before bootstrap dumping external tables.
> 2. REPL LOAD on this dump applies all the events first, copies external 
> tables data and then bootstrap external tables (metadata).
> - It is possible that the external tables (metadata) are not point-in time 
> consistent with rest of the tables.
> - But, it would be eventually consistent when the next incremental load is 
> applied.
> - This REPL LOAD is fault tolerant and can be retried if failed.
> 3. All future REPL DUMPs on this repl policy should set 
> hive.repl.bootstrap.external.tables=false.
> - If not set to false, then target might end up having inconsistent set of 
> external tables as bootstrap wouldn’t clean-up any dropped external tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21029) External table replication for existing deployments running incremental replication.

2019-01-29 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21029:

Component/s: repl

> External table replication for existing deployments running incremental 
> replication.
> 
>
> Key: HIVE-21029
> URL: https://issues.apache.org/jira/browse/HIVE-21029
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, repl
>Affects Versions: 3.0.0, 3.1.0, 3.1.1
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21029.01.patch
>
>
> Existing deployments using hive replication do not get external tables 
> replicated. For such deployments to enable external table replication they 
> will have to provide a specific switch to first bootstrap external tables as 
> part of hive incremental replication, following which the incremental 
> replication will take care of further changes in external tables.
> The switch will be provided by an additional hive configuration (for ex: 
> hive.repl.bootstrap.external.tables) and is to be used in 
> {code} WITH {code}  clause of 
> {code} REPL DUMP {code} command. 
> Additionally the existing hive config _hive.repl.include.external.tables_  
> will always have to be set to "true" in the above clause. 
> Proposed usage for enabling external tables replication on existing 
> replication policy.
> 1. Consider an ongoing repl policy  in incremental phase.
> Enable hive.repl.include.external.tables=true and 
> hive.repl.bootstrap.external.tables=true in next incremental REPL DUMP.
> - Dumps all events but skips events related to external tables.
> - Instead, combine bootstrap dump for all external tables under “_bootstrap” 
> directory.
> - Also, includes the data locations file "_external_tables_info”.
> - LIMIT or TO clause shouldn’t be there to ensure the latest events are 
> dumped before bootstrap dumping external tables.
> 2. REPL LOAD on this dump applies all the events first, copies external 
> tables data and then bootstrap external tables (metadata).
> - It is possible that the external tables (metadata) are not point-in time 
> consistent with rest of the tables.
> - But, it would be eventually consistent when the next incremental load is 
> applied.
> - This REPL LOAD is fault tolerant and can be retried if failed.
> 3. All future REPL DUMPs on this repl policy should set 
> hive.repl.bootstrap.external.tables=false.
> - If not set to false, then target might end up having inconsistent set of 
> external tables as bootstrap wouldn’t clean-up any dropped external tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21029) External table replication for existing deployments running incremental replication.

2019-01-29 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21029:

Attachment: HIVE-21029.01.patch

> External table replication for existing deployments running incremental 
> replication.
> 
>
> Key: HIVE-21029
> URL: https://issues.apache.org/jira/browse/HIVE-21029
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0, 3.1.0, 3.1.1
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21029.01.patch
>
>
> Existing deployments using hive replication do not get external tables 
> replicated. For such deployments to enable external table replication they 
> will have to provide a specific switch to first bootstrap external tables as 
> part of hive incremental replication, following which the incremental 
> replication will take care of further changes in external tables.
> The switch will be provided by an additional hive configuration (for ex: 
> hive.repl.bootstrap.external.tables) and is to be used in 
> {code} WITH {code}  clause of 
> {code} REPL DUMP {code} command. 
> Additionally the existing hive config _hive.repl.include.external.tables_  
> will always have to be set to "true" in the above clause. 
> Proposed usage for enabling external tables replication on existing 
> replication policy.
> 1. Consider an ongoing repl policy  in incremental phase.
> Enable hive.repl.include.external.tables=true and 
> hive.repl.bootstrap.external.tables=true in next incremental REPL DUMP.
> - Dumps all events but skips events related to external tables.
> - Instead, combine bootstrap dump for all external tables under “_bootstrap” 
> directory.
> - Also, includes the data locations file "_external_tables_info”.
> - LIMIT or TO clause shouldn’t be there to ensure the latest events are 
> dumped before bootstrap dumping external tables.
> 2. REPL LOAD on this dump applies all the events first, copies external 
> tables data and then bootstrap external tables (metadata).
> - It is possible that the external tables (metadata) are not point-in time 
> consistent with rest of the tables.
> - But, it would be eventually consistent when the next incremental load is 
> applied.
> - This REPL LOAD is fault tolerant and can be retried if failed.
> 3. All future REPL DUMPs on this repl policy should set 
> hive.repl.bootstrap.external.tables=false.
> - If not set to false, then target might end up having inconsistent set of 
> external tables as bootstrap wouldn’t clean-up any dropped external tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-21079) Replicate column statistics for partitions of partitioned Hive table.

2019-01-29 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754822#comment-16754822
 ] 

Sankar Hariappan edited comment on HIVE-21079 at 1/29/19 10:16 AM:
---

+1 for 05.patch, pending tests


was (Author: sankarh):
+1, pending tests

> Replicate column statistics for partitions of partitioned Hive table.
> -
>
> Key: HIVE-21079
> URL: https://issues.apache.org/jira/browse/HIVE-21079
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21079.01.patch, HIVE-21079.02.patch, 
> HIVE-21079.03.patch, HIVE-21079.04.patch, HIVE-21079.05.patch
>
>
> This task is for replicating statistics for partitions of a partitioned Hive 
> table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21079) Replicate column statistics for partitions of partitioned Hive table.

2019-01-29 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754822#comment-16754822
 ] 

Sankar Hariappan commented on HIVE-21079:
-

+1, pending tests

> Replicate column statistics for partitions of partitioned Hive table.
> -
>
> Key: HIVE-21079
> URL: https://issues.apache.org/jira/browse/HIVE-21079
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21079.01.patch, HIVE-21079.02.patch, 
> HIVE-21079.03.patch, HIVE-21079.04.patch, HIVE-21079.05.patch
>
>
> This task is for replicating statistics for partitions of a partitioned Hive 
> table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21029) External table replication for existing deployments running incremental replication.

2019-01-28 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21029:

Description: 
Existing deployments using hive replication do not get external tables 
replicated. For such deployments to enable external table replication they will 
have to provide a specific switch to first bootstrap external tables as part of 
hive incremental replication, following which the incremental replication will 
take care of further changes in external tables.

The switch will be provided by an additional hive configuration (for ex: 
hive.repl.bootstrap.external.tables) and is to be used in 
{code} WITH {code}  clause of 
{code} REPL DUMP {code} command. 

Additionally the existing hive config _hive.repl.include.external.tables_  will 
always have to be set to "true" in the above clause. 

Proposed usage for enabling external tables replication on existing replication 
policy.
1. Consider an ongoing repl policy  in incremental phase.
Enable hive.repl.include.external.tables=true and 
hive.repl.bootstrap.external.tables=true in next incremental REPL DUMP.
- Dumps all events but skips events related to external tables.
- Instead, combine bootstrap dump for all external tables under “_bootstrap” 
directory.
- Also, includes the data locations file "_external_tables_info”.
- LIMIT or TO clause shouldn’t be there to ensure the latest events are dumped 
before bootstrap dumping external tables.

2. REPL LOAD on this dump applies all the events first, copies external tables 
data and then bootstrap external tables (metadata).
- It is possible that the external tables (metadata) are not point-in time 
consistent with rest of the tables.
- But, it would be eventually consistent when the next incremental load is 
applied.
- This REPL LOAD is fault tolerant and can be retried if failed.

3. All future REPL DUMPs on this repl policy should set 
hive.repl.bootstrap.external.tables=false.
- If not set to false, then target might end up having inconsistent set of 
external tables as bootstrap wouldn’t clean-up any dropped external tables.

  was:
Existing deployments using hive replication do not get external tables 
replicated. For such deployments to enable external table replication they will 
have to provide a specific switch to first bootstrap external tables as part of 
hive incremental replication, following which the incremental replication will 
take care of further changes in external tables.

The switch will be provided by an additional hive configuration (for ex: 
hive.repl.bootstrap.external.tables) and is to be used in 
{code} WITH {code}  clause of 
{code} REPL DUMP {code} command. 

Additionally the existing hive config _hive.repl.include.external.tables_  will 
always have to be set to "true" in the above clause. 

Proposed usage for enabling external tables replication on existing DLM 
replication policy.
1. Consider an ongoing repl policy  in incremental phase.
Enable hive.repl.include.external.tables=true and 
hive.repl.bootstrap.external.tables=true in next incremental REPL DUMP.
- Dumps all events but skips events related to external tables.
- Instead, combine bootstrap dump for all external tables under “_bootstrap” 
directory.
- Also, includes the data locations file "_external_tables_info”.
- LIMIT or TO clause shouldn’t be there to ensure the latest events are dumped 
before bootstrap dumping external tables.

2. REPL LOAD on this dump applies all the events first, copies external tables 
data and then bootstrap external tables (metadata).
- It is possible that the external tables (metadata) are not point-in time 
consistent with rest of the tables.
- But, it would be eventually consistent when the next incremental load is 
applied.
- This REPL LOAD is fault tolerant and can be retried if failed.

3. All future REPL DUMPs on this repl policy should set 
hive.repl.bootstrap.external.tables=false.
- If not set to false, then target might end up having inconsistent set of 
external tables as bootstrap wouldn’t clean-up any dropped external tables.


> External table replication for existing deployments running incremental 
> replication.
> 
>
> Key: HIVE-21029
> URL: https://issues.apache.org/jira/browse/HIVE-21029
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0, 3.1.0, 3.1.1
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Critical
> Fix For: 4.0.0
>
>
> Existing deployments using hive replication do not get external tables 
> replicated. For such deployments to enable external table replication they 
> will have to provide a specific switch to first bootstrap external tables as 
> part of hive incremental replication, following which the in

[jira] [Updated] (HIVE-21029) External table replication for existing deployments running incremental replication.

2019-01-28 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21029:

Description: 
Existing deployments using hive replication do not get external tables 
replicated. For such deployments to enable external table replication they will 
have to provide a specific switch to first bootstrap external tables as part of 
hive incremental replication, following which the incremental replication will 
take care of further changes in external tables.

The switch will be provided by an additional hive configuration (for ex: 
hive.repl.bootstrap.external.tables) and is to be used in 
{code} WITH {code}  clause of 
{code} REPL DUMP {code} command. 

Additionally the existing hive config _hive.repl.include.external.tables_  will 
always have to be set to "true" in the above clause. 

Proposed usage for enabling external tables replication on existing DLM 
replication policy.
1. Consider an ongoing repl policy  in incremental phase.
Enable hive.repl.include.external.tables=true and 
hive.repl.bootstrap.external.tables=true in next incremental REPL DUMP.
- Dumps all events but skips events related to external tables.
- Instead, combine bootstrap dump for all external tables under “_bootstrap” 
directory.
- Also, includes the data locations file "_external_tables_info”.
- LIMIT or TO clause shouldn’t be there to ensure the latest events are dumped 
before bootstrap dumping external tables.

2. REPL LOAD on this dump applies all the events first, copies external tables 
data and then bootstrap external tables (metadata).
- It is possible that the external tables (metadata) are not point-in time 
consistent with rest of the tables.
- But, it would be eventually consistent when the next incremental load is 
applied.
- This REPL LOAD is fault tolerant and can be retried if failed.

3. All future REPL DUMPs on this repl policy should set 
hive.repl.bootstrap.external.tables=false.
- If not set to false, then target might end up having inconsistent set of 
external tables as bootstrap wouldn’t clean-up any dropped external tables.

  was:
Existing deployments using hive replication do not get external tables 
replicated. For such deployments to enable external table replication they will 
have to provide a specific switch to first bootstrap external tables as part of 
hive incremental replication, following which the incremental replication will 
take care of further changes in external tables.

The switch will be provided by an additional hive configuration (for ex: 
hive.repl.bootstrap.external.tables) and is to be used in 
{code} WITH {code}  clause of 
{code} REPL DUMP {code} command. 

Additionally the existing hive config _hive.repl.include.external.tables_  will 
always have to be set to "true" in the above clause. 


> External table replication for existing deployments running incremental 
> replication.
> 
>
> Key: HIVE-21029
> URL: https://issues.apache.org/jira/browse/HIVE-21029
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0, 3.1.0, 3.1.1
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Critical
> Fix For: 4.0.0
>
>
> Existing deployments using hive replication do not get external tables 
> replicated. For such deployments to enable external table replication they 
> will have to provide a specific switch to first bootstrap external tables as 
> part of hive incremental replication, following which the incremental 
> replication will take care of further changes in external tables.
> The switch will be provided by an additional hive configuration (for ex: 
> hive.repl.bootstrap.external.tables) and is to be used in 
> {code} WITH {code}  clause of 
> {code} REPL DUMP {code} command. 
> Additionally the existing hive config _hive.repl.include.external.tables_  
> will always have to be set to "true" in the above clause. 
> Proposed usage for enabling external tables replication on existing DLM 
> replication policy.
> 1. Consider an ongoing repl policy  in incremental phase.
> Enable hive.repl.include.external.tables=true and 
> hive.repl.bootstrap.external.tables=true in next incremental REPL DUMP.
> - Dumps all events but skips events related to external tables.
> - Instead, combine bootstrap dump for all external tables under “_bootstrap” 
> directory.
> - Also, includes the data locations file "_external_tables_info”.
> - LIMIT or TO clause shouldn’t be there to ensure the latest events are 
> dumped before bootstrap dumping external tables.
> 2. REPL LOAD on this dump applies all the events first, copies external 
> tables data and then bootstrap external tables (metadata).
> - It is possible that the external tables (metadata) are not 

[jira] [Commented] (HIVE-21079) Replicate column statistics for partitions of partitioned Hive table.

2019-01-25 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16752160#comment-16752160
 ] 

Sankar Hariappan commented on HIVE-21079:
-

[~ashutosh.bapat]
Posted few comments in the pull request. Please take a look.

> Replicate column statistics for partitions of partitioned Hive table.
> -
>
> Key: HIVE-21079
> URL: https://issues.apache.org/jira/browse/HIVE-21079
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21079.01.patch, HIVE-21079.02.patch
>
>
> This task is for replicating statistics for partitions of a partitioned Hive 
> table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21078) Replicate column and table level statistics for unpartitioned Hive tables

2019-01-23 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21078:

Affects Version/s: 4.0.0

> Replicate column and table level statistics for unpartitioned Hive tables
> -
>
> Key: HIVE-21078
> URL: https://issues.apache.org/jira/browse/HIVE-21078
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21078.01.patch, HIVE-21078.02.patch, 
> HIVE-21078.03.patch, HIVE-21078.04.patch, HIVE-21078.05.patch, 
> HIVE-21078.06.patch, HIVE-21078.07.patch, HIVE-21078.08.patch, 
> HIVE-21078.09.patch, HIVE-21078.10.patch, HIVE-21078.11.patch, 
> HIVE-21078.sameas.05.patch
>
>
> This task is for replicating column and table level statistics for 
> unpartitioned tables.  The same for partitioned tables will be worked upon in 
> a separate sub-task.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21078) Replicate column and table level statistics for unpartitioned Hive tables

2019-01-23 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21078:

Fix Version/s: 4.0.0

> Replicate column and table level statistics for unpartitioned Hive tables
> -
>
> Key: HIVE-21078
> URL: https://issues.apache.org/jira/browse/HIVE-21078
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21078.01.patch, HIVE-21078.02.patch, 
> HIVE-21078.03.patch, HIVE-21078.04.patch, HIVE-21078.05.patch, 
> HIVE-21078.06.patch, HIVE-21078.07.patch, HIVE-21078.08.patch, 
> HIVE-21078.09.patch, HIVE-21078.10.patch, HIVE-21078.11.patch, 
> HIVE-21078.sameas.05.patch
>
>
> This task is for replicating column and table level statistics for 
> unpartitioned tables.  The same for partitioned tables will be worked upon in 
> a separate sub-task.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21078) Replicate column and table level statistics for unpartitioned Hive tables

2019-01-23 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21078:

Component/s: repl

> Replicate column and table level statistics for unpartitioned Hive tables
> -
>
> Key: HIVE-21078
> URL: https://issues.apache.org/jira/browse/HIVE-21078
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21078.01.patch, HIVE-21078.02.patch, 
> HIVE-21078.03.patch, HIVE-21078.04.patch, HIVE-21078.05.patch, 
> HIVE-21078.06.patch, HIVE-21078.07.patch, HIVE-21078.08.patch, 
> HIVE-21078.09.patch, HIVE-21078.10.patch, HIVE-21078.11.patch, 
> HIVE-21078.sameas.05.patch
>
>
> This task is for replicating column and table level statistics for 
> unpartitioned tables.  The same for partitioned tables will be worked upon in 
> a separate sub-task.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21078) Replicate column and table level statistics for unpartitioned Hive tables

2019-01-23 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21078:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Replicate column and table level statistics for unpartitioned Hive tables
> -
>
> Key: HIVE-21078
> URL: https://issues.apache.org/jira/browse/HIVE-21078
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21078.01.patch, HIVE-21078.02.patch, 
> HIVE-21078.03.patch, HIVE-21078.04.patch, HIVE-21078.05.patch, 
> HIVE-21078.06.patch, HIVE-21078.07.patch, HIVE-21078.08.patch, 
> HIVE-21078.09.patch, HIVE-21078.10.patch, HIVE-21078.11.patch, 
> HIVE-21078.sameas.05.patch
>
>
> This task is for replicating column and table level statistics for 
> unpartitioned tables.  The same for partitioned tables will be worked upon in 
> a separate sub-task.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21078) Replicate column and table level statistics for unpartitioned Hive tables

2019-01-23 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749807#comment-16749807
 ] 

Sankar Hariappan commented on HIVE-21078:
-

11.patch committed to master.
Thanks [~ashutosh.bapat] for the contribution!

> Replicate column and table level statistics for unpartitioned Hive tables
> -
>
> Key: HIVE-21078
> URL: https://issues.apache.org/jira/browse/HIVE-21078
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21078.01.patch, HIVE-21078.02.patch, 
> HIVE-21078.03.patch, HIVE-21078.04.patch, HIVE-21078.05.patch, 
> HIVE-21078.06.patch, HIVE-21078.07.patch, HIVE-21078.08.patch, 
> HIVE-21078.09.patch, HIVE-21078.10.patch, HIVE-21078.11.patch, 
> HIVE-21078.sameas.05.patch
>
>
> This task is for replicating column and table level statistics for 
> unpartitioned tables.  The same for partitioned tables will be worked upon in 
> a separate sub-task.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21078) Replicate column and table level statistics for unpartitioned Hive tables

2019-01-22 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16748618#comment-16748618
 ] 

Sankar Hariappan commented on HIVE-21078:
-

+1
Posted few trivial comments. Please take a look.

> Replicate column and table level statistics for unpartitioned Hive tables
> -
>
> Key: HIVE-21078
> URL: https://issues.apache.org/jira/browse/HIVE-21078
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21078.01.patch, HIVE-21078.02.patch, 
> HIVE-21078.03.patch, HIVE-21078.04.patch, HIVE-21078.05.patch, 
> HIVE-21078.06.patch, HIVE-21078.07.patch, HIVE-21078.08.patch, 
> HIVE-21078.09.patch, HIVE-21078.sameas.05.patch
>
>
> This task is for replicating column and table level statistics for 
> unpartitioned tables.  The same for partitioned tables will be worked upon in 
> a separate sub-task.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21103) PartitionManagementTask should not modify DN configs to avoid closing persistence manager

2019-01-08 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16737863#comment-16737863
 ] 

Sankar Hariappan commented on HIVE-21103:
-

+1

> PartitionManagementTask should not modify DN configs to avoid closing 
> persistence manager
> -
>
> Key: HIVE-21103
> URL: https://issues.apache.org/jira/browse/HIVE-21103
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-21103.1.patch
>
>
> HIVE-20707 added automatic partition management which uses thread pools to 
> run parallel msck repair. It also modifies datanucleus connection pool size 
> to avoid explosion of connections to backend database. But object store 
> closes the persistence manager when it detects a change in datanuclues or jdo 
> configs. So when PartitionManagementTask is running and when HS2 tries to 
> connect to metastore HS2 will get persistence manager close exception. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21078) Replicate column and table level statistics for unpartitioned Hive tables

2019-01-03 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733483#comment-16733483
 ] 

Sankar Hariappan commented on HIVE-21078:
-

[~ashutosh.bapat]
I posted my comments in the PR link. Please take a look.
Thanks!

> Replicate column and table level statistics for unpartitioned Hive tables
> -
>
> Key: HIVE-21078
> URL: https://issues.apache.org/jira/browse/HIVE-21078
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21078.01.patch
>
>
> This task is for replicating column and table level statistics for 
> unpartitioned tables.  The same for partitioned tables will be worked upon in 
> a separate sub-task.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20911) External Table Replication for Hive

2019-01-02 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16731845#comment-16731845
 ] 

Sankar Hariappan commented on HIVE-20911:
-

+1, pending tests for 08.patch

> External Table Replication for Hive
> ---
>
> Key: HIVE-20911
> URL: https://issues.apache.org/jira/browse/HIVE-20911
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: anishek
>Assignee: anishek
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, 
> HIVE-20911.03.patch, HIVE-20911.04.patch, HIVE-20911.05.patch, 
> HIVE-20911.06.patch, HIVE-20911.07.patch, HIVE-20911.07.patch, 
> HIVE-20911.08.patch
>
>
> External tables are not replicated currently as part of hive replication. As 
> part of this jira we want to enable that.
> Approach:
> * Target cluster will have a top level base directory config that will be 
> used to copy all data relevant to external tables. This will be provided via 
> the *with* clause in the *repl load* command. This base path will be prefixed 
> to the path of the same external table on source cluster. This can be 
> provided using the following configuration:
> {code}
> hive.repl.replica.external.table.base.dir=/
> {code}
> * Since changes to directories on the external table can happen without hive 
> knowing it, hence we cant capture the relevant events when ever new data is 
> added or removed, we will have to copy the data from the source path to 
> target path for external tables every time we run incremental replication.
> ** this will require incremental *repl dump*  to now create an additional 
> file *\_external\_tables\_info* with data in the following form 
> {code}
> tableName,base64Encoded(tableDataLocation)
> {code}
> In case there are different partitions in the table pointing to different 
> locations there will be multiple entries in the file for the same table name 
> with location pointing to different partition locations. For partitions 
> created in a table without specifying the _set location_ command will be 
> within the same table Data location and hence there will not be different 
> entries in the file above 
> ** *repl load* will read the  *\_external\_tables\_info* to identify what 
> locations are to be copied from source to target and create corresponding 
> tasks for them.
> * New External tables will be created with metadata only with no data copied 
> as part of regular tasks while incremental load/bootstrap load.
> * Bootstrap dump will also create  *\_external\_tables\_info* which will be 
> used to copy data from source to target  as part of boostrap load.
> * Bootstrap load will create a DAG, that can use parallelism in the execution 
> phase, the hdfs copy related tasks are created, once the bootstrap phase is 
> complete.
> * Since incremental load results in a DAG with only sequential execution ( 
> events applied in sequence ) to effectively use the parallelism capability in 
> execution mode, we create tasks for hdfs copy along with the incremental DAG. 
> This requires a few basic calculations to approximately meet the configured 
> value in  "hive.repl.approx.max.load.tasks" 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20911) External Table Replication for Hive

2018-12-21 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726672#comment-16726672
 ] 

Sankar Hariappan commented on HIVE-20911:
-

[~anishek]
I posted few comments in the PR link. Please take a look. 
Thanks!

> External Table Replication for Hive
> ---
>
> Key: HIVE-20911
> URL: https://issues.apache.org/jira/browse/HIVE-20911
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: anishek
>Assignee: anishek
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, 
> HIVE-20911.03.patch, HIVE-20911.04.patch, HIVE-20911.05.patch, 
> HIVE-20911.06.patch, HIVE-20911.07.patch, HIVE-20911.07.patch
>
>
> External tables are not replicated currently as part of hive replication. As 
> part of this jira we want to enable that.
> Approach:
> * Target cluster will have a top level base directory config that will be 
> used to copy all data relevant to external tables. This will be provided via 
> the *with* clause in the *repl load* command. This base path will be prefixed 
> to the path of the same external table on source cluster. This can be 
> provided using the following configuration:
> {code}
> hive.repl.replica.external.table.base.dir=/
> {code}
> * Since changes to directories on the external table can happen without hive 
> knowing it, hence we cant capture the relevant events when ever new data is 
> added or removed, we will have to copy the data from the source path to 
> target path for external tables every time we run incremental replication.
> ** this will require incremental *repl dump*  to now create an additional 
> file *\_external\_tables\_info* with data in the following form 
> {code}
> tableName,base64Encoded(tableDataLocation)
> {code}
> In case there are different partitions in the table pointing to different 
> locations there will be multiple entries in the file for the same table name 
> with location pointing to different partition locations. For partitions 
> created in a table without specifying the _set location_ command will be 
> within the same table Data location and hence there will not be different 
> entries in the file above 
> ** *repl load* will read the  *\_external\_tables\_info* to identify what 
> locations are to be copied from source to target and create corresponding 
> tasks for them.
> * New External tables will be created with metadata only with no data copied 
> as part of regular tasks while incremental load/bootstrap load.
> * Bootstrap dump will also create  *\_external\_tables\_info* which will be 
> used to copy data from source to target  as part of boostrap load.
> * Bootstrap load will create a DAG, that can use parallelism in the execution 
> phase, the hdfs copy related tasks are created, once the bootstrap phase is 
> complete.
> * Since incremental load results in a DAG with only sequential execution ( 
> events applied in sequence ) to effectively use the parallelism capability in 
> execution mode, we create tasks for hdfs copy along with the incremental DAG. 
> This requires a few basic calculations to approximately meet the configured 
> value in  "hive.repl.approx.max.load.tasks" 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20989) JDBC - The GetOperationStatus + log can block query progress via sleep()

2018-12-21 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20989:

   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

Thanks [~anishek] for the review!
Committed to master!

> JDBC - The GetOperationStatus + log can block query progress via sleep()
> 
>
> Key: HIVE-20989
> URL: https://issues.apache.org/jira/browse/HIVE-20989
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Sankar Hariappan
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20989.01.patch
>
>
> There is an exponential sleep operation inside the CLIService which can end 
> up adding tens of seconds to a query which has already completed.
> {code}
> "HiveServer2-Handler-Pool: Thread-9373" #9373 prio=5 os_prio=0 
> tid=0x7f4d5e72d800 nid=0xb634a waiting on condition [0x7f28d06a5000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hive.service.cli.CLIService.progressUpdateLog(CLIService.java:506)
> at 
> org.apache.hive.service.cli.CLIService.getOperationStatus(CLIService.java:480)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetOperationStatus(ThriftCLIService.java:695)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1757)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1742)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The sleep loop is on the server side.
> {code}
> private static final long PROGRESS_MAX_WAIT_NS = 30 * 10l;
> private JobProgressUpdate progressUpdateLog(boolean isProgressLogRequested, 
> Operation operation, HiveConf conf) {
> ...
> long startTime = System.nanoTime();
> int timeOutMs = 8;
> try {
>   while (sessionState.getProgressMonitor() == null && 
> !operation.isDone()) {
> long remainingMs = (PROGRESS_MAX_WAIT_NS - (System.nanoTime() - 
> startTime)) / 100l;
> if (remainingMs <= 0) {
>   LOG.debug("timed out and hence returning progress log as NULL");
>   return new JobProgressUpdate(ProgressMonitor.NULL);
> }
> Thread.sleep(Math.min(remainingMs, timeOutMs));
> timeOutMs <<= 1;
>   }
> {code}
> After about 16 seconds of execution of the query, the timeOutMs is 16384 ms, 
> which means the next sleep cycle is for min(30 - 17, 16) = 13.
> If the query finishes on the 17th second, the JDBC server will only respond 
> after the 30th second when it will check for operation.isDone() and return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20989) JDBC - The GetOperationStatus + log can block query progress via sleep()

2018-12-20 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725994#comment-16725994
 ] 

Sankar Hariappan commented on HIVE-20989:
-

[~gopalv] 
I fixed it with CountDownLatch. Can you please review?

> JDBC - The GetOperationStatus + log can block query progress via sleep()
> 
>
> Key: HIVE-20989
> URL: https://issues.apache.org/jira/browse/HIVE-20989
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Sankar Hariappan
>Priority: Major
> Attachments: HIVE-20989.01.patch
>
>
> There is an exponential sleep operation inside the CLIService which can end 
> up adding tens of seconds to a query which has already completed.
> {code}
> "HiveServer2-Handler-Pool: Thread-9373" #9373 prio=5 os_prio=0 
> tid=0x7f4d5e72d800 nid=0xb634a waiting on condition [0x7f28d06a5000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hive.service.cli.CLIService.progressUpdateLog(CLIService.java:506)
> at 
> org.apache.hive.service.cli.CLIService.getOperationStatus(CLIService.java:480)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetOperationStatus(ThriftCLIService.java:695)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1757)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1742)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The sleep loop is on the server side.
> {code}
> private static final long PROGRESS_MAX_WAIT_NS = 30 * 10l;
> private JobProgressUpdate progressUpdateLog(boolean isProgressLogRequested, 
> Operation operation, HiveConf conf) {
> ...
> long startTime = System.nanoTime();
> int timeOutMs = 8;
> try {
>   while (sessionState.getProgressMonitor() == null && 
> !operation.isDone()) {
> long remainingMs = (PROGRESS_MAX_WAIT_NS - (System.nanoTime() - 
> startTime)) / 100l;
> if (remainingMs <= 0) {
>   LOG.debug("timed out and hence returning progress log as NULL");
>   return new JobProgressUpdate(ProgressMonitor.NULL);
> }
> Thread.sleep(Math.min(remainingMs, timeOutMs));
> timeOutMs <<= 1;
>   }
> {code}
> After about 16 seconds of execution of the query, the timeOutMs is 16384 ms, 
> which means the next sleep cycle is for min(30 - 17, 16) = 13.
> If the query finishes on the 17th second, the JDBC server will only respond 
> after the 30th second when it will check for operation.isDone() and return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20989) JDBC - The GetOperationStatus + log can block query progress via sleep()

2018-12-20 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20989:

Status: Patch Available  (was: Open)

> JDBC - The GetOperationStatus + log can block query progress via sleep()
> 
>
> Key: HIVE-20989
> URL: https://issues.apache.org/jira/browse/HIVE-20989
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Sankar Hariappan
>Priority: Major
> Attachments: HIVE-20989.01.patch
>
>
> There is an exponential sleep operation inside the CLIService which can end 
> up adding tens of seconds to a query which has already completed.
> {code}
> "HiveServer2-Handler-Pool: Thread-9373" #9373 prio=5 os_prio=0 
> tid=0x7f4d5e72d800 nid=0xb634a waiting on condition [0x7f28d06a5000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hive.service.cli.CLIService.progressUpdateLog(CLIService.java:506)
> at 
> org.apache.hive.service.cli.CLIService.getOperationStatus(CLIService.java:480)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetOperationStatus(ThriftCLIService.java:695)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1757)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1742)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The sleep loop is on the server side.
> {code}
> private static final long PROGRESS_MAX_WAIT_NS = 30 * 10l;
> private JobProgressUpdate progressUpdateLog(boolean isProgressLogRequested, 
> Operation operation, HiveConf conf) {
> ...
> long startTime = System.nanoTime();
> int timeOutMs = 8;
> try {
>   while (sessionState.getProgressMonitor() == null && 
> !operation.isDone()) {
> long remainingMs = (PROGRESS_MAX_WAIT_NS - (System.nanoTime() - 
> startTime)) / 100l;
> if (remainingMs <= 0) {
>   LOG.debug("timed out and hence returning progress log as NULL");
>   return new JobProgressUpdate(ProgressMonitor.NULL);
> }
> Thread.sleep(Math.min(remainingMs, timeOutMs));
> timeOutMs <<= 1;
>   }
> {code}
> After about 16 seconds of execution of the query, the timeOutMs is 16384 ms, 
> which means the next sleep cycle is for min(30 - 17, 16) = 13.
> If the query finishes on the 17th second, the JDBC server will only respond 
> after the 30th second when it will check for operation.isDone() and return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20989) JDBC - The GetOperationStatus + log can block query progress via sleep()

2018-12-20 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20989:

Attachment: HIVE-20989.01.patch

> JDBC - The GetOperationStatus + log can block query progress via sleep()
> 
>
> Key: HIVE-20989
> URL: https://issues.apache.org/jira/browse/HIVE-20989
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Sankar Hariappan
>Priority: Major
> Attachments: HIVE-20989.01.patch
>
>
> There is an exponential sleep operation inside the CLIService which can end 
> up adding tens of seconds to a query which has already completed.
> {code}
> "HiveServer2-Handler-Pool: Thread-9373" #9373 prio=5 os_prio=0 
> tid=0x7f4d5e72d800 nid=0xb634a waiting on condition [0x7f28d06a5000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hive.service.cli.CLIService.progressUpdateLog(CLIService.java:506)
> at 
> org.apache.hive.service.cli.CLIService.getOperationStatus(CLIService.java:480)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetOperationStatus(ThriftCLIService.java:695)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1757)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1742)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The sleep loop is on the server side.
> {code}
> private static final long PROGRESS_MAX_WAIT_NS = 30 * 10l;
> private JobProgressUpdate progressUpdateLog(boolean isProgressLogRequested, 
> Operation operation, HiveConf conf) {
> ...
> long startTime = System.nanoTime();
> int timeOutMs = 8;
> try {
>   while (sessionState.getProgressMonitor() == null && 
> !operation.isDone()) {
> long remainingMs = (PROGRESS_MAX_WAIT_NS - (System.nanoTime() - 
> startTime)) / 100l;
> if (remainingMs <= 0) {
>   LOG.debug("timed out and hence returning progress log as NULL");
>   return new JobProgressUpdate(ProgressMonitor.NULL);
> }
> Thread.sleep(Math.min(remainingMs, timeOutMs));
> timeOutMs <<= 1;
>   }
> {code}
> After about 16 seconds of execution of the query, the timeOutMs is 16384 ms, 
> which means the next sleep cycle is for min(30 - 17, 16) = 13.
> If the query finishes on the 17th second, the JDBC server will only respond 
> after the 30th second when it will check for operation.isDone() and return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20989) JDBC - The GetOperationStatus + log can block query progress via sleep()

2018-12-20 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20989:

Attachment: (was: HIVE-20989.01.patch)

> JDBC - The GetOperationStatus + log can block query progress via sleep()
> 
>
> Key: HIVE-20989
> URL: https://issues.apache.org/jira/browse/HIVE-20989
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Sankar Hariappan
>Priority: Major
> Attachments: HIVE-20989.01.patch
>
>
> There is an exponential sleep operation inside the CLIService which can end 
> up adding tens of seconds to a query which has already completed.
> {code}
> "HiveServer2-Handler-Pool: Thread-9373" #9373 prio=5 os_prio=0 
> tid=0x7f4d5e72d800 nid=0xb634a waiting on condition [0x7f28d06a5000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hive.service.cli.CLIService.progressUpdateLog(CLIService.java:506)
> at 
> org.apache.hive.service.cli.CLIService.getOperationStatus(CLIService.java:480)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetOperationStatus(ThriftCLIService.java:695)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1757)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1742)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The sleep loop is on the server side.
> {code}
> private static final long PROGRESS_MAX_WAIT_NS = 30 * 10l;
> private JobProgressUpdate progressUpdateLog(boolean isProgressLogRequested, 
> Operation operation, HiveConf conf) {
> ...
> long startTime = System.nanoTime();
> int timeOutMs = 8;
> try {
>   while (sessionState.getProgressMonitor() == null && 
> !operation.isDone()) {
> long remainingMs = (PROGRESS_MAX_WAIT_NS - (System.nanoTime() - 
> startTime)) / 100l;
> if (remainingMs <= 0) {
>   LOG.debug("timed out and hence returning progress log as NULL");
>   return new JobProgressUpdate(ProgressMonitor.NULL);
> }
> Thread.sleep(Math.min(remainingMs, timeOutMs));
> timeOutMs <<= 1;
>   }
> {code}
> After about 16 seconds of execution of the query, the timeOutMs is 16384 ms, 
> which means the next sleep cycle is for min(30 - 17, 16) = 13.
> If the query finishes on the 17th second, the JDBC server will only respond 
> after the 30th second when it will check for operation.isDone() and return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20989) JDBC - The GetOperationStatus + log can block query progress via sleep()

2018-12-20 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20989:

Status: Open  (was: Patch Available)

> JDBC - The GetOperationStatus + log can block query progress via sleep()
> 
>
> Key: HIVE-20989
> URL: https://issues.apache.org/jira/browse/HIVE-20989
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Sankar Hariappan
>Priority: Major
> Attachments: HIVE-20989.01.patch
>
>
> There is an exponential sleep operation inside the CLIService which can end 
> up adding tens of seconds to a query which has already completed.
> {code}
> "HiveServer2-Handler-Pool: Thread-9373" #9373 prio=5 os_prio=0 
> tid=0x7f4d5e72d800 nid=0xb634a waiting on condition [0x7f28d06a5000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hive.service.cli.CLIService.progressUpdateLog(CLIService.java:506)
> at 
> org.apache.hive.service.cli.CLIService.getOperationStatus(CLIService.java:480)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetOperationStatus(ThriftCLIService.java:695)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1757)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1742)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The sleep loop is on the server side.
> {code}
> private static final long PROGRESS_MAX_WAIT_NS = 30 * 10l;
> private JobProgressUpdate progressUpdateLog(boolean isProgressLogRequested, 
> Operation operation, HiveConf conf) {
> ...
> long startTime = System.nanoTime();
> int timeOutMs = 8;
> try {
>   while (sessionState.getProgressMonitor() == null && 
> !operation.isDone()) {
> long remainingMs = (PROGRESS_MAX_WAIT_NS - (System.nanoTime() - 
> startTime)) / 100l;
> if (remainingMs <= 0) {
>   LOG.debug("timed out and hence returning progress log as NULL");
>   return new JobProgressUpdate(ProgressMonitor.NULL);
> }
> Thread.sleep(Math.min(remainingMs, timeOutMs));
> timeOutMs <<= 1;
>   }
> {code}
> After about 16 seconds of execution of the query, the timeOutMs is 16384 ms, 
> which means the next sleep cycle is for min(30 - 17, 16) = 13.
> If the query finishes on the 17th second, the JDBC server will only respond 
> after the 30th second when it will check for operation.isDone() and return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20989) JDBC - The GetOperationStatus + log can block query progress via sleep()

2018-12-19 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20989:

Status: Open  (was: Patch Available)

> JDBC - The GetOperationStatus + log can block query progress via sleep()
> 
>
> Key: HIVE-20989
> URL: https://issues.apache.org/jira/browse/HIVE-20989
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Sankar Hariappan
>Priority: Major
> Attachments: HIVE-20989.01.patch
>
>
> There is an exponential sleep operation inside the CLIService which can end 
> up adding tens of seconds to a query which has already completed.
> {code}
> "HiveServer2-Handler-Pool: Thread-9373" #9373 prio=5 os_prio=0 
> tid=0x7f4d5e72d800 nid=0xb634a waiting on condition [0x7f28d06a5000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hive.service.cli.CLIService.progressUpdateLog(CLIService.java:506)
> at 
> org.apache.hive.service.cli.CLIService.getOperationStatus(CLIService.java:480)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetOperationStatus(ThriftCLIService.java:695)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1757)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1742)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The sleep loop is on the server side.
> {code}
> private static final long PROGRESS_MAX_WAIT_NS = 30 * 10l;
> private JobProgressUpdate progressUpdateLog(boolean isProgressLogRequested, 
> Operation operation, HiveConf conf) {
> ...
> long startTime = System.nanoTime();
> int timeOutMs = 8;
> try {
>   while (sessionState.getProgressMonitor() == null && 
> !operation.isDone()) {
> long remainingMs = (PROGRESS_MAX_WAIT_NS - (System.nanoTime() - 
> startTime)) / 100l;
> if (remainingMs <= 0) {
>   LOG.debug("timed out and hence returning progress log as NULL");
>   return new JobProgressUpdate(ProgressMonitor.NULL);
> }
> Thread.sleep(Math.min(remainingMs, timeOutMs));
> timeOutMs <<= 1;
>   }
> {code}
> After about 16 seconds of execution of the query, the timeOutMs is 16384 ms, 
> which means the next sleep cycle is for min(30 - 17, 16) = 13.
> If the query finishes on the 17th second, the JDBC server will only respond 
> after the 30th second when it will check for operation.isDone() and return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20989) JDBC - The GetOperationStatus + log can block query progress via sleep()

2018-12-19 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20989:

Attachment: (was: HIVE-20989.01.patch)

> JDBC - The GetOperationStatus + log can block query progress via sleep()
> 
>
> Key: HIVE-20989
> URL: https://issues.apache.org/jira/browse/HIVE-20989
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Sankar Hariappan
>Priority: Major
> Attachments: HIVE-20989.01.patch
>
>
> There is an exponential sleep operation inside the CLIService which can end 
> up adding tens of seconds to a query which has already completed.
> {code}
> "HiveServer2-Handler-Pool: Thread-9373" #9373 prio=5 os_prio=0 
> tid=0x7f4d5e72d800 nid=0xb634a waiting on condition [0x7f28d06a5000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hive.service.cli.CLIService.progressUpdateLog(CLIService.java:506)
> at 
> org.apache.hive.service.cli.CLIService.getOperationStatus(CLIService.java:480)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetOperationStatus(ThriftCLIService.java:695)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1757)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1742)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The sleep loop is on the server side.
> {code}
> private static final long PROGRESS_MAX_WAIT_NS = 30 * 10l;
> private JobProgressUpdate progressUpdateLog(boolean isProgressLogRequested, 
> Operation operation, HiveConf conf) {
> ...
> long startTime = System.nanoTime();
> int timeOutMs = 8;
> try {
>   while (sessionState.getProgressMonitor() == null && 
> !operation.isDone()) {
> long remainingMs = (PROGRESS_MAX_WAIT_NS - (System.nanoTime() - 
> startTime)) / 100l;
> if (remainingMs <= 0) {
>   LOG.debug("timed out and hence returning progress log as NULL");
>   return new JobProgressUpdate(ProgressMonitor.NULL);
> }
> Thread.sleep(Math.min(remainingMs, timeOutMs));
> timeOutMs <<= 1;
>   }
> {code}
> After about 16 seconds of execution of the query, the timeOutMs is 16384 ms, 
> which means the next sleep cycle is for min(30 - 17, 16) = 13.
> If the query finishes on the 17th second, the JDBC server will only respond 
> after the 30th second when it will check for operation.isDone() and return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20989) JDBC - The GetOperationStatus + log can block query progress via sleep()

2018-12-19 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20989:

Status: Patch Available  (was: Open)

> JDBC - The GetOperationStatus + log can block query progress via sleep()
> 
>
> Key: HIVE-20989
> URL: https://issues.apache.org/jira/browse/HIVE-20989
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Sankar Hariappan
>Priority: Major
> Attachments: HIVE-20989.01.patch
>
>
> There is an exponential sleep operation inside the CLIService which can end 
> up adding tens of seconds to a query which has already completed.
> {code}
> "HiveServer2-Handler-Pool: Thread-9373" #9373 prio=5 os_prio=0 
> tid=0x7f4d5e72d800 nid=0xb634a waiting on condition [0x7f28d06a5000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hive.service.cli.CLIService.progressUpdateLog(CLIService.java:506)
> at 
> org.apache.hive.service.cli.CLIService.getOperationStatus(CLIService.java:480)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetOperationStatus(ThriftCLIService.java:695)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1757)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1742)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The sleep loop is on the server side.
> {code}
> private static final long PROGRESS_MAX_WAIT_NS = 30 * 10l;
> private JobProgressUpdate progressUpdateLog(boolean isProgressLogRequested, 
> Operation operation, HiveConf conf) {
> ...
> long startTime = System.nanoTime();
> int timeOutMs = 8;
> try {
>   while (sessionState.getProgressMonitor() == null && 
> !operation.isDone()) {
> long remainingMs = (PROGRESS_MAX_WAIT_NS - (System.nanoTime() - 
> startTime)) / 100l;
> if (remainingMs <= 0) {
>   LOG.debug("timed out and hence returning progress log as NULL");
>   return new JobProgressUpdate(ProgressMonitor.NULL);
> }
> Thread.sleep(Math.min(remainingMs, timeOutMs));
> timeOutMs <<= 1;
>   }
> {code}
> After about 16 seconds of execution of the query, the timeOutMs is 16384 ms, 
> which means the next sleep cycle is for min(30 - 17, 16) = 13.
> If the query finishes on the 17th second, the JDBC server will only respond 
> after the 30th second when it will check for operation.isDone() and return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20989) JDBC - The GetOperationStatus + log can block query progress via sleep()

2018-12-19 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20989:

Attachment: HIVE-20989.01.patch

> JDBC - The GetOperationStatus + log can block query progress via sleep()
> 
>
> Key: HIVE-20989
> URL: https://issues.apache.org/jira/browse/HIVE-20989
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Sankar Hariappan
>Priority: Major
> Attachments: HIVE-20989.01.patch
>
>
> There is an exponential sleep operation inside the CLIService which can end 
> up adding tens of seconds to a query which has already completed.
> {code}
> "HiveServer2-Handler-Pool: Thread-9373" #9373 prio=5 os_prio=0 
> tid=0x7f4d5e72d800 nid=0xb634a waiting on condition [0x7f28d06a5000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hive.service.cli.CLIService.progressUpdateLog(CLIService.java:506)
> at 
> org.apache.hive.service.cli.CLIService.getOperationStatus(CLIService.java:480)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetOperationStatus(ThriftCLIService.java:695)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1757)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1742)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The sleep loop is on the server side.
> {code}
> private static final long PROGRESS_MAX_WAIT_NS = 30 * 10l;
> private JobProgressUpdate progressUpdateLog(boolean isProgressLogRequested, 
> Operation operation, HiveConf conf) {
> ...
> long startTime = System.nanoTime();
> int timeOutMs = 8;
> try {
>   while (sessionState.getProgressMonitor() == null && 
> !operation.isDone()) {
> long remainingMs = (PROGRESS_MAX_WAIT_NS - (System.nanoTime() - 
> startTime)) / 100l;
> if (remainingMs <= 0) {
>   LOG.debug("timed out and hence returning progress log as NULL");
>   return new JobProgressUpdate(ProgressMonitor.NULL);
> }
> Thread.sleep(Math.min(remainingMs, timeOutMs));
> timeOutMs <<= 1;
>   }
> {code}
> After about 16 seconds of execution of the query, the timeOutMs is 16384 ms, 
> which means the next sleep cycle is for min(30 - 17, 16) = 13.
> If the query finishes on the 17th second, the JDBC server will only respond 
> after the 30th second when it will check for operation.isDone() and return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20989) JDBC - The GetOperationStatus + log can block query progress via sleep()

2018-12-19 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20989:

Summary: JDBC - The GetOperationStatus + log can block query progress via 
sleep()  (was: JDBC: The GetOperationStatus + log can block query progress via 
sleep())

> JDBC - The GetOperationStatus + log can block query progress via sleep()
> 
>
> Key: HIVE-20989
> URL: https://issues.apache.org/jira/browse/HIVE-20989
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Sankar Hariappan
>Priority: Major
> Attachments: HIVE-20989.01.patch
>
>
> There is an exponential sleep operation inside the CLIService which can end 
> up adding tens of seconds to a query which has already completed.
> {code}
> "HiveServer2-Handler-Pool: Thread-9373" #9373 prio=5 os_prio=0 
> tid=0x7f4d5e72d800 nid=0xb634a waiting on condition [0x7f28d06a5000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hive.service.cli.CLIService.progressUpdateLog(CLIService.java:506)
> at 
> org.apache.hive.service.cli.CLIService.getOperationStatus(CLIService.java:480)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetOperationStatus(ThriftCLIService.java:695)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1757)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1742)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The sleep loop is on the server side.
> {code}
> private static final long PROGRESS_MAX_WAIT_NS = 30 * 10l;
> private JobProgressUpdate progressUpdateLog(boolean isProgressLogRequested, 
> Operation operation, HiveConf conf) {
> ...
> long startTime = System.nanoTime();
> int timeOutMs = 8;
> try {
>   while (sessionState.getProgressMonitor() == null && 
> !operation.isDone()) {
> long remainingMs = (PROGRESS_MAX_WAIT_NS - (System.nanoTime() - 
> startTime)) / 100l;
> if (remainingMs <= 0) {
>   LOG.debug("timed out and hence returning progress log as NULL");
>   return new JobProgressUpdate(ProgressMonitor.NULL);
> }
> Thread.sleep(Math.min(remainingMs, timeOutMs));
> timeOutMs <<= 1;
>   }
> {code}
> After about 16 seconds of execution of the query, the timeOutMs is 16384 ms, 
> which means the next sleep cycle is for min(30 - 17, 16) = 13.
> If the query finishes on the 17th second, the JDBC server will only respond 
> after the 30th second when it will check for operation.isDone() and return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20989) JDBC: The GetOperationStatus + log can block query progress via sleep()

2018-12-19 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20989:

Status: Patch Available  (was: Open)

> JDBC: The GetOperationStatus + log can block query progress via sleep()
> ---
>
> Key: HIVE-20989
> URL: https://issues.apache.org/jira/browse/HIVE-20989
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Sankar Hariappan
>Priority: Major
> Attachments: HIVE-20989.01.patch
>
>
> There is an exponential sleep operation inside the CLIService which can end 
> up adding tens of seconds to a query which has already completed.
> {code}
> "HiveServer2-Handler-Pool: Thread-9373" #9373 prio=5 os_prio=0 
> tid=0x7f4d5e72d800 nid=0xb634a waiting on condition [0x7f28d06a5000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hive.service.cli.CLIService.progressUpdateLog(CLIService.java:506)
> at 
> org.apache.hive.service.cli.CLIService.getOperationStatus(CLIService.java:480)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetOperationStatus(ThriftCLIService.java:695)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1757)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1742)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The sleep loop is on the server side.
> {code}
> private static final long PROGRESS_MAX_WAIT_NS = 30 * 10l;
> private JobProgressUpdate progressUpdateLog(boolean isProgressLogRequested, 
> Operation operation, HiveConf conf) {
> ...
> long startTime = System.nanoTime();
> int timeOutMs = 8;
> try {
>   while (sessionState.getProgressMonitor() == null && 
> !operation.isDone()) {
> long remainingMs = (PROGRESS_MAX_WAIT_NS - (System.nanoTime() - 
> startTime)) / 100l;
> if (remainingMs <= 0) {
>   LOG.debug("timed out and hence returning progress log as NULL");
>   return new JobProgressUpdate(ProgressMonitor.NULL);
> }
> Thread.sleep(Math.min(remainingMs, timeOutMs));
> timeOutMs <<= 1;
>   }
> {code}
> After about 16 seconds of execution of the query, the timeOutMs is 16384 ms, 
> which means the next sleep cycle is for min(30 - 17, 16) = 13.
> If the query finishes on the 17th second, the JDBC server will only respond 
> after the 30th second when it will check for operation.isDone() and return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20989) JDBC: The GetOperationStatus + log can block query progress via sleep()

2018-12-19 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20989:

Attachment: HIVE-20989.01.patch

> JDBC: The GetOperationStatus + log can block query progress via sleep()
> ---
>
> Key: HIVE-20989
> URL: https://issues.apache.org/jira/browse/HIVE-20989
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Sankar Hariappan
>Priority: Major
> Attachments: HIVE-20989.01.patch
>
>
> There is an exponential sleep operation inside the CLIService which can end 
> up adding tens of seconds to a query which has already completed.
> {code}
> "HiveServer2-Handler-Pool: Thread-9373" #9373 prio=5 os_prio=0 
> tid=0x7f4d5e72d800 nid=0xb634a waiting on condition [0x7f28d06a5000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hive.service.cli.CLIService.progressUpdateLog(CLIService.java:506)
> at 
> org.apache.hive.service.cli.CLIService.getOperationStatus(CLIService.java:480)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetOperationStatus(ThriftCLIService.java:695)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1757)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1742)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The sleep loop is on the server side.
> {code}
> private static final long PROGRESS_MAX_WAIT_NS = 30 * 10l;
> private JobProgressUpdate progressUpdateLog(boolean isProgressLogRequested, 
> Operation operation, HiveConf conf) {
> ...
> long startTime = System.nanoTime();
> int timeOutMs = 8;
> try {
>   while (sessionState.getProgressMonitor() == null && 
> !operation.isDone()) {
> long remainingMs = (PROGRESS_MAX_WAIT_NS - (System.nanoTime() - 
> startTime)) / 100l;
> if (remainingMs <= 0) {
>   LOG.debug("timed out and hence returning progress log as NULL");
>   return new JobProgressUpdate(ProgressMonitor.NULL);
> }
> Thread.sleep(Math.min(remainingMs, timeOutMs));
> timeOutMs <<= 1;
>   }
> {code}
> After about 16 seconds of execution of the query, the timeOutMs is 16384 ms, 
> which means the next sleep cycle is for min(30 - 17, 16) = 13.
> If the query finishes on the 17th second, the JDBC server will only respond 
> after the 30th second when it will check for operation.isDone() and return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20989) JDBC: The GetOperationStatus + log can block query progress via sleep()

2018-12-19 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan reassigned HIVE-20989:
---

Assignee: Sankar Hariappan

> JDBC: The GetOperationStatus + log can block query progress via sleep()
> ---
>
> Key: HIVE-20989
> URL: https://issues.apache.org/jira/browse/HIVE-20989
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Sankar Hariappan
>Priority: Major
>
> There is an exponential sleep operation inside the CLIService which can end 
> up adding tens of seconds to a query which has already completed.
> {code}
> "HiveServer2-Handler-Pool: Thread-9373" #9373 prio=5 os_prio=0 
> tid=0x7f4d5e72d800 nid=0xb634a waiting on condition [0x7f28d06a5000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hive.service.cli.CLIService.progressUpdateLog(CLIService.java:506)
> at 
> org.apache.hive.service.cli.CLIService.getOperationStatus(CLIService.java:480)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetOperationStatus(ThriftCLIService.java:695)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1757)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1742)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The sleep loop is on the server side.
> {code}
> private static final long PROGRESS_MAX_WAIT_NS = 30 * 10l;
> private JobProgressUpdate progressUpdateLog(boolean isProgressLogRequested, 
> Operation operation, HiveConf conf) {
> ...
> long startTime = System.nanoTime();
> int timeOutMs = 8;
> try {
>   while (sessionState.getProgressMonitor() == null && 
> !operation.isDone()) {
> long remainingMs = (PROGRESS_MAX_WAIT_NS - (System.nanoTime() - 
> startTime)) / 100l;
> if (remainingMs <= 0) {
>   LOG.debug("timed out and hence returning progress log as NULL");
>   return new JobProgressUpdate(ProgressMonitor.NULL);
> }
> Thread.sleep(Math.min(remainingMs, timeOutMs));
> timeOutMs <<= 1;
>   }
> {code}
> After about 16 seconds of execution of the query, the timeOutMs is 16384 ms, 
> which means the next sleep cycle is for min(30 - 17, 16) = 13.
> If the query finishes on the 17th second, the JDBC server will only respond 
> after the 30th second when it will check for operation.isDone() and return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20989) JDBC: The GetOperationStatus + log can block query progress via sleep()

2018-12-19 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20989:

Affects Version/s: 4.0.0

> JDBC: The GetOperationStatus + log can block query progress via sleep()
> ---
>
> Key: HIVE-20989
> URL: https://issues.apache.org/jira/browse/HIVE-20989
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Sankar Hariappan
>Priority: Major
>
> There is an exponential sleep operation inside the CLIService which can end 
> up adding tens of seconds to a query which has already completed.
> {code}
> "HiveServer2-Handler-Pool: Thread-9373" #9373 prio=5 os_prio=0 
> tid=0x7f4d5e72d800 nid=0xb634a waiting on condition [0x7f28d06a5000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hive.service.cli.CLIService.progressUpdateLog(CLIService.java:506)
> at 
> org.apache.hive.service.cli.CLIService.getOperationStatus(CLIService.java:480)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetOperationStatus(ThriftCLIService.java:695)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1757)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1742)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The sleep loop is on the server side.
> {code}
> private static final long PROGRESS_MAX_WAIT_NS = 30 * 10l;
> private JobProgressUpdate progressUpdateLog(boolean isProgressLogRequested, 
> Operation operation, HiveConf conf) {
> ...
> long startTime = System.nanoTime();
> int timeOutMs = 8;
> try {
>   while (sessionState.getProgressMonitor() == null && 
> !operation.isDone()) {
> long remainingMs = (PROGRESS_MAX_WAIT_NS - (System.nanoTime() - 
> startTime)) / 100l;
> if (remainingMs <= 0) {
>   LOG.debug("timed out and hence returning progress log as NULL");
>   return new JobProgressUpdate(ProgressMonitor.NULL);
> }
> Thread.sleep(Math.min(remainingMs, timeOutMs));
> timeOutMs <<= 1;
>   }
> {code}
> After about 16 seconds of execution of the query, the timeOutMs is 16384 ms, 
> which means the next sleep cycle is for min(30 - 17, 16) = 13.
> If the query finishes on the 17th second, the JDBC server will only respond 
> after the 30th second when it will check for operation.isDone() and return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21029) External table replication for existing deployments running incremental replication.

2018-12-18 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21029:

Summary: External table replication for existing deployments running 
incremental replication.  (was: External table replication: for existing 
deployments running incremental replication)

> External table replication for existing deployments running incremental 
> replication.
> 
>
> Key: HIVE-21029
> URL: https://issues.apache.org/jira/browse/HIVE-21029
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0, 3.1.0, 3.1.1
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Critical
> Fix For: 4.0.0
>
>
> Existing deployments using hive replication do not get external tables 
> replicated. For such deployments to enable external table replication they 
> will have to provide a specific switch to first bootstrap external tables as 
> part of hive incremental replication, following which the incremental 
> replication will take care of further changes in external tables.
> The switch will be provided by an additional hive configuration (for ex: 
> hive.repl.bootstrap.external.tables) and is to be used in 
> {code} WITH {code}  clause of 
> {code} REPL DUMP {code} command. 
> Additionally the existing hive config _hive.repl.include.external.tables_  
> will always have to be set to "true" in the above clause. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21043) Enable move optimization for cloud replication with strict managed tables.

2018-12-18 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21043:

   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks [~maheshk114] for the review!

> Enable move optimization for cloud replication with strict managed tables.
> --
>
> Key: HIVE-21043
> URL: https://issues.apache.org/jira/browse/HIVE-21043
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21043.01.patch
>
>
> If hive.repl.enable.move.optimization is set to true, then Hive REPL LOAD 
> avoids move operation and copy the data files directly to target location. 
> This is helpful for Cloud replication where the move is non-atomic and also 
> implemented as copy.
> Currently, if hive.strict.managed.tables is enabled at target and if 
> migration to transactional table is needed during REPL LOAD, then this 
> optimization is disabled.
> Need to support it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21043) Enable move optimization for cloud replication with strict managed tables.

2018-12-18 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21043:

Summary: Enable move optimization for cloud replication with strict managed 
tables.  (was: Enable move optimization for replicating to cloud based cluster 
with strict managed tables.)

> Enable move optimization for cloud replication with strict managed tables.
> --
>
> Key: HIVE-21043
> URL: https://issues.apache.org/jira/browse/HIVE-21043
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available
> Attachments: HIVE-21043.01.patch
>
>
> If hive.repl.enable.move.optimization is set to true, then Hive REPL LOAD 
> avoids move operation and copy the data files directly to target location. 
> This is helpful for Cloud replication where the move is non-atomic and also 
> implemented as copy.
> Currently, if hive.strict.managed.tables is enabled at target and if 
> migration to transactional table is needed during REPL LOAD, then this 
> optimization is disabled.
> Need to support it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21029) External table replication: for existing deployments running incremental replication

2018-12-18 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan reassigned HIVE-21029:
---

Assignee: Sankar Hariappan  (was: anishek)

> External table replication: for existing deployments running incremental 
> replication
> 
>
> Key: HIVE-21029
> URL: https://issues.apache.org/jira/browse/HIVE-21029
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0, 3.1.0, 3.1.1
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Critical
> Fix For: 4.0.0
>
>
> Existing deployments using hive replication do not get external tables 
> replicated. For such deployments to enable external table replication they 
> will have to provide a specific switch to first bootstrap external tables as 
> part of hive incremental replication, following which the incremental 
> replication will take care of further changes in external tables.
> The switch will be provided by an additional hive configuration (for ex: 
> hive.repl.bootstrap.external.tables) and is to be used in 
> {code} WITH {code}  clause of 
> {code} REPL DUMP {code} command. 
> Additionally the existing hive config _hive.repl.include.external.tables_  
> will always have to be set to "true" in the above clause. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21055) Replication load command executing copy in serial mode even if parallel execution is enabled using with clause

2018-12-18 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724112#comment-16724112
 ] 

Sankar Hariappan commented on HIVE-21055:
-

+1, pending tests

> Replication load command executing copy in serial mode even if parallel 
> execution is enabled using with clause
> --
>
> Key: HIVE-21055
> URL: https://issues.apache.org/jira/browse/HIVE-21055
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21055.01.patch
>
>
> For repl load command use can specify the execution mode as part of "with" 
> clause. But the config for executing task in parallel or serial is not read 
> from the command specific config. It is read from the hive server config. So 
> even if user specifies to run the tasks in parallel during repl load command, 
> the tasks are getting executed serially.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21043) Enable move optimization for replicating to cloud based cluster with strict managed tables.

2018-12-18 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21043:

Status: Patch Available  (was: Open)

> Enable move optimization for replicating to cloud based cluster with strict 
> managed tables.
> ---
>
> Key: HIVE-21043
> URL: https://issues.apache.org/jira/browse/HIVE-21043
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR
> Attachments: HIVE-21043.01.patch
>
>
> If hive.repl.enable.move.optimization is set to true, then Hive REPL LOAD 
> avoids move operation and copy the data files directly to target location. 
> This is helpful for Cloud replication where the move is non-atomic and also 
> implemented as copy.
> Currently, if hive.strict.managed.tables is enabled at target and if 
> migration to transactional table is needed during REPL LOAD, then this 
> optimization is disabled.
> Need to support it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21043) Enable move optimization for replicating to cloud based cluster with strict managed tables.

2018-12-18 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21043:

Attachment: HIVE-21043.01.patch

> Enable move optimization for replicating to cloud based cluster with strict 
> managed tables.
> ---
>
> Key: HIVE-21043
> URL: https://issues.apache.org/jira/browse/HIVE-21043
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available
> Attachments: HIVE-21043.01.patch
>
>
> If hive.repl.enable.move.optimization is set to true, then Hive REPL LOAD 
> avoids move operation and copy the data files directly to target location. 
> This is helpful for Cloud replication where the move is non-atomic and also 
> implemented as copy.
> Currently, if hive.strict.managed.tables is enabled at target and if 
> migration to transactional table is needed during REPL LOAD, then this 
> optimization is disabled.
> Need to support it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20967) Handle alter events when replicate to cluster with hive.strict.managed.tables enabled.

2018-12-18 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20967:

Description: 
Some of the events from Hive2 may cause conflicts in Hive3 
(hive.strict.managed.tables=true) when applied. So, need to handle them 
properly.
1. Alter table to convert non-acid to acid.
This event should be no-op as the table in target might be already acid or MM 
or external table.
2. Alter table or partition that changes the location.
Once the table is moved to managed table warehouse directory, the location 
shouldn't be changed.

  was:
Some of the events from Hive2 may cause conflicts in Hive3 
(hive.strict.managed.tables=true) when applied. So, need to handle them 
properly.
1. Alter table to convert non-acid to acid.
This event should be no-op as the table in target might be already acid or MM 
or external table.
2. Alter table or partition that changes the location.
Once the table is moved to managed table warehouse directory, the location 
shouldn't be changed.
3. Alter database that changes the location.
Once the database is moved to managed table warehouse directory, the location 
shouldn't be changed.


> Handle alter events when replicate to cluster with hive.strict.managed.tables 
> enabled.
> --
>
> Key: HIVE-20967
> URL: https://issues.apache.org/jira/browse/HIVE-20967
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR
>
> Some of the events from Hive2 may cause conflicts in Hive3 
> (hive.strict.managed.tables=true) when applied. So, need to handle them 
> properly.
> 1. Alter table to convert non-acid to acid.
> This event should be no-op as the table in target might be already acid or MM 
> or external table.
> 2. Alter table or partition that changes the location.
> Once the table is moved to managed table warehouse directory, the location 
> shouldn't be changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20967) Handle alter events when replicate to cluster with hive.strict.managed.tables enabled.

2018-12-18 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20967:

Priority: Minor  (was: Major)

> Handle alter events when replicate to cluster with hive.strict.managed.tables 
> enabled.
> --
>
> Key: HIVE-20967
> URL: https://issues.apache.org/jira/browse/HIVE-20967
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: DR
>
> Some of the events from Hive2 may cause conflicts in Hive3 
> (hive.strict.managed.tables=true) when applied. So, need to handle them 
> properly.
> 1. Alter table to convert non-acid to acid.
> This event should be no-op as the table in target might be already acid or MM 
> or external table.
> 2. Alter table or partition that changes the location.
> Once the table is moved to managed table warehouse directory, the location 
> shouldn't be changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20967) Handle alter events when replicate to cluster with hive.strict.managed.tables enabled.

2018-12-18 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723832#comment-16723832
 ] 

Sankar Hariappan commented on HIVE-20967:
-

It seems, this JIRA is completed but need to add tests to verify it.

> Handle alter events when replicate to cluster with hive.strict.managed.tables 
> enabled.
> --
>
> Key: HIVE-20967
> URL: https://issues.apache.org/jira/browse/HIVE-20967
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR
>
> Some of the events from Hive2 may cause conflicts in Hive3 
> (hive.strict.managed.tables=true) when applied. So, need to handle them 
> properly.
> 1. Alter table to convert non-acid to acid.
> This event should be no-op as the table in target might be already acid or MM 
> or external table.
> 2. Alter table or partition that changes the location.
> Once the table is moved to managed table warehouse directory, the location 
> shouldn't be changed.
> 3. Alter database that changes the location.
> Once the database is moved to managed table warehouse directory, the location 
> shouldn't be changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20968) Support conversion of managed to external where location set was not owned by hive

2018-12-18 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20968:

Summary: Support conversion of managed to external where location set was 
not owned by hive  (was: Support conversion of managed to external where 
location set by user.)

> Support conversion of managed to external where location set was not owned by 
> hive
> --
>
> Key: HIVE-20968
> URL: https://issues.apache.org/jira/browse/HIVE-20968
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR
>
> As per migration rule, if a location is outside the default managed table 
> directory and the location is not owned by "hive" user, then it should be 
> converted to external table after upgrade.
> So, the same rule is applicable for Hive replication where the data of source 
> managed table is residing outside the default warehouse directory and is not 
> owned by "hive" user.
> During this conversion, the path should be preserved in target as well so 
> that failover works seamlessly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21023) Add test for replication to a target with hive.strict.managed.tables enabled

2018-12-14 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721095#comment-16721095
 ] 

Sankar Hariappan commented on HIVE-21023:
-

+1 for 02.patch, pending tests

> Add test for replication to a target with hive.strict.managed.tables enabled
> 
>
> Key: HIVE-21023
> URL: https://issues.apache.org/jira/browse/HIVE-21023
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Attachments: HIVE-21023.01.patch, HIVE-21023.02.patch
>
>
> Tests added are timing out in ptest run. Need to skip these test cases from 
> batching and run them separately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21043) Enable move optimization for replicating to cloud based cluster with strict managed tables.

2018-12-14 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21043:

Description: 
If hive.repl.enable.move.optimization is set to true, then Hive REPL LOAD 
avoids move operation and copy the data files directly to target location. This 
is helpful for Cloud replication where the move is non-atomic and also 
implemented as copy.
Currently, if hive.strict.managed.tables is enabled at target and if migration 
to transactional table is needed during REPL LOAD, then this optimization is 
disabled.
Need to support it.


  was:
If hive.repl.enable.move.optimization is set to true, then Hive REPL LOAD 
avoids move operation and copy the data files directly to target location. This 
is helpful for Cloud replication where the move is non-atomic and also 
implemented as copy.
Currently, if hive.strict.managed.tables is enabled at target and if migration 
to transactional table is needed during REPL LOAD, then this optimisation is 
disabled.



> Enable move optimization for replicating to cloud based cluster with strict 
> managed tables.
> ---
>
> Key: HIVE-21043
> URL: https://issues.apache.org/jira/browse/HIVE-21043
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR
>
> If hive.repl.enable.move.optimization is set to true, then Hive REPL LOAD 
> avoids move operation and copy the data files directly to target location. 
> This is helpful for Cloud replication where the move is non-atomic and also 
> implemented as copy.
> Currently, if hive.strict.managed.tables is enabled at target and if 
> migration to transactional table is needed during REPL LOAD, then this 
> optimization is disabled.
> Need to support it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21043) Enable move optimization for replicating to cloud based cluster with strict managed tables.

2018-12-14 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan reassigned HIVE-21043:
---


> Enable move optimization for replicating to cloud based cluster with strict 
> managed tables.
> ---
>
> Key: HIVE-21043
> URL: https://issues.apache.org/jira/browse/HIVE-21043
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR
>
> If hive.repl.enable.move.optimization is set to true, then Hive REPL LOAD 
> avoids move operation and copy the data files directly to target location. 
> This is helpful for Cloud replication where the move is non-atomic and also 
> implemented as copy.
> Currently, if hive.strict.managed.tables is enabled at target and if 
> migration to transactional table is needed during REPL LOAD, then this 
> optimisation is disabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20967) Handle alter events when replicate to cluster with hive.strict.managed.tables enabled.

2018-12-09 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20967:

Priority: Major  (was: Minor)

> Handle alter events when replicate to cluster with hive.strict.managed.tables 
> enabled.
> --
>
> Key: HIVE-20967
> URL: https://issues.apache.org/jira/browse/HIVE-20967
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR
>
> Some of the events from HDP 2.6.5 may cause conflicts in HDP 3.0 when 
> applied. So, need to handle them properly.
> 1. Alter table to convert non-acid to acid.
> This event should be no-op as the table in target might be already acid or MM 
> or external table.
> 2. Alter table or partition that changes the location.
> Once the table is moved to managed table warehouse directory, the location 
> shouldn't be changed.
> 3. Alter database that changes the location.
> Once the database is moved to managed table warehouse directory, the location 
> shouldn't be changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20968) Support conversion of managed to external where location set by user.

2018-12-09 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20968:

Description: 
As per migration rule, if a location is outside the default managed table 
directory and the location is not owned by "hive" user, then it should be 
converted to external table after upgrade.
So, the same rule is applicable for Hive replication where the data of source 
managed table is residing outside the default warehouse directory and is not 
owned by "hive" user.
During this conversion, the path should be preserved in target as well so that 
failover works seamlessly.

  was:
Hive2 supports replication of managed tables. But in Hive3, some of these 
managed tables are converted to ACID or MM tables. Also, some of them are 
converted to external tables based on below rules. 
 # Avro format with external schema, Storage handlers, List bucketed tabled are 
converted to external tables.
 # Location not owned by "hive" user are converted to external table.
 # Hive owned ORC format are converted to full ACID transactional table.
 # Hive owned Non-ORC format are converted to MM transactional table.

REPL LOAD should apply these rules during bootstrap and convert the tables 
accordingly.


> Support conversion of managed to external where location set by user.
> -
>
> Key: HIVE-20968
> URL: https://issues.apache.org/jira/browse/HIVE-20968
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR
>
> As per migration rule, if a location is outside the default managed table 
> directory and the location is not owned by "hive" user, then it should be 
> converted to external table after upgrade.
> So, the same rule is applicable for Hive replication where the data of source 
> managed table is residing outside the default warehouse directory and is not 
> owned by "hive" user.
> During this conversion, the path should be preserved in target as well so 
> that failover works seamlessly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20968) Support conversion of managed to external where location set by user.

2018-12-09 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20968:

Summary: Support conversion of managed to external where location set by 
user.  (was: CLONE - Bootstrap of tables to target with 
hive.strict.managed.tables enabled.)

> Support conversion of managed to external where location set by user.
> -
>
> Key: HIVE-20968
> URL: https://issues.apache.org/jira/browse/HIVE-20968
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR
>
> Hive2 supports replication of managed tables. But in Hive3, some of these 
> managed tables are converted to ACID or MM tables. Also, some of them are 
> converted to external tables based on below rules. 
>  # Avro format with external schema, Storage handlers, List bucketed tabled 
> are converted to external tables.
>  # Location not owned by "hive" user are converted to external table.
>  # Hive owned ORC format are converted to full ACID transactional table.
>  # Hive owned Non-ORC format are converted to MM transactional table.
> REPL LOAD should apply these rules during bootstrap and convert the tables 
> accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20967) Handle alter events when replicate to cluster with hive.strict.managed.tables enabled.

2018-12-09 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20967:

Description: 
Some of the events from Hive2 may cause conflicts in Hive3 ( when applied. So, 
need to handle them properly.
1. Alter table to convert non-acid to acid.
This event should be no-op as the table in target might be already acid or MM 
or external table.
2. Alter table or partition that changes the location.
Once the table is moved to managed table warehouse directory, the location 
shouldn't be changed.
3. Alter database that changes the location.
Once the database is moved to managed table warehouse directory, the location 
shouldn't be changed.

  was:
Some of the events from HDP 2.6.5 may cause conflicts in HDP 3.0 when applied. 
So, need to handle them properly.
1. Alter table to convert non-acid to acid.
This event should be no-op as the table in target might be already acid or MM 
or external table.
2. Alter table or partition that changes the location.
Once the table is moved to managed table warehouse directory, the location 
shouldn't be changed.
3. Alter database that changes the location.
Once the database is moved to managed table warehouse directory, the location 
shouldn't be changed.


> Handle alter events when replicate to cluster with hive.strict.managed.tables 
> enabled.
> --
>
> Key: HIVE-20967
> URL: https://issues.apache.org/jira/browse/HIVE-20967
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR
>
> Some of the events from Hive2 may cause conflicts in Hive3 ( when applied. 
> So, need to handle them properly.
> 1. Alter table to convert non-acid to acid.
> This event should be no-op as the table in target might be already acid or MM 
> or external table.
> 2. Alter table or partition that changes the location.
> Once the table is moved to managed table warehouse directory, the location 
> shouldn't be changed.
> 3. Alter database that changes the location.
> Once the database is moved to managed table warehouse directory, the location 
> shouldn't be changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20967) Handle alter events when replicate to cluster with hive.strict.managed.tables enabled.

2018-12-09 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20967:

Description: 
Some of the events from Hive2 may cause conflicts in Hive3 
(hive.strict.managed.tables=true) when applied. So, need to handle them 
properly.
1. Alter table to convert non-acid to acid.
This event should be no-op as the table in target might be already acid or MM 
or external table.
2. Alter table or partition that changes the location.
Once the table is moved to managed table warehouse directory, the location 
shouldn't be changed.
3. Alter database that changes the location.
Once the database is moved to managed table warehouse directory, the location 
shouldn't be changed.

  was:
Some of the events from Hive2 may cause conflicts in Hive3 ( when applied. So, 
need to handle them properly.
1. Alter table to convert non-acid to acid.
This event should be no-op as the table in target might be already acid or MM 
or external table.
2. Alter table or partition that changes the location.
Once the table is moved to managed table warehouse directory, the location 
shouldn't be changed.
3. Alter database that changes the location.
Once the database is moved to managed table warehouse directory, the location 
shouldn't be changed.


> Handle alter events when replicate to cluster with hive.strict.managed.tables 
> enabled.
> --
>
> Key: HIVE-20967
> URL: https://issues.apache.org/jira/browse/HIVE-20967
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR
>
> Some of the events from Hive2 may cause conflicts in Hive3 
> (hive.strict.managed.tables=true) when applied. So, need to handle them 
> properly.
> 1. Alter table to convert non-acid to acid.
> This event should be no-op as the table in target might be already acid or MM 
> or external table.
> 2. Alter table or partition that changes the location.
> Once the table is moved to managed table warehouse directory, the location 
> shouldn't be changed.
> 3. Alter database that changes the location.
> Once the database is moved to managed table warehouse directory, the location 
> shouldn't be changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20967) Handle alter events when replicate to cluster with hive.strict.managed.tables enabled.

2018-12-09 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20967:

Summary: Handle alter events when replicate to cluster with 
hive.strict.managed.tables enabled.  (was: CLONE - REPL DUMP to dump the 
default warehouse directory of source.)

> Handle alter events when replicate to cluster with hive.strict.managed.tables 
> enabled.
> --
>
> Key: HIVE-20967
> URL: https://issues.apache.org/jira/browse/HIVE-20967
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: DR
>
> The default warehouse directory of the source is needed by target to detect 
> if DB or table location is set by user or assigned by Hive. 
> Using this information, REPL LOAD will decide to preserve the path or move 
> data to default managed table's warehouse directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20967) Handle alter events when replicate to cluster with hive.strict.managed.tables enabled.

2018-12-09 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20967:

Description: 
Some of the events from HDP 2.6.5 may cause conflicts in HDP 3.0 when applied. 
So, need to handle them properly.
1. Alter table to convert non-acid to acid.
This event should be no-op as the table in target might be already acid or MM 
or external table.
2. Alter table or partition that changes the location.
Once the table is moved to managed table warehouse directory, the location 
shouldn't be changed.
3. Alter database that changes the location.
Once the database is moved to managed table warehouse directory, the location 
shouldn't be changed.

  was:
The default warehouse directory of the source is needed by target to detect if 
DB or table location is set by user or assigned by Hive. 
Using this information, REPL LOAD will decide to preserve the path or move data 
to default managed table's warehouse directory.


> Handle alter events when replicate to cluster with hive.strict.managed.tables 
> enabled.
> --
>
> Key: HIVE-20967
> URL: https://issues.apache.org/jira/browse/HIVE-20967
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: DR
>
> Some of the events from HDP 2.6.5 may cause conflicts in HDP 3.0 when 
> applied. So, need to handle them properly.
> 1. Alter table to convert non-acid to acid.
> This event should be no-op as the table in target might be already acid or MM 
> or external table.
> 2. Alter table or partition that changes the location.
> Once the table is moved to managed table warehouse directory, the location 
> shouldn't be changed.
> 3. Alter database that changes the location.
> Once the database is moved to managed table warehouse directory, the location 
> shouldn't be changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20966) Support bootstrap and incremental replication to a target with hive.strict.managed.tables enabled.

2018-12-09 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20966:

Attachment: HIVE-20966.05.patch

> Support bootstrap and incremental replication to a target with 
> hive.strict.managed.tables enabled.
> --
>
> Key: HIVE-20966
> URL: https://issues.apache.org/jira/browse/HIVE-20966
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR
> Attachments: HIVE-20966.01.patch, HIVE-20966.02.patch, 
> HIVE-20966.03.patch, HIVE-20966.04.patch, HIVE-20966.04.patch, 
> HIVE-20966.05.patch
>
>
> *Requirements:*
> Hive2 supports replication of managed tables. But in Hive3 with 
> hive.strict.managed.tables=true, some of these managed tables are converted 
> to ACID or MM tables. Also, some of them are converted to external tables 
> based on below rules. 
> - Avro format with external schema, Storage handlers, List bucketed tabled 
> are converted to external tables.
> - Location not owned by "hive" user are converted to external table.
> - Hive owned ORC format are converted to full ACID transactional table.
> - Hive owned Non-ORC format are converted to MM transactional table.
> REPL LOAD should apply these rules during bootstrap and incremental phases 
> and convert the tables accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20966) Support bootstrap and incremental replication to a target with hive.strict.managed.tables enabled.

2018-12-09 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20966:

Assignee: mahesh kumar behera  (was: Sankar Hariappan)
  Status: Patch Available  (was: Open)

> Support bootstrap and incremental replication to a target with 
> hive.strict.managed.tables enabled.
> --
>
> Key: HIVE-20966
> URL: https://issues.apache.org/jira/browse/HIVE-20966
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: DR
> Attachments: HIVE-20966.01.patch, HIVE-20966.02.patch, 
> HIVE-20966.03.patch, HIVE-20966.04.patch, HIVE-20966.04.patch, 
> HIVE-20966.05.patch
>
>
> *Requirements:*
> Hive2 supports replication of managed tables. But in Hive3 with 
> hive.strict.managed.tables=true, some of these managed tables are converted 
> to ACID or MM tables. Also, some of them are converted to external tables 
> based on below rules. 
> - Avro format with external schema, Storage handlers, List bucketed tabled 
> are converted to external tables.
> - Location not owned by "hive" user are converted to external table.
> - Hive owned ORC format are converted to full ACID transactional table.
> - Hive owned Non-ORC format are converted to MM transactional table.
> REPL LOAD should apply these rules during bootstrap and incremental phases 
> and convert the tables accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20966) Support bootstrap and incremental replication to a target with hive.strict.managed.tables enabled.

2018-12-09 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20966:

Status: Open  (was: Patch Available)

> Support bootstrap and incremental replication to a target with 
> hive.strict.managed.tables enabled.
> --
>
> Key: HIVE-20966
> URL: https://issues.apache.org/jira/browse/HIVE-20966
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR
> Attachments: HIVE-20966.01.patch, HIVE-20966.02.patch, 
> HIVE-20966.03.patch, HIVE-20966.04.patch, HIVE-20966.04.patch, 
> HIVE-20966.05.patch
>
>
> *Requirements:*
> Hive2 supports replication of managed tables. But in Hive3 with 
> hive.strict.managed.tables=true, some of these managed tables are converted 
> to ACID or MM tables. Also, some of them are converted to external tables 
> based on below rules. 
> - Avro format with external schema, Storage handlers, List bucketed tabled 
> are converted to external tables.
> - Location not owned by "hive" user are converted to external table.
> - Hive owned ORC format are converted to full ACID transactional table.
> - Hive owned Non-ORC format are converted to MM transactional table.
> REPL LOAD should apply these rules during bootstrap and incremental phases 
> and convert the tables accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20966) Support bootstrap and incremental replication to a target with hive.strict.managed.tables enabled.

2018-12-09 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20966:

Attachment: (was: HIVE-20966.05.patch)

> Support bootstrap and incremental replication to a target with 
> hive.strict.managed.tables enabled.
> --
>
> Key: HIVE-20966
> URL: https://issues.apache.org/jira/browse/HIVE-20966
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR
> Attachments: HIVE-20966.01.patch, HIVE-20966.02.patch, 
> HIVE-20966.03.patch, HIVE-20966.04.patch, HIVE-20966.04.patch, 
> HIVE-20966.05.patch
>
>
> *Requirements:*
> Hive2 supports replication of managed tables. But in Hive3 with 
> hive.strict.managed.tables=true, some of these managed tables are converted 
> to ACID or MM tables. Also, some of them are converted to external tables 
> based on below rules. 
> - Avro format with external schema, Storage handlers, List bucketed tabled 
> are converted to external tables.
> - Location not owned by "hive" user are converted to external table.
> - Hive owned ORC format are converted to full ACID transactional table.
> - Hive owned Non-ORC format are converted to MM transactional table.
> REPL LOAD should apply these rules during bootstrap and incremental phases 
> and convert the tables accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20966) Support bootstrap and incremental replication to a target with hive.strict.managed.tables enabled.

2018-12-09 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20966:

Attachment: HIVE-20966.05.patch

> Support bootstrap and incremental replication to a target with 
> hive.strict.managed.tables enabled.
> --
>
> Key: HIVE-20966
> URL: https://issues.apache.org/jira/browse/HIVE-20966
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR
> Attachments: HIVE-20966.01.patch, HIVE-20966.02.patch, 
> HIVE-20966.03.patch, HIVE-20966.04.patch, HIVE-20966.04.patch, 
> HIVE-20966.05.patch
>
>
> *Requirements:*
> Hive2 supports replication of managed tables. But in Hive3 with 
> hive.strict.managed.tables=true, some of these managed tables are converted 
> to ACID or MM tables. Also, some of them are converted to external tables 
> based on below rules. 
> - Avro format with external schema, Storage handlers, List bucketed tabled 
> are converted to external tables.
> - Location not owned by "hive" user are converted to external table.
> - Hive owned ORC format are converted to full ACID transactional table.
> - Hive owned Non-ORC format are converted to MM transactional table.
> REPL LOAD should apply these rules during bootstrap and incremental phases 
> and convert the tables accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20966) Support bootstrap and incremental replication to a target with hive.strict.managed.tables enabled.

2018-12-08 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan reassigned HIVE-20966:
---

Assignee: Sankar Hariappan  (was: mahesh kumar behera)

> Support bootstrap and incremental replication to a target with 
> hive.strict.managed.tables enabled.
> --
>
> Key: HIVE-20966
> URL: https://issues.apache.org/jira/browse/HIVE-20966
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR
> Attachments: HIVE-20966.01.patch, HIVE-20966.02.patch, 
> HIVE-20966.03.patch, HIVE-20966.04.patch, HIVE-20966.04.patch, 
> HIVE-20966.05.patch, HIVE-20966.05.patch
>
>
> *Requirements:*
> Hive2 supports replication of managed tables. But in Hive3 with 
> hive.strict.managed.tables=true, some of these managed tables are converted 
> to ACID or MM tables. Also, some of them are converted to external tables 
> based on below rules. 
> - Avro format with external schema, Storage handlers, List bucketed tabled 
> are converted to external tables.
> - Location not owned by "hive" user are converted to external table.
> - Hive owned ORC format are converted to full ACID transactional table.
> - Hive owned Non-ORC format are converted to MM transactional table.
> REPL LOAD should apply these rules during bootstrap and incremental phases 
> and convert the tables accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20966) Support bootstrap and incremental replication to a target with hive.strict.managed.tables enabled.

2018-12-07 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712598#comment-16712598
 ] 

Sankar Hariappan commented on HIVE-20966:
-

[~vihangk1], [~pvary]
The newly added test suite TestReplicationScenariosMigration is equivalent to 
the existing TestReplicationScenarios. it takes lot of time around 30 to 40 
mins to finish. It is getting batched with other tests and hence timing out 
always.
Can you please help us to login to ptest server and isolate it from getting 
batched? 

cc [~maheshk114]

> Support bootstrap and incremental replication to a target with 
> hive.strict.managed.tables enabled.
> --
>
> Key: HIVE-20966
> URL: https://issues.apache.org/jira/browse/HIVE-20966
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: DR
> Attachments: HIVE-20966.01.patch, HIVE-20966.02.patch, 
> HIVE-20966.03.patch, HIVE-20966.04.patch
>
>
> *Requirements:*
> Hive2 supports replication of managed tables. But in Hive3 with 
> hive.strict.managed.tables=true, some of these managed tables are converted 
> to ACID or MM tables. Also, some of them are converted to external tables 
> based on below rules. 
> - Avro format with external schema, Storage handlers, List bucketed tabled 
> are converted to external tables.
> - Location not owned by "hive" user are converted to external table.
> - Hive owned ORC format are converted to full ACID transactional table.
> - Hive owned Non-ORC format are converted to MM transactional table.
> REPL LOAD should apply these rules during bootstrap and incremental phases 
> and convert the tables accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20966) Support bootstrap and incremental replication to a target with hive.strict.managed.tables enabled.

2018-12-06 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711778#comment-16711778
 ] 

Sankar Hariappan commented on HIVE-20966:
-

[~maheshk114]
+1 for 02.patch, pending tests

> Support bootstrap and incremental replication to a target with 
> hive.strict.managed.tables enabled.
> --
>
> Key: HIVE-20966
> URL: https://issues.apache.org/jira/browse/HIVE-20966
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: DR
> Attachments: HIVE-20966.01.patch, HIVE-20966.02.patch
>
>
> *Requirements:*
> Hive2 supports replication of managed tables. But in Hive3 with 
> hive.strict.managed.tables=true, some of these managed tables are converted 
> to ACID or MM tables. Also, some of them are converted to external tables 
> based on below rules. 
> - Avro format with external schema, Storage handlers, List bucketed tabled 
> are converted to external tables.
> - Location not owned by "hive" user are converted to external table.
> - Hive owned ORC format are converted to full ACID transactional table.
> - Hive owned Non-ORC format are converted to MM transactional table.
> REPL LOAD should apply these rules during bootstrap and incremental phases 
> and convert the tables accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20966) Support bootstrap and incremental replication to a target with hive.strict.managed.tables enabled.

2018-12-03 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20966:

Summary: Support bootstrap and incremental replication to a target with 
hive.strict.managed.tables enabled.  (was: Support bootstrap and incremental 
replication to a target cluster with hive.strict.managed.tables enabled.)

> Support bootstrap and incremental replication to a target with 
> hive.strict.managed.tables enabled.
> --
>
> Key: HIVE-20966
> URL: https://issues.apache.org/jira/browse/HIVE-20966
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: DR
>
> *Requirements:*
> Hive2 supports replication of managed tables. But in Hive3, some of these 
> managed tables are converted to ACID or MM tables. Also, some of them are 
> converted to external tables based on below rules. 
> - Avro format with external schema, Storage handlers, List bucketed tabled 
> are converted to external tables.
> - Location not owned by "hive" user are converted to external table.
> - Hive owned ORC format are converted to full ACID transactional table.
> - Hive owned Non-ORC format are converted to MM transactional table.
> REPL LOAD should apply these rules during bootstrap and incremental phases 
> and convert the tables accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-20884) Bootstrap of tables to target with hive.strict.managed.tables enabled.

2018-12-03 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan resolved HIVE-20884.
-
Resolution: Duplicate

> Bootstrap of tables to target with hive.strict.managed.tables enabled.
> --
>
> Key: HIVE-20884
> URL: https://issues.apache.org/jira/browse/HIVE-20884
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR
>
> Hive2 supports replication of managed tables. But in Hive3, some of these 
> managed tables are converted to ACID or MM tables. Also, some of them are 
> converted to external tables based on below rules. 
>  # Avro format with external schema, Storage handlers, List bucketed tabled 
> are converted to external tables.
>  # Location not owned by "hive" user are converted to external table.
>  # Hive owned ORC format are converted to full ACID transactional table.
>  # Hive owned Non-ORC format are converted to MM transactional table.
> REPL LOAD should apply these rules during bootstrap and convert the tables 
> accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20966) Support bootstrap and incremental replication to a target with hive.strict.managed.tables enabled.

2018-12-03 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20966:

Description: 
*Requirements:*
Hive2 supports replication of managed tables. But in Hive3 with 
hive.strict.managed.tables=true, some of these managed tables are converted to 
ACID or MM tables. Also, some of them are converted to external tables based on 
below rules. 
- Avro format with external schema, Storage handlers, List bucketed tabled are 
converted to external tables.
- Location not owned by "hive" user are converted to external table.
- Hive owned ORC format are converted to full ACID transactional table.
- Hive owned Non-ORC format are converted to MM transactional table.

REPL LOAD should apply these rules during bootstrap and incremental phases and 
convert the tables accordingly.

  was:
*Requirements:*
Hive2 supports replication of managed tables. But in Hive3, some of these 
managed tables are converted to ACID or MM tables. Also, some of them are 
converted to external tables based on below rules. 
- Avro format with external schema, Storage handlers, List bucketed tabled are 
converted to external tables.
- Location not owned by "hive" user are converted to external table.
- Hive owned ORC format are converted to full ACID transactional table.
- Hive owned Non-ORC format are converted to MM transactional table.

REPL LOAD should apply these rules during bootstrap and incremental phases and 
convert the tables accordingly.


> Support bootstrap and incremental replication to a target with 
> hive.strict.managed.tables enabled.
> --
>
> Key: HIVE-20966
> URL: https://issues.apache.org/jira/browse/HIVE-20966
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: DR
>
> *Requirements:*
> Hive2 supports replication of managed tables. But in Hive3 with 
> hive.strict.managed.tables=true, some of these managed tables are converted 
> to ACID or MM tables. Also, some of them are converted to external tables 
> based on below rules. 
> - Avro format with external schema, Storage handlers, List bucketed tabled 
> are converted to external tables.
> - Location not owned by "hive" user are converted to external table.
> - Hive owned ORC format are converted to full ACID transactional table.
> - Hive owned Non-ORC format are converted to MM transactional table.
> REPL LOAD should apply these rules during bootstrap and incremental phases 
> and convert the tables accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20966) Support bootstrap and incremental replication to a target with hive.strict.managed.tables enabled.

2018-12-03 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20966:

Description: 
*Requirements:*
Hive2 supports replication of managed tables. But in Hive3, some of these 
managed tables are converted to ACID or MM tables. Also, some of them are 
converted to external tables based on below rules. 
- Avro format with external schema, Storage handlers, List bucketed tabled are 
converted to external tables.
- Location not owned by "hive" user are converted to external table.
- Hive owned ORC format are converted to full ACID transactional table.
- Hive owned Non-ORC format are converted to MM transactional table.

REPL LOAD should apply these rules during bootstrap and incremental phases and 
convert the tables accordingly.

  was:
*Requirements:*
Hive2 supports replication of managed tables. But in Hive3, some of these 
managed tables are converted to ACID or MM tables. Also, some of them are 
converted to external tables based on below rules. 
- Avro format with external schema, Storage handlers, List bucketed tabled are 
converted to external tables.
- Location not owned by "hive" user are converted to external table.
- Hive owned ORC format are converted to full ACID transactional table.
- Hive owned Non-ORC format are converted to MM transactional table.
REPL LOAD should apply these rules during bootstrap and incremental phases and 
convert the tables accordingly.


> Support bootstrap and incremental replication to a target with 
> hive.strict.managed.tables enabled.
> --
>
> Key: HIVE-20966
> URL: https://issues.apache.org/jira/browse/HIVE-20966
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: DR
>
> *Requirements:*
> Hive2 supports replication of managed tables. But in Hive3, some of these 
> managed tables are converted to ACID or MM tables. Also, some of them are 
> converted to external tables based on below rules. 
> - Avro format with external schema, Storage handlers, List bucketed tabled 
> are converted to external tables.
> - Location not owned by "hive" user are converted to external table.
> - Hive owned ORC format are converted to full ACID transactional table.
> - Hive owned Non-ORC format are converted to MM transactional table.
> REPL LOAD should apply these rules during bootstrap and incremental phases 
> and convert the tables accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20966) Support bootstrap and incremental replication to a target cluster with hive.strict.managed.tables enabled.

2018-12-03 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20966:

Summary: Support bootstrap and incremental replication to a target cluster 
with hive.strict.managed.tables enabled.  (was: Support incremental replication 
to a target cluster with hive.strict.managed.tables enabled.)

> Support bootstrap and incremental replication to a target cluster with 
> hive.strict.managed.tables enabled.
> --
>
> Key: HIVE-20966
> URL: https://issues.apache.org/jira/browse/HIVE-20966
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: DR
>
> *Requirements:*
> Hive2 supports replication of managed tables. But in Hive3, some of these 
> managed tables are converted to ACID or MM tables. Also, some of them are 
> converted to external tables based on below rules. 
> - Avro format with external schema, Storage handlers, List bucketed tabled 
> are converted to external tables.
> - Location not owned by "hive" user are converted to external table.
> - Hive owned ORC format are converted to full ACID transactional table.
> - Hive owned Non-ORC format are converted to MM transactional table.
> REPL LOAD should apply these rules during bootstrap and incremental phases 
> and convert the tables accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20966) Support incremental replication to a target cluster with hive.strict.managed.tables enabled.

2018-12-03 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20966:

Description: 
*Requirements:*
Hive2 supports replication of managed tables. But in Hive3, some of these 
managed tables are converted to ACID or MM tables. Also, some of them are 
converted to external tables based on below rules. 
- Avro format with external schema, Storage handlers, List bucketed tabled are 
converted to external tables.
- Location not owned by "hive" user are converted to external table.
- Hive owned ORC format are converted to full ACID transactional table.
- Hive owned Non-ORC format are converted to MM transactional table.
REPL LOAD should apply these rules during bootstrap and incremental phases and 
convert the tables accordingly.

  was:
*Requirements:*
 - Support Hive incremental replication with Hive2 as master and Hive3 as slave 
where hive.strict.managed.tables is enabled.
 - The non-ACID managed tables from Hive2 should be converted to appropriate 
ACID or MM tables or to an external table based on Hive3 table type rules.


> Support incremental replication to a target cluster with 
> hive.strict.managed.tables enabled.
> 
>
> Key: HIVE-20966
> URL: https://issues.apache.org/jira/browse/HIVE-20966
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: DR
>
> *Requirements:*
> Hive2 supports replication of managed tables. But in Hive3, some of these 
> managed tables are converted to ACID or MM tables. Also, some of them are 
> converted to external tables based on below rules. 
> - Avro format with external schema, Storage handlers, List bucketed tabled 
> are converted to external tables.
> - Location not owned by "hive" user are converted to external table.
> - Hive owned ORC format are converted to full ACID transactional table.
> - Hive owned Non-ORC format are converted to MM transactional table.
> REPL LOAD should apply these rules during bootstrap and incremental phases 
> and convert the tables accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20897) TestJdbcDriver2#testSelectExecAsync2 fails with result set not present error

2018-11-29 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan reassigned HIVE-20897:
---

Assignee: mahesh kumar behera  (was: Sankar Hariappan)

> TestJdbcDriver2#testSelectExecAsync2 fails with result set not present error
> 
>
> Key: HIVE-20897
> URL: https://issues.apache.org/jira/browse/HIVE-20897
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20897.01.patch, HIVE-20897.02.patch, 
> HIVE-20897.03.patch, HIVE-20897.04.patch, HIVE-20897.05.patch, 
> HIVE-20897.06.patch, HIVE-20897.07.patch, HIVE-20897.08.patch
>
>
> if async prepare is enabled, control will be returned to the client before 
> driver could set of the query has a result set or not. But in current code, 
> while generating the response for the query, it is not checked if the result 
> set field is set or not. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20897) TestJdbcDriver2#testSelectExecAsync2 fails with result set not present error

2018-11-29 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20897:

Attachment: HIVE-20897.08.patch

> TestJdbcDriver2#testSelectExecAsync2 fails with result set not present error
> 
>
> Key: HIVE-20897
> URL: https://issues.apache.org/jira/browse/HIVE-20897
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20897.01.patch, HIVE-20897.02.patch, 
> HIVE-20897.03.patch, HIVE-20897.04.patch, HIVE-20897.05.patch, 
> HIVE-20897.06.patch, HIVE-20897.07.patch, HIVE-20897.08.patch
>
>
> if async prepare is enabled, control will be returned to the client before 
> driver could set of the query has a result set or not. But in current code, 
> while generating the response for the query, it is not checked if the result 
> set field is set or not. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20897) TestJdbcDriver2#testSelectExecAsync2 fails with result set not present error

2018-11-29 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20897:

Status: Patch Available  (was: Open)

Re-attaching same 08.patch as test failure in previous ptest run was flaky.

> TestJdbcDriver2#testSelectExecAsync2 fails with result set not present error
> 
>
> Key: HIVE-20897
> URL: https://issues.apache.org/jira/browse/HIVE-20897
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20897.01.patch, HIVE-20897.02.patch, 
> HIVE-20897.03.patch, HIVE-20897.04.patch, HIVE-20897.05.patch, 
> HIVE-20897.06.patch, HIVE-20897.07.patch, HIVE-20897.08.patch
>
>
> if async prepare is enabled, control will be returned to the client before 
> driver could set of the query has a result set or not. But in current code, 
> while generating the response for the query, it is not checked if the result 
> set field is set or not. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20897) TestJdbcDriver2#testSelectExecAsync2 fails with result set not present error

2018-11-29 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20897:

Assignee: Sankar Hariappan  (was: mahesh kumar behera)
  Status: Open  (was: Patch Available)

> TestJdbcDriver2#testSelectExecAsync2 fails with result set not present error
> 
>
> Key: HIVE-20897
> URL: https://issues.apache.org/jira/browse/HIVE-20897
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20897.01.patch, HIVE-20897.02.patch, 
> HIVE-20897.03.patch, HIVE-20897.04.patch, HIVE-20897.05.patch, 
> HIVE-20897.06.patch, HIVE-20897.07.patch, HIVE-20897.08.patch
>
>
> if async prepare is enabled, control will be returned to the client before 
> driver could set of the query has a result set or not. But in current code, 
> while generating the response for the query, it is not checked if the result 
> set field is set or not. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20897) TestJdbcDriver2#testSelectExecAsync2 fails with result set not present error

2018-11-28 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701854#comment-16701854
 ] 

Sankar Hariappan commented on HIVE-20897:
-

+1 for 07.patch, pending tests

> TestJdbcDriver2#testSelectExecAsync2 fails with result set not present error
> 
>
> Key: HIVE-20897
> URL: https://issues.apache.org/jira/browse/HIVE-20897
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20897.01.patch, HIVE-20897.02.patch, 
> HIVE-20897.03.patch, HIVE-20897.04.patch, HIVE-20897.05.patch, 
> HIVE-20897.06.patch, HIVE-20897.07.patch
>
>
> if async prepare is enabled, control will be returned to the client before 
> driver could set of the query has a result set or not. But in current code, 
> while generating the response for the query, it is not checked if the result 
> set field is set or not. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20884) Bootstrap of tables to target with hive.strict.managed.tables enabled.

2018-11-22 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20884:

Description: 
Hive2 supports replication of managed tables. But in Hive3, some of these 
managed tables are converted to ACID or MM tables. Also, some of them are 
converted to external tables based on below rules. 
 # Avro format with external schema, Storage handlers, List bucketed tabled are 
converted to external tables.
 # Location not owned by "hive" user are converted to external table.
 # Hive owned ORC format are converted to full ACID transactional table.
 # Hive owned Non-ORC format are converted to MM transactional table.

REPL LOAD should apply these rules during bootstrap and convert the tables 
accordingly.

  was:
Hive2 supports replication of managed tables. But in Hive3, some of these 
managed tables are converted to ACID or MM tables. Also, some of them are 
converted to external tables based on below rules. 
 # Avro format, Storage handlers, List bucketed tabled are converted to 
external tables.
 # Location not owned by "hive" user are converted to external table.
 # Hive owned ORC format are converted to full ACID transactional table.
 # Hive owned Non-ORC format are converted to MM transactional table.

REPL LOAD should apply these rules during bootstrap and convert the tables 
accordingly.


> Bootstrap of tables to target with hive.strict.managed.tables enabled.
> --
>
> Key: HIVE-20884
> URL: https://issues.apache.org/jira/browse/HIVE-20884
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR
>
> Hive2 supports replication of managed tables. But in Hive3, some of these 
> managed tables are converted to ACID or MM tables. Also, some of them are 
> converted to external tables based on below rules. 
>  # Avro format with external schema, Storage handlers, List bucketed tabled 
> are converted to external tables.
>  # Location not owned by "hive" user are converted to external table.
>  # Hive owned ORC format are converted to full ACID transactional table.
>  # Hive owned Non-ORC format are converted to MM transactional table.
> REPL LOAD should apply these rules during bootstrap and convert the tables 
> accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19701) getDelegationTokenFromMetaStore doesn't need to be synchronized

2018-11-13 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16685534#comment-16685534
 ] 

Sankar Hariappan commented on HIVE-19701:
-

Thanks [~thejas] for the review!
01.patch is committed to master!

> getDelegationTokenFromMetaStore doesn't need to be synchronized
> ---
>
> Key: HIVE-19701
> URL: https://issues.apache.org/jira/browse/HIVE-19701
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Thejas M Nair
>Assignee: Sankar Hariappan
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-19701.01.patch
>
>
> CLIService.getDelegationTokenFromMetaStore just invokes metastore api via 
> thread local Hive object. So, it doesn't have to be synchronized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19701) getDelegationTokenFromMetaStore doesn't need to be synchronized

2018-11-13 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-19701:

   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

> getDelegationTokenFromMetaStore doesn't need to be synchronized
> ---
>
> Key: HIVE-19701
> URL: https://issues.apache.org/jira/browse/HIVE-19701
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Thejas M Nair
>Assignee: Sankar Hariappan
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-19701.01.patch
>
>
> CLIService.getDelegationTokenFromMetaStore just invokes metastore api via 
> thread local Hive object. So, it doesn't have to be synchronized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19701) getDelegationTokenFromMetaStore doesn't need to be synchronized

2018-11-13 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16685316#comment-16685316
 ] 

Sankar Hariappan commented on HIVE-19701:
-

[~thejas]
Can you please review the patch?

> getDelegationTokenFromMetaStore doesn't need to be synchronized
> ---
>
> Key: HIVE-19701
> URL: https://issues.apache.org/jira/browse/HIVE-19701
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Thejas M Nair
>Assignee: Sankar Hariappan
>Priority: Major
> Attachments: HIVE-19701.01.patch
>
>
> CLIService.getDelegationTokenFromMetaStore just invokes metastore api via 
> thread local Hive object. So, it doesn't have to be synchronized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2018-11-13 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20682:

   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

> Async query execution can potentially fail if shared sessionHive is closed by 
> master thread.
> 
>
> Key: HIVE-20682
> URL: https://issues.apache.org/jira/browse/HIVE-20682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-20682.01.patch, HIVE-20682.02.patch, 
> HIVE-20682.03.patch, HIVE-20682.04.patch, HIVE-20682.05.patch, 
> HIVE-20682.06.patch
>
>
> *Problem description:*
> The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
> class when we open a new session for a client connection and by default all 
> queries from this connection shares the same sessionHive object. 
> If the master thread executes a *synchronous* query, it closes the 
> sessionHive object (referred via thread local hiveDb) if  
> {{Hive.isCompatible}} returns false and sets new Hive object in thread local 
> HiveDb but doesn't change the sessionHive object in the session. Whereas, 
> *asynchronous* query execution via async threads never closes the sessionHive 
> object and it just creates a new one if needed and sets it as their thread 
> local hiveDb.
> So, the problem can happen in the case where an *asynchronous* query is being 
> executed by async threads refers to sessionHive object and the master thread 
> receives a *synchronous* query that closes the same sessionHive object. 
> Also, each query execution overwrites the thread local hiveDb object to 
> sessionHive object which potentially leaks a metastore connection if the 
> previous synchronous query execution re-created the Hive object.
> *Possible Fix:*
> The *sessionHive* object could be shared my multiple threads and so it 
> shouldn't be allowed to be closed by any query execution threads when they 
> re-create the Hive object due to changes in Hive configurations. But the Hive 
> objects created by query execution threads should be closed when the thread 
> exits.
> So, it is proposed to have an *isAllowClose* flag (default: *true*) in Hive 
> object which should be set to *false* for *sessionHive* and would be 
> forcefully closed when the session is closed or released.
> Also, when we reset *sessionHive* object with new one due to changes in 
> *sessionConf*, the old one should be closed when no async thread is referring 
> to it. This can be done using "*finalize*" method of Hive object where we can 
> close HMS connection when Hive object is garbage collected.
> cc [~pvary]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2018-11-13 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16685041#comment-16685041
 ] 

Sankar Hariappan commented on HIVE-20682:
-

Thanks [~anishek] for the review!
The 06.patch is committed to master!

> Async query execution can potentially fail if shared sessionHive is closed by 
> master thread.
> 
>
> Key: HIVE-20682
> URL: https://issues.apache.org/jira/browse/HIVE-20682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-20682.01.patch, HIVE-20682.02.patch, 
> HIVE-20682.03.patch, HIVE-20682.04.patch, HIVE-20682.05.patch, 
> HIVE-20682.06.patch
>
>
> *Problem description:*
> The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
> class when we open a new session for a client connection and by default all 
> queries from this connection shares the same sessionHive object. 
> If the master thread executes a *synchronous* query, it closes the 
> sessionHive object (referred via thread local hiveDb) if  
> {{Hive.isCompatible}} returns false and sets new Hive object in thread local 
> HiveDb but doesn't change the sessionHive object in the session. Whereas, 
> *asynchronous* query execution via async threads never closes the sessionHive 
> object and it just creates a new one if needed and sets it as their thread 
> local hiveDb.
> So, the problem can happen in the case where an *asynchronous* query is being 
> executed by async threads refers to sessionHive object and the master thread 
> receives a *synchronous* query that closes the same sessionHive object. 
> Also, each query execution overwrites the thread local hiveDb object to 
> sessionHive object which potentially leaks a metastore connection if the 
> previous synchronous query execution re-created the Hive object.
> *Possible Fix:*
> The *sessionHive* object could be shared my multiple threads and so it 
> shouldn't be allowed to be closed by any query execution threads when they 
> re-create the Hive object due to changes in Hive configurations. But the Hive 
> objects created by query execution threads should be closed when the thread 
> exits.
> So, it is proposed to have an *isAllowClose* flag (default: *true*) in Hive 
> object which should be set to *false* for *sessionHive* and would be 
> forcefully closed when the session is closed or released.
> Also, when we reset *sessionHive* object with new one due to changes in 
> *sessionConf*, the old one should be closed when no async thread is referring 
> to it. This can be done using "*finalize*" method of Hive object where we can 
> close HMS connection when Hive object is garbage collected.
> cc [~pvary]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19701) getDelegationTokenFromMetaStore doesn't need to be synchronized

2018-11-13 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-19701:

Status: Patch Available  (was: Open)

> getDelegationTokenFromMetaStore doesn't need to be synchronized
> ---
>
> Key: HIVE-19701
> URL: https://issues.apache.org/jira/browse/HIVE-19701
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Thejas M Nair
>Assignee: Sankar Hariappan
>Priority: Major
> Attachments: HIVE-19701.01.patch
>
>
> CLIService.getDelegationTokenFromMetaStore just invokes metastore api via 
> thread local Hive object. So, it doesn't have to be synchronized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19701) getDelegationTokenFromMetaStore doesn't need to be synchronized

2018-11-13 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan reassigned HIVE-19701:
---

Assignee: Sankar Hariappan

> getDelegationTokenFromMetaStore doesn't need to be synchronized
> ---
>
> Key: HIVE-19701
> URL: https://issues.apache.org/jira/browse/HIVE-19701
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Sankar Hariappan
>Priority: Major
>
> or so it seems



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19701) getDelegationTokenFromMetaStore doesn't need to be synchronized

2018-11-13 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-19701:

Description: CLIService.getDelegationTokenFromMetaStore just invokes 
metastore api via thread local Hive object. So, it doesn't have to be 
synchronized.  (was: or so it seems)

> getDelegationTokenFromMetaStore doesn't need to be synchronized
> ---
>
> Key: HIVE-19701
> URL: https://issues.apache.org/jira/browse/HIVE-19701
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Sankar Hariappan
>Priority: Major
>
> CLIService.getDelegationTokenFromMetaStore just invokes metastore api via 
> thread local Hive object. So, it doesn't have to be synchronized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19701) getDelegationTokenFromMetaStore doesn't need to be synchronized

2018-11-13 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-19701:

Component/s: HiveServer2

> getDelegationTokenFromMetaStore doesn't need to be synchronized
> ---
>
> Key: HIVE-19701
> URL: https://issues.apache.org/jira/browse/HIVE-19701
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Thejas M Nair
>Assignee: Sankar Hariappan
>Priority: Major
> Attachments: HIVE-19701.01.patch
>
>
> CLIService.getDelegationTokenFromMetaStore just invokes metastore api via 
> thread local Hive object. So, it doesn't have to be synchronized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19701) getDelegationTokenFromMetaStore doesn't need to be synchronized

2018-11-13 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-19701:

Attachment: HIVE-19701.01.patch

> getDelegationTokenFromMetaStore doesn't need to be synchronized
> ---
>
> Key: HIVE-19701
> URL: https://issues.apache.org/jira/browse/HIVE-19701
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Thejas M Nair
>Assignee: Sankar Hariappan
>Priority: Major
> Attachments: HIVE-19701.01.patch
>
>
> CLIService.getDelegationTokenFromMetaStore just invokes metastore api via 
> thread local Hive object. So, it doesn't have to be synchronized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19701) getDelegationTokenFromMetaStore doesn't need to be synchronized

2018-11-13 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-19701:

Affects Version/s: 4.0.0

> getDelegationTokenFromMetaStore doesn't need to be synchronized
> ---
>
> Key: HIVE-19701
> URL: https://issues.apache.org/jira/browse/HIVE-19701
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Thejas M Nair
>Assignee: Sankar Hariappan
>Priority: Major
> Attachments: HIVE-19701.01.patch
>
>
> CLIService.getDelegationTokenFromMetaStore just invokes metastore api via 
> thread local Hive object. So, it doesn't have to be synchronized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2018-11-12 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16683869#comment-16683869
 ] 

Sankar Hariappan commented on HIVE-20682:
-

Thanks for the review [~maheshk114]!

[~daijy], [~pvary], [~anishek], [~thejas]
Can you please review and +1 the patch?

> Async query execution can potentially fail if shared sessionHive is closed by 
> master thread.
> 
>
> Key: HIVE-20682
> URL: https://issues.apache.org/jira/browse/HIVE-20682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20682.01.patch, HIVE-20682.02.patch, 
> HIVE-20682.03.patch, HIVE-20682.04.patch, HIVE-20682.05.patch, 
> HIVE-20682.06.patch
>
>
> *Problem description:*
> The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
> class when we open a new session for a client connection and by default all 
> queries from this connection shares the same sessionHive object. 
> If the master thread executes a *synchronous* query, it closes the 
> sessionHive object (referred via thread local hiveDb) if  
> {{Hive.isCompatible}} returns false and sets new Hive object in thread local 
> HiveDb but doesn't change the sessionHive object in the session. Whereas, 
> *asynchronous* query execution via async threads never closes the sessionHive 
> object and it just creates a new one if needed and sets it as their thread 
> local hiveDb.
> So, the problem can happen in the case where an *asynchronous* query is being 
> executed by async threads refers to sessionHive object and the master thread 
> receives a *synchronous* query that closes the same sessionHive object. 
> Also, each query execution overwrites the thread local hiveDb object to 
> sessionHive object which potentially leaks a metastore connection if the 
> previous synchronous query execution re-created the Hive object.
> *Possible Fix:*
> The *sessionHive* object could be shared my multiple threads and so it 
> shouldn't be allowed to be closed by any query execution threads when they 
> re-create the Hive object due to changes in Hive configurations. But the Hive 
> objects created by query execution threads should be closed when the thread 
> exits.
> So, it is proposed to have an *isAllowClose* flag (default: *true*) in Hive 
> object which should be set to *false* for *sessionHive* and would be 
> forcefully closed when the session is closed or released.
> Also, when we reset *sessionHive* object with new one due to changes in 
> *sessionConf*, the old one should be closed when no async thread is referring 
> to it. This can be done using "*finalize*" method of Hive object where we can 
> close HMS connection when Hive object is garbage collected.
> cc [~pvary]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2018-11-11 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20682:

Attachment: HIVE-20682.06.patch

> Async query execution can potentially fail if shared sessionHive is closed by 
> master thread.
> 
>
> Key: HIVE-20682
> URL: https://issues.apache.org/jira/browse/HIVE-20682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20682.01.patch, HIVE-20682.02.patch, 
> HIVE-20682.03.patch, HIVE-20682.04.patch, HIVE-20682.05.patch, 
> HIVE-20682.06.patch
>
>
> *Problem description:*
> The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
> class when we open a new session for a client connection and by default all 
> queries from this connection shares the same sessionHive object. 
> If the master thread executes a *synchronous* query, it closes the 
> sessionHive object (referred via thread local hiveDb) if  
> {{Hive.isCompatible}} returns false and sets new Hive object in thread local 
> HiveDb but doesn't change the sessionHive object in the session. Whereas, 
> *asynchronous* query execution via async threads never closes the sessionHive 
> object and it just creates a new one if needed and sets it as their thread 
> local hiveDb.
> So, the problem can happen in the case where an *asynchronous* query is being 
> executed by async threads refers to sessionHive object and the master thread 
> receives a *synchronous* query that closes the same sessionHive object. 
> Also, each query execution overwrites the thread local hiveDb object to 
> sessionHive object which potentially leaks a metastore connection if the 
> previous synchronous query execution re-created the Hive object.
> *Possible Fix:*
> The *sessionHive* object could be shared my multiple threads and so it 
> shouldn't be allowed to be closed by any query execution threads when they 
> re-create the Hive object due to changes in Hive configurations. But the Hive 
> objects created by query execution threads should be closed when the thread 
> exits.
> So, it is proposed to have an *isAllowClose* flag (default: *true*) in Hive 
> object which should be set to *false* for *sessionHive* and would be 
> forcefully closed when the session is closed or released.
> Also, when we reset *sessionHive* object with new one due to changes in 
> *sessionConf*, the old one should be closed when no async thread is referring 
> to it. This can be done using "*finalize*" method of Hive object where we can 
> close HMS connection when Hive object is garbage collected.
> cc [~pvary]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2018-11-11 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20682:

Status: Patch Available  (was: Open)

Re-attaching same patch as failed test was flaky and passing locally.

> Async query execution can potentially fail if shared sessionHive is closed by 
> master thread.
> 
>
> Key: HIVE-20682
> URL: https://issues.apache.org/jira/browse/HIVE-20682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20682.01.patch, HIVE-20682.02.patch, 
> HIVE-20682.03.patch, HIVE-20682.04.patch, HIVE-20682.05.patch, 
> HIVE-20682.06.patch
>
>
> *Problem description:*
> The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
> class when we open a new session for a client connection and by default all 
> queries from this connection shares the same sessionHive object. 
> If the master thread executes a *synchronous* query, it closes the 
> sessionHive object (referred via thread local hiveDb) if  
> {{Hive.isCompatible}} returns false and sets new Hive object in thread local 
> HiveDb but doesn't change the sessionHive object in the session. Whereas, 
> *asynchronous* query execution via async threads never closes the sessionHive 
> object and it just creates a new one if needed and sets it as their thread 
> local hiveDb.
> So, the problem can happen in the case where an *asynchronous* query is being 
> executed by async threads refers to sessionHive object and the master thread 
> receives a *synchronous* query that closes the same sessionHive object. 
> Also, each query execution overwrites the thread local hiveDb object to 
> sessionHive object which potentially leaks a metastore connection if the 
> previous synchronous query execution re-created the Hive object.
> *Possible Fix:*
> The *sessionHive* object could be shared my multiple threads and so it 
> shouldn't be allowed to be closed by any query execution threads when they 
> re-create the Hive object due to changes in Hive configurations. But the Hive 
> objects created by query execution threads should be closed when the thread 
> exits.
> So, it is proposed to have an *isAllowClose* flag (default: *true*) in Hive 
> object which should be set to *false* for *sessionHive* and would be 
> forcefully closed when the session is closed or released.
> Also, when we reset *sessionHive* object with new one due to changes in 
> *sessionConf*, the old one should be closed when no async thread is referring 
> to it. This can be done using "*finalize*" method of Hive object where we can 
> close HMS connection when Hive object is garbage collected.
> cc [~pvary]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2018-11-11 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20682:

Attachment: (was: HIVE-20682.06.patch)

> Async query execution can potentially fail if shared sessionHive is closed by 
> master thread.
> 
>
> Key: HIVE-20682
> URL: https://issues.apache.org/jira/browse/HIVE-20682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20682.01.patch, HIVE-20682.02.patch, 
> HIVE-20682.03.patch, HIVE-20682.04.patch, HIVE-20682.05.patch, 
> HIVE-20682.06.patch
>
>
> *Problem description:*
> The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
> class when we open a new session for a client connection and by default all 
> queries from this connection shares the same sessionHive object. 
> If the master thread executes a *synchronous* query, it closes the 
> sessionHive object (referred via thread local hiveDb) if  
> {{Hive.isCompatible}} returns false and sets new Hive object in thread local 
> HiveDb but doesn't change the sessionHive object in the session. Whereas, 
> *asynchronous* query execution via async threads never closes the sessionHive 
> object and it just creates a new one if needed and sets it as their thread 
> local hiveDb.
> So, the problem can happen in the case where an *asynchronous* query is being 
> executed by async threads refers to sessionHive object and the master thread 
> receives a *synchronous* query that closes the same sessionHive object. 
> Also, each query execution overwrites the thread local hiveDb object to 
> sessionHive object which potentially leaks a metastore connection if the 
> previous synchronous query execution re-created the Hive object.
> *Possible Fix:*
> The *sessionHive* object could be shared my multiple threads and so it 
> shouldn't be allowed to be closed by any query execution threads when they 
> re-create the Hive object due to changes in Hive configurations. But the Hive 
> objects created by query execution threads should be closed when the thread 
> exits.
> So, it is proposed to have an *isAllowClose* flag (default: *true*) in Hive 
> object which should be set to *false* for *sessionHive* and would be 
> forcefully closed when the session is closed or released.
> Also, when we reset *sessionHive* object with new one due to changes in 
> *sessionConf*, the old one should be closed when no async thread is referring 
> to it. This can be done using "*finalize*" method of Hive object where we can 
> close HMS connection when Hive object is garbage collected.
> cc [~pvary]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2018-11-11 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20682:

Status: Open  (was: Patch Available)

> Async query execution can potentially fail if shared sessionHive is closed by 
> master thread.
> 
>
> Key: HIVE-20682
> URL: https://issues.apache.org/jira/browse/HIVE-20682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20682.01.patch, HIVE-20682.02.patch, 
> HIVE-20682.03.patch, HIVE-20682.04.patch, HIVE-20682.05.patch, 
> HIVE-20682.06.patch
>
>
> *Problem description:*
> The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
> class when we open a new session for a client connection and by default all 
> queries from this connection shares the same sessionHive object. 
> If the master thread executes a *synchronous* query, it closes the 
> sessionHive object (referred via thread local hiveDb) if  
> {{Hive.isCompatible}} returns false and sets new Hive object in thread local 
> HiveDb but doesn't change the sessionHive object in the session. Whereas, 
> *asynchronous* query execution via async threads never closes the sessionHive 
> object and it just creates a new one if needed and sets it as their thread 
> local hiveDb.
> So, the problem can happen in the case where an *asynchronous* query is being 
> executed by async threads refers to sessionHive object and the master thread 
> receives a *synchronous* query that closes the same sessionHive object. 
> Also, each query execution overwrites the thread local hiveDb object to 
> sessionHive object which potentially leaks a metastore connection if the 
> previous synchronous query execution re-created the Hive object.
> *Possible Fix:*
> The *sessionHive* object could be shared my multiple threads and so it 
> shouldn't be allowed to be closed by any query execution threads when they 
> re-create the Hive object due to changes in Hive configurations. But the Hive 
> objects created by query execution threads should be closed when the thread 
> exits.
> So, it is proposed to have an *isAllowClose* flag (default: *true*) in Hive 
> object which should be set to *false* for *sessionHive* and would be 
> forcefully closed when the session is closed or released.
> Also, when we reset *sessionHive* object with new one due to changes in 
> *sessionConf*, the old one should be closed when no async thread is referring 
> to it. This can be done using "*finalize*" method of Hive object where we can 
> close HMS connection when Hive object is garbage collected.
> cc [~pvary]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2018-11-07 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20682:

Status: Open  (was: Patch Available)

> Async query execution can potentially fail if shared sessionHive is closed by 
> master thread.
> 
>
> Key: HIVE-20682
> URL: https://issues.apache.org/jira/browse/HIVE-20682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20682.01.patch, HIVE-20682.02.patch, 
> HIVE-20682.03.patch, HIVE-20682.04.patch, HIVE-20682.05.patch
>
>
> *Problem description:*
> The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
> class when we open a new session for a client connection and by default all 
> queries from this connection shares the same sessionHive object. 
> If the master thread executes a *synchronous* query, it closes the 
> sessionHive object (referred via thread local hiveDb) if  
> {{Hive.isCompatible}} returns false and sets new Hive object in thread local 
> HiveDb but doesn't change the sessionHive object in the session. Whereas, 
> *asynchronous* query execution via async threads never closes the sessionHive 
> object and it just creates a new one if needed and sets it as their thread 
> local hiveDb.
> So, the problem can happen in the case where an *asynchronous* query is being 
> executed by async threads refers to sessionHive object and the master thread 
> receives a *synchronous* query that closes the same sessionHive object. 
> Also, each query execution overwrites the thread local hiveDb object to 
> sessionHive object which potentially leaks a metastore connection if the 
> previous synchronous query execution re-created the Hive object.
> *Possible Fix:*
> The *sessionHive* object could be shared my multiple threads and so it 
> shouldn't be allowed to be closed by any query execution threads when they 
> re-create the Hive object due to changes in Hive configurations. But the Hive 
> objects created by query execution threads should be closed when the thread 
> exits.
> So, it is proposed to have an *isAllowClose* flag (default: *true*) in Hive 
> object which should be set to *false* for *sessionHive* and would be 
> forcefully closed when the session is closed or released.
> Also, when we reset *sessionHive* object with new one due to changes in 
> *sessionConf*, the old one should be closed when no async thread is referring 
> to it. This can be done using "*finalize*" method of Hive object where we can 
> close HMS connection when Hive object is garbage collected.
> cc [~pvary]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2018-11-07 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20682:

Status: Patch Available  (was: Open)

> Async query execution can potentially fail if shared sessionHive is closed by 
> master thread.
> 
>
> Key: HIVE-20682
> URL: https://issues.apache.org/jira/browse/HIVE-20682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20682.01.patch, HIVE-20682.02.patch, 
> HIVE-20682.03.patch, HIVE-20682.04.patch, HIVE-20682.05.patch, 
> HIVE-20682.06.patch
>
>
> *Problem description:*
> The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
> class when we open a new session for a client connection and by default all 
> queries from this connection shares the same sessionHive object. 
> If the master thread executes a *synchronous* query, it closes the 
> sessionHive object (referred via thread local hiveDb) if  
> {{Hive.isCompatible}} returns false and sets new Hive object in thread local 
> HiveDb but doesn't change the sessionHive object in the session. Whereas, 
> *asynchronous* query execution via async threads never closes the sessionHive 
> object and it just creates a new one if needed and sets it as their thread 
> local hiveDb.
> So, the problem can happen in the case where an *asynchronous* query is being 
> executed by async threads refers to sessionHive object and the master thread 
> receives a *synchronous* query that closes the same sessionHive object. 
> Also, each query execution overwrites the thread local hiveDb object to 
> sessionHive object which potentially leaks a metastore connection if the 
> previous synchronous query execution re-created the Hive object.
> *Possible Fix:*
> The *sessionHive* object could be shared my multiple threads and so it 
> shouldn't be allowed to be closed by any query execution threads when they 
> re-create the Hive object due to changes in Hive configurations. But the Hive 
> objects created by query execution threads should be closed when the thread 
> exits.
> So, it is proposed to have an *isAllowClose* flag (default: *true*) in Hive 
> object which should be set to *false* for *sessionHive* and would be 
> forcefully closed when the session is closed or released.
> Also, when we reset *sessionHive* object with new one due to changes in 
> *sessionConf*, the old one should be closed when no async thread is referring 
> to it. This can be done using "*finalize*" method of Hive object where we can 
> close HMS connection when Hive object is garbage collected.
> cc [~pvary]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2018-11-07 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20682:

Attachment: (was: HIVE-20682.06.patch)

> Async query execution can potentially fail if shared sessionHive is closed by 
> master thread.
> 
>
> Key: HIVE-20682
> URL: https://issues.apache.org/jira/browse/HIVE-20682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20682.01.patch, HIVE-20682.02.patch, 
> HIVE-20682.03.patch, HIVE-20682.04.patch, HIVE-20682.05.patch
>
>
> *Problem description:*
> The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
> class when we open a new session for a client connection and by default all 
> queries from this connection shares the same sessionHive object. 
> If the master thread executes a *synchronous* query, it closes the 
> sessionHive object (referred via thread local hiveDb) if  
> {{Hive.isCompatible}} returns false and sets new Hive object in thread local 
> HiveDb but doesn't change the sessionHive object in the session. Whereas, 
> *asynchronous* query execution via async threads never closes the sessionHive 
> object and it just creates a new one if needed and sets it as their thread 
> local hiveDb.
> So, the problem can happen in the case where an *asynchronous* query is being 
> executed by async threads refers to sessionHive object and the master thread 
> receives a *synchronous* query that closes the same sessionHive object. 
> Also, each query execution overwrites the thread local hiveDb object to 
> sessionHive object which potentially leaks a metastore connection if the 
> previous synchronous query execution re-created the Hive object.
> *Possible Fix:*
> The *sessionHive* object could be shared my multiple threads and so it 
> shouldn't be allowed to be closed by any query execution threads when they 
> re-create the Hive object due to changes in Hive configurations. But the Hive 
> objects created by query execution threads should be closed when the thread 
> exits.
> So, it is proposed to have an *isAllowClose* flag (default: *true*) in Hive 
> object which should be set to *false* for *sessionHive* and would be 
> forcefully closed when the session is closed or released.
> Also, when we reset *sessionHive* object with new one due to changes in 
> *sessionConf*, the old one should be closed when no async thread is referring 
> to it. This can be done using "*finalize*" method of Hive object where we can 
> close HMS connection when Hive object is garbage collected.
> cc [~pvary]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


<    6   7   8   9   10   11   12   13   14   15   >