[jira] [Commented] (HIVE-19994) Impala "drop table" fails with Hive Metastore exception

2019-07-30 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896492#comment-16896492
 ] 

Karthik Manamcheri commented on HIVE-19994:
---

Adding the foreign-key line to package.jdo should not affect functionality. It 
would affect performance because data nucleus cannot optimize effectively.

> Impala "drop table" fails with Hive Metastore exception
> ---
>
> Key: HIVE-19994
> URL: https://issues.apache.org/jira/browse/HIVE-19994
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.1.0
> Environment: Hadoop distribution: CHD 5.14.2
> Hive version:  1.1.0-cdh5.14.2
> Impala version: 2.11.0
> Kudu version: 1.6.0
>  
>Reporter: Rodion Myronov
>Assignee: Karthik Manamcheri
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-19994.1.patch, metastore_exception.txt
>
>
> "drop table" statement in Impala shell fails with the following exception:
> {{ImpalaRuntimeException: Error making 'dropTable' RPC to Hive Metastore: 
> CAUSED BY: MetaException: One or more instances could not be deleted}}
>  
> Metastore log file shows that "DELETE FROM `PARTITION_KEYS` WHERE `TBL_ID`=?" 
> statement fails because of foreign key violation (full stacktrace will be 
> added):
> {{Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails 
> ("hivemetastore_emtig3vtq7qp1tiooo07sb70ud"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID"))}}
>  
> The table is created and then dropped as a part of ETL process executed every 
> hour. Most of the time it works fine, the issue is not reproducible at will.
> Table creation script is:
> {{CREATE TABLE IF NOT EXISTS price_advisor_ouput.t_switching_coef_source}}
> {{( }}
> {{...fields here...}}
> {{PRIMARY KEY (...PK field here...)}}
> {{)}}
> {{PARTITION BY HASH(matrix_pcd) PARTITIONS 3}}
> {{STORED AS KUDU;}}
>  
> Not sure how to approach diagnostics and fix, so any input will be really 
> appreciated. 
> Thanks in advance, 
> Rodion Myronov



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HIVE-19994) Impala "drop table" fails with Hive Metastore exception

2019-07-29 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895535#comment-16895535
 ] 

Karthik Manamcheri commented on HIVE-19994:
---

[~Symious] You are correct that the name of the foreign-key does matter. Can 
you try adding
{code:java}
{code}
to your package.jdo?

> Impala "drop table" fails with Hive Metastore exception
> ---
>
> Key: HIVE-19994
> URL: https://issues.apache.org/jira/browse/HIVE-19994
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.1.0
> Environment: Hadoop distribution: CHD 5.14.2
> Hive version:  1.1.0-cdh5.14.2
> Impala version: 2.11.0
> Kudu version: 1.6.0
>  
>Reporter: Rodion Myronov
>Assignee: Karthik Manamcheri
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-19994.1.patch, metastore_exception.txt
>
>
> "drop table" statement in Impala shell fails with the following exception:
> {{ImpalaRuntimeException: Error making 'dropTable' RPC to Hive Metastore: 
> CAUSED BY: MetaException: One or more instances could not be deleted}}
>  
> Metastore log file shows that "DELETE FROM `PARTITION_KEYS` WHERE `TBL_ID`=?" 
> statement fails because of foreign key violation (full stacktrace will be 
> added):
> {{Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails 
> ("hivemetastore_emtig3vtq7qp1tiooo07sb70ud"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID"))}}
>  
> The table is created and then dropped as a part of ETL process executed every 
> hour. Most of the time it works fine, the issue is not reproducible at will.
> Table creation script is:
> {{CREATE TABLE IF NOT EXISTS price_advisor_ouput.t_switching_coef_source}}
> {{( }}
> {{...fields here...}}
> {{PRIMARY KEY (...PK field here...)}}
> {{)}}
> {{PARTITION BY HASH(matrix_pcd) PARTITIONS 3}}
> {{STORED AS KUDU;}}
>  
> Not sure how to approach diagnostics and fix, so any input will be really 
> appreciated. 
> Thanks in advance, 
> Rodion Myronov



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HIVE-19994) Impala "drop table" fails with Hive Metastore exception

2019-05-29 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850999#comment-16850999
 ] 

Karthik Manamcheri commented on HIVE-19994:
---

Code Review by [~ngangam] and [~holleyism]. 

Naveen, can you merge this? Thanks.

> Impala "drop table" fails with Hive Metastore exception
> ---
>
> Key: HIVE-19994
> URL: https://issues.apache.org/jira/browse/HIVE-19994
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.1.0
> Environment: Hadoop distribution: CHD 5.14.2
> Hive version:  1.1.0-cdh5.14.2
> Impala version: 2.11.0
> Kudu version: 1.6.0
>  
>Reporter: Rodion Myronov
>Assignee: Karthik Manamcheri
>Priority: Major
> Attachments: HIVE-19994.1.patch, metastore_exception.txt
>
>
> "drop table" statement in Impala shell fails with the following exception:
> {{ImpalaRuntimeException: Error making 'dropTable' RPC to Hive Metastore: 
> CAUSED BY: MetaException: One or more instances could not be deleted}}
>  
> Metastore log file shows that "DELETE FROM `PARTITION_KEYS` WHERE `TBL_ID`=?" 
> statement fails because of foreign key violation (full stacktrace will be 
> added):
> {{Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails 
> ("hivemetastore_emtig3vtq7qp1tiooo07sb70ud"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID"))}}
>  
> The table is created and then dropped as a part of ETL process executed every 
> hour. Most of the time it works fine, the issue is not reproducible at will.
> Table creation script is:
> {{CREATE TABLE IF NOT EXISTS price_advisor_ouput.t_switching_coef_source}}
> {{( }}
> {{...fields here...}}
> {{PRIMARY KEY (...PK field here...)}}
> {{)}}
> {{PARTITION BY HASH(matrix_pcd) PARTITIONS 3}}
> {{STORED AS KUDU;}}
>  
> Not sure how to approach diagnostics and fix, so any input will be really 
> appreciated. 
> Thanks in advance, 
> Rodion Myronov



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19994) Impala "drop table" fails with Hive Metastore exception

2019-05-28 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-19994:
--
Attachment: HIVE-19994.1.patch

> Impala "drop table" fails with Hive Metastore exception
> ---
>
> Key: HIVE-19994
> URL: https://issues.apache.org/jira/browse/HIVE-19994
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.1.0
> Environment: Hadoop distribution: CHD 5.14.2
> Hive version:  1.1.0-cdh5.14.2
> Impala version: 2.11.0
> Kudu version: 1.6.0
>  
>Reporter: Rodion Myronov
>Assignee: Karthik Manamcheri
>Priority: Major
> Attachments: HIVE-19994.1.patch, metastore_exception.txt
>
>
> "drop table" statement in Impala shell fails with the following exception:
> {{ImpalaRuntimeException: Error making 'dropTable' RPC to Hive Metastore: 
> CAUSED BY: MetaException: One or more instances could not be deleted}}
>  
> Metastore log file shows that "DELETE FROM `PARTITION_KEYS` WHERE `TBL_ID`=?" 
> statement fails because of foreign key violation (full stacktrace will be 
> added):
> {{Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails 
> ("hivemetastore_emtig3vtq7qp1tiooo07sb70ud"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID"))}}
>  
> The table is created and then dropped as a part of ETL process executed every 
> hour. Most of the time it works fine, the issue is not reproducible at will.
> Table creation script is:
> {{CREATE TABLE IF NOT EXISTS price_advisor_ouput.t_switching_coef_source}}
> {{( }}
> {{...fields here...}}
> {{PRIMARY KEY (...PK field here...)}}
> {{)}}
> {{PARTITION BY HASH(matrix_pcd) PARTITIONS 3}}
> {{STORED AS KUDU;}}
>  
> Not sure how to approach diagnostics and fix, so any input will be really 
> appreciated. 
> Thanks in advance, 
> Rodion Myronov



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19994) Impala "drop table" fails with Hive Metastore exception

2019-05-28 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-19994:
--
Status: Patch Available  (was: In Progress)

> Impala "drop table" fails with Hive Metastore exception
> ---
>
> Key: HIVE-19994
> URL: https://issues.apache.org/jira/browse/HIVE-19994
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.1.0
> Environment: Hadoop distribution: CHD 5.14.2
> Hive version:  1.1.0-cdh5.14.2
> Impala version: 2.11.0
> Kudu version: 1.6.0
>  
>Reporter: Rodion Myronov
>Assignee: Karthik Manamcheri
>Priority: Major
> Attachments: HIVE-19994.1.patch, metastore_exception.txt
>
>
> "drop table" statement in Impala shell fails with the following exception:
> {{ImpalaRuntimeException: Error making 'dropTable' RPC to Hive Metastore: 
> CAUSED BY: MetaException: One or more instances could not be deleted}}
>  
> Metastore log file shows that "DELETE FROM `PARTITION_KEYS` WHERE `TBL_ID`=?" 
> statement fails because of foreign key violation (full stacktrace will be 
> added):
> {{Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails 
> ("hivemetastore_emtig3vtq7qp1tiooo07sb70ud"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID"))}}
>  
> The table is created and then dropped as a part of ETL process executed every 
> hour. Most of the time it works fine, the issue is not reproducible at will.
> Table creation script is:
> {{CREATE TABLE IF NOT EXISTS price_advisor_ouput.t_switching_coef_source}}
> {{( }}
> {{...fields here...}}
> {{PRIMARY KEY (...PK field here...)}}
> {{)}}
> {{PARTITION BY HASH(matrix_pcd) PARTITIONS 3}}
> {{STORED AS KUDU;}}
>  
> Not sure how to approach diagnostics and fix, so any input will be really 
> appreciated. 
> Thanks in advance, 
> Rodion Myronov



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HIVE-19994) Impala "drop table" fails with Hive Metastore exception

2019-05-28 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-19994 started by Karthik Manamcheri.
-
> Impala "drop table" fails with Hive Metastore exception
> ---
>
> Key: HIVE-19994
> URL: https://issues.apache.org/jira/browse/HIVE-19994
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.1.0
> Environment: Hadoop distribution: CHD 5.14.2
> Hive version:  1.1.0-cdh5.14.2
> Impala version: 2.11.0
> Kudu version: 1.6.0
>  
>Reporter: Rodion Myronov
>Assignee: Karthik Manamcheri
>Priority: Major
> Attachments: metastore_exception.txt
>
>
> "drop table" statement in Impala shell fails with the following exception:
> {{ImpalaRuntimeException: Error making 'dropTable' RPC to Hive Metastore: 
> CAUSED BY: MetaException: One or more instances could not be deleted}}
>  
> Metastore log file shows that "DELETE FROM `PARTITION_KEYS` WHERE `TBL_ID`=?" 
> statement fails because of foreign key violation (full stacktrace will be 
> added):
> {{Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails 
> ("hivemetastore_emtig3vtq7qp1tiooo07sb70ud"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID"))}}
>  
> The table is created and then dropped as a part of ETL process executed every 
> hour. Most of the time it works fine, the issue is not reproducible at will.
> Table creation script is:
> {{CREATE TABLE IF NOT EXISTS price_advisor_ouput.t_switching_coef_source}}
> {{( }}
> {{...fields here...}}
> {{PRIMARY KEY (...PK field here...)}}
> {{)}}
> {{PARTITION BY HASH(matrix_pcd) PARTITIONS 3}}
> {{STORED AS KUDU;}}
>  
> Not sure how to approach diagnostics and fix, so any input will be really 
> appreciated. 
> Thanks in advance, 
> Rodion Myronov



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19994) Impala "drop table" fails with Hive Metastore exception

2019-05-28 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri reassigned HIVE-19994:
-

Assignee: Karthik Manamcheri

> Impala "drop table" fails with Hive Metastore exception
> ---
>
> Key: HIVE-19994
> URL: https://issues.apache.org/jira/browse/HIVE-19994
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.1.0
> Environment: Hadoop distribution: CHD 5.14.2
> Hive version:  1.1.0-cdh5.14.2
> Impala version: 2.11.0
> Kudu version: 1.6.0
>  
>Reporter: Rodion Myronov
>Assignee: Karthik Manamcheri
>Priority: Major
> Attachments: metastore_exception.txt
>
>
> "drop table" statement in Impala shell fails with the following exception:
> {{ImpalaRuntimeException: Error making 'dropTable' RPC to Hive Metastore: 
> CAUSED BY: MetaException: One or more instances could not be deleted}}
>  
> Metastore log file shows that "DELETE FROM `PARTITION_KEYS` WHERE `TBL_ID`=?" 
> statement fails because of foreign key violation (full stacktrace will be 
> added):
> {{Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails 
> ("hivemetastore_emtig3vtq7qp1tiooo07sb70ud"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID"))}}
>  
> The table is created and then dropped as a part of ETL process executed every 
> hour. Most of the time it works fine, the issue is not reproducible at will.
> Table creation script is:
> {{CREATE TABLE IF NOT EXISTS price_advisor_ouput.t_switching_coef_source}}
> {{( }}
> {{...fields here...}}
> {{PRIMARY KEY (...PK field here...)}}
> {{)}}
> {{PARTITION BY HASH(matrix_pcd) PARTITIONS 3}}
> {{STORED AS KUDU;}}
>  
> Not sure how to approach diagnostics and fix, so any input will be really 
> appreciated. 
> Thanks in advance, 
> Rodion Myronov



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19994) Impala "drop table" fails with Hive Metastore exception

2019-05-28 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850147#comment-16850147
 ] 

Karthik Manamcheri commented on HIVE-19994:
---

I think I understand what is happening, but I don't know why it is happening! 
Basically in my logs I can see that the table delete is proceeding normally 
until it tries to remove the MColumnDescriptor. Removal of MColumnDescriptor 
takes two SQL queries
 * DELETE FROM `COLUMNS_V2` WHERE `CD_ID`=757869
 DELETE FROM `CDS` WHERE `CD_ID`=757869
 * COLUMNS_V2 and CDS are related by a foreign key and the error happens here.

My team's current thought is that the problem resides in the datanucleus layer 
and it's handling of the persistence during delete. One of the fixes we tried 
and it seems to help is to add "" to package.jdo and this forces the datanuclues layer 
to deal with the foreign key constraint. If you don't have this in package.jdo, 
it delegates the foreign key checks to the underlying db.

> Impala "drop table" fails with Hive Metastore exception
> ---
>
> Key: HIVE-19994
> URL: https://issues.apache.org/jira/browse/HIVE-19994
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.1.0
> Environment: Hadoop distribution: CHD 5.14.2
> Hive version:  1.1.0-cdh5.14.2
> Impala version: 2.11.0
> Kudu version: 1.6.0
>  
>Reporter: Rodion Myronov
>Priority: Major
> Attachments: metastore_exception.txt
>
>
> "drop table" statement in Impala shell fails with the following exception:
> {{ImpalaRuntimeException: Error making 'dropTable' RPC to Hive Metastore: 
> CAUSED BY: MetaException: One or more instances could not be deleted}}
>  
> Metastore log file shows that "DELETE FROM `PARTITION_KEYS` WHERE `TBL_ID`=?" 
> statement fails because of foreign key violation (full stacktrace will be 
> added):
> {{Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails 
> ("hivemetastore_emtig3vtq7qp1tiooo07sb70ud"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID"))}}
>  
> The table is created and then dropped as a part of ETL process executed every 
> hour. Most of the time it works fine, the issue is not reproducible at will.
> Table creation script is:
> {{CREATE TABLE IF NOT EXISTS price_advisor_ouput.t_switching_coef_source}}
> {{( }}
> {{...fields here...}}
> {{PRIMARY KEY (...PK field here...)}}
> {{)}}
> {{PARTITION BY HASH(matrix_pcd) PARTITIONS 3}}
> {{STORED AS KUDU;}}
>  
> Not sure how to approach diagnostics and fix, so any input will be really 
> appreciated. 
> Thanks in advance, 
> Rodion Myronov



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21227) HIVE-20776 causes view access regression

2019-02-07 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762887#comment-16762887
 ] 

Karthik Manamcheri commented on HIVE-21227:
---

Can you post a code-review at review board? Comments on that are easier.

> HIVE-20776 causes view access regression
> 
>
> Key: HIVE-21227
> URL: https://issues.apache.org/jira/browse/HIVE-21227
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Na Li
>Assignee: Na Li
>Priority: Major
> Attachments: HIVE-21227.001.patch
>
>
> HIVE-20776 introduces a change that causes regression for view access.
> Before the change, a user with select access of a view can get all columns of 
> a view with select access of a view that is derived from a partitioned table.
> With the change, that user cannot access that view.
> The reason is that
> * When user accesses columns of a view, Hive needs to get the partitions of 
> the table that the view is derived from. The user name is the user who issues 
> the query to access the view.
> *  The change in HIVE-20776 checks if user has access to a table before 
> getting its partitions. When user only has access of a view, not the access 
> of a table itself, this change denies the user access of the view. 
> The solution is when getting table partitions, do not filter on table at HMS 
> client



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21227) HIVE-20776 causes view access regression

2019-02-07 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762885#comment-16762885
 ] 

Karthik Manamcheri commented on HIVE-21227:
---

[~LinaAtAustin] can you add testing to ensure that this bug doesn't happen 
again?

> HIVE-20776 causes view access regression
> 
>
> Key: HIVE-21227
> URL: https://issues.apache.org/jira/browse/HIVE-21227
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Na Li
>Assignee: Na Li
>Priority: Major
> Attachments: HIVE-21227.001.patch
>
>
> HIVE-20776 introduces a change that causes regression for view access.
> Before the change, a user with select access of a view can get all columns of 
> a view with select access of a view that is derived from a partitioned table.
> With the change, that user cannot access that view.
> The reason is that
> * When user accesses columns of a view, Hive needs to get the partitions of 
> the table that the view is derived from. The user name is the user who issues 
> the query to access the view.
> *  The change in HIVE-20776 checks if user has access to a table before 
> getting its partitions. When user only has access of a view, not the access 
> of a table itself, this change denies the user access of the view. 
> The solution is when getting table partitions, do not filter on table at HMS 
> client



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20977) Lazy evaluate the table object in PreReadTableEvent to improve get_partition performance

2019-02-05 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761064#comment-16761064
 ] 

Karthik Manamcheri commented on HIVE-20977:
---

Can we merge this [~pvary]?

> Lazy evaluate the table object in PreReadTableEvent to improve get_partition 
> performance
> 
>
> Key: HIVE-20977
> URL: https://issues.apache.org/jira/browse/HIVE-20977
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-20977.1.patch, HIVE-20977.2.patch, 
> HIVE-20977.3.patch, HIVE-20977.4.patch
>
>
> The PreReadTableEvent is generated for non-table operations (such as 
> get_partitions), but only if there is an event listener attached. However, 
> this is also not necessary if the event listener is not interested in the 
> read table event.
> For example, the TransactionalValidationListener's onEvent looks like this
> {code:java}
> @Override
> public void onEvent(PreEventContext context) throws MetaException, 
> NoSuchObjectException,
> InvalidOperationException {
>   switch (context.getEventType()) {
> case CREATE_TABLE:
>   handle((PreCreateTableEvent) context);
>   break;
> case ALTER_TABLE:
>   handle((PreAlterTableEvent) context);
>   break;
> default:
>   //no validation required..
>   }
> }{code}
>  
> Note that for read table events it is a no-op. The problem is that the 
> get_table is evaluated when creating the PreReadTableEvent finally to be just 
> ignored!
> Look at the code below.. {{getMS().getTable(..)}} is evaluated irrespective 
> of if the listener uses it or not.
> {code:java}
> private void fireReadTablePreEvent(String catName, String dbName, String 
> tblName)
> throws MetaException, NoSuchObjectException {
>   if(preListeners.size() > 0) {
> // do this only if there is a pre event listener registered (avoid 
> unnecessary
> // metastore api call)
> Table t = getMS().getTable(catName, dbName, tblName);
> if (t == null) {
>   throw new NoSuchObjectException(TableName.getQualified(catName, dbName, 
> tblName)
>   + " table not found");
> }
> firePreEvent(new PreReadTableEvent(t, this));
>   }
> }
> {code}
> This can be improved by using a {{Supplier}} and lazily evaluating the table 
> when needed (once when the first time it is called, memorized after that).
> *Motivation*
> Whenever a partition call occurs (get_partition, etc.), we fire the 
> PreReadTableEvent. This affects performance since it fetches the table even 
> if it is not being used. This change will improve performance on the 
> get_partition calls.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21044) Add SLF4J reporter to the metastore metrics system

2019-02-04 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21044:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Add SLF4J reporter to the metastore metrics system
> --
>
> Key: HIVE-21044
> URL: https://issues.apache.org/jira/browse/HIVE-21044
> Project: Hive
>  Issue Type: New Feature
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>  Labels: metrics
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-21044.1.patch, HIVE-21044.2.branch-3.patch, 
> HIVE-21044.2.patch, HIVE-21044.3.patch, HIVE-21044.4.patch, 
> HIVE-21044.branch-3.patch
>
>
> Lets add SLF4J reporter as an option in Metrics reporting system. Currently 
> we support JMX, JSON and Console reporting.
> We will add a new option to {{hive.service.metrics.reporter}} called SLF4J. 
> We can use the 
> {{[Slf4jReporter|https://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Slf4jReporter.html]}}
>  class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21044) Add SLF4J reporter to the metastore metrics system

2019-01-30 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756695#comment-16756695
 ] 

Karthik Manamcheri commented on HIVE-21044:
---

[~pvary] Looks like branch-3 tests are currently not be able to run because of 
HIVE-21180. This change touches code in standalone-metastore and I was able to 
successfully run all the unit tests under standalone-metastore on my 
development machine. Can we merge to branch-3? Thanks.

> Add SLF4J reporter to the metastore metrics system
> --
>
> Key: HIVE-21044
> URL: https://issues.apache.org/jira/browse/HIVE-21044
> Project: Hive
>  Issue Type: New Feature
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>  Labels: metrics
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-21044.1.patch, HIVE-21044.2.branch-3.patch, 
> HIVE-21044.2.patch, HIVE-21044.3.patch, HIVE-21044.4.patch, 
> HIVE-21044.branch-3.patch
>
>
> Lets add SLF4J reporter as an option in Metrics reporting system. Currently 
> we support JMX, JSON and Console reporting.
> We will add a new option to {{hive.service.metrics.reporter}} called SLF4J. 
> We can use the 
> {{[Slf4jReporter|https://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Slf4jReporter.html]}}
>  class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21045) Add HMS total api count stats and connection pool stats to metrics

2019-01-30 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756684#comment-16756684
 ] 

Karthik Manamcheri commented on HIVE-21045:
---

[~ngangam] [~ychena] Looks like branch-3 tests are currently not be able to run 
because of HIVE-21180. This change touches code in standalone-metastore and I 
was able to successfully run all the unit tests under standalone-metastore on 
my development machine. Can we merge to branch-3? Thanks.

> Add HMS total api count stats and connection pool stats to metrics
> --
>
> Key: HIVE-21045
> URL: https://issues.apache.org/jira/browse/HIVE-21045
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-21045.1.patch, HIVE-21045.2.branch-3.patch, 
> HIVE-21045.2.patch, HIVE-21045.3.patch, HIVE-21045.4.patch, 
> HIVE-21045.5.patch, HIVE-21045.6.patch, HIVE-21045.7.patch, 
> HIVE-21045.branch-3.patch
>
>
> There are two key metrics which I think we lack and which would be really 
> great to help with scaling visibility in HMS.
> *Total API calls duration stats*
> We already compute and log the duration of API calls in the {{PerfLogger}}. 
> We don't have any gauge or timer on what the average duration of an API call 
> is for the past some bucket of time. This will give us an insight into if 
> there is load on the server which is increasing the average API response time.
>  
> *Connection Pool stats*
> We can use different connection pooling libraries such as bonecp or hikaricp. 
> These pool managers expose statistics such as average time waiting to get a 
> connection, number of connections active, etc. We should expose this as a 
> metric so that we can track if the the connection pool size configured is too 
> small and we are saturating!
> These metrics would help catch problems with HMS resource contention before 
> they actually have jobs failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20977) Lazy evaluate the table object in PreReadTableEvent to improve get_partition performance

2019-01-30 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-20977:
--
Fix Version/s: 4.0.0

> Lazy evaluate the table object in PreReadTableEvent to improve get_partition 
> performance
> 
>
> Key: HIVE-20977
> URL: https://issues.apache.org/jira/browse/HIVE-20977
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-20977.1.patch, HIVE-20977.2.patch, 
> HIVE-20977.3.patch, HIVE-20977.4.patch
>
>
> The PreReadTableEvent is generated for non-table operations (such as 
> get_partitions), but only if there is an event listener attached. However, 
> this is also not necessary if the event listener is not interested in the 
> read table event.
> For example, the TransactionalValidationListener's onEvent looks like this
> {code:java}
> @Override
> public void onEvent(PreEventContext context) throws MetaException, 
> NoSuchObjectException,
> InvalidOperationException {
>   switch (context.getEventType()) {
> case CREATE_TABLE:
>   handle((PreCreateTableEvent) context);
>   break;
> case ALTER_TABLE:
>   handle((PreAlterTableEvent) context);
>   break;
> default:
>   //no validation required..
>   }
> }{code}
>  
> Note that for read table events it is a no-op. The problem is that the 
> get_table is evaluated when creating the PreReadTableEvent finally to be just 
> ignored!
> Look at the code below.. {{getMS().getTable(..)}} is evaluated irrespective 
> of if the listener uses it or not.
> {code:java}
> private void fireReadTablePreEvent(String catName, String dbName, String 
> tblName)
> throws MetaException, NoSuchObjectException {
>   if(preListeners.size() > 0) {
> // do this only if there is a pre event listener registered (avoid 
> unnecessary
> // metastore api call)
> Table t = getMS().getTable(catName, dbName, tblName);
> if (t == null) {
>   throw new NoSuchObjectException(TableName.getQualified(catName, dbName, 
> tblName)
>   + " table not found");
> }
> firePreEvent(new PreReadTableEvent(t, this));
>   }
> }
> {code}
> This can be improved by using a {{Supplier}} and lazily evaluating the table 
> when needed (once when the first time it is called, memorized after that).
> *Motivation*
> Whenever a partition call occurs (get_partition, etc.), we fire the 
> PreReadTableEvent. This affects performance since it fetches the table even 
> if it is not being used. This change will improve performance on the 
> get_partition calls.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21045) Add HMS total api count stats and connection pool stats to metrics

2019-01-30 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21045:
--
Attachment: HIVE-21045.2.branch-3.patch

> Add HMS total api count stats and connection pool stats to metrics
> --
>
> Key: HIVE-21045
> URL: https://issues.apache.org/jira/browse/HIVE-21045
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-21045.1.patch, HIVE-21045.2.branch-3.patch, 
> HIVE-21045.2.patch, HIVE-21045.3.patch, HIVE-21045.4.patch, 
> HIVE-21045.5.patch, HIVE-21045.6.patch, HIVE-21045.7.patch, 
> HIVE-21045.branch-3.patch
>
>
> There are two key metrics which I think we lack and which would be really 
> great to help with scaling visibility in HMS.
> *Total API calls duration stats*
> We already compute and log the duration of API calls in the {{PerfLogger}}. 
> We don't have any gauge or timer on what the average duration of an API call 
> is for the past some bucket of time. This will give us an insight into if 
> there is load on the server which is increasing the average API response time.
>  
> *Connection Pool stats*
> We can use different connection pooling libraries such as bonecp or hikaricp. 
> These pool managers expose statistics such as average time waiting to get a 
> connection, number of connections active, etc. We should expose this as a 
> metric so that we can track if the the connection pool size configured is too 
> small and we are saturating!
> These metrics would help catch problems with HMS resource contention before 
> they actually have jobs failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21044) Add SLF4J reporter to the metastore metrics system

2019-01-30 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21044:
--
Attachment: HIVE-21044.2.branch-3.patch

> Add SLF4J reporter to the metastore metrics system
> --
>
> Key: HIVE-21044
> URL: https://issues.apache.org/jira/browse/HIVE-21044
> Project: Hive
>  Issue Type: New Feature
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>  Labels: metrics
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-21044.1.patch, HIVE-21044.2.branch-3.patch, 
> HIVE-21044.2.patch, HIVE-21044.3.patch, HIVE-21044.4.patch, 
> HIVE-21044.branch-3.patch
>
>
> Lets add SLF4J reporter as an option in Metrics reporting system. Currently 
> we support JMX, JSON and Console reporting.
> We will add a new option to {{hive.service.metrics.reporter}} called SLF4J. 
> We can use the 
> {{[Slf4jReporter|https://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Slf4jReporter.html]}}
>  class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21045) Add HMS total api count stats and connection pool stats to metrics

2019-01-30 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21045:
--
Fix Version/s: 4.0.0

> Add HMS total api count stats and connection pool stats to metrics
> --
>
> Key: HIVE-21045
> URL: https://issues.apache.org/jira/browse/HIVE-21045
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-21045.1.patch, HIVE-21045.2.patch, 
> HIVE-21045.3.patch, HIVE-21045.4.patch, HIVE-21045.5.patch, 
> HIVE-21045.6.patch, HIVE-21045.7.patch, HIVE-21045.branch-3.patch
>
>
> There are two key metrics which I think we lack and which would be really 
> great to help with scaling visibility in HMS.
> *Total API calls duration stats*
> We already compute and log the duration of API calls in the {{PerfLogger}}. 
> We don't have any gauge or timer on what the average duration of an API call 
> is for the past some bucket of time. This will give us an insight into if 
> there is load on the server which is increasing the average API response time.
>  
> *Connection Pool stats*
> We can use different connection pooling libraries such as bonecp or hikaricp. 
> These pool managers expose statistics such as average time waiting to get a 
> connection, number of connections active, etc. We should expose this as a 
> metric so that we can track if the the connection pool size configured is too 
> small and we are saturating!
> These metrics would help catch problems with HMS resource contention before 
> they actually have jobs failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21044) Add SLF4J reporter to the metastore metrics system

2019-01-30 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21044:
--
Fix Version/s: 3.2.0

> Add SLF4J reporter to the metastore metrics system
> --
>
> Key: HIVE-21044
> URL: https://issues.apache.org/jira/browse/HIVE-21044
> Project: Hive
>  Issue Type: New Feature
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>  Labels: metrics
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-21044.1.patch, HIVE-21044.2.patch, 
> HIVE-21044.3.patch, HIVE-21044.4.patch, HIVE-21044.branch-3.patch
>
>
> Lets add SLF4J reporter as an option in Metrics reporting system. Currently 
> we support JMX, JSON and Console reporting.
> We will add a new option to {{hive.service.metrics.reporter}} called SLF4J. 
> We can use the 
> {{[Slf4jReporter|https://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Slf4jReporter.html]}}
>  class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21044) Add SLF4J reporter to the metastore metrics system

2019-01-30 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21044:
--
Status: Patch Available  (was: Reopened)

> Add SLF4J reporter to the metastore metrics system
> --
>
> Key: HIVE-21044
> URL: https://issues.apache.org/jira/browse/HIVE-21044
> Project: Hive
>  Issue Type: New Feature
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>  Labels: metrics
> Fix For: 4.0.0
>
> Attachments: HIVE-21044.1.patch, HIVE-21044.2.patch, 
> HIVE-21044.3.patch, HIVE-21044.4.patch, HIVE-21044.branch-3.patch
>
>
> Lets add SLF4J reporter as an option in Metrics reporting system. Currently 
> we support JMX, JSON and Console reporting.
> We will add a new option to {{hive.service.metrics.reporter}} called SLF4J. 
> We can use the 
> {{[Slf4jReporter|https://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Slf4jReporter.html]}}
>  class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HIVE-21044) Add SLF4J reporter to the metastore metrics system

2019-01-30 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri reopened HIVE-21044:
---

> Add SLF4J reporter to the metastore metrics system
> --
>
> Key: HIVE-21044
> URL: https://issues.apache.org/jira/browse/HIVE-21044
> Project: Hive
>  Issue Type: New Feature
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>  Labels: metrics
> Fix For: 4.0.0
>
> Attachments: HIVE-21044.1.patch, HIVE-21044.2.patch, 
> HIVE-21044.3.patch, HIVE-21044.4.patch, HIVE-21044.branch-3.patch
>
>
> Lets add SLF4J reporter as an option in Metrics reporting system. Currently 
> we support JMX, JSON and Console reporting.
> We will add a new option to {{hive.service.metrics.reporter}} called SLF4J. 
> We can use the 
> {{[Slf4jReporter|https://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Slf4jReporter.html]}}
>  class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21045) Add HMS total api count stats and connection pool stats to metrics

2019-01-26 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21045:
--
Attachment: HIVE-21045.branch-3.patch

> Add HMS total api count stats and connection pool stats to metrics
> --
>
> Key: HIVE-21045
> URL: https://issues.apache.org/jira/browse/HIVE-21045
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-21045.1.patch, HIVE-21045.2.patch, 
> HIVE-21045.3.patch, HIVE-21045.4.patch, HIVE-21045.5.patch, 
> HIVE-21045.6.patch, HIVE-21045.7.patch, HIVE-21045.branch-3.patch
>
>
> There are two key metrics which I think we lack and which would be really 
> great to help with scaling visibility in HMS.
> *Total API calls duration stats*
> We already compute and log the duration of API calls in the {{PerfLogger}}. 
> We don't have any gauge or timer on what the average duration of an API call 
> is for the past some bucket of time. This will give us an insight into if 
> there is load on the server which is increasing the average API response time.
>  
> *Connection Pool stats*
> We can use different connection pooling libraries such as bonecp or hikaricp. 
> These pool managers expose statistics such as average time waiting to get a 
> connection, number of connections active, etc. We should expose this as a 
> metric so that we can track if the the connection pool size configured is too 
> small and we are saturating!
> These metrics would help catch problems with HMS resource contention before 
> they actually have jobs failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21045) Add HMS total api count stats and connection pool stats to metrics

2019-01-26 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16753203#comment-16753203
 ] 

Karthik Manamcheri commented on HIVE-21045:
---

Thanks [~ychena] I have attached the branch-3 patch as well. How do I kick off 
the tests for that? 

> Add HMS total api count stats and connection pool stats to metrics
> --
>
> Key: HIVE-21045
> URL: https://issues.apache.org/jira/browse/HIVE-21045
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-21045.1.patch, HIVE-21045.2.patch, 
> HIVE-21045.3.patch, HIVE-21045.4.patch, HIVE-21045.5.patch, 
> HIVE-21045.6.patch, HIVE-21045.7.patch, HIVE-21045.branch-3.patch
>
>
> There are two key metrics which I think we lack and which would be really 
> great to help with scaling visibility in HMS.
> *Total API calls duration stats*
> We already compute and log the duration of API calls in the {{PerfLogger}}. 
> We don't have any gauge or timer on what the average duration of an API call 
> is for the past some bucket of time. This will give us an insight into if 
> there is load on the server which is increasing the average API response time.
>  
> *Connection Pool stats*
> We can use different connection pooling libraries such as bonecp or hikaricp. 
> These pool managers expose statistics such as average time waiting to get a 
> connection, number of connections active, etc. We should expose this as a 
> metric so that we can track if the the connection pool size configured is too 
> small and we are saturating!
> These metrics would help catch problems with HMS resource contention before 
> they actually have jobs failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21044) Add SLF4J reporter to the metastore metrics system

2019-01-26 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16753178#comment-16753178
 ] 

Karthik Manamcheri commented on HIVE-21044:
---

Hi [~pvary]. I have backported this to branch-3 as well. Please commit to 
branch-3. Thanks

> Add SLF4J reporter to the metastore metrics system
> --
>
> Key: HIVE-21044
> URL: https://issues.apache.org/jira/browse/HIVE-21044
> Project: Hive
>  Issue Type: New Feature
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>  Labels: metrics
> Fix For: 4.0.0
>
> Attachments: HIVE-21044.1.patch, HIVE-21044.2.patch, 
> HIVE-21044.3.patch, HIVE-21044.4.patch, HIVE-21044.branch-3.patch
>
>
> Lets add SLF4J reporter as an option in Metrics reporting system. Currently 
> we support JMX, JSON and Console reporting.
> We will add a new option to {{hive.service.metrics.reporter}} called SLF4J. 
> We can use the 
> {{[Slf4jReporter|https://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Slf4jReporter.html]}}
>  class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21044) Add SLF4J reporter to the metastore metrics system

2019-01-26 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21044:
--
Attachment: HIVE-21044.branch-3.patch

> Add SLF4J reporter to the metastore metrics system
> --
>
> Key: HIVE-21044
> URL: https://issues.apache.org/jira/browse/HIVE-21044
> Project: Hive
>  Issue Type: New Feature
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>  Labels: metrics
> Fix For: 4.0.0
>
> Attachments: HIVE-21044.1.patch, HIVE-21044.2.patch, 
> HIVE-21044.3.patch, HIVE-21044.4.patch, HIVE-21044.branch-3.patch
>
>
> Lets add SLF4J reporter as an option in Metrics reporting system. Currently 
> we support JMX, JSON and Console reporting.
> We will add a new option to {{hive.service.metrics.reporter}} called SLF4J. 
> We can use the 
> {{[Slf4jReporter|https://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Slf4jReporter.html]}}
>  class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21045) Add HMS total api count stats and connection pool stats to metrics

2019-01-25 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16752675#comment-16752675
 ] 

Karthik Manamcheri commented on HIVE-21045:
---

Thanks for the suggestion [~ngangam]. I filed HIVE-21169 as a follow up.

> Add HMS total api count stats and connection pool stats to metrics
> --
>
> Key: HIVE-21045
> URL: https://issues.apache.org/jira/browse/HIVE-21045
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Attachments: HIVE-21045.1.patch, HIVE-21045.2.patch, 
> HIVE-21045.3.patch, HIVE-21045.4.patch, HIVE-21045.5.patch, 
> HIVE-21045.6.patch, HIVE-21045.7.patch
>
>
> There are two key metrics which I think we lack and which would be really 
> great to help with scaling visibility in HMS.
> *Total API calls duration stats*
> We already compute and log the duration of API calls in the {{PerfLogger}}. 
> We don't have any gauge or timer on what the average duration of an API call 
> is for the past some bucket of time. This will give us an insight into if 
> there is load on the server which is increasing the average API response time.
>  
> *Connection Pool stats*
> We can use different connection pooling libraries such as bonecp or hikaricp. 
> These pool managers expose statistics such as average time waiting to get a 
> connection, number of connections active, etc. We should expose this as a 
> metric so that we can track if the the connection pool size configured is too 
> small and we are saturating!
> These metrics would help catch problems with HMS resource contention before 
> they actually have jobs failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21045) Add HMS total api count stats and connection pool stats to metrics

2019-01-25 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21045:
--
Attachment: HIVE-21045.7.patch

> Add HMS total api count stats and connection pool stats to metrics
> --
>
> Key: HIVE-21045
> URL: https://issues.apache.org/jira/browse/HIVE-21045
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Attachments: HIVE-21045.1.patch, HIVE-21045.2.patch, 
> HIVE-21045.3.patch, HIVE-21045.4.patch, HIVE-21045.5.patch, 
> HIVE-21045.6.patch, HIVE-21045.7.patch
>
>
> There are two key metrics which I think we lack and which would be really 
> great to help with scaling visibility in HMS.
> *Total API calls duration stats*
> We already compute and log the duration of API calls in the {{PerfLogger}}. 
> We don't have any gauge or timer on what the average duration of an API call 
> is for the past some bucket of time. This will give us an insight into if 
> there is load on the server which is increasing the average API response time.
>  
> *Connection Pool stats*
> We can use different connection pooling libraries such as bonecp or hikaricp. 
> These pool managers expose statistics such as average time waiting to get a 
> connection, number of connections active, etc. We should expose this as a 
> metric so that we can track if the the connection pool size configured is too 
> small and we are saturating!
> These metrics would help catch problems with HMS resource contention before 
> they actually have jobs failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21045) Add HMS total api count stats and connection pool stats to metrics

2019-01-24 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21045:
--
Attachment: HIVE-21045.6.patch

> Add HMS total api count stats and connection pool stats to metrics
> --
>
> Key: HIVE-21045
> URL: https://issues.apache.org/jira/browse/HIVE-21045
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Attachments: HIVE-21045.1.patch, HIVE-21045.2.patch, 
> HIVE-21045.3.patch, HIVE-21045.4.patch, HIVE-21045.5.patch, HIVE-21045.6.patch
>
>
> There are two key metrics which I think we lack and which would be really 
> great to help with scaling visibility in HMS.
> *Total API calls duration stats*
> We already compute and log the duration of API calls in the {{PerfLogger}}. 
> We don't have any gauge or timer on what the average duration of an API call 
> is for the past some bucket of time. This will give us an insight into if 
> there is load on the server which is increasing the average API response time.
>  
> *Connection Pool stats*
> We can use different connection pooling libraries such as bonecp or hikaricp. 
> These pool managers expose statistics such as average time waiting to get a 
> connection, number of connections active, etc. We should expose this as a 
> metric so that we can track if the the connection pool size configured is too 
> small and we are saturating!
> These metrics would help catch problems with HMS resource contention before 
> they actually have jobs failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21045) Add HMS total api count stats and connection pool stats to metrics

2019-01-23 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21045:
--
Attachment: HIVE-21045.5.patch

> Add HMS total api count stats and connection pool stats to metrics
> --
>
> Key: HIVE-21045
> URL: https://issues.apache.org/jira/browse/HIVE-21045
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Attachments: HIVE-21045.1.patch, HIVE-21045.2.patch, 
> HIVE-21045.3.patch, HIVE-21045.4.patch, HIVE-21045.5.patch
>
>
> There are two key metrics which I think we lack and which would be really 
> great to help with scaling visibility in HMS.
> *Total API calls duration stats*
> We already compute and log the duration of API calls in the {{PerfLogger}}. 
> We don't have any gauge or timer on what the average duration of an API call 
> is for the past some bucket of time. This will give us an insight into if 
> there is load on the server which is increasing the average API response time.
>  
> *Connection Pool stats*
> We can use different connection pooling libraries such as bonecp or hikaricp. 
> These pool managers expose statistics such as average time waiting to get a 
> connection, number of connections active, etc. We should expose this as a 
> metric so that we can track if the the connection pool size configured is too 
> small and we are saturating!
> These metrics would help catch problems with HMS resource contention before 
> they actually have jobs failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21045) Add HMS total api count stats and connection pool stats to metrics

2019-01-23 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21045:
--
Attachment: HIVE-21045.4.patch

> Add HMS total api count stats and connection pool stats to metrics
> --
>
> Key: HIVE-21045
> URL: https://issues.apache.org/jira/browse/HIVE-21045
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Attachments: HIVE-21045.1.patch, HIVE-21045.2.patch, 
> HIVE-21045.3.patch, HIVE-21045.4.patch
>
>
> There are two key metrics which I think we lack and which would be really 
> great to help with scaling visibility in HMS.
> *Total API calls duration stats*
> We already compute and log the duration of API calls in the {{PerfLogger}}. 
> We don't have any gauge or timer on what the average duration of an API call 
> is for the past some bucket of time. This will give us an insight into if 
> there is load on the server which is increasing the average API response time.
>  
> *Connection Pool stats*
> We can use different connection pooling libraries such as bonecp or hikaricp. 
> These pool managers expose statistics such as average time waiting to get a 
> connection, number of connections active, etc. We should expose this as a 
> metric so that we can track if the the connection pool size configured is too 
> small and we are saturating!
> These metrics would help catch problems with HMS resource contention before 
> they actually have jobs failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-8133) Support Postgres via DirectSQL

2019-01-11 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740839#comment-16740839
 ] 

Karthik Manamcheri commented on HIVE-8133:
--

[~damien.carol] [~brocknoland] Is this JIRA even relevant anymore? There is no 
description. Postgres DirectSQL is supported in Hive 2.x releases, right? If 
so, can we close this ticket as Done?

> Support Postgres via DirectSQL
> --
>
> Key: HIVE-8133
> URL: https://issues.apache.org/jira/browse/HIVE-8133
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21045) Add HMS total api count stats and connection pool stats to metrics

2019-01-11 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21045:
--
Attachment: HIVE-21045.3.patch

> Add HMS total api count stats and connection pool stats to metrics
> --
>
> Key: HIVE-21045
> URL: https://issues.apache.org/jira/browse/HIVE-21045
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Attachments: HIVE-21045.1.patch, HIVE-21045.2.patch, 
> HIVE-21045.3.patch
>
>
> There are two key metrics which I think we lack and which would be really 
> great to help with scaling visibility in HMS.
> *Total API calls duration stats*
> We already compute and log the duration of API calls in the {{PerfLogger}}. 
> We don't have any gauge or timer on what the average duration of an API call 
> is for the past some bucket of time. This will give us an insight into if 
> there is load on the server which is increasing the average API response time.
>  
> *Connection Pool stats*
> We can use different connection pooling libraries such as bonecp or hikaricp. 
> These pool managers expose statistics such as average time waiting to get a 
> connection, number of connections active, etc. We should expose this as a 
> metric so that we can track if the the connection pool size configured is too 
> small and we are saturating!
> These metrics would help catch problems with HMS resource contention before 
> they actually have jobs failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21045) Add HMS total api count stats and connection pool stats to metrics

2019-01-11 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21045:
--
Attachment: HIVE-21045.2.patch

> Add HMS total api count stats and connection pool stats to metrics
> --
>
> Key: HIVE-21045
> URL: https://issues.apache.org/jira/browse/HIVE-21045
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Attachments: HIVE-21045.1.patch, HIVE-21045.2.patch
>
>
> There are two key metrics which I think we lack and which would be really 
> great to help with scaling visibility in HMS.
> *Total API calls duration stats*
> We already compute and log the duration of API calls in the {{PerfLogger}}. 
> We don't have any gauge or timer on what the average duration of an API call 
> is for the past some bucket of time. This will give us an insight into if 
> there is load on the server which is increasing the average API response time.
>  
> *Connection Pool stats*
> We can use different connection pooling libraries such as bonecp or hikaricp. 
> These pool managers expose statistics such as average time waiting to get a 
> connection, number of connections active, etc. We should expose this as a 
> metric so that we can track if the the connection pool size configured is too 
> small and we are saturating!
> These metrics would help catch problems with HMS resource contention before 
> they actually have jobs failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21045) Add HMS total api count stats and connection pool stats to metrics

2019-01-04 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21045:
--
Description: 
There are two key metrics which I think we lack and which would be really great 
to help with scaling visibility in HMS.

*Total API calls duration stats*
We already compute and log the duration of API calls in the {{PerfLogger}}. We 
don't have any gauge or timer on what the average duration of an API call is 
for the past some bucket of time. This will give us an insight into if there is 
load on the server which is increasing the average API response time.
 
*Connection Pool stats*
We can use different connection pooling libraries such as bonecp or hikaricp. 
These pool managers expose statistics such as average time waiting to get a 
connection, number of connections active, etc. We should expose this as a 
metric so that we can track if the the connection pool size configured is too 
small and we are saturating!

These metrics would help catch problems with HMS resource contention before 
they actually have jobs failing.

  was:
There are two key metrics which I think we lack and which would be really great 
to help with scaling visibility in HMS.

*Average API duration for the past 'n' minutes*
We already compute and log the duration of API calls in the {{PerfLogger}}. We 
don't have any gauge on what the average duration of an API call is for the 
past some bucket of time. This will give us an insight into if there is load on 
the server which is increasing the average API response time.
 
*RDBMS Connection wait time*
We can use different connection pooling libraries such as bonecp or hikaricp. 
These pool managers expose statistics such as average time waiting to get a 
connection, number of connections active, etc. We should expose this as a 
metric so that we can track if the the connection pool size configured is too 
small and we are saturating!

These metrics would help catch problems with HMS resource contention before 
they actually have jobs failing.


> Add HMS total api count stats and connection pool stats to metrics
> --
>
> Key: HIVE-21045
> URL: https://issues.apache.org/jira/browse/HIVE-21045
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Attachments: HIVE-21045.1.patch
>
>
> There are two key metrics which I think we lack and which would be really 
> great to help with scaling visibility in HMS.
> *Total API calls duration stats*
> We already compute and log the duration of API calls in the {{PerfLogger}}. 
> We don't have any gauge or timer on what the average duration of an API call 
> is for the past some bucket of time. This will give us an insight into if 
> there is load on the server which is increasing the average API response time.
>  
> *Connection Pool stats*
> We can use different connection pooling libraries such as bonecp or hikaricp. 
> These pool managers expose statistics such as average time waiting to get a 
> connection, number of connections active, etc. We should expose this as a 
> metric so that we can track if the the connection pool size configured is too 
> small and we are saturating!
> These metrics would help catch problems with HMS resource contention before 
> they actually have jobs failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21045) Add HMS total api count stats and connection pool stats to metrics

2019-01-04 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21045:
--
Status: Patch Available  (was: In Progress)

> Add HMS total api count stats and connection pool stats to metrics
> --
>
> Key: HIVE-21045
> URL: https://issues.apache.org/jira/browse/HIVE-21045
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Attachments: HIVE-21045.1.patch
>
>
> There are two key metrics which I think we lack and which would be really 
> great to help with scaling visibility in HMS.
> *Average API duration for the past 'n' minutes*
> We already compute and log the duration of API calls in the {{PerfLogger}}. 
> We don't have any gauge on what the average duration of an API call is for 
> the past some bucket of time. This will give us an insight into if there is 
> load on the server which is increasing the average API response time.
>  
> *RDBMS Connection wait time*
> We can use different connection pooling libraries such as bonecp or hikaricp. 
> These pool managers expose statistics such as average time waiting to get a 
> connection, number of connections active, etc. We should expose this as a 
> metric so that we can track if the the connection pool size configured is too 
> small and we are saturating!
> These metrics would help catch problems with HMS resource contention before 
> they actually have jobs failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21045) Add HMS total api count stats and connection pool stats to metrics

2019-01-04 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21045:
--
Attachment: HIVE-21045.1.patch

> Add HMS total api count stats and connection pool stats to metrics
> --
>
> Key: HIVE-21045
> URL: https://issues.apache.org/jira/browse/HIVE-21045
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Attachments: HIVE-21045.1.patch
>
>
> There are two key metrics which I think we lack and which would be really 
> great to help with scaling visibility in HMS.
> *Average API duration for the past 'n' minutes*
> We already compute and log the duration of API calls in the {{PerfLogger}}. 
> We don't have any gauge on what the average duration of an API call is for 
> the past some bucket of time. This will give us an insight into if there is 
> load on the server which is increasing the average API response time.
>  
> *RDBMS Connection wait time*
> We can use different connection pooling libraries such as bonecp or hikaricp. 
> These pool managers expose statistics such as average time waiting to get a 
> connection, number of connections active, etc. We should expose this as a 
> metric so that we can track if the the connection pool size configured is too 
> small and we are saturating!
> These metrics would help catch problems with HMS resource contention before 
> they actually have jobs failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20977) Lazy evaluate the table object in PreReadTableEvent to improve get_partition performance

2019-01-04 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734700#comment-16734700
 ] 

Karthik Manamcheri commented on HIVE-20977:
---

[~alangates] [~thejas] I want to hear your thoughts on a particular issue 
regarding this change. In this ticket, I am changing the behavior of bulk 
listing partitions API. Previously, if you issue a getPartitionsByNames (or one 
of the similar APIs) with an empty database, we threw an exception. After this 
change, we will return an empty list of partitions instead. This behavior is 
similar to what happens if you issue a getTablesByNames call (an empty list of 
tables are returned).

Here is my thoughts on why this is an ok change. In the master branch (without 
my change), if you remove the TransactionalValidationListener from the list of 
pre-listeners, the GetPartitions API tests fail! The bulk get partitions APIs 
behavior actually depends on the absence or presence of a pre-event listener. 
If that is the case, then clients already expect different return behavior 
(throwing an exception or returning an empty list). So the API contract itself 
is that it can throw an exception (OR return an empty list). However, we 
committed HIVE-12064 which made it so that there will *always* be a pre-event 
listener (the TransactionalValidationListener is added in code 
unconditionally). The unit tests were written around the fact that there is 
always a pre-event listener. I don't think that is a good assumption to make. 
Listeners are supposed to be plugins and hence we shouldn't design API 
contracts around there being a listener. 

My change will revert back to the original behavior (before HIVE-12064), which 
is that the list partitions APIs can throw an exception (or return an empty 
list) if an invalid table/db name is specified. We should also fix the tests so 
that they don't depend of the existence (or non-existence) of listeners or 
plugins.

Thoughts? 

> Lazy evaluate the table object in PreReadTableEvent to improve get_partition 
> performance
> 
>
> Key: HIVE-20977
> URL: https://issues.apache.org/jira/browse/HIVE-20977
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Attachments: HIVE-20977.1.patch, HIVE-20977.2.patch, 
> HIVE-20977.3.patch, HIVE-20977.4.patch
>
>
> The PreReadTableEvent is generated for non-table operations (such as 
> get_partitions), but only if there is an event listener attached. However, 
> this is also not necessary if the event listener is not interested in the 
> read table event.
> For example, the TransactionalValidationListener's onEvent looks like this
> {code:java}
> @Override
> public void onEvent(PreEventContext context) throws MetaException, 
> NoSuchObjectException,
> InvalidOperationException {
>   switch (context.getEventType()) {
> case CREATE_TABLE:
>   handle((PreCreateTableEvent) context);
>   break;
> case ALTER_TABLE:
>   handle((PreAlterTableEvent) context);
>   break;
> default:
>   //no validation required..
>   }
> }{code}
>  
> Note that for read table events it is a no-op. The problem is that the 
> get_table is evaluated when creating the PreReadTableEvent finally to be just 
> ignored!
> Look at the code below.. {{getMS().getTable(..)}} is evaluated irrespective 
> of if the listener uses it or not.
> {code:java}
> private void fireReadTablePreEvent(String catName, String dbName, String 
> tblName)
> throws MetaException, NoSuchObjectException {
>   if(preListeners.size() > 0) {
> // do this only if there is a pre event listener registered (avoid 
> unnecessary
> // metastore api call)
> Table t = getMS().getTable(catName, dbName, tblName);
> if (t == null) {
>   throw new NoSuchObjectException(TableName.getQualified(catName, dbName, 
> tblName)
>   + " table not found");
> }
> firePreEvent(new PreReadTableEvent(t, this));
>   }
> }
> {code}
> This can be improved by using a {{Supplier}} and lazily evaluating the table 
> when needed (once when the first time it is called, memorized after that).
> *Motivation*
> Whenever a partition call occurs (get_partition, etc.), we fire the 
> PreReadTableEvent. This affects performance since it fetches the table even 
> if it is not being used. This change will improve performance on the 
> get_partition calls.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21045) Add HMS total api count stats and connection pool stats to metrics

2019-01-03 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21045:
--
Summary: Add HMS total api count stats and connection pool stats to metrics 
 (was: Add connection pool info and rolling performance info to the metrics 
system)

> Add HMS total api count stats and connection pool stats to metrics
> --
>
> Key: HIVE-21045
> URL: https://issues.apache.org/jira/browse/HIVE-21045
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>
> There are two key metrics which I think we lack and which would be really 
> great to help with scaling visibility in HMS.
> *Average API duration for the past 'n' minutes*
> We already compute and log the duration of API calls in the {{PerfLogger}}. 
> We don't have any gauge on what the average duration of an API call is for 
> the past some bucket of time. This will give us an insight into if there is 
> load on the server which is increasing the average API response time.
>  
> *RDBMS Connection wait time*
> We can use different connection pooling libraries such as bonecp or hikaricp. 
> These pool managers expose statistics such as average time waiting to get a 
> connection, number of connections active, etc. We should expose this as a 
> metric so that we can track if the the connection pool size configured is too 
> small and we are saturating!
> These metrics would help catch problems with HMS resource contention before 
> they actually have jobs failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HIVE-21045) Add connection pool info and rolling performance info to the metrics system

2019-01-03 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-21045 started by Karthik Manamcheri.
-
> Add connection pool info and rolling performance info to the metrics system
> ---
>
> Key: HIVE-21045
> URL: https://issues.apache.org/jira/browse/HIVE-21045
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>
> There are two key metrics which I think we lack and which would be really 
> great to help with scaling visibility in HMS.
> *Average API duration for the past 'n' minutes*
> We already compute and log the duration of API calls in the {{PerfLogger}}. 
> We don't have any gauge on what the average duration of an API call is for 
> the past some bucket of time. This will give us an insight into if there is 
> load on the server which is increasing the average API response time.
>  
> *RDBMS Connection wait time*
> We can use different connection pooling libraries such as bonecp or hikaricp. 
> These pool managers expose statistics such as average time waiting to get a 
> connection, number of connections active, etc. We should expose this as a 
> metric so that we can track if the the connection pool size configured is too 
> small and we are saturating!
> These metrics would help catch problems with HMS resource contention before 
> they actually have jobs failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-20986) Add TransactionalValidationListener to HMS preListeners only when ACID support is enabled

2019-01-03 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri resolved HIVE-20986.
---
Resolution: Won't Fix

> Add TransactionalValidationListener to HMS preListeners only when ACID 
> support is enabled
> -
>
> Key: HIVE-20986
> URL: https://issues.apache.org/jira/browse/HIVE-20986
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karthik Manamcheri
>Assignee: Adam Holley
>Priority: Major
>
> We add the TransactionalValidationListener to the preListeners in HMS 
> unconditionally.
> {code:java}
> public void init() throws MetaException {
>   ..
>   preListeners.add(0, new TransactionalValidationListener(conf));
>   ..
> }{code}
> This causes some performance issues because the listener is called even when 
> not needed. Lets add a condition around this and add this listener only if 
> the transactional support is enabled.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20986) Add TransactionalValidationListener to HMS preListeners only when ACID support is enabled

2019-01-03 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733514#comment-16733514
 ] 

Karthik Manamcheri commented on HIVE-20986:
---

This can be alternatively addressed by HIVE-20977 instead. There is no one flag 
to indicate if ACID support is enabled in Hive or not. 

> Add TransactionalValidationListener to HMS preListeners only when ACID 
> support is enabled
> -
>
> Key: HIVE-20986
> URL: https://issues.apache.org/jira/browse/HIVE-20986
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karthik Manamcheri
>Assignee: Adam Holley
>Priority: Major
>
> We add the TransactionalValidationListener to the preListeners in HMS 
> unconditionally.
> {code:java}
> public void init() throws MetaException {
>   ..
>   preListeners.add(0, new TransactionalValidationListener(conf));
>   ..
> }{code}
> This causes some performance issues because the listener is called even when 
> not needed. Lets add a condition around this and add this listener only if 
> the transactional support is enabled.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20977) Lazy evaluate the table object in PreReadTableEvent to improve get_partition performance

2019-01-02 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-20977:
--
Attachment: HIVE-20977.4.patch

> Lazy evaluate the table object in PreReadTableEvent to improve get_partition 
> performance
> 
>
> Key: HIVE-20977
> URL: https://issues.apache.org/jira/browse/HIVE-20977
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Attachments: HIVE-20977.1.patch, HIVE-20977.2.patch, 
> HIVE-20977.3.patch, HIVE-20977.4.patch
>
>
> The PreReadTableEvent is generated for non-table operations (such as 
> get_partitions), but only if there is an event listener attached. However, 
> this is also not necessary if the event listener is not interested in the 
> read table event.
> For example, the TransactionalValidationListener's onEvent looks like this
> {code:java}
> @Override
> public void onEvent(PreEventContext context) throws MetaException, 
> NoSuchObjectException,
> InvalidOperationException {
>   switch (context.getEventType()) {
> case CREATE_TABLE:
>   handle((PreCreateTableEvent) context);
>   break;
> case ALTER_TABLE:
>   handle((PreAlterTableEvent) context);
>   break;
> default:
>   //no validation required..
>   }
> }{code}
>  
> Note that for read table events it is a no-op. The problem is that the 
> get_table is evaluated when creating the PreReadTableEvent finally to be just 
> ignored!
> Look at the code below.. {{getMS().getTable(..)}} is evaluated irrespective 
> of if the listener uses it or not.
> {code:java}
> private void fireReadTablePreEvent(String catName, String dbName, String 
> tblName)
> throws MetaException, NoSuchObjectException {
>   if(preListeners.size() > 0) {
> // do this only if there is a pre event listener registered (avoid 
> unnecessary
> // metastore api call)
> Table t = getMS().getTable(catName, dbName, tblName);
> if (t == null) {
>   throw new NoSuchObjectException(TableName.getQualified(catName, dbName, 
> tblName)
>   + " table not found");
> }
> firePreEvent(new PreReadTableEvent(t, this));
>   }
> }
> {code}
> This can be improved by using a {{Supplier}} and lazily evaluating the table 
> when needed (once when the first time it is called, memorized after that).
> *Motivation*
> Whenever a partition call occurs (get_partition, etc.), we fire the 
> PreReadTableEvent. This affects performance since it fetches the table even 
> if it is not being used. This change will improve performance on the 
> get_partition calls.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21044) Add SLF4J reporter to the metastore metrics system

2019-01-02 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732524#comment-16732524
 ] 

Karthik Manamcheri commented on HIVE-21044:
---

Finally, all tests passed! [~pvary] can you merge this change? Thank you.

> Add SLF4J reporter to the metastore metrics system
> --
>
> Key: HIVE-21044
> URL: https://issues.apache.org/jira/browse/HIVE-21044
> Project: Hive
>  Issue Type: New Feature
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>  Labels: metrics
> Attachments: HIVE-21044.1.patch, HIVE-21044.2.patch, 
> HIVE-21044.3.patch, HIVE-21044.4.patch
>
>
> Lets add SLF4J reporter as an option in Metrics reporting system. Currently 
> we support JMX, JSON and Console reporting.
> We will add a new option to {{hive.service.metrics.reporter}} called SLF4J. 
> We can use the 
> {{[Slf4jReporter|https://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Slf4jReporter.html]}}
>  class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21044) Add SLF4J reporter to the metastore metrics system

2019-01-02 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21044:
--
Attachment: HIVE-21044.4.patch

> Add SLF4J reporter to the metastore metrics system
> --
>
> Key: HIVE-21044
> URL: https://issues.apache.org/jira/browse/HIVE-21044
> Project: Hive
>  Issue Type: New Feature
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>  Labels: metrics
> Attachments: HIVE-21044.1.patch, HIVE-21044.2.patch, 
> HIVE-21044.3.patch, HIVE-21044.4.patch
>
>
> Lets add SLF4J reporter as an option in Metrics reporting system. Currently 
> we support JMX, JSON and Console reporting.
> We will add a new option to {{hive.service.metrics.reporter}} called SLF4J. 
> We can use the 
> {{[Slf4jReporter|https://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Slf4jReporter.html]}}
>  class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20977) Lazy evaluate the table object in PreReadTableEvent to improve get_partition performance

2019-01-02 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-20977:
--
Attachment: HIVE-20977.3.patch

> Lazy evaluate the table object in PreReadTableEvent to improve get_partition 
> performance
> 
>
> Key: HIVE-20977
> URL: https://issues.apache.org/jira/browse/HIVE-20977
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Attachments: HIVE-20977.1.patch, HIVE-20977.2.patch, 
> HIVE-20977.3.patch
>
>
> The PreReadTableEvent is generated for non-table operations (such as 
> get_partitions), but only if there is an event listener attached. However, 
> this is also not necessary if the event listener is not interested in the 
> read table event.
> For example, the TransactionalValidationListener's onEvent looks like this
> {code:java}
> @Override
> public void onEvent(PreEventContext context) throws MetaException, 
> NoSuchObjectException,
> InvalidOperationException {
>   switch (context.getEventType()) {
> case CREATE_TABLE:
>   handle((PreCreateTableEvent) context);
>   break;
> case ALTER_TABLE:
>   handle((PreAlterTableEvent) context);
>   break;
> default:
>   //no validation required..
>   }
> }{code}
>  
> Note that for read table events it is a no-op. The problem is that the 
> get_table is evaluated when creating the PreReadTableEvent finally to be just 
> ignored!
> Look at the code below.. {{getMS().getTable(..)}} is evaluated irrespective 
> of if the listener uses it or not.
> {code:java}
> private void fireReadTablePreEvent(String catName, String dbName, String 
> tblName)
> throws MetaException, NoSuchObjectException {
>   if(preListeners.size() > 0) {
> // do this only if there is a pre event listener registered (avoid 
> unnecessary
> // metastore api call)
> Table t = getMS().getTable(catName, dbName, tblName);
> if (t == null) {
>   throw new NoSuchObjectException(TableName.getQualified(catName, dbName, 
> tblName)
>   + " table not found");
> }
> firePreEvent(new PreReadTableEvent(t, this));
>   }
> }
> {code}
> This can be improved by using a {{Supplier}} and lazily evaluating the table 
> when needed (once when the first time it is called, memorized after that).
> *Motivation*
> Whenever a partition call occurs (get_partition, etc.), we fire the 
> PreReadTableEvent. This affects performance since it fetches the table even 
> if it is not being used. This change will improve performance on the 
> get_partition calls.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20977) Lazy evaluate the table object in PreReadTableEvent to improve get_partition performance

2019-01-02 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-20977:
--
Attachment: HIVE-20977.2.patch

> Lazy evaluate the table object in PreReadTableEvent to improve get_partition 
> performance
> 
>
> Key: HIVE-20977
> URL: https://issues.apache.org/jira/browse/HIVE-20977
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Attachments: HIVE-20977.1.patch, HIVE-20977.2.patch
>
>
> The PreReadTableEvent is generated for non-table operations (such as 
> get_partitions), but only if there is an event listener attached. However, 
> this is also not necessary if the event listener is not interested in the 
> read table event.
> For example, the TransactionalValidationListener's onEvent looks like this
> {code:java}
> @Override
> public void onEvent(PreEventContext context) throws MetaException, 
> NoSuchObjectException,
> InvalidOperationException {
>   switch (context.getEventType()) {
> case CREATE_TABLE:
>   handle((PreCreateTableEvent) context);
>   break;
> case ALTER_TABLE:
>   handle((PreAlterTableEvent) context);
>   break;
> default:
>   //no validation required..
>   }
> }{code}
>  
> Note that for read table events it is a no-op. The problem is that the 
> get_table is evaluated when creating the PreReadTableEvent finally to be just 
> ignored!
> Look at the code below.. {{getMS().getTable(..)}} is evaluated irrespective 
> of if the listener uses it or not.
> {code:java}
> private void fireReadTablePreEvent(String catName, String dbName, String 
> tblName)
> throws MetaException, NoSuchObjectException {
>   if(preListeners.size() > 0) {
> // do this only if there is a pre event listener registered (avoid 
> unnecessary
> // metastore api call)
> Table t = getMS().getTable(catName, dbName, tblName);
> if (t == null) {
>   throw new NoSuchObjectException(TableName.getQualified(catName, dbName, 
> tblName)
>   + " table not found");
> }
> firePreEvent(new PreReadTableEvent(t, this));
>   }
> }
> {code}
> This can be improved by using a {{Supplier}} and lazily evaluating the table 
> when needed (once when the first time it is called, memorized after that).
> *Motivation*
> Whenever a partition call occurs (get_partition, etc.), we fire the 
> PreReadTableEvent. This affects performance since it fetches the table even 
> if it is not being used. This change will improve performance on the 
> get_partition calls.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21044) Add SLF4J reporter to the metastore metrics system

2019-01-02 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732220#comment-16732220
 ] 

Karthik Manamcheri commented on HIVE-21044:
---

[~pvary] Nothing but comments changed in the second patch! I am reuploading the 
same as a third patch to re-trigger the tests.

> Add SLF4J reporter to the metastore metrics system
> --
>
> Key: HIVE-21044
> URL: https://issues.apache.org/jira/browse/HIVE-21044
> Project: Hive
>  Issue Type: New Feature
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>  Labels: metrics
> Attachments: HIVE-21044.1.patch, HIVE-21044.2.patch, 
> HIVE-21044.3.patch
>
>
> Lets add SLF4J reporter as an option in Metrics reporting system. Currently 
> we support JMX, JSON and Console reporting.
> We will add a new option to {{hive.service.metrics.reporter}} called SLF4J. 
> We can use the 
> {{[Slf4jReporter|https://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Slf4jReporter.html]}}
>  class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21044) Add SLF4J reporter to the metastore metrics system

2019-01-02 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21044:
--
Attachment: HIVE-21044.3.patch

> Add SLF4J reporter to the metastore metrics system
> --
>
> Key: HIVE-21044
> URL: https://issues.apache.org/jira/browse/HIVE-21044
> Project: Hive
>  Issue Type: New Feature
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>  Labels: metrics
> Attachments: HIVE-21044.1.patch, HIVE-21044.2.patch, 
> HIVE-21044.3.patch
>
>
> Lets add SLF4J reporter as an option in Metrics reporting system. Currently 
> we support JMX, JSON and Console reporting.
> We will add a new option to {{hive.service.metrics.reporter}} called SLF4J. 
> We can use the 
> {{[Slf4jReporter|https://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Slf4jReporter.html]}}
>  class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20977) Lazy evaluate the table object in PreReadTableEvent to improve get_partition performance

2018-12-28 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-20977:
--
Status: Patch Available  (was: In Progress)

> Lazy evaluate the table object in PreReadTableEvent to improve get_partition 
> performance
> 
>
> Key: HIVE-20977
> URL: https://issues.apache.org/jira/browse/HIVE-20977
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Attachments: HIVE-20977.1.patch
>
>
> The PreReadTableEvent is generated for non-table operations (such as 
> get_partitions), but only if there is an event listener attached. However, 
> this is also not necessary if the event listener is not interested in the 
> read table event.
> For example, the TransactionalValidationListener's onEvent looks like this
> {code:java}
> @Override
> public void onEvent(PreEventContext context) throws MetaException, 
> NoSuchObjectException,
> InvalidOperationException {
>   switch (context.getEventType()) {
> case CREATE_TABLE:
>   handle((PreCreateTableEvent) context);
>   break;
> case ALTER_TABLE:
>   handle((PreAlterTableEvent) context);
>   break;
> default:
>   //no validation required..
>   }
> }{code}
>  
> Note that for read table events it is a no-op. The problem is that the 
> get_table is evaluated when creating the PreReadTableEvent finally to be just 
> ignored!
> Look at the code below.. {{getMS().getTable(..)}} is evaluated irrespective 
> of if the listener uses it or not.
> {code:java}
> private void fireReadTablePreEvent(String catName, String dbName, String 
> tblName)
> throws MetaException, NoSuchObjectException {
>   if(preListeners.size() > 0) {
> // do this only if there is a pre event listener registered (avoid 
> unnecessary
> // metastore api call)
> Table t = getMS().getTable(catName, dbName, tblName);
> if (t == null) {
>   throw new NoSuchObjectException(TableName.getQualified(catName, dbName, 
> tblName)
>   + " table not found");
> }
> firePreEvent(new PreReadTableEvent(t, this));
>   }
> }
> {code}
> This can be improved by using a {{Supplier}} and lazily evaluating the table 
> when needed (once when the first time it is called, memorized after that).
> *Motivation*
> Whenever a partition call occurs (get_partition, etc.), we fire the 
> PreReadTableEvent. This affects performance since it fetches the table even 
> if it is not being used. This change will improve performance on the 
> get_partition calls.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20977) Lazy evaluate the table object in PreReadTableEvent to improve get_partition performance

2018-12-28 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-20977:
--
Attachment: HIVE-20977.1.patch

> Lazy evaluate the table object in PreReadTableEvent to improve get_partition 
> performance
> 
>
> Key: HIVE-20977
> URL: https://issues.apache.org/jira/browse/HIVE-20977
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
> Attachments: HIVE-20977.1.patch
>
>
> The PreReadTableEvent is generated for non-table operations (such as 
> get_partitions), but only if there is an event listener attached. However, 
> this is also not necessary if the event listener is not interested in the 
> read table event.
> For example, the TransactionalValidationListener's onEvent looks like this
> {code:java}
> @Override
> public void onEvent(PreEventContext context) throws MetaException, 
> NoSuchObjectException,
> InvalidOperationException {
>   switch (context.getEventType()) {
> case CREATE_TABLE:
>   handle((PreCreateTableEvent) context);
>   break;
> case ALTER_TABLE:
>   handle((PreAlterTableEvent) context);
>   break;
> default:
>   //no validation required..
>   }
> }{code}
>  
> Note that for read table events it is a no-op. The problem is that the 
> get_table is evaluated when creating the PreReadTableEvent finally to be just 
> ignored!
> Look at the code below.. {{getMS().getTable(..)}} is evaluated irrespective 
> of if the listener uses it or not.
> {code:java}
> private void fireReadTablePreEvent(String catName, String dbName, String 
> tblName)
> throws MetaException, NoSuchObjectException {
>   if(preListeners.size() > 0) {
> // do this only if there is a pre event listener registered (avoid 
> unnecessary
> // metastore api call)
> Table t = getMS().getTable(catName, dbName, tblName);
> if (t == null) {
>   throw new NoSuchObjectException(TableName.getQualified(catName, dbName, 
> tblName)
>   + " table not found");
> }
> firePreEvent(new PreReadTableEvent(t, this));
>   }
> }
> {code}
> This can be improved by using a {{Supplier}} and lazily evaluating the table 
> when needed (once when the first time it is called, memorized after that).
> *Motivation*
> Whenever a partition call occurs (get_partition, etc.), we fire the 
> PreReadTableEvent. This affects performance since it fetches the table even 
> if it is not being used. This change will improve performance on the 
> get_partition calls.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work stopped] (HIVE-21045) Add connection pool info and rolling performance info to the metrics system

2018-12-28 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-21045 stopped by Karthik Manamcheri.
-
> Add connection pool info and rolling performance info to the metrics system
> ---
>
> Key: HIVE-21045
> URL: https://issues.apache.org/jira/browse/HIVE-21045
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>
> There are two key metrics which I think we lack and which would be really 
> great to help with scaling visibility in HMS.
> *Average API duration for the past 'n' minutes*
> We already compute and log the duration of API calls in the {{PerfLogger}}. 
> We don't have any gauge on what the average duration of an API call is for 
> the past some bucket of time. This will give us an insight into if there is 
> load on the server which is increasing the average API response time.
>  
> *RDBMS Connection wait time*
> We can use different connection pooling libraries such as bonecp or hikaricp. 
> These pool managers expose statistics such as average time waiting to get a 
> connection, number of connections active, etc. We should expose this as a 
> metric so that we can track if the the connection pool size configured is too 
> small and we are saturating!
> These metrics would help catch problems with HMS resource contention before 
> they actually have jobs failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HIVE-20977) Lazy evaluate the table object in PreReadTableEvent to improve get_partition performance

2018-12-28 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-20977 started by Karthik Manamcheri.
-
> Lazy evaluate the table object in PreReadTableEvent to improve get_partition 
> performance
> 
>
> Key: HIVE-20977
> URL: https://issues.apache.org/jira/browse/HIVE-20977
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>
> The PreReadTableEvent is generated for non-table operations (such as 
> get_partitions), but only if there is an event listener attached. However, 
> this is also not necessary if the event listener is not interested in the 
> read table event.
> For example, the TransactionalValidationListener's onEvent looks like this
> {code:java}
> @Override
> public void onEvent(PreEventContext context) throws MetaException, 
> NoSuchObjectException,
> InvalidOperationException {
>   switch (context.getEventType()) {
> case CREATE_TABLE:
>   handle((PreCreateTableEvent) context);
>   break;
> case ALTER_TABLE:
>   handle((PreAlterTableEvent) context);
>   break;
> default:
>   //no validation required..
>   }
> }{code}
>  
> Note that for read table events it is a no-op. The problem is that the 
> get_table is evaluated when creating the PreReadTableEvent finally to be just 
> ignored!
> Look at the code below.. {{getMS().getTable(..)}} is evaluated irrespective 
> of if the listener uses it or not.
> {code:java}
> private void fireReadTablePreEvent(String catName, String dbName, String 
> tblName)
> throws MetaException, NoSuchObjectException {
>   if(preListeners.size() > 0) {
> // do this only if there is a pre event listener registered (avoid 
> unnecessary
> // metastore api call)
> Table t = getMS().getTable(catName, dbName, tblName);
> if (t == null) {
>   throw new NoSuchObjectException(TableName.getQualified(catName, dbName, 
> tblName)
>   + " table not found");
> }
> firePreEvent(new PreReadTableEvent(t, this));
>   }
> }
> {code}
> This can be improved by using a {{Supplier}} and lazily evaluating the table 
> when needed (once when the first time it is called, memorized after that).
> *Motivation*
> Whenever a partition call occurs (get_partition, etc.), we fire the 
> PreReadTableEvent. This affects performance since it fetches the table even 
> if it is not being used. This change will improve performance on the 
> get_partition calls.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2018-12-28 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730383#comment-16730383
 ] 

Karthik Manamcheri commented on HIVE-21075:
---

Could we add the index on CD_ID column? Any reason why we don't have an index 
on that? [~pvary] [~ychena]?

> Metastore: Drop partition performance downgrade with Postgres DB
> 
>
> Key: HIVE-21075
> URL: https://issues.apache.org/jira/browse/HIVE-21075
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Yongzhi Chen
>Priority: Major
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HIVE-21045) Add connection pool info and rolling performance info to the metrics system

2018-12-20 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-21045 started by Karthik Manamcheri.
-
> Add connection pool info and rolling performance info to the metrics system
> ---
>
> Key: HIVE-21045
> URL: https://issues.apache.org/jira/browse/HIVE-21045
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>
> There are two key metrics which I think we lack and which would be really 
> great to help with scaling visibility in HMS.
> *Average API duration for the past 'n' minutes*
> We already compute and log the duration of API calls in the {{PerfLogger}}. 
> We don't have any gauge on what the average duration of an API call is for 
> the past some bucket of time. This will give us an insight into if there is 
> load on the server which is increasing the average API response time.
>  
> *RDBMS Connection wait time*
> We can use different connection pooling libraries such as bonecp or hikaricp. 
> These pool managers expose statistics such as average time waiting to get a 
> connection, number of connections active, etc. We should expose this as a 
> metric so that we can track if the the connection pool size configured is too 
> small and we are saturating!
> These metrics would help catch problems with HMS resource contention before 
> they actually have jobs failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21044) Add SLF4J reporter to the metastore metrics system

2018-12-20 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21044:
--
Attachment: HIVE-21044.2.patch

> Add SLF4J reporter to the metastore metrics system
> --
>
> Key: HIVE-21044
> URL: https://issues.apache.org/jira/browse/HIVE-21044
> Project: Hive
>  Issue Type: New Feature
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>  Labels: metrics
> Attachments: HIVE-21044.1.patch, HIVE-21044.2.patch
>
>
> Lets add SLF4J reporter as an option in Metrics reporting system. Currently 
> we support JMX, JSON and Console reporting.
> We will add a new option to {{hive.service.metrics.reporter}} called SLF4J. 
> We can use the 
> {{[Slf4jReporter|https://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Slf4jReporter.html]}}
>  class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21044) Add SLF4J reporter to the metastore metrics system

2018-12-19 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21044:
--
Component/s: (was: HiveServer2)

> Add SLF4J reporter to the metastore metrics system
> --
>
> Key: HIVE-21044
> URL: https://issues.apache.org/jira/browse/HIVE-21044
> Project: Hive
>  Issue Type: New Feature
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>  Labels: metrics
>
> Lets add SLF4J reporter as an option in Metrics reporting system. Currently 
> we support JMX, JSON and Console reporting.
> We will add a new option to {{hive.service.metrics.reporter}} called SLF4J. 
> We can use the 
> {{[Slf4jReporter|https://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Slf4jReporter.html]}}
>  class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21044) Add SLF4J reporter to the metastore metrics system

2018-12-19 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21044:
--
Attachment: HIVE-21044.1.patch

> Add SLF4J reporter to the metastore metrics system
> --
>
> Key: HIVE-21044
> URL: https://issues.apache.org/jira/browse/HIVE-21044
> Project: Hive
>  Issue Type: New Feature
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>  Labels: metrics
> Attachments: HIVE-21044.1.patch
>
>
> Lets add SLF4J reporter as an option in Metrics reporting system. Currently 
> we support JMX, JSON and Console reporting.
> We will add a new option to {{hive.service.metrics.reporter}} called SLF4J. 
> We can use the 
> {{[Slf4jReporter|https://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Slf4jReporter.html]}}
>  class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21044) Add SLF4J reporter to the metastore metrics system

2018-12-19 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21044:
--
Status: Patch Available  (was: In Progress)

> Add SLF4J reporter to the metastore metrics system
> --
>
> Key: HIVE-21044
> URL: https://issues.apache.org/jira/browse/HIVE-21044
> Project: Hive
>  Issue Type: New Feature
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>  Labels: metrics
> Attachments: HIVE-21044.1.patch
>
>
> Lets add SLF4J reporter as an option in Metrics reporting system. Currently 
> we support JMX, JSON and Console reporting.
> We will add a new option to {{hive.service.metrics.reporter}} called SLF4J. 
> We can use the 
> {{[Slf4jReporter|https://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Slf4jReporter.html]}}
>  class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21044) Add SLF4J reporter to the metastore metrics system

2018-12-19 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21044:
--
Summary: Add SLF4J reporter to the metastore metrics system  (was: Add 
SLF4J reporter to the metrics system)

> Add SLF4J reporter to the metastore metrics system
> --
>
> Key: HIVE-21044
> URL: https://issues.apache.org/jira/browse/HIVE-21044
> Project: Hive
>  Issue Type: New Feature
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>  Labels: metrics
>
> Lets add SLF4J reporter as an option in Metrics reporting system. Currently 
> we support JMX, JSON and Console reporting.
> We will add a new option to {{hive.service.metrics.reporter}} called SLF4J. 
> We can use the 
> {{[Slf4jReporter|https://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Slf4jReporter.html]}}
>  class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HIVE-21044) Add SLF4J reporter to the metrics system

2018-12-19 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-21044 started by Karthik Manamcheri.
-
> Add SLF4J reporter to the metrics system
> 
>
> Key: HIVE-21044
> URL: https://issues.apache.org/jira/browse/HIVE-21044
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2, Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>  Labels: metrics
>
> Lets add SLF4J reporter as an option in Metrics reporting system. Currently 
> we support JMX, JSON and Console reporting.
> We will add a new option to {{hive.service.metrics.reporter}} called SLF4J. 
> We can use the 
> {{[Slf4jReporter|https://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Slf4jReporter.html]}}
>  class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20776) Run HMS filterHooks on server-side in addition to client-side

2018-12-17 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri reassigned HIVE-20776:
-

Assignee: (was: Karthik Manamcheri)

> Run HMS filterHooks on server-side in addition to client-side
> -
>
> Key: HIVE-20776
> URL: https://issues.apache.org/jira/browse/HIVE-20776
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Priority: Major
>
> In HMS, I noticed that all the filter hooks are applied on the client side 
> (in HiveMetaStoreClient.java). Is there any reason why we can't apply the 
> filters on the server-side?
> Motivation: Some newer apache projects such as Kudu use HMS for metadata 
> storage. Kudu is not completely Java-based and there are interaction points 
> where they have C++ clients. In such cases, it would be ideal to have 
> consistent behavior from HMS side as far as filters, etc are concerned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21045) Add connection pool info and rolling performance info to the metrics system

2018-12-14 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri reassigned HIVE-21045:
-


> Add connection pool info and rolling performance info to the metrics system
> ---
>
> Key: HIVE-21045
> URL: https://issues.apache.org/jira/browse/HIVE-21045
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>
> There are two key metrics which I think we lack and which would be really 
> great to help with scaling visibility in HMS.
> *Average API duration for the past 'n' minutes*
> We already compute and log the duration of API calls in the {{PerfLogger}}. 
> We don't have any gauge on what the average duration of an API call is for 
> the past some bucket of time. This will give us an insight into if there is 
> load on the server which is increasing the average API response time.
>  
> *RDBMS Connection wait time*
> We can use different connection pooling libraries such as bonecp or hikaricp. 
> These pool managers expose statistics such as average time waiting to get a 
> connection, number of connections active, etc. We should expose this as a 
> metric so that we can track if the the connection pool size configured is too 
> small and we are saturating!
> These metrics would help catch problems with HMS resource contention before 
> they actually have jobs failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21044) Add SLF4J reporter to the metrics system

2018-12-14 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21044:
--
Affects Version/s: (was: 3.1.0)

> Add SLF4J reporter to the metrics system
> 
>
> Key: HIVE-21044
> URL: https://issues.apache.org/jira/browse/HIVE-21044
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2, Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>  Labels: metrics
>
> Lets add SLF4J reporter as an option in Metrics reporting system. Currently 
> we support JMX, JSON and Console reporting.
> We will add a new option to {{hive.service.metrics.reporter}} called SLF4J. 
> We can use the 
> {{[Slf4jReporter|https://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Slf4jReporter.html]}}
>  class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21044) Add SLF4J reporter to the metrics system

2018-12-14 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri reassigned HIVE-21044:
-


> Add SLF4J reporter to the metrics system
> 
>
> Key: HIVE-21044
> URL: https://issues.apache.org/jira/browse/HIVE-21044
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2, Standalone Metastore
>Affects Versions: 3.1.0
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>  Labels: metrics
>
> Lets add SLF4J reporter as an option in Metrics reporting system. Currently 
> we support JMX, JSON and Console reporting.
> We will add a new option to {{hive.service.metrics.reporter}} called SLF4J. 
> We can use the 
> {{[Slf4jReporter|https://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Slf4jReporter.html]}}
>  class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21028) get_table_meta should use a fetch plan to avoid race conditions ending up in NucleusObjectNotFoundException

2018-12-14 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721639#comment-16721639
 ] 

Karthik Manamcheri commented on HIVE-21028:
---

[~ngangam] I have attached branch-3 patch. Thank you for the merge.

> get_table_meta should use a fetch plan to avoid race conditions ending up in 
> NucleusObjectNotFoundException
> ---
>
> Key: HIVE-21028
> URL: https://issues.apache.org/jira/browse/HIVE-21028
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-21028.1.patch, HIVE-21028.2.patch, 
> HIVE-21028.3.patch, HIVE-21028.4.patch, HIVE-21028.5.patch, 
> HIVE-21028.branch-3.patch
>
>
> The {{getTableMeta}} call retrieves the tables, loops through the tables and 
> during this loop it retrieves the database object to get the containing 
> database name. DataNuclues does a lazy retrieval and so, when the first call 
> to get all the tables is done, it does not retrieve the database objects.
> When this query is executed
> {code}query = pm.newQuery(MTable.class, filterBuilder.toString());
> {code}
> it loads all the tables, and when you do
> {code}
> table.getDatabase().getName()
> {code}
> it then goes and retrieves the database object.
> *However*, there could be another thread which actually has deleted the 
> database!! If this happens, we end up with exceptions such as
> {code}
> 2018-12-04 22:25:06,525 INFO  DataNucleus.Datastore.Retrieve: 
> [pool-7-thread-191]: Object with id 
> "6930391[OID]org.apache.hadoop.hive.metastore.model.MTable" not found !
> 2018-12-04 22:25:06,527 WARN  DataNucleus.Persistence: [pool-7-thread-191]: 
> Exception thrown by StateManager.isLoaded
> No such database row
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row
> {code}
> We see this happen especially with calls which retrieve all the tables in all 
> the databases (basically a call to get_table_meta with dbNames="\*" and 
> tableNames="\*").
> To avoid this, we can define a custom fetch plan and activate it only for the 
> get_table_meta query. This fetch plan would fetch the database object along 
> with the MTable object.
> We would first create a fetch plan on the pmf
> {code}
> pmf.getFetchGroup(MTable.class, 
> "mtable_db_fetch_group").addMember("database");
> {code}
> Then we use it just before calling the query
> {code}
> pm.getFetchPlan().addGroup("mtable_db_fetch_group");
> query = pm.newQuery(MTable.class, filterBuilder.toString());
> Collection tables = (Collection) query.executeWithArray(...);
> ...
> {code}
> Before the API call ends, we can remove the fetch plan by
> {code}
> pm.getFetchPlan().removeGroup("mtable_db_fetch_group");
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21028) get_table_meta should use a fetch plan to avoid race conditions ending up in NucleusObjectNotFoundException

2018-12-14 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21028:
--
Affects Version/s: 3.0.0

> get_table_meta should use a fetch plan to avoid race conditions ending up in 
> NucleusObjectNotFoundException
> ---
>
> Key: HIVE-21028
> URL: https://issues.apache.org/jira/browse/HIVE-21028
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-21028.1.patch, HIVE-21028.2.patch, 
> HIVE-21028.3.patch, HIVE-21028.4.patch, HIVE-21028.5.patch, 
> HIVE-21028.branch-3.patch
>
>
> The {{getTableMeta}} call retrieves the tables, loops through the tables and 
> during this loop it retrieves the database object to get the containing 
> database name. DataNuclues does a lazy retrieval and so, when the first call 
> to get all the tables is done, it does not retrieve the database objects.
> When this query is executed
> {code}query = pm.newQuery(MTable.class, filterBuilder.toString());
> {code}
> it loads all the tables, and when you do
> {code}
> table.getDatabase().getName()
> {code}
> it then goes and retrieves the database object.
> *However*, there could be another thread which actually has deleted the 
> database!! If this happens, we end up with exceptions such as
> {code}
> 2018-12-04 22:25:06,525 INFO  DataNucleus.Datastore.Retrieve: 
> [pool-7-thread-191]: Object with id 
> "6930391[OID]org.apache.hadoop.hive.metastore.model.MTable" not found !
> 2018-12-04 22:25:06,527 WARN  DataNucleus.Persistence: [pool-7-thread-191]: 
> Exception thrown by StateManager.isLoaded
> No such database row
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row
> {code}
> We see this happen especially with calls which retrieve all the tables in all 
> the databases (basically a call to get_table_meta with dbNames="\*" and 
> tableNames="\*").
> To avoid this, we can define a custom fetch plan and activate it only for the 
> get_table_meta query. This fetch plan would fetch the database object along 
> with the MTable object.
> We would first create a fetch plan on the pmf
> {code}
> pmf.getFetchGroup(MTable.class, 
> "mtable_db_fetch_group").addMember("database");
> {code}
> Then we use it just before calling the query
> {code}
> pm.getFetchPlan().addGroup("mtable_db_fetch_group");
> query = pm.newQuery(MTable.class, filterBuilder.toString());
> Collection tables = (Collection) query.executeWithArray(...);
> ...
> {code}
> Before the API call ends, we can remove the fetch plan by
> {code}
> pm.getFetchPlan().removeGroup("mtable_db_fetch_group");
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21028) get_table_meta should use a fetch plan to avoid race conditions ending up in NucleusObjectNotFoundException

2018-12-14 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21028:
--
Attachment: HIVE-21028.branch-3.patch

> get_table_meta should use a fetch plan to avoid race conditions ending up in 
> NucleusObjectNotFoundException
> ---
>
> Key: HIVE-21028
> URL: https://issues.apache.org/jira/browse/HIVE-21028
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-21028.1.patch, HIVE-21028.2.patch, 
> HIVE-21028.3.patch, HIVE-21028.4.patch, HIVE-21028.5.patch, 
> HIVE-21028.branch-3.patch
>
>
> The {{getTableMeta}} call retrieves the tables, loops through the tables and 
> during this loop it retrieves the database object to get the containing 
> database name. DataNuclues does a lazy retrieval and so, when the first call 
> to get all the tables is done, it does not retrieve the database objects.
> When this query is executed
> {code}query = pm.newQuery(MTable.class, filterBuilder.toString());
> {code}
> it loads all the tables, and when you do
> {code}
> table.getDatabase().getName()
> {code}
> it then goes and retrieves the database object.
> *However*, there could be another thread which actually has deleted the 
> database!! If this happens, we end up with exceptions such as
> {code}
> 2018-12-04 22:25:06,525 INFO  DataNucleus.Datastore.Retrieve: 
> [pool-7-thread-191]: Object with id 
> "6930391[OID]org.apache.hadoop.hive.metastore.model.MTable" not found !
> 2018-12-04 22:25:06,527 WARN  DataNucleus.Persistence: [pool-7-thread-191]: 
> Exception thrown by StateManager.isLoaded
> No such database row
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row
> {code}
> We see this happen especially with calls which retrieve all the tables in all 
> the databases (basically a call to get_table_meta with dbNames="\*" and 
> tableNames="\*").
> To avoid this, we can define a custom fetch plan and activate it only for the 
> get_table_meta query. This fetch plan would fetch the database object along 
> with the MTable object.
> We would first create a fetch plan on the pmf
> {code}
> pmf.getFetchGroup(MTable.class, 
> "mtable_db_fetch_group").addMember("database");
> {code}
> Then we use it just before calling the query
> {code}
> pm.getFetchPlan().addGroup("mtable_db_fetch_group");
> query = pm.newQuery(MTable.class, filterBuilder.toString());
> Collection tables = (Collection) query.executeWithArray(...);
> ...
> {code}
> Before the API call ends, we can remove the fetch plan by
> {code}
> pm.getFetchPlan().removeGroup("mtable_db_fetch_group");
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21028) get_table_meta should use a fetch plan to avoid race conditions ending up in NucleusObjectNotFoundException

2018-12-14 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21028:
--
Component/s: Standalone Metastore

> get_table_meta should use a fetch plan to avoid race conditions ending up in 
> NucleusObjectNotFoundException
> ---
>
> Key: HIVE-21028
> URL: https://issues.apache.org/jira/browse/HIVE-21028
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 3.0.0
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-21028.1.patch, HIVE-21028.2.patch, 
> HIVE-21028.3.patch, HIVE-21028.4.patch, HIVE-21028.5.patch, 
> HIVE-21028.branch-3.patch
>
>
> The {{getTableMeta}} call retrieves the tables, loops through the tables and 
> during this loop it retrieves the database object to get the containing 
> database name. DataNuclues does a lazy retrieval and so, when the first call 
> to get all the tables is done, it does not retrieve the database objects.
> When this query is executed
> {code}query = pm.newQuery(MTable.class, filterBuilder.toString());
> {code}
> it loads all the tables, and when you do
> {code}
> table.getDatabase().getName()
> {code}
> it then goes and retrieves the database object.
> *However*, there could be another thread which actually has deleted the 
> database!! If this happens, we end up with exceptions such as
> {code}
> 2018-12-04 22:25:06,525 INFO  DataNucleus.Datastore.Retrieve: 
> [pool-7-thread-191]: Object with id 
> "6930391[OID]org.apache.hadoop.hive.metastore.model.MTable" not found !
> 2018-12-04 22:25:06,527 WARN  DataNucleus.Persistence: [pool-7-thread-191]: 
> Exception thrown by StateManager.isLoaded
> No such database row
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row
> {code}
> We see this happen especially with calls which retrieve all the tables in all 
> the databases (basically a call to get_table_meta with dbNames="\*" and 
> tableNames="\*").
> To avoid this, we can define a custom fetch plan and activate it only for the 
> get_table_meta query. This fetch plan would fetch the database object along 
> with the MTable object.
> We would first create a fetch plan on the pmf
> {code}
> pmf.getFetchGroup(MTable.class, 
> "mtable_db_fetch_group").addMember("database");
> {code}
> Then we use it just before calling the query
> {code}
> pm.getFetchPlan().addGroup("mtable_db_fetch_group");
> query = pm.newQuery(MTable.class, filterBuilder.toString());
> Collection tables = (Collection) query.executeWithArray(...);
> ...
> {code}
> Before the API call ends, we can remove the fetch plan by
> {code}
> pm.getFetchPlan().removeGroup("mtable_db_fetch_group");
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21028) get_table_meta should use a fetch plan to avoid race conditions ending up in NucleusObjectNotFoundException

2018-12-14 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21028:
--
Fix Version/s: 3.2.0
   4.0.0

> get_table_meta should use a fetch plan to avoid race conditions ending up in 
> NucleusObjectNotFoundException
> ---
>
> Key: HIVE-21028
> URL: https://issues.apache.org/jira/browse/HIVE-21028
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-21028.1.patch, HIVE-21028.2.patch, 
> HIVE-21028.3.patch, HIVE-21028.4.patch, HIVE-21028.5.patch, 
> HIVE-21028.branch-3.patch
>
>
> The {{getTableMeta}} call retrieves the tables, loops through the tables and 
> during this loop it retrieves the database object to get the containing 
> database name. DataNuclues does a lazy retrieval and so, when the first call 
> to get all the tables is done, it does not retrieve the database objects.
> When this query is executed
> {code}query = pm.newQuery(MTable.class, filterBuilder.toString());
> {code}
> it loads all the tables, and when you do
> {code}
> table.getDatabase().getName()
> {code}
> it then goes and retrieves the database object.
> *However*, there could be another thread which actually has deleted the 
> database!! If this happens, we end up with exceptions such as
> {code}
> 2018-12-04 22:25:06,525 INFO  DataNucleus.Datastore.Retrieve: 
> [pool-7-thread-191]: Object with id 
> "6930391[OID]org.apache.hadoop.hive.metastore.model.MTable" not found !
> 2018-12-04 22:25:06,527 WARN  DataNucleus.Persistence: [pool-7-thread-191]: 
> Exception thrown by StateManager.isLoaded
> No such database row
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row
> {code}
> We see this happen especially with calls which retrieve all the tables in all 
> the databases (basically a call to get_table_meta with dbNames="\*" and 
> tableNames="\*").
> To avoid this, we can define a custom fetch plan and activate it only for the 
> get_table_meta query. This fetch plan would fetch the database object along 
> with the MTable object.
> We would first create a fetch plan on the pmf
> {code}
> pmf.getFetchGroup(MTable.class, 
> "mtable_db_fetch_group").addMember("database");
> {code}
> Then we use it just before calling the query
> {code}
> pm.getFetchPlan().addGroup("mtable_db_fetch_group");
> query = pm.newQuery(MTable.class, filterBuilder.toString());
> Collection tables = (Collection) query.executeWithArray(...);
> ...
> {code}
> Before the API call ends, we can remove the fetch plan by
> {code}
> pm.getFetchPlan().removeGroup("mtable_db_fetch_group");
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21028) get_table_meta should use a fetch plan to avoid race conditions ending up in NucleusObjectNotFoundException

2018-12-13 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21028:
--
Attachment: HIVE-21028.5.patch

> get_table_meta should use a fetch plan to avoid race conditions ending up in 
> NucleusObjectNotFoundException
> ---
>
> Key: HIVE-21028
> URL: https://issues.apache.org/jira/browse/HIVE-21028
> Project: Hive
>  Issue Type: Bug
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
> Attachments: HIVE-21028.1.patch, HIVE-21028.2.patch, 
> HIVE-21028.3.patch, HIVE-21028.4.patch, HIVE-21028.5.patch
>
>
> The {{getTableMeta}} call retrieves the tables, loops through the tables and 
> during this loop it retrieves the database object to get the containing 
> database name. DataNuclues does a lazy retrieval and so, when the first call 
> to get all the tables is done, it does not retrieve the database objects.
> When this query is executed
> {code}query = pm.newQuery(MTable.class, filterBuilder.toString());
> {code}
> it loads all the tables, and when you do
> {code}
> table.getDatabase().getName()
> {code}
> it then goes and retrieves the database object.
> *However*, there could be another thread which actually has deleted the 
> database!! If this happens, we end up with exceptions such as
> {code}
> 2018-12-04 22:25:06,525 INFO  DataNucleus.Datastore.Retrieve: 
> [pool-7-thread-191]: Object with id 
> "6930391[OID]org.apache.hadoop.hive.metastore.model.MTable" not found !
> 2018-12-04 22:25:06,527 WARN  DataNucleus.Persistence: [pool-7-thread-191]: 
> Exception thrown by StateManager.isLoaded
> No such database row
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row
> {code}
> We see this happen especially with calls which retrieve all the tables in all 
> the databases (basically a call to get_table_meta with dbNames="\*" and 
> tableNames="\*").
> To avoid this, we can define a custom fetch plan and activate it only for the 
> get_table_meta query. This fetch plan would fetch the database object along 
> with the MTable object.
> We would first create a fetch plan on the pmf
> {code}
> pmf.getFetchGroup(MTable.class, 
> "mtable_db_fetch_group").addMember("database");
> {code}
> Then we use it just before calling the query
> {code}
> pm.getFetchPlan().addGroup("mtable_db_fetch_group");
> query = pm.newQuery(MTable.class, filterBuilder.toString());
> Collection tables = (Collection) query.executeWithArray(...);
> ...
> {code}
> Before the API call ends, we can remove the fetch plan by
> {code}
> pm.getFetchPlan().removeGroup("mtable_db_fetch_group");
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21028) get_table_meta should use a fetch plan to avoid race conditions ending up in NucleusObjectNotFoundException

2018-12-13 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21028:
--
Attachment: HIVE-21028.4.patch

> get_table_meta should use a fetch plan to avoid race conditions ending up in 
> NucleusObjectNotFoundException
> ---
>
> Key: HIVE-21028
> URL: https://issues.apache.org/jira/browse/HIVE-21028
> Project: Hive
>  Issue Type: Bug
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
> Attachments: HIVE-21028.1.patch, HIVE-21028.2.patch, 
> HIVE-21028.3.patch, HIVE-21028.4.patch
>
>
> The {{getTableMeta}} call retrieves the tables, loops through the tables and 
> during this loop it retrieves the database object to get the containing 
> database name. DataNuclues does a lazy retrieval and so, when the first call 
> to get all the tables is done, it does not retrieve the database objects.
> When this query is executed
> {code}query = pm.newQuery(MTable.class, filterBuilder.toString());
> {code}
> it loads all the tables, and when you do
> {code}
> table.getDatabase().getName()
> {code}
> it then goes and retrieves the database object.
> *However*, there could be another thread which actually has deleted the 
> database!! If this happens, we end up with exceptions such as
> {code}
> 2018-12-04 22:25:06,525 INFO  DataNucleus.Datastore.Retrieve: 
> [pool-7-thread-191]: Object with id 
> "6930391[OID]org.apache.hadoop.hive.metastore.model.MTable" not found !
> 2018-12-04 22:25:06,527 WARN  DataNucleus.Persistence: [pool-7-thread-191]: 
> Exception thrown by StateManager.isLoaded
> No such database row
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row
> {code}
> We see this happen especially with calls which retrieve all the tables in all 
> the databases (basically a call to get_table_meta with dbNames="\*" and 
> tableNames="\*").
> To avoid this, we can define a custom fetch plan and activate it only for the 
> get_table_meta query. This fetch plan would fetch the database object along 
> with the MTable object.
> We would first create a fetch plan on the pmf
> {code}
> pmf.getFetchGroup(MTable.class, 
> "mtable_db_fetch_group").addMember("database");
> {code}
> Then we use it just before calling the query
> {code}
> pm.getFetchPlan().addGroup("mtable_db_fetch_group");
> query = pm.newQuery(MTable.class, filterBuilder.toString());
> Collection tables = (Collection) query.executeWithArray(...);
> ...
> {code}
> Before the API call ends, we can remove the fetch plan by
> {code}
> pm.getFetchPlan().removeGroup("mtable_db_fetch_group");
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21028) get_table_meta should use a fetch plan to avoid race conditions ending up in NucleusObjectNotFoundException

2018-12-13 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21028:
--
Attachment: HIVE-21028.3.patch

> get_table_meta should use a fetch plan to avoid race conditions ending up in 
> NucleusObjectNotFoundException
> ---
>
> Key: HIVE-21028
> URL: https://issues.apache.org/jira/browse/HIVE-21028
> Project: Hive
>  Issue Type: Bug
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
> Attachments: HIVE-21028.1.patch, HIVE-21028.2.patch, 
> HIVE-21028.3.patch
>
>
> The {{getTableMeta}} call retrieves the tables, loops through the tables and 
> during this loop it retrieves the database object to get the containing 
> database name. DataNuclues does a lazy retrieval and so, when the first call 
> to get all the tables is done, it does not retrieve the database objects.
> When this query is executed
> {code}query = pm.newQuery(MTable.class, filterBuilder.toString());
> {code}
> it loads all the tables, and when you do
> {code}
> table.getDatabase().getName()
> {code}
> it then goes and retrieves the database object.
> *However*, there could be another thread which actually has deleted the 
> database!! If this happens, we end up with exceptions such as
> {code}
> 2018-12-04 22:25:06,525 INFO  DataNucleus.Datastore.Retrieve: 
> [pool-7-thread-191]: Object with id 
> "6930391[OID]org.apache.hadoop.hive.metastore.model.MTable" not found !
> 2018-12-04 22:25:06,527 WARN  DataNucleus.Persistence: [pool-7-thread-191]: 
> Exception thrown by StateManager.isLoaded
> No such database row
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row
> {code}
> We see this happen especially with calls which retrieve all the tables in all 
> the databases (basically a call to get_table_meta with dbNames="\*" and 
> tableNames="\*").
> To avoid this, we can define a custom fetch plan and activate it only for the 
> get_table_meta query. This fetch plan would fetch the database object along 
> with the MTable object.
> We would first create a fetch plan on the pmf
> {code}
> pmf.getFetchGroup(MTable.class, 
> "mtable_db_fetch_group").addMember("database");
> {code}
> Then we use it just before calling the query
> {code}
> pm.getFetchPlan().addGroup("mtable_db_fetch_group");
> query = pm.newQuery(MTable.class, filterBuilder.toString());
> Collection tables = (Collection) query.executeWithArray(...);
> ...
> {code}
> Before the API call ends, we can remove the fetch plan by
> {code}
> pm.getFetchPlan().removeGroup("mtable_db_fetch_group");
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21028) get_table_meta should use a fetch plan to avoid race conditions ending up in NucleusObjectNotFoundException

2018-12-13 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21028:
--
Attachment: HIVE-21028.2.patch

> get_table_meta should use a fetch plan to avoid race conditions ending up in 
> NucleusObjectNotFoundException
> ---
>
> Key: HIVE-21028
> URL: https://issues.apache.org/jira/browse/HIVE-21028
> Project: Hive
>  Issue Type: Bug
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
> Attachments: HIVE-21028.1.patch, HIVE-21028.2.patch
>
>
> The {{getTableMeta}} call retrieves the tables, loops through the tables and 
> during this loop it retrieves the database object to get the containing 
> database name. DataNuclues does a lazy retrieval and so, when the first call 
> to get all the tables is done, it does not retrieve the database objects.
> When this query is executed
> {code}query = pm.newQuery(MTable.class, filterBuilder.toString());
> {code}
> it loads all the tables, and when you do
> {code}
> table.getDatabase().getName()
> {code}
> it then goes and retrieves the database object.
> *However*, there could be another thread which actually has deleted the 
> database!! If this happens, we end up with exceptions such as
> {code}
> 2018-12-04 22:25:06,525 INFO  DataNucleus.Datastore.Retrieve: 
> [pool-7-thread-191]: Object with id 
> "6930391[OID]org.apache.hadoop.hive.metastore.model.MTable" not found !
> 2018-12-04 22:25:06,527 WARN  DataNucleus.Persistence: [pool-7-thread-191]: 
> Exception thrown by StateManager.isLoaded
> No such database row
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row
> {code}
> We see this happen especially with calls which retrieve all the tables in all 
> the databases (basically a call to get_table_meta with dbNames="\*" and 
> tableNames="\*").
> To avoid this, we can define a custom fetch plan and activate it only for the 
> get_table_meta query. This fetch plan would fetch the database object along 
> with the MTable object.
> We would first create a fetch plan on the pmf
> {code}
> pmf.getFetchGroup(MTable.class, 
> "mtable_db_fetch_group").addMember("database");
> {code}
> Then we use it just before calling the query
> {code}
> pm.getFetchPlan().addGroup("mtable_db_fetch_group");
> query = pm.newQuery(MTable.class, filterBuilder.toString());
> Collection tables = (Collection) query.executeWithArray(...);
> ...
> {code}
> Before the API call ends, we can remove the fetch plan by
> {code}
> pm.getFetchPlan().removeGroup("mtable_db_fetch_group");
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21028) get_table_meta should use a fetch plan to avoid race conditions ending up in NucleusObjectNotFoundException

2018-12-12 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21028:
--
Attachment: HIVE-21028.1.patch

> get_table_meta should use a fetch plan to avoid race conditions ending up in 
> NucleusObjectNotFoundException
> ---
>
> Key: HIVE-21028
> URL: https://issues.apache.org/jira/browse/HIVE-21028
> Project: Hive
>  Issue Type: Bug
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
> Attachments: HIVE-21028.1.patch
>
>
> The {{getTableMeta}} call retrieves the tables, loops through the tables and 
> during this loop it retrieves the database object to get the containing 
> database name. DataNuclues does a lazy retrieval and so, when the first call 
> to get all the tables is done, it does not retrieve the database objects.
> When this query is executed
> {code}query = pm.newQuery(MTable.class, filterBuilder.toString());
> {code}
> it loads all the tables, and when you do
> {code}
> table.getDatabase().getName()
> {code}
> it then goes and retrieves the database object.
> *However*, there could be another thread which actually has deleted the 
> database!! If this happens, we end up with exceptions such as
> {code}
> 2018-12-04 22:25:06,525 INFO  DataNucleus.Datastore.Retrieve: 
> [pool-7-thread-191]: Object with id 
> "6930391[OID]org.apache.hadoop.hive.metastore.model.MTable" not found !
> 2018-12-04 22:25:06,527 WARN  DataNucleus.Persistence: [pool-7-thread-191]: 
> Exception thrown by StateManager.isLoaded
> No such database row
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row
> {code}
> We see this happen especially with calls which retrieve all the tables in all 
> the databases (basically a call to get_table_meta with dbNames="\*" and 
> tableNames="\*").
> To avoid this, we can define a custom fetch plan and activate it only for the 
> get_table_meta query. This fetch plan would fetch the database object along 
> with the MTable object.
> We would first create a fetch plan on the pmf
> {code}
> pmf.getFetchGroup(MTable.class, 
> "mtable_db_fetch_group").addMember("database");
> {code}
> Then we use it just before calling the query
> {code}
> pm.getFetchPlan().addGroup("mtable_db_fetch_group");
> query = pm.newQuery(MTable.class, filterBuilder.toString());
> Collection tables = (Collection) query.executeWithArray(...);
> ...
> {code}
> Before the API call ends, we can remove the fetch plan by
> {code}
> pm.getFetchPlan().removeGroup("mtable_db_fetch_group");
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21028) get_table_meta should use a fetch plan to avoid race conditions ending up in NucleusObjectNotFoundException

2018-12-12 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21028:
--
Status: Patch Available  (was: In Progress)

> get_table_meta should use a fetch plan to avoid race conditions ending up in 
> NucleusObjectNotFoundException
> ---
>
> Key: HIVE-21028
> URL: https://issues.apache.org/jira/browse/HIVE-21028
> Project: Hive
>  Issue Type: Bug
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
> Attachments: HIVE-21028.1.patch
>
>
> The {{getTableMeta}} call retrieves the tables, loops through the tables and 
> during this loop it retrieves the database object to get the containing 
> database name. DataNuclues does a lazy retrieval and so, when the first call 
> to get all the tables is done, it does not retrieve the database objects.
> When this query is executed
> {code}query = pm.newQuery(MTable.class, filterBuilder.toString());
> {code}
> it loads all the tables, and when you do
> {code}
> table.getDatabase().getName()
> {code}
> it then goes and retrieves the database object.
> *However*, there could be another thread which actually has deleted the 
> database!! If this happens, we end up with exceptions such as
> {code}
> 2018-12-04 22:25:06,525 INFO  DataNucleus.Datastore.Retrieve: 
> [pool-7-thread-191]: Object with id 
> "6930391[OID]org.apache.hadoop.hive.metastore.model.MTable" not found !
> 2018-12-04 22:25:06,527 WARN  DataNucleus.Persistence: [pool-7-thread-191]: 
> Exception thrown by StateManager.isLoaded
> No such database row
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row
> {code}
> We see this happen especially with calls which retrieve all the tables in all 
> the databases (basically a call to get_table_meta with dbNames="\*" and 
> tableNames="\*").
> To avoid this, we can define a custom fetch plan and activate it only for the 
> get_table_meta query. This fetch plan would fetch the database object along 
> with the MTable object.
> We would first create a fetch plan on the pmf
> {code}
> pmf.getFetchGroup(MTable.class, 
> "mtable_db_fetch_group").addMember("database");
> {code}
> Then we use it just before calling the query
> {code}
> pm.getFetchPlan().addGroup("mtable_db_fetch_group");
> query = pm.newQuery(MTable.class, filterBuilder.toString());
> Collection tables = (Collection) query.executeWithArray(...);
> ...
> {code}
> Before the API call ends, we can remove the fetch plan by
> {code}
> pm.getFetchPlan().removeGroup("mtable_db_fetch_group");
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21028) get_table_meta should use a fetch plan

2018-12-10 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21028:
--
Issue Type: Bug  (was: Improvement)

> get_table_meta should use a fetch plan
> --
>
> Key: HIVE-21028
> URL: https://issues.apache.org/jira/browse/HIVE-21028
> Project: Hive
>  Issue Type: Bug
>Reporter: Karthik Manamcheri
>Priority: Major
>
> The {{getTableMeta}} call retrieves the tables, loops through the tables and 
> during this loop it retrieves the database object to get the containing 
> database name. DataNuclues does a lazy retrieval and so, when the first call 
> to get all the tables is done, it does not retrieve the database objects.
> When this query is executed
> {code}query = pm.newQuery(MTable.class, filterBuilder.toString());
> {code}
> it loads all the tables, and when you do
> {code}
> table.getDatabase().getName()
> {code}
> it then goes and retrieves the database object.
> *However*, there could be another thread which actually has deleted the 
> database!! If this happens, we end up with exceptions such as
> {code}
> 2018-12-04 22:25:06,525 INFO  DataNucleus.Datastore.Retrieve: 
> [pool-7-thread-191]: Object with id 
> "6930391[OID]org.apache.hadoop.hive.metastore.model.MTable" not found !
> 2018-12-04 22:25:06,527 WARN  DataNucleus.Persistence: [pool-7-thread-191]: 
> Exception thrown by StateManager.isLoaded
> No such database row
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row
> {code}
> We see this happen especially with calls which retrieve all the tables in all 
> the databases (basically a call to get_table_meta with dbNames="\*" and 
> tableNames="\*").
> To avoid this, we can define a custom fetch plan and activate it only for the 
> get_table_meta query. This fetch plan would fetch the database object along 
> with the MTable object.
> We would first create a fetch plan on the pmf
> {code}
> pmf.getFetchGroup(MTable.class, 
> "mtable_db_fetch_group").addMember("database");
> {code}
> Then we use it just before calling the query
> {code}
> pm.getFetchPlan().addGroup("mtable_db_fetch_group");
> query = pm.newQuery(MTable.class, filterBuilder.toString());
> Collection tables = (Collection) query.executeWithArray(...);
> ...
> {code}
> Before the API call ends, we can remove the fetch plan by
> {code}
> pm.getFetchPlan().removeGroup("mtable_db_fetch_group");
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HIVE-21028) get_table_meta should use a fetch plan to avoid race conditions ending up in NucleusObjectNotFoundException

2018-12-10 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-21028 started by Karthik Manamcheri.
-
> get_table_meta should use a fetch plan to avoid race conditions ending up in 
> NucleusObjectNotFoundException
> ---
>
> Key: HIVE-21028
> URL: https://issues.apache.org/jira/browse/HIVE-21028
> Project: Hive
>  Issue Type: Bug
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
>
> The {{getTableMeta}} call retrieves the tables, loops through the tables and 
> during this loop it retrieves the database object to get the containing 
> database name. DataNuclues does a lazy retrieval and so, when the first call 
> to get all the tables is done, it does not retrieve the database objects.
> When this query is executed
> {code}query = pm.newQuery(MTable.class, filterBuilder.toString());
> {code}
> it loads all the tables, and when you do
> {code}
> table.getDatabase().getName()
> {code}
> it then goes and retrieves the database object.
> *However*, there could be another thread which actually has deleted the 
> database!! If this happens, we end up with exceptions such as
> {code}
> 2018-12-04 22:25:06,525 INFO  DataNucleus.Datastore.Retrieve: 
> [pool-7-thread-191]: Object with id 
> "6930391[OID]org.apache.hadoop.hive.metastore.model.MTable" not found !
> 2018-12-04 22:25:06,527 WARN  DataNucleus.Persistence: [pool-7-thread-191]: 
> Exception thrown by StateManager.isLoaded
> No such database row
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row
> {code}
> We see this happen especially with calls which retrieve all the tables in all 
> the databases (basically a call to get_table_meta with dbNames="\*" and 
> tableNames="\*").
> To avoid this, we can define a custom fetch plan and activate it only for the 
> get_table_meta query. This fetch plan would fetch the database object along 
> with the MTable object.
> We would first create a fetch plan on the pmf
> {code}
> pmf.getFetchGroup(MTable.class, 
> "mtable_db_fetch_group").addMember("database");
> {code}
> Then we use it just before calling the query
> {code}
> pm.getFetchPlan().addGroup("mtable_db_fetch_group");
> query = pm.newQuery(MTable.class, filterBuilder.toString());
> Collection tables = (Collection) query.executeWithArray(...);
> ...
> {code}
> Before the API call ends, we can remove the fetch plan by
> {code}
> pm.getFetchPlan().removeGroup("mtable_db_fetch_group");
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21028) get_table_meta should use a fetch plan to avoid race conditions ending up in NucleusObjectNotFoundException

2018-12-10 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21028:
--
Summary: get_table_meta should use a fetch plan to avoid race conditions 
ending up in NucleusObjectNotFoundException  (was: get_table_meta should use a 
fetch plan to avoid race conditions ending up in JDOObjectNotFoundException)

> get_table_meta should use a fetch plan to avoid race conditions ending up in 
> NucleusObjectNotFoundException
> ---
>
> Key: HIVE-21028
> URL: https://issues.apache.org/jira/browse/HIVE-21028
> Project: Hive
>  Issue Type: Bug
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
>
> The {{getTableMeta}} call retrieves the tables, loops through the tables and 
> during this loop it retrieves the database object to get the containing 
> database name. DataNuclues does a lazy retrieval and so, when the first call 
> to get all the tables is done, it does not retrieve the database objects.
> When this query is executed
> {code}query = pm.newQuery(MTable.class, filterBuilder.toString());
> {code}
> it loads all the tables, and when you do
> {code}
> table.getDatabase().getName()
> {code}
> it then goes and retrieves the database object.
> *However*, there could be another thread which actually has deleted the 
> database!! If this happens, we end up with exceptions such as
> {code}
> 2018-12-04 22:25:06,525 INFO  DataNucleus.Datastore.Retrieve: 
> [pool-7-thread-191]: Object with id 
> "6930391[OID]org.apache.hadoop.hive.metastore.model.MTable" not found !
> 2018-12-04 22:25:06,527 WARN  DataNucleus.Persistence: [pool-7-thread-191]: 
> Exception thrown by StateManager.isLoaded
> No such database row
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row
> {code}
> We see this happen especially with calls which retrieve all the tables in all 
> the databases (basically a call to get_table_meta with dbNames="\*" and 
> tableNames="\*").
> To avoid this, we can define a custom fetch plan and activate it only for the 
> get_table_meta query. This fetch plan would fetch the database object along 
> with the MTable object.
> We would first create a fetch plan on the pmf
> {code}
> pmf.getFetchGroup(MTable.class, 
> "mtable_db_fetch_group").addMember("database");
> {code}
> Then we use it just before calling the query
> {code}
> pm.getFetchPlan().addGroup("mtable_db_fetch_group");
> query = pm.newQuery(MTable.class, filterBuilder.toString());
> Collection tables = (Collection) query.executeWithArray(...);
> ...
> {code}
> Before the API call ends, we can remove the fetch plan by
> {code}
> pm.getFetchPlan().removeGroup("mtable_db_fetch_group");
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-21028) get_table_meta should use a fetch plan

2018-12-10 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715880#comment-16715880
 ] 

Karthik Manamcheri edited comment on HIVE-21028 at 12/11/18 12:54 AM:
--

Thoughts [~vihangk1] [~ngangam] [~pvary]. I also don't know how to test this in 
an unit test! I have manually tested this by slowing down the get_table_meta 
call and then dropping the database from another HMS.


was (Author: karthik.manamcheri):
Thoughts [~vihangk1] [~ngangam] [~pvary]. I also don't know how to test this in 
an unit test! I can manually test this by slowing down the get_table_meta call 
and then dropping the database from another HMS.

> get_table_meta should use a fetch plan
> --
>
> Key: HIVE-21028
> URL: https://issues.apache.org/jira/browse/HIVE-21028
> Project: Hive
>  Issue Type: Bug
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
>
> The {{getTableMeta}} call retrieves the tables, loops through the tables and 
> during this loop it retrieves the database object to get the containing 
> database name. DataNuclues does a lazy retrieval and so, when the first call 
> to get all the tables is done, it does not retrieve the database objects.
> When this query is executed
> {code}query = pm.newQuery(MTable.class, filterBuilder.toString());
> {code}
> it loads all the tables, and when you do
> {code}
> table.getDatabase().getName()
> {code}
> it then goes and retrieves the database object.
> *However*, there could be another thread which actually has deleted the 
> database!! If this happens, we end up with exceptions such as
> {code}
> 2018-12-04 22:25:06,525 INFO  DataNucleus.Datastore.Retrieve: 
> [pool-7-thread-191]: Object with id 
> "6930391[OID]org.apache.hadoop.hive.metastore.model.MTable" not found !
> 2018-12-04 22:25:06,527 WARN  DataNucleus.Persistence: [pool-7-thread-191]: 
> Exception thrown by StateManager.isLoaded
> No such database row
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row
> {code}
> We see this happen especially with calls which retrieve all the tables in all 
> the databases (basically a call to get_table_meta with dbNames="\*" and 
> tableNames="\*").
> To avoid this, we can define a custom fetch plan and activate it only for the 
> get_table_meta query. This fetch plan would fetch the database object along 
> with the MTable object.
> We would first create a fetch plan on the pmf
> {code}
> pmf.getFetchGroup(MTable.class, 
> "mtable_db_fetch_group").addMember("database");
> {code}
> Then we use it just before calling the query
> {code}
> pm.getFetchPlan().addGroup("mtable_db_fetch_group");
> query = pm.newQuery(MTable.class, filterBuilder.toString());
> Collection tables = (Collection) query.executeWithArray(...);
> ...
> {code}
> Before the API call ends, we can remove the fetch plan by
> {code}
> pm.getFetchPlan().removeGroup("mtable_db_fetch_group");
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21028) get_table_meta should use a fetch plan

2018-12-10 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715880#comment-16715880
 ] 

Karthik Manamcheri commented on HIVE-21028:
---

Thoughts [~vihangk1] [~ngangam] [~pvary]. I also don't know how to test this in 
an unit test! I can manually test this by slowing down the get_table_meta call 
and then dropping the database from another HMS.

> get_table_meta should use a fetch plan
> --
>
> Key: HIVE-21028
> URL: https://issues.apache.org/jira/browse/HIVE-21028
> Project: Hive
>  Issue Type: Bug
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
>
> The {{getTableMeta}} call retrieves the tables, loops through the tables and 
> during this loop it retrieves the database object to get the containing 
> database name. DataNuclues does a lazy retrieval and so, when the first call 
> to get all the tables is done, it does not retrieve the database objects.
> When this query is executed
> {code}query = pm.newQuery(MTable.class, filterBuilder.toString());
> {code}
> it loads all the tables, and when you do
> {code}
> table.getDatabase().getName()
> {code}
> it then goes and retrieves the database object.
> *However*, there could be another thread which actually has deleted the 
> database!! If this happens, we end up with exceptions such as
> {code}
> 2018-12-04 22:25:06,525 INFO  DataNucleus.Datastore.Retrieve: 
> [pool-7-thread-191]: Object with id 
> "6930391[OID]org.apache.hadoop.hive.metastore.model.MTable" not found !
> 2018-12-04 22:25:06,527 WARN  DataNucleus.Persistence: [pool-7-thread-191]: 
> Exception thrown by StateManager.isLoaded
> No such database row
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row
> {code}
> We see this happen especially with calls which retrieve all the tables in all 
> the databases (basically a call to get_table_meta with dbNames="\*" and 
> tableNames="\*").
> To avoid this, we can define a custom fetch plan and activate it only for the 
> get_table_meta query. This fetch plan would fetch the database object along 
> with the MTable object.
> We would first create a fetch plan on the pmf
> {code}
> pmf.getFetchGroup(MTable.class, 
> "mtable_db_fetch_group").addMember("database");
> {code}
> Then we use it just before calling the query
> {code}
> pm.getFetchPlan().addGroup("mtable_db_fetch_group");
> query = pm.newQuery(MTable.class, filterBuilder.toString());
> Collection tables = (Collection) query.executeWithArray(...);
> ...
> {code}
> Before the API call ends, we can remove the fetch plan by
> {code}
> pm.getFetchPlan().removeGroup("mtable_db_fetch_group");
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21028) get_table_meta should use a fetch plan to avoid race conditions ending up in JDOObjectNotFoundException

2018-12-10 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-21028:
--
Summary: get_table_meta should use a fetch plan to avoid race conditions 
ending up in JDOObjectNotFoundException  (was: get_table_meta should use a 
fetch plan)

> get_table_meta should use a fetch plan to avoid race conditions ending up in 
> JDOObjectNotFoundException
> ---
>
> Key: HIVE-21028
> URL: https://issues.apache.org/jira/browse/HIVE-21028
> Project: Hive
>  Issue Type: Bug
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
>
> The {{getTableMeta}} call retrieves the tables, loops through the tables and 
> during this loop it retrieves the database object to get the containing 
> database name. DataNuclues does a lazy retrieval and so, when the first call 
> to get all the tables is done, it does not retrieve the database objects.
> When this query is executed
> {code}query = pm.newQuery(MTable.class, filterBuilder.toString());
> {code}
> it loads all the tables, and when you do
> {code}
> table.getDatabase().getName()
> {code}
> it then goes and retrieves the database object.
> *However*, there could be another thread which actually has deleted the 
> database!! If this happens, we end up with exceptions such as
> {code}
> 2018-12-04 22:25:06,525 INFO  DataNucleus.Datastore.Retrieve: 
> [pool-7-thread-191]: Object with id 
> "6930391[OID]org.apache.hadoop.hive.metastore.model.MTable" not found !
> 2018-12-04 22:25:06,527 WARN  DataNucleus.Persistence: [pool-7-thread-191]: 
> Exception thrown by StateManager.isLoaded
> No such database row
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row
> {code}
> We see this happen especially with calls which retrieve all the tables in all 
> the databases (basically a call to get_table_meta with dbNames="\*" and 
> tableNames="\*").
> To avoid this, we can define a custom fetch plan and activate it only for the 
> get_table_meta query. This fetch plan would fetch the database object along 
> with the MTable object.
> We would first create a fetch plan on the pmf
> {code}
> pmf.getFetchGroup(MTable.class, 
> "mtable_db_fetch_group").addMember("database");
> {code}
> Then we use it just before calling the query
> {code}
> pm.getFetchPlan().addGroup("mtable_db_fetch_group");
> query = pm.newQuery(MTable.class, filterBuilder.toString());
> Collection tables = (Collection) query.executeWithArray(...);
> ...
> {code}
> Before the API call ends, we can remove the fetch plan by
> {code}
> pm.getFetchPlan().removeGroup("mtable_db_fetch_group");
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21028) get_table_meta should use a fetch plan

2018-12-10 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri reassigned HIVE-21028:
-

Assignee: Karthik Manamcheri

> get_table_meta should use a fetch plan
> --
>
> Key: HIVE-21028
> URL: https://issues.apache.org/jira/browse/HIVE-21028
> Project: Hive
>  Issue Type: Bug
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
>
> The {{getTableMeta}} call retrieves the tables, loops through the tables and 
> during this loop it retrieves the database object to get the containing 
> database name. DataNuclues does a lazy retrieval and so, when the first call 
> to get all the tables is done, it does not retrieve the database objects.
> When this query is executed
> {code}query = pm.newQuery(MTable.class, filterBuilder.toString());
> {code}
> it loads all the tables, and when you do
> {code}
> table.getDatabase().getName()
> {code}
> it then goes and retrieves the database object.
> *However*, there could be another thread which actually has deleted the 
> database!! If this happens, we end up with exceptions such as
> {code}
> 2018-12-04 22:25:06,525 INFO  DataNucleus.Datastore.Retrieve: 
> [pool-7-thread-191]: Object with id 
> "6930391[OID]org.apache.hadoop.hive.metastore.model.MTable" not found !
> 2018-12-04 22:25:06,527 WARN  DataNucleus.Persistence: [pool-7-thread-191]: 
> Exception thrown by StateManager.isLoaded
> No such database row
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row
> {code}
> We see this happen especially with calls which retrieve all the tables in all 
> the databases (basically a call to get_table_meta with dbNames="\*" and 
> tableNames="\*").
> To avoid this, we can define a custom fetch plan and activate it only for the 
> get_table_meta query. This fetch plan would fetch the database object along 
> with the MTable object.
> We would first create a fetch plan on the pmf
> {code}
> pmf.getFetchGroup(MTable.class, 
> "mtable_db_fetch_group").addMember("database");
> {code}
> Then we use it just before calling the query
> {code}
> pm.getFetchPlan().addGroup("mtable_db_fetch_group");
> query = pm.newQuery(MTable.class, filterBuilder.toString());
> Collection tables = (Collection) query.executeWithArray(...);
> ...
> {code}
> Before the API call ends, we can remove the fetch plan by
> {code}
> pm.getFetchPlan().removeGroup("mtable_db_fetch_group");
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20992) Split the config "hive.metastore.dbaccess.ssl.properties" into more meaningful configs

2018-11-30 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705404#comment-16705404
 ] 

Karthik Manamcheri commented on HIVE-20992:
---

Would this break current use cases where they have some other custom SSL 
properties? Should we keep the existing ssl.properties as a deprecated option?

Thoughts [~vihangk1] [~pvary]

> Split the config "hive.metastore.dbaccess.ssl.properties" into more 
> meaningful configs
> --
>
> Key: HIVE-20992
> URL: https://issues.apache.org/jira/browse/HIVE-20992
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Security, Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Morio Ramdenbourg
>Assignee: Morio Ramdenbourg
>Priority: Minor
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> HIVE-13044 brought in the ability to enable TLS encryption from the HMS 
> Service to the HMSDB by configuring the following two properties:
>  # _javax.jdo.option.ConnectionURL_: JDBC connect string for a JDBC 
> metastore. To use SSL to encrypt/authenticate the connection, provide 
> database-specific SSL flag in the connection URL. (E.g. 
> "jdbc:postgresql://myhost/db?ssl=true")
>  # _hive.metastore.dbaccess.ssl.properties_: Comma-separated SSL properties 
> for metastore to access database when JDO connection URL. (E.g. 
> javax.net.ssl.trustStore=/tmp/truststore,javax.net.ssl.trustStorePassword=pwd)
> However, the latter configuration option is opaque and poses some problems. 
> The most glaring of which is it takes in _any_ 
> [java.lang.System|https://docs.oracle.com/javase/7/docs/api/java/lang/System.html]
>  system property, whether it is 
> [TLS-related|https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html#InstallationAndCustomization]
>  or not. This can cause some unintended side-effects for other components of 
> the HMS, especially if it overrides an already-set system property. If the 
> user truly wishes to add an unrelated Java property, setting it statically 
> using the "-D" option of the _java_ command is more appropriate. Secondly, 
> the truststore password is stored in plain text. We should add Hadoop Shims 
> back to the HMS to prevent exposing these passwords, but this effort can be 
> done after this ticket.
> I propose we split _hive.metastore.dbaccess.ssl.properties_ into the 
> following properties:
>  * *_hive.metastore.dbaccess.ssl.use.SSL_*
>  ** Set this to true to use TLS encryption from the HMS Service to the HMSDB
>  * *_hive.metastore.dbaccess.ssl.truststore.path_*
>  ** TLS truststore file location
>  ** Java property: _javax.net.ssl.trustStore_
>  ** E.g. _/tmp/truststore_
>  * *_hive.metastore.dbaccess.ssl.truststore.password_*
>  ** Password of the truststore file
>  ** Java property: _javax.net.ssl.trustStorePassword_
>  ** E.g. _pwd_
>  * _*hive.metastore.dbaccess.ssl.truststore.type*_
>  ** Type of the truststore file
>  ** Java property: _javax.net.ssl.trustStoreType_
>  ** E.g. _JKS_
> We should guide the user towards an easier TLS configuration experience. This 
> is the minimum configuration necessary to configure TLS to the HMSDB. If we 
> need other options, such as the keystore location/password for 
> dual-authentication, then we can add those on afterwards.
> Also, document these changes - 
> [javax.jdo.option.ConnectionURL|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-javax.jdo.option.ConnectionURL]
>  does not have up-to-date documentation, and these new parameters will need 
> documentation as well.
> Note "TLS" refers to both SSL and TLS. TLS is simply the successor of SSL.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18415) Lower "Updating Partition Stats" Logging Level

2018-11-30 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705234#comment-16705234
 ] 

Karthik Manamcheri commented on HIVE-18415:
---

+1 [~pvary] thoughts on my suggestion for making it an info log similar to the 
other logs.

> Lower "Updating Partition Stats" Logging Level
> --
>
> Key: HIVE-18415
> URL: https://issues.apache.org/jira/browse/HIVE-18415
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.2, 2.2.0, 3.0.0, 2.3.2
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: HIVE-18415.1.patch, HIVE-18415.2.patch
>
>
> {code:title=org.apache.hadoop.hive.metastore.utils.MetaStoreUtils}
> LOG.warn("Updating partition stats fast for: " + part.getTableName());
> ...
> LOG.warn("Updated size to " + params.get(StatsSetupConst.TOTAL_SIZE));
> {code}
> This logging produces many lines of WARN log messages in my log file and it's 
> not clear to me what the issue is here.  Why is this a warning and how should 
> I respond to address this warning?
> DEBUG is probably more appropriate for a utility class.  Please lower.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18415) Lower "Updating Partition Stats" Logging Level

2018-11-30 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704957#comment-16704957
 ] 

Karthik Manamcheri commented on HIVE-18415:
---

[~belugabehr] How about making this INFO level? The other stats logs are at 
INFO level (see updateTableStatsSlow). I agree that this doesn't seem to be a 
WARN level log. If anything the slow stats computation should have the WARN 
logs (since they are slow and indicates that HMS performance is affected).

> Lower "Updating Partition Stats" Logging Level
> --
>
> Key: HIVE-18415
> URL: https://issues.apache.org/jira/browse/HIVE-18415
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.2, 2.2.0, 3.0.0, 2.3.2
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: HIVE-18415.1.patch
>
>
> {code:title=org.apache.hadoop.hive.metastore.utils.MetaStoreUtils}
> LOG.warn("Updating partition stats fast for: " + part.getTableName());
> ...
> LOG.warn("Updated size to " + params.get(StatsSetupConst.TOTAL_SIZE));
> {code}
> This logging produces many lines of WARN log messages in my log file and it's 
> not clear to me what the issue is here.  Why is this a warning and how should 
> I respond to address this warning?
> DEBUG is probably more appropriate for a utility class.  Please lower.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20986) Add TransactionalValidationListener to HMS preListeners only when ACID support is enabled

2018-11-29 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri reassigned HIVE-20986:
-


> Add TransactionalValidationListener to HMS preListeners only when ACID 
> support is enabled
> -
>
> Key: HIVE-20986
> URL: https://issues.apache.org/jira/browse/HIVE-20986
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karthik Manamcheri
>Assignee: Adam Holley
>Priority: Major
>
> We add the TransactionalValidationListener to the preListeners in HMS 
> unconditionally.
> {code:java}
> public void init() throws MetaException {
>   ..
>   preListeners.add(0, new TransactionalValidationListener(conf));
>   ..
> }{code}
> This causes some performance issues because the listener is called even when 
> not needed. Lets add a condition around this and add this listener only if 
> the transactional support is enabled.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20977) Lazy evaluate the table object in PreReadTableEvent to improve get_partition performance

2018-11-27 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri reassigned HIVE-20977:
-


> Lazy evaluate the table object in PreReadTableEvent to improve get_partition 
> performance
> 
>
> Key: HIVE-20977
> URL: https://issues.apache.org/jira/browse/HIVE-20977
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Minor
>
> The PreReadTableEvent is generated for non-table operations (such as 
> get_partitions), but only if there is an event listener attached. However, 
> this is also not necessary if the event listener is not interested in the 
> read table event.
> For example, the TransactionalValidationListener's onEvent looks like this
> {code:java}
> @Override
> public void onEvent(PreEventContext context) throws MetaException, 
> NoSuchObjectException,
> InvalidOperationException {
>   switch (context.getEventType()) {
> case CREATE_TABLE:
>   handle((PreCreateTableEvent) context);
>   break;
> case ALTER_TABLE:
>   handle((PreAlterTableEvent) context);
>   break;
> default:
>   //no validation required..
>   }
> }{code}
>  
> Note that for read table events it is a no-op. The problem is that the 
> get_table is evaluated when creating the PreReadTableEvent finally to be just 
> ignored!
> Look at the code below.. {{getMS().getTable(..)}} is evaluated irrespective 
> of if the listener uses it or not.
> {code:java}
> private void fireReadTablePreEvent(String catName, String dbName, String 
> tblName)
> throws MetaException, NoSuchObjectException {
>   if(preListeners.size() > 0) {
> // do this only if there is a pre event listener registered (avoid 
> unnecessary
> // metastore api call)
> Table t = getMS().getTable(catName, dbName, tblName);
> if (t == null) {
>   throw new NoSuchObjectException(TableName.getQualified(catName, dbName, 
> tblName)
>   + " table not found");
> }
> firePreEvent(new PreReadTableEvent(t, this));
>   }
> }
> {code}
> This can be improved by using a {{Supplier}} and lazily evaluating the table 
> when needed (once when the first time it is called, memorized after that).
> *Motivation*
> Whenever a partition call occurs (get_partition, etc.), we fire the 
> PreReadTableEvent. This affects performance since it fetches the table even 
> if it is not being used. This change will improve performance on the 
> get_partition calls.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20776) Run HMS filterHooks on server-side in addition to client-side

2018-11-27 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-20776:
--
Summary: Run HMS filterHooks on server-side in addition to client-side  
(was: Move HMS filterHooks from client-side to server-side)

> Run HMS filterHooks on server-side in addition to client-side
> -
>
> Key: HIVE-20776
> URL: https://issues.apache.org/jira/browse/HIVE-20776
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
>
> In HMS, I noticed that all the filter hooks are applied on the client side 
> (in HiveMetaStoreClient.java). Is there any reason why we can't apply the 
> filters on the server-side?
> Motivation: Some newer apache projects such as Kudu use HMS for metadata 
> storage. Kudu is not completely Java-based and there are interaction points 
> where they have C++ clients. In such cases, it would be ideal to have 
> consistent behavior from HMS side as far as filters, etc are concerned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20776) Move HMS filterHooks from client-side to server-side

2018-10-29 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1971#comment-1971
 ] 

Karthik Manamcheri commented on HIVE-20776:
---

Should we then do the filtering both on client AND server side, instead of just 
moving it? 
{quote} HMS doesn't really know who the end-user is.
{quote}
HMS uses the hadoop proxy user system to know the end-user, no? This is how 
Spark interacts with HMS. Spark sets the hadoop proxy user and HMS knows the 
end-user through that. Why not use the same mechanism from HS2?

> Move HMS filterHooks from client-side to server-side
> 
>
> Key: HIVE-20776
> URL: https://issues.apache.org/jira/browse/HIVE-20776
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
>
> In HMS, I noticed that all the filter hooks are applied on the client side 
> (in HiveMetaStoreClient.java). Is there any reason why we can't apply the 
> filters on the server-side?
> Motivation: Some newer apache projects such as Kudu use HMS for metadata 
> storage. Kudu is not completely Java-based and there are interaction points 
> where they have C++ clients. In such cases, it would be ideal to have 
> consistent behavior from HMS side as far as filters, etc are concerned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20776) Move HMS filterHooks from client-side to server-side

2018-10-23 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661303#comment-16661303
 ] 

Karthik Manamcheri commented on HIVE-20776:
---

[~vihangk1] hmm This should work if we use Kerberos and pick up the UGI 
information from the environment yes?

> Move HMS filterHooks from client-side to server-side
> 
>
> Key: HIVE-20776
> URL: https://issues.apache.org/jira/browse/HIVE-20776
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
>
> In HMS, I noticed that all the filter hooks are applied on the client side 
> (in HiveMetaStoreClient.java). Is there any reason why we can't apply the 
> filters on the server-side?
> Motivation: Some newer apache projects such as Kudu use HMS for metadata 
> storage. Kudu is not completely Java-based and there are interaction points 
> where they have C++ clients. In such cases, it would be ideal to have 
> consistent behavior from HMS side as far as filters, etc are concerned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20776) Move HMS filterHooks from client-side to server-side

2018-10-19 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656466#comment-16656466
 ] 

Karthik Manamcheri commented on HIVE-20776:
---

[~prasadm] You had worked on 
[HIVE-8612|https://issues.apache.org/jira/browse/HIVE-8612] originally. Any 
reason why the filters were implemented on the client-side and not on the 
server-side?

> Move HMS filterHooks from client-side to server-side
> 
>
> Key: HIVE-20776
> URL: https://issues.apache.org/jira/browse/HIVE-20776
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
>
> In HMS, I noticed that all the filter hooks are applied on the client side 
> (in HiveMetaStoreClient.java). Is there any reason why we can't apply the 
> filters on the server-side?
> Motivation: Some newer apache projects such as Kudu use HMS for metadata 
> storage. Kudu is not completely Java-based and there are interaction points 
> where they have C++ clients. In such cases, it would be ideal to have 
> consistent behavior from HMS side as far as filters, etc are concerned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20776) Move HMS filterHooks from client-side to server-side

2018-10-19 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-20776:
--
Component/s: (was: Metastore)

> Move HMS filterHooks from client-side to server-side
> 
>
> Key: HIVE-20776
> URL: https://issues.apache.org/jira/browse/HIVE-20776
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
>
> In HMS, I noticed that all the filter hooks are applied on the client side 
> (in HiveMetaStoreClient.java). Is there any reason why we can't apply the 
> filters on the server-side?
> Motivation: Some newer apache projects such as Kudu use HMS for metadata 
> storage. Kudu is not completely Java-based and there are interaction points 
> where they have C++ clients. In such cases, it would be ideal to have 
> consistent behavior from HMS side as far as filters, etc are concerned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20776) Move HMS filterHooks from client-side to server-side

2018-10-19 Thread Karthik Manamcheri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656461#comment-16656461
 ] 

Karthik Manamcheri commented on HIVE-20776:
---

cc [~vihangk1] [~pvary] [~akolb] thoughts?

> Move HMS filterHooks from client-side to server-side
> 
>
> Key: HIVE-20776
> URL: https://issues.apache.org/jira/browse/HIVE-20776
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
>
> In HMS, I noticed that all the filter hooks are applied on the client side 
> (in HiveMetaStoreClient.java). Is there any reason why we can't apply the 
> filters on the server-side?
> Motivation: Some newer apache projects such as Kudu use HMS for metadata 
> storage. Kudu is not completely Java-based and there are interaction points 
> where they have C++ clients. In such cases, it would be ideal to have 
> consistent behavior from HMS side as far as filters, etc are concerned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20776) Move HMS filterHooks from client-side to server-side

2018-10-19 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri reassigned HIVE-20776:
-


> Move HMS filterHooks from client-side to server-side
> 
>
> Key: HIVE-20776
> URL: https://issues.apache.org/jira/browse/HIVE-20776
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Standalone Metastore
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
>
> In HMS, I noticed that all the filter hooks are applied on the client side 
> (in HiveMetaStoreClient.java). Is there any reason why we can't apply the 
> filters on the server-side?
> Motivation: Some newer apache projects such as Kudu use HMS for metadata 
> storage. Kudu is not completely Java-based and there are interaction points 
> where they have C++ clients. In such cases, it would be ideal to have 
> consistent behavior from HMS side as far as filters, etc are concerned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work stopped] (HIVE-16994) Support connection pooling for HiveMetaStoreClient

2018-10-10 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16994 stopped by Karthik Manamcheri.
-
> Support connection pooling for HiveMetaStoreClient
> --
>
> Key: HIVE-16994
> URL: https://issues.apache.org/jira/browse/HIVE-16994
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Alexander Kolbasov
>Assignee: Karthik Manamcheri
>Priority: Major
>
> The native {{HiveMetaStoreClient}} doesn't support connection pooling. I 
> think it would be a very useful feature, especially in Kerberos environments 
> where connection establishment may be especially expensive. 
> A similar feature is now supported in Sentry - see SENTRY-1580.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-16994) Support connection pooling for HiveMetaStoreClient

2018-10-10 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri reassigned HIVE-16994:
-

Assignee: Karthik Manamcheri

> Support connection pooling for HiveMetaStoreClient
> --
>
> Key: HIVE-16994
> URL: https://issues.apache.org/jira/browse/HIVE-16994
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Alexander Kolbasov
>Assignee: Karthik Manamcheri
>Priority: Major
>
> The native {{HiveMetaStoreClient}} doesn't support connection pooling. I 
> think it would be a very useful feature, especially in Kerberos environments 
> where connection establishment may be especially expensive. 
> A similar feature is now supported in Sentry - see SENTRY-1580.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-10296) Cast exception observed when hive runs a multi join query on metastore (postgres), since postgres pushes the filter into the join, and ignores the condition before applyi

2018-09-28 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-10296:
--
Status: Patch Available  (was: In Progress)

> Cast exception observed when hive runs a multi join query on metastore 
> (postgres), since postgres pushes the filter into the join, and ignores the 
> condition before applying cast
> -
>
> Key: HIVE-10296
> URL: https://issues.apache.org/jira/browse/HIVE-10296
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Yash Datta
>Assignee: Karthik Manamcheri
>Priority: Major
> Attachments: HIVE-10296.1.patch
>
>
> Try to drop a partition from hive:
> ALTER TABLE f___edr_bin_source___900_sub_id DROP IF EXISTS PARTITION ( 
> exporttimestamp=1427824800, timestamp=1427824800)
> This triggers a query on the metastore like this :
>  "select "PARTITIONS"."PART_ID" from "PARTITIONS" inner join "TBLS" on 
> "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ? inner join 
> "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" and "DBS"."NAME" = ? inner join 
> "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = 
> "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 inner join 
> "PARTITION_KEY_VALS" "FILTER1" on "FILTER1"."PART_ID" = 
> "PARTITIONS"."PART_ID" and "FILTER1"."INTEGER_IDX" = 1 where ( (((case when 
> "TBLS"."TBL_NAME" = ? and "DBS"."NAME" = ? then cast("FILTER0"."PART_KEY_VAL" 
> as decimal(21,0)) else null end) = ?) and ((case when "TBLS"."TBL_NAME" = ? 
> and "DBS"."NAME" = ? then cast("FILTER1"."PART_KEY_VAL" as decimal(21,0)) 
> else null end) = ?)) )"
> In some cases, when the internal tables in postgres (metastore) have some 
> amount of data, the query plan pushes the condition down into the join.
> Now because of DERBY-6358 , case when clause is used before the cast, but in 
> this case , cast is evaluated before condition being evaluated. So in case we 
> have different tables partitioned on string and integer columns, cast 
> exception is observed!
> 15/04/06 08:41:20 ERROR metastore.ObjectStore: Direct SQL failed, falling 
> back to ORM 
> javax.jdo.JDODataStoreException: Error executing SQL query "select 
> "PARTITIONS"."PART_ID" from "PARTITIONS" inner join "TBLS" on 
> "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ? inner join 
> "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" and "DBS"."NAME" = ? inner join 
> "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = 
> "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 inner join 
> "PARTITION_KEY_VALS" "FILTER1" on "FILTER1"."PART_ID" = 
> "PARTITIONS"."PART_ID" and "FILTER1"."INTEGER_IDX" = 1 where ( (((case when 
> "TBLS"."TBL_NAME" = ? and "DBS"."NAME" = ? then cast("FILTER0"."PART_KEY_VAL" 
> as decimal(21,0)) else null end) = ?) and ((case when "TBLS"."TBL_NAME" = ? 
> and "DBS"."NAME" = ? then cast("FILTER1"."PART_KEY_VAL" as decimal(21,0)) 
> else null end) = ?)) )". 
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
>  
> at 
> org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321) 
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:300)
>  
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211)
>  
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1915)
>  
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1909)
>  
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2208)
>  
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1909)
>  
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1882)
>  
> org.postgresql.util.PSQLException: ERROR: invalid input syntax for type 
> numeric: "__DEFAULT_BINSRC__" 
> 15/04/06 08:41:20 INFO metastore.ObjectStore: JDO filter pushdown cannot be 
> used: Filtering is supported only on partition keys of type string 
> 15/04/06 08:41:20 ERROR metastore.ObjectStore: 
> javax.jdo.JDOException: Exception thrown when executing query 
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:596)
>  
> at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:275) 
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionNamesNoTxn(ObjectStore.java:1700)
>  
> at 
> 

[jira] [Updated] (HIVE-10296) Cast exception observed when hive runs a multi join query on metastore (postgres), since postgres pushes the filter into the join, and ignores the condition before applyi

2018-09-28 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri updated HIVE-10296:
--
Attachment: HIVE-10296.1.patch

> Cast exception observed when hive runs a multi join query on metastore 
> (postgres), since postgres pushes the filter into the join, and ignores the 
> condition before applying cast
> -
>
> Key: HIVE-10296
> URL: https://issues.apache.org/jira/browse/HIVE-10296
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Yash Datta
>Assignee: Karthik Manamcheri
>Priority: Major
> Attachments: HIVE-10296.1.patch
>
>
> Try to drop a partition from hive:
> ALTER TABLE f___edr_bin_source___900_sub_id DROP IF EXISTS PARTITION ( 
> exporttimestamp=1427824800, timestamp=1427824800)
> This triggers a query on the metastore like this :
>  "select "PARTITIONS"."PART_ID" from "PARTITIONS" inner join "TBLS" on 
> "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ? inner join 
> "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" and "DBS"."NAME" = ? inner join 
> "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = 
> "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 inner join 
> "PARTITION_KEY_VALS" "FILTER1" on "FILTER1"."PART_ID" = 
> "PARTITIONS"."PART_ID" and "FILTER1"."INTEGER_IDX" = 1 where ( (((case when 
> "TBLS"."TBL_NAME" = ? and "DBS"."NAME" = ? then cast("FILTER0"."PART_KEY_VAL" 
> as decimal(21,0)) else null end) = ?) and ((case when "TBLS"."TBL_NAME" = ? 
> and "DBS"."NAME" = ? then cast("FILTER1"."PART_KEY_VAL" as decimal(21,0)) 
> else null end) = ?)) )"
> In some cases, when the internal tables in postgres (metastore) have some 
> amount of data, the query plan pushes the condition down into the join.
> Now because of DERBY-6358 , case when clause is used before the cast, but in 
> this case , cast is evaluated before condition being evaluated. So in case we 
> have different tables partitioned on string and integer columns, cast 
> exception is observed!
> 15/04/06 08:41:20 ERROR metastore.ObjectStore: Direct SQL failed, falling 
> back to ORM 
> javax.jdo.JDODataStoreException: Error executing SQL query "select 
> "PARTITIONS"."PART_ID" from "PARTITIONS" inner join "TBLS" on 
> "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ? inner join 
> "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" and "DBS"."NAME" = ? inner join 
> "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = 
> "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 inner join 
> "PARTITION_KEY_VALS" "FILTER1" on "FILTER1"."PART_ID" = 
> "PARTITIONS"."PART_ID" and "FILTER1"."INTEGER_IDX" = 1 where ( (((case when 
> "TBLS"."TBL_NAME" = ? and "DBS"."NAME" = ? then cast("FILTER0"."PART_KEY_VAL" 
> as decimal(21,0)) else null end) = ?) and ((case when "TBLS"."TBL_NAME" = ? 
> and "DBS"."NAME" = ? then cast("FILTER1"."PART_KEY_VAL" as decimal(21,0)) 
> else null end) = ?)) )". 
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
>  
> at 
> org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321) 
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:300)
>  
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211)
>  
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1915)
>  
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1909)
>  
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2208)
>  
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1909)
>  
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1882)
>  
> org.postgresql.util.PSQLException: ERROR: invalid input syntax for type 
> numeric: "__DEFAULT_BINSRC__" 
> 15/04/06 08:41:20 INFO metastore.ObjectStore: JDO filter pushdown cannot be 
> used: Filtering is supported only on partition keys of type string 
> 15/04/06 08:41:20 ERROR metastore.ObjectStore: 
> javax.jdo.JDOException: Exception thrown when executing query 
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:596)
>  
> at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:275) 
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionNamesNoTxn(ObjectStore.java:1700)
>  
> at 
> 

[jira] [Work started] (HIVE-10296) Cast exception observed when hive runs a multi join query on metastore (postgres), since postgres pushes the filter into the join, and ignores the condition before a

2018-09-28 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-10296 started by Karthik Manamcheri.
-
> Cast exception observed when hive runs a multi join query on metastore 
> (postgres), since postgres pushes the filter into the join, and ignores the 
> condition before applying cast
> -
>
> Key: HIVE-10296
> URL: https://issues.apache.org/jira/browse/HIVE-10296
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Yash Datta
>Assignee: Karthik Manamcheri
>Priority: Major
>
> Try to drop a partition from hive:
> ALTER TABLE f___edr_bin_source___900_sub_id DROP IF EXISTS PARTITION ( 
> exporttimestamp=1427824800, timestamp=1427824800)
> This triggers a query on the metastore like this :
>  "select "PARTITIONS"."PART_ID" from "PARTITIONS" inner join "TBLS" on 
> "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ? inner join 
> "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" and "DBS"."NAME" = ? inner join 
> "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = 
> "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 inner join 
> "PARTITION_KEY_VALS" "FILTER1" on "FILTER1"."PART_ID" = 
> "PARTITIONS"."PART_ID" and "FILTER1"."INTEGER_IDX" = 1 where ( (((case when 
> "TBLS"."TBL_NAME" = ? and "DBS"."NAME" = ? then cast("FILTER0"."PART_KEY_VAL" 
> as decimal(21,0)) else null end) = ?) and ((case when "TBLS"."TBL_NAME" = ? 
> and "DBS"."NAME" = ? then cast("FILTER1"."PART_KEY_VAL" as decimal(21,0)) 
> else null end) = ?)) )"
> In some cases, when the internal tables in postgres (metastore) have some 
> amount of data, the query plan pushes the condition down into the join.
> Now because of DERBY-6358 , case when clause is used before the cast, but in 
> this case , cast is evaluated before condition being evaluated. So in case we 
> have different tables partitioned on string and integer columns, cast 
> exception is observed!
> 15/04/06 08:41:20 ERROR metastore.ObjectStore: Direct SQL failed, falling 
> back to ORM 
> javax.jdo.JDODataStoreException: Error executing SQL query "select 
> "PARTITIONS"."PART_ID" from "PARTITIONS" inner join "TBLS" on 
> "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ? inner join 
> "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" and "DBS"."NAME" = ? inner join 
> "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = 
> "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 inner join 
> "PARTITION_KEY_VALS" "FILTER1" on "FILTER1"."PART_ID" = 
> "PARTITIONS"."PART_ID" and "FILTER1"."INTEGER_IDX" = 1 where ( (((case when 
> "TBLS"."TBL_NAME" = ? and "DBS"."NAME" = ? then cast("FILTER0"."PART_KEY_VAL" 
> as decimal(21,0)) else null end) = ?) and ((case when "TBLS"."TBL_NAME" = ? 
> and "DBS"."NAME" = ? then cast("FILTER1"."PART_KEY_VAL" as decimal(21,0)) 
> else null end) = ?)) )". 
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
>  
> at 
> org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321) 
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:300)
>  
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211)
>  
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1915)
>  
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1909)
>  
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2208)
>  
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1909)
>  
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1882)
>  
> org.postgresql.util.PSQLException: ERROR: invalid input syntax for type 
> numeric: "__DEFAULT_BINSRC__" 
> 15/04/06 08:41:20 INFO metastore.ObjectStore: JDO filter pushdown cannot be 
> used: Filtering is supported only on partition keys of type string 
> 15/04/06 08:41:20 ERROR metastore.ObjectStore: 
> javax.jdo.JDOException: Exception thrown when executing query 
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:596)
>  
> at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:275) 
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionNamesNoTxn(ObjectStore.java:1700)
>  
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionNamesPrunedByExprNoTxn(ObjectStore.java:2003)
>  

[jira] [Resolved] (HIVE-20634) DirectSQL does not retry in ORM mode while getting partitions by filter

2018-09-27 Thread Karthik Manamcheri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Manamcheri resolved HIVE-20634.
---
Resolution: Not A Problem

I realized that this was not a bug. 
{{directSql.generateSqlFilterForPushdown(..)}} never calls SQL and only 
generates the filter. Closing as Not A Problem.

> DirectSQL does not retry in ORM mode while getting partitions by filter
> ---
>
> Key: HIVE-20634
> URL: https://issues.apache.org/jira/browse/HIVE-20634
> Project: Hive
>  Issue Type: Bug
>Reporter: Karthik Manamcheri
>Assignee: Karthik Manamcheri
>Priority: Major
>
> The code path for getting partitions by filter is as follows,
> {code:java}
>   protected List getPartitionsByFilterInternal(..) {
>...
>   @Override
>   protected boolean canUseDirectSql(GetHelper> ctx) 
> throws MetaException 
>  {
> return directSql.generateSqlFilterForPushdown(ctx.getTable(), tree, 
> filter);
>   }
>...
>   }
> {code}
> If directSql.generateSqlFilterForPushdown throws an exception, we should be 
> returning false from canUseDirectSql instead of propagating the exception. 
> The propagation of exception causes the whole query to fail, instead of 
> retrying with JDO.
> We should have code such as
> {code:java}
>   @Override
>   protected boolean canUseDirectSql(GetHelper ctx) throws 
> MetaException {
> try {
>   return directSql.generateSqlFilterForPushdown(ctx.getTable(), 
> exprTree, filter);
> } catch (final MetaException me) {
>   return false;
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >