[jira] [Work logged] (HIVE-25879) MetaStoreDirectSql test query should not query the whole DBS table

2022-06-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25879?focusedWorklogId=781173=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781173
 ]

ASF GitHub Bot logged work on HIVE-25879:
-

Author: ASF GitHub Bot
Created on: 14/Jun/22 13:15
Start Date: 14/Jun/22 13:15
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged PR #3348:
URL: https://github.com/apache/hive/pull/3348




Issue Time Tracking
---

Worklog Id: (was: 781173)
Time Spent: 1h 40m  (was: 1.5h)

> MetaStoreDirectSql test query should not query the whole DBS table
> --
>
> Key: HIVE-25879
> URL: https://issues.apache.org/jira/browse/HIVE-25879
> Project: Hive
>  Issue Type: Bug
>Reporter: Miklos Szurap
>Assignee: Miklos Szurap
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The runTestQuery() in the 
> org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java is using a test query
> {code:java}
> select "DB_ID" from "DBS"{code}
> to determine whether the direct SQL can be used.
> With larger deployments with many (10k+) Hive databases it would be more 
> efficienct to query a small table instead, for example the "VERSION" table 
> should always have a single row only.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-25879) MetaStoreDirectSql test query should not query the whole DBS table

2022-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25879?focusedWorklogId=780869=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780869
 ]

ASF GitHub Bot logged work on HIVE-25879:
-

Author: ASF GitHub Bot
Created on: 13/Jun/22 15:30
Start Date: 13/Jun/22 15:30
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on code in PR #3348:
URL: https://github.com/apache/hive/pull/3348#discussion_r895859650


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -316,17 +316,20 @@ private boolean ensureDbInit() {
   }
 
   private boolean runTestQuery() {
+boolean doTrace = LOG.isDebugEnabled();
 Transaction tx = pm.currentTransaction();
 boolean doCommit = false;
 if (!tx.isActive()) {
   tx.begin();
   doCommit = true;
 }
 // Run a self-test query. If it doesn't work, we will self-disable. What a 
PITA...
-String selfTestQuery = "select \"DB_ID\" from " + DBS + "";
+String selfTestQuery = "select \"DB_ID\" from " + DBS + " WHERE 
\"DB_ID\"=1";

Review Comment:
   `where 1=0` sounds good to me...or `DB_ID=-1`; but this will be ok as well





Issue Time Tracking
---

Worklog Id: (was: 780869)
Time Spent: 1.5h  (was: 1h 20m)

> MetaStoreDirectSql test query should not query the whole DBS table
> --
>
> Key: HIVE-25879
> URL: https://issues.apache.org/jira/browse/HIVE-25879
> Project: Hive
>  Issue Type: Bug
>Reporter: Miklos Szurap
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The runTestQuery() in the 
> org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java is using a test query
> {code:java}
> select "DB_ID" from "DBS"{code}
> to determine whether the direct SQL can be used.
> With larger deployments with many (10k+) Hive databases it would be more 
> efficienct to query a small table instead, for example the "VERSION" table 
> should always have a single row only.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-25879) MetaStoreDirectSql test query should not query the whole DBS table

2022-06-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25879?focusedWorklogId=779356=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779356
 ]

ASF GitHub Bot logged work on HIVE-25879:
-

Author: ASF GitHub Bot
Created on: 08/Jun/22 08:32
Start Date: 08/Jun/22 08:32
Worklog Time Spent: 10m 
  Work Description: mszurap commented on code in PR #3348:
URL: https://github.com/apache/hive/pull/3348#discussion_r892075811


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -316,17 +316,20 @@ private boolean ensureDbInit() {
   }
 
   private boolean runTestQuery() {
+boolean doTrace = LOG.isDebugEnabled();
 Transaction tx = pm.currentTransaction();
 boolean doCommit = false;
 if (!tx.isActive()) {
   tx.begin();
   doCommit = true;
 }
 // Run a self-test query. If it doesn't work, we will self-disable. What a 
PITA...
-String selfTestQuery = "select \"DB_ID\" from " + DBS + "";
+String selfTestQuery = "select \"DB_ID\" from " + DBS + " WHERE 
\"DB_ID\"=1";

Review Comment:
   Yes, the results do not matter at this part of the code. Actually I did not 
want to introduce RDBMS specific clauses like "LIMIT 0" (which is not 
recognized by Oracle for example), also the "WHERE 1=0" could be a valid 
"no-results" filter, but again, I do not know if all different DBs like that or 
not. 





Issue Time Tracking
---

Worklog Id: (was: 779356)
Time Spent: 1h 20m  (was: 1h 10m)

> MetaStoreDirectSql test query should not query the whole DBS table
> --
>
> Key: HIVE-25879
> URL: https://issues.apache.org/jira/browse/HIVE-25879
> Project: Hive
>  Issue Type: Bug
>Reporter: Miklos Szurap
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The runTestQuery() in the 
> org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java is using a test query
> {code:java}
> select "DB_ID" from "DBS"{code}
> to determine whether the direct SQL can be used.
> With larger deployments with many (10k+) Hive databases it would be more 
> efficienct to query a small table instead, for example the "VERSION" table 
> should always have a single row only.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-25879) MetaStoreDirectSql test query should not query the whole DBS table

2022-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25879?focusedWorklogId=779194=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779194
 ]

ASF GitHub Bot logged work on HIVE-25879:
-

Author: ASF GitHub Bot
Created on: 07/Jun/22 17:15
Start Date: 07/Jun/22 17:15
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on code in PR #3348:
URL: https://github.com/apache/hive/pull/3348#discussion_r891508679


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##
@@ -316,17 +316,20 @@ private boolean ensureDbInit() {
   }
 
   private boolean runTestQuery() {
+boolean doTrace = LOG.isDebugEnabled();
 Transaction tx = pm.currentTransaction();
 boolean doCommit = false;
 if (!tx.isActive()) {
   tx.begin();
   doCommit = true;
 }
 // Run a self-test query. If it doesn't work, we will self-disable. What a 
PITA...
-String selfTestQuery = "select \"DB_ID\" from " + DBS + "";
+String selfTestQuery = "select \"DB_ID\" from " + DBS + " WHERE 
\"DB_ID\"=1";

Review Comment:
   this query may or may not have results... I think both is fine... how about 
changing it to `LIMIT 0`  ? 
   then it will not even need a primary-key scan... :)
   





Issue Time Tracking
---

Worklog Id: (was: 779194)
Time Spent: 1h 10m  (was: 1h)

> MetaStoreDirectSql test query should not query the whole DBS table
> --
>
> Key: HIVE-25879
> URL: https://issues.apache.org/jira/browse/HIVE-25879
> Project: Hive
>  Issue Type: Bug
>Reporter: Miklos Szurap
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The runTestQuery() in the 
> org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java is using a test query
> {code:java}
> select "DB_ID" from "DBS"{code}
> to determine whether the direct SQL can be used.
> With larger deployments with many (10k+) Hive databases it would be more 
> efficienct to query a small table instead, for example the "VERSION" table 
> should always have a single row only.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-25879) MetaStoreDirectSql test query should not query the whole DBS table

2022-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25879?focusedWorklogId=778978=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778978
 ]

ASF GitHub Bot logged work on HIVE-25879:
-

Author: ASF GitHub Bot
Created on: 07/Jun/22 10:36
Start Date: 07/Jun/22 10:36
Worklog Time Spent: 10m 
  Work Description: mszurap opened a new pull request, #3348:
URL: https://github.com/apache/hive/pull/3348

   ### What changes were proposed in this pull request?
   Performance optimization: the MetaStoreDirectSql#runTestQuery determines 
whether the direct SQL can be used. Before the fix the code queries the whole 
DBS table without any filters. The proposed change is to add a filter "WHERE 
DB_ID=1" which should be fast as it is the primary key for that table.
   
   
   ### Why are the changes needed?
   Performance improvement. With bigger deployments the test query can run 10x 
faster (~3 ms instead of 30 ms), and this test query runs for every ObjectStore 
initialization.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   In my local dev env I've run the "mvn test" successfully in the 
"standalone-metastore".




Issue Time Tracking
---

Worklog Id: (was: 778978)
Time Spent: 1h  (was: 50m)

> MetaStoreDirectSql test query should not query the whole DBS table
> --
>
> Key: HIVE-25879
> URL: https://issues.apache.org/jira/browse/HIVE-25879
> Project: Hive
>  Issue Type: Bug
>Reporter: Miklos Szurap
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The runTestQuery() in the 
> org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java is using a test query
> {code:java}
> select "DB_ID" from "DBS"{code}
> to determine whether the direct SQL can be used.
> With larger deployments with many (10k+) Hive databases it would be more 
> efficienct to query a small table instead, for example the "VERSION" table 
> should always have a single row only.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)