[jira] [Work logged] (HIVE-25879) MetaStoreDirectSql test query should not query the whole DBS table
[ https://issues.apache.org/jira/browse/HIVE-25879?focusedWorklogId=781173=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781173 ] ASF GitHub Bot logged work on HIVE-25879: - Author: ASF GitHub Bot Created on: 14/Jun/22 13:15 Start Date: 14/Jun/22 13:15 Worklog Time Spent: 10m Work Description: kgyrtkirk merged PR #3348: URL: https://github.com/apache/hive/pull/3348 Issue Time Tracking --- Worklog Id: (was: 781173) Time Spent: 1h 40m (was: 1.5h) > MetaStoreDirectSql test query should not query the whole DBS table > -- > > Key: HIVE-25879 > URL: https://issues.apache.org/jira/browse/HIVE-25879 > Project: Hive > Issue Type: Bug >Reporter: Miklos Szurap >Assignee: Miklos Szurap >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The runTestQuery() in the > org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java is using a test query > {code:java} > select "DB_ID" from "DBS"{code} > to determine whether the direct SQL can be used. > With larger deployments with many (10k+) Hive databases it would be more > efficienct to query a small table instead, for example the "VERSION" table > should always have a single row only. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-25879) MetaStoreDirectSql test query should not query the whole DBS table
[ https://issues.apache.org/jira/browse/HIVE-25879?focusedWorklogId=780869=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780869 ] ASF GitHub Bot logged work on HIVE-25879: - Author: ASF GitHub Bot Created on: 13/Jun/22 15:30 Start Date: 13/Jun/22 15:30 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on code in PR #3348: URL: https://github.com/apache/hive/pull/3348#discussion_r895859650 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java: ## @@ -316,17 +316,20 @@ private boolean ensureDbInit() { } private boolean runTestQuery() { +boolean doTrace = LOG.isDebugEnabled(); Transaction tx = pm.currentTransaction(); boolean doCommit = false; if (!tx.isActive()) { tx.begin(); doCommit = true; } // Run a self-test query. If it doesn't work, we will self-disable. What a PITA... -String selfTestQuery = "select \"DB_ID\" from " + DBS + ""; +String selfTestQuery = "select \"DB_ID\" from " + DBS + " WHERE \"DB_ID\"=1"; Review Comment: `where 1=0` sounds good to me...or `DB_ID=-1`; but this will be ok as well Issue Time Tracking --- Worklog Id: (was: 780869) Time Spent: 1.5h (was: 1h 20m) > MetaStoreDirectSql test query should not query the whole DBS table > -- > > Key: HIVE-25879 > URL: https://issues.apache.org/jira/browse/HIVE-25879 > Project: Hive > Issue Type: Bug >Reporter: Miklos Szurap >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > The runTestQuery() in the > org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java is using a test query > {code:java} > select "DB_ID" from "DBS"{code} > to determine whether the direct SQL can be used. > With larger deployments with many (10k+) Hive databases it would be more > efficienct to query a small table instead, for example the "VERSION" table > should always have a single row only. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-25879) MetaStoreDirectSql test query should not query the whole DBS table
[ https://issues.apache.org/jira/browse/HIVE-25879?focusedWorklogId=779356=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779356 ] ASF GitHub Bot logged work on HIVE-25879: - Author: ASF GitHub Bot Created on: 08/Jun/22 08:32 Start Date: 08/Jun/22 08:32 Worklog Time Spent: 10m Work Description: mszurap commented on code in PR #3348: URL: https://github.com/apache/hive/pull/3348#discussion_r892075811 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java: ## @@ -316,17 +316,20 @@ private boolean ensureDbInit() { } private boolean runTestQuery() { +boolean doTrace = LOG.isDebugEnabled(); Transaction tx = pm.currentTransaction(); boolean doCommit = false; if (!tx.isActive()) { tx.begin(); doCommit = true; } // Run a self-test query. If it doesn't work, we will self-disable. What a PITA... -String selfTestQuery = "select \"DB_ID\" from " + DBS + ""; +String selfTestQuery = "select \"DB_ID\" from " + DBS + " WHERE \"DB_ID\"=1"; Review Comment: Yes, the results do not matter at this part of the code. Actually I did not want to introduce RDBMS specific clauses like "LIMIT 0" (which is not recognized by Oracle for example), also the "WHERE 1=0" could be a valid "no-results" filter, but again, I do not know if all different DBs like that or not. Issue Time Tracking --- Worklog Id: (was: 779356) Time Spent: 1h 20m (was: 1h 10m) > MetaStoreDirectSql test query should not query the whole DBS table > -- > > Key: HIVE-25879 > URL: https://issues.apache.org/jira/browse/HIVE-25879 > Project: Hive > Issue Type: Bug >Reporter: Miklos Szurap >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > The runTestQuery() in the > org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java is using a test query > {code:java} > select "DB_ID" from "DBS"{code} > to determine whether the direct SQL can be used. > With larger deployments with many (10k+) Hive databases it would be more > efficienct to query a small table instead, for example the "VERSION" table > should always have a single row only. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-25879) MetaStoreDirectSql test query should not query the whole DBS table
[ https://issues.apache.org/jira/browse/HIVE-25879?focusedWorklogId=779194=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779194 ] ASF GitHub Bot logged work on HIVE-25879: - Author: ASF GitHub Bot Created on: 07/Jun/22 17:15 Start Date: 07/Jun/22 17:15 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on code in PR #3348: URL: https://github.com/apache/hive/pull/3348#discussion_r891508679 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java: ## @@ -316,17 +316,20 @@ private boolean ensureDbInit() { } private boolean runTestQuery() { +boolean doTrace = LOG.isDebugEnabled(); Transaction tx = pm.currentTransaction(); boolean doCommit = false; if (!tx.isActive()) { tx.begin(); doCommit = true; } // Run a self-test query. If it doesn't work, we will self-disable. What a PITA... -String selfTestQuery = "select \"DB_ID\" from " + DBS + ""; +String selfTestQuery = "select \"DB_ID\" from " + DBS + " WHERE \"DB_ID\"=1"; Review Comment: this query may or may not have results... I think both is fine... how about changing it to `LIMIT 0` ? then it will not even need a primary-key scan... :) Issue Time Tracking --- Worklog Id: (was: 779194) Time Spent: 1h 10m (was: 1h) > MetaStoreDirectSql test query should not query the whole DBS table > -- > > Key: HIVE-25879 > URL: https://issues.apache.org/jira/browse/HIVE-25879 > Project: Hive > Issue Type: Bug >Reporter: Miklos Szurap >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > The runTestQuery() in the > org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java is using a test query > {code:java} > select "DB_ID" from "DBS"{code} > to determine whether the direct SQL can be used. > With larger deployments with many (10k+) Hive databases it would be more > efficienct to query a small table instead, for example the "VERSION" table > should always have a single row only. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-25879) MetaStoreDirectSql test query should not query the whole DBS table
[ https://issues.apache.org/jira/browse/HIVE-25879?focusedWorklogId=778978=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778978 ] ASF GitHub Bot logged work on HIVE-25879: - Author: ASF GitHub Bot Created on: 07/Jun/22 10:36 Start Date: 07/Jun/22 10:36 Worklog Time Spent: 10m Work Description: mszurap opened a new pull request, #3348: URL: https://github.com/apache/hive/pull/3348 ### What changes were proposed in this pull request? Performance optimization: the MetaStoreDirectSql#runTestQuery determines whether the direct SQL can be used. Before the fix the code queries the whole DBS table without any filters. The proposed change is to add a filter "WHERE DB_ID=1" which should be fast as it is the primary key for that table. ### Why are the changes needed? Performance improvement. With bigger deployments the test query can run 10x faster (~3 ms instead of 30 ms), and this test query runs for every ObjectStore initialization. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? In my local dev env I've run the "mvn test" successfully in the "standalone-metastore". Issue Time Tracking --- Worklog Id: (was: 778978) Time Spent: 1h (was: 50m) > MetaStoreDirectSql test query should not query the whole DBS table > -- > > Key: HIVE-25879 > URL: https://issues.apache.org/jira/browse/HIVE-25879 > Project: Hive > Issue Type: Bug >Reporter: Miklos Szurap >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > The runTestQuery() in the > org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java is using a test query > {code:java} > select "DB_ID" from "DBS"{code} > to determine whether the direct SQL can be used. > With larger deployments with many (10k+) Hive databases it would be more > efficienct to query a small table instead, for example the "VERSION" table > should always have a single row only. -- This message was sent by Atlassian Jira (v8.20.7#820007)