[jira] [Commented] (HIVE-13132) Hive should lazily load and cache metastore (permanent) functions

2016-02-24 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163143#comment-15163143
 ] 

Alan Gates commented on HIVE-13132:
---

Several comments:
# It would be good to test whether HIVE-2573 solves the issue, since there's no 
point in making further changes if it does.
# I see how this code prevents the system from repeatedly downloading the 
functions (since it tracks whether the metastore has been searched) but I don't 
see how it prevents pre-fetching all the functions at startup.
# I don't think using statics in the FunctionRegistry will work.  This will 
cause HiveServer2 to share the function names across sessions, which we don't 
want because there won't be a way to force new functions to be downloaded.  
That is, HS2 will download the set of functions when it first starts, and not 
do so again because the static haveSearchedMetastore will be true.

cc [~jdere] and [~sershe] since both of you have done work in this area 
recently.

> Hive should lazily load and cache metastore (permanent) functions
> -
>
> Key: HIVE-13132
> URL: https://issues.apache.org/jira/browse/HIVE-13132
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.13.1
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-13132.1.patch
>
>
> In Hive 0.13.1, we have noticed that as the number of databases increases, 
> the start-up time of the Hive interactive shell increases. This is because 
> during start-up, all databases are iterated over to fetch the permanent 
> functions to display in the {{SHOW FUNCTIONS}} output.
> {noformat:title=FunctionRegistry.java}
>   private static Set getFunctionNames(boolean searchMetastore) {
> Set functionNames = mFunctions.keySet();
> if (searchMetastore) {
>   functionNames = new HashSet(functionNames);
>   try {
> Hive db = getHive();
> List dbNames = db.getAllDatabases();
> for (String dbName : dbNames) {
>   List funcNames = db.getFunctions(dbName, "*");
>   for (String funcName : funcNames) {
> functionNames.add(FunctionUtils.qualifyFunctionName(funcName, 
> dbName));
>   }
> }
>   } catch (Exception e) {
> LOG.error(e);
> // Continue on, we can still return the functions we've gotten to 
> this point.
>   }
> }
> return functionNames;
>   }
> {noformat}
> Instead of eagerly loading all metastore functions, we should only load them 
> the first time {{SHOW FUNCTIONS}} is invoked. We should also cache the 
> results.
> Note that this issue may have been fixed by HIVE-2573, though I haven't 
> verified this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13132) Hive should lazily load and cache metastore (permanent) functions

2016-02-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163542#comment-15163542
 ] 

Hive QA commented on HIVE-13132:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12789357/HIVE-13132.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7081/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7081/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7081/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-7081/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   74c2f06..f8f2014  branch-2.0 -> origin/branch-2.0
   402fabe..2f73233  master -> origin/master
+ git reset --hard HEAD
HEAD is now at 402fabe HIVE-13064 : Fix passing serde properties from insert 
overwrite directory (Rajat Khandelwal, reviewed by Amareshwari Sriramadasu)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 3 commits, and can be fast-forwarded.
+ git reset --hard origin/master
HEAD is now at 2f73233 HIVE-13128. NullScan fails on a secure setup. (Siddharth 
Seth, reviewed by Sergey Shelukhin)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12789357 - PreCommit-HIVE-TRUNK-Build

> Hive should lazily load and cache metastore (permanent) functions
> -
>
> Key: HIVE-13132
> URL: https://issues.apache.org/jira/browse/HIVE-13132
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.13.1
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-13132.1.patch
>
>
> In Hive 0.13.1, we have noticed that as the number of databases increases, 
> the start-up time of the Hive interactive shell increases. This is because 
> during start-up, all databases are iterated over to fetch the permanent 
> functions to display in the {{SHOW FUNCTIONS}} output.
> {noformat:title=FunctionRegistry.java}
>   private static Set getFunctionNames(boolean searchMetastore) {
> Set functionNames = mFunctions.keySet();
> if (searchMetastore) {
>   functionNames = new HashSet(functionNames);
>   try {
> Hive db = getHive();
> List dbNames = db.getAllDatabases();
> for (String dbName : dbNames) {
>   List funcNames = db.getFunctions(dbName, "*");
>   for (String funcName : funcNames) {
> functionNames.add(FunctionUtils.qualifyFunctionName(funcName, 
> dbName));
>   }
> }
>   } catch (Exception e) {
> LOG.error(e);
> // Continue on, we can still return the functions we've gotten to 
> this poin

[jira] [Commented] (HIVE-13132) Hive should lazily load and cache metastore (permanent) functions

2016-02-25 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167639#comment-15167639
 ] 

Anthony Hsu commented on HIVE-13132:


Thanks for the review, [~alangates].

# I tested and unfortunately, HIVE-2573 does NOT solve this issue. A stacktrace 
in jdb shows all functions are loaded during CliDriver start-up:
{noformat}
main[1] where
  [1] org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions (Hive.java:173)
  [2] org.apache.hadoop.hive.ql.metadata.Hive. (Hive.java:166)
  [3] org.apache.hadoop.hive.ql.session.SessionState.start 
(SessionState.java:503)
  [4] org.apache.hadoop.hive.cli.CliDriver.run (CliDriver.java:677)
  [5] org.apache.hadoop.hive.cli.CliDriver.main (CliDriver.java:621)
  [6] sun.reflect.NativeMethodAccessorImpl.invoke0 (native method)
  [7] sun.reflect.NativeMethodAccessorImpl.invoke 
(NativeMethodAccessorImpl.java:62)
  [8] sun.reflect.DelegatingMethodAccessorImpl.invoke 
(DelegatingMethodAccessorImpl.java:43)
  [9] java.lang.reflect.Method.invoke (Method.java:483)
  [10] org.apache.hadoop.util.RunJar.run (RunJar.java:221)
  [11] org.apache.hadoop.util.RunJar.main (RunJar.java:136)
{noformat}
# My fix is a bit hacky and only works on the Hive 0.13.1 branch. I basically 
changed the CliDriver initialization code to use 
{{FunctionRegistry.getFunctionNames(String funcPatternStr)}} instead of 
{{FunctionRegistry.getFunctionNames()}}. In the Hive 0.13.1 branch, the former 
does NOT search the metastore while the latter does. This is no longer the case 
in trunk.
# I don't have much experience with HS2 but I will take a look.

I'll work on a cleaner solution that removes the pre-loading of metastore 
functions from the CliDriver initialization code path. Suggestions welcome.

> Hive should lazily load and cache metastore (permanent) functions
> -
>
> Key: HIVE-13132
> URL: https://issues.apache.org/jira/browse/HIVE-13132
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.13.1
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-13132.1.patch
>
>
> In Hive 0.13.1, we have noticed that as the number of databases increases, 
> the start-up time of the Hive interactive shell increases. This is because 
> during start-up, all databases are iterated over to fetch the permanent 
> functions to display in the {{SHOW FUNCTIONS}} output.
> {noformat:title=FunctionRegistry.java}
>   private static Set getFunctionNames(boolean searchMetastore) {
> Set functionNames = mFunctions.keySet();
> if (searchMetastore) {
>   functionNames = new HashSet(functionNames);
>   try {
> Hive db = getHive();
> List dbNames = db.getAllDatabases();
> for (String dbName : dbNames) {
>   List funcNames = db.getFunctions(dbName, "*");
>   for (String funcName : funcNames) {
> functionNames.add(FunctionUtils.qualifyFunctionName(funcName, 
> dbName));
>   }
> }
>   } catch (Exception e) {
> LOG.error(e);
> // Continue on, we can still return the functions we've gotten to 
> this point.
>   }
> }
> return functionNames;
>   }
> {noformat}
> Instead of eagerly loading all metastore functions, we should only load them 
> the first time {{SHOW FUNCTIONS}} is invoked. We should also cache the 
> results.
> Note that this issue may have been fixed by HIVE-2573, though I haven't 
> verified this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13132) Hive should lazily load and cache metastore (permanent) functions

2016-03-01 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174819#comment-15174819
 ] 

Jason Dere commented on HIVE-13132:
---

It is true that HIVE-2573 still loads all permanent functions from all 
databases during startup. One thing that might improve things is HIVE-10319, 
which rather than calling once for every database, only makes a single call to 
the metastore to retrieve all permanent functions. Can you check if this helps 
at all?

> Hive should lazily load and cache metastore (permanent) functions
> -
>
> Key: HIVE-13132
> URL: https://issues.apache.org/jira/browse/HIVE-13132
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.13.1
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-13132.1.patch
>
>
> In Hive 0.13.1, we have noticed that as the number of databases increases, 
> the start-up time of the Hive interactive shell increases. This is because 
> during start-up, all databases are iterated over to fetch the permanent 
> functions to display in the {{SHOW FUNCTIONS}} output.
> {noformat:title=FunctionRegistry.java}
>   private static Set getFunctionNames(boolean searchMetastore) {
> Set functionNames = mFunctions.keySet();
> if (searchMetastore) {
>   functionNames = new HashSet(functionNames);
>   try {
> Hive db = getHive();
> List dbNames = db.getAllDatabases();
> for (String dbName : dbNames) {
>   List funcNames = db.getFunctions(dbName, "*");
>   for (String funcName : funcNames) {
> functionNames.add(FunctionUtils.qualifyFunctionName(funcName, 
> dbName));
>   }
> }
>   } catch (Exception e) {
> LOG.error(e);
> // Continue on, we can still return the functions we've gotten to 
> this point.
>   }
> }
> return functionNames;
>   }
> {noformat}
> Instead of eagerly loading all metastore functions, we should only load them 
> the first time {{SHOW FUNCTIONS}} is invoked. We should also cache the 
> results.
> Note that this issue may have been fixed by HIVE-2573, though I haven't 
> verified this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13132) Hive should lazily load and cache metastore (permanent) functions

2016-03-04 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181545#comment-15181545
 ] 

Anthony Hsu commented on HIVE-13132:


[~jdere], thanks for pointing out HIVE-10319! I backported it to Hive 0.13.1 
and it reduced my CLI start-up time from 20+ seconds to < 1 second. {{SHOW 
FUNCTIONS}} also now takes < 1 second rather than 20+ seconds. This ticket is 
no longer needed, so I will resolve it.

> Hive should lazily load and cache metastore (permanent) functions
> -
>
> Key: HIVE-13132
> URL: https://issues.apache.org/jira/browse/HIVE-13132
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.13.1
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-13132.1.patch
>
>
> In Hive 0.13.1, we have noticed that as the number of databases increases, 
> the start-up time of the Hive interactive shell increases. This is because 
> during start-up, all databases are iterated over to fetch the permanent 
> functions to display in the {{SHOW FUNCTIONS}} output.
> {noformat:title=FunctionRegistry.java}
>   private static Set getFunctionNames(boolean searchMetastore) {
> Set functionNames = mFunctions.keySet();
> if (searchMetastore) {
>   functionNames = new HashSet(functionNames);
>   try {
> Hive db = getHive();
> List dbNames = db.getAllDatabases();
> for (String dbName : dbNames) {
>   List funcNames = db.getFunctions(dbName, "*");
>   for (String funcName : funcNames) {
> functionNames.add(FunctionUtils.qualifyFunctionName(funcName, 
> dbName));
>   }
> }
>   } catch (Exception e) {
> LOG.error(e);
> // Continue on, we can still return the functions we've gotten to 
> this point.
>   }
> }
> return functionNames;
>   }
> {noformat}
> Instead of eagerly loading all metastore functions, we should only load them 
> the first time {{SHOW FUNCTIONS}} is invoked. We should also cache the 
> results.
> Note that this issue may have been fixed by HIVE-2573, though I haven't 
> verified this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13132) Hive should lazily load and cache metastore (permanent) functions

2016-03-04 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181558#comment-15181558
 ] 

Anthony Hsu commented on HIVE-13132:


Correction: CLI start-up time was reduce from 20+ seconds to < 5 seconds.

> Hive should lazily load and cache metastore (permanent) functions
> -
>
> Key: HIVE-13132
> URL: https://issues.apache.org/jira/browse/HIVE-13132
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.13.1
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-13132.1.patch
>
>
> In Hive 0.13.1, we have noticed that as the number of databases increases, 
> the start-up time of the Hive interactive shell increases. This is because 
> during start-up, all databases are iterated over to fetch the permanent 
> functions to display in the {{SHOW FUNCTIONS}} output.
> {noformat:title=FunctionRegistry.java}
>   private static Set getFunctionNames(boolean searchMetastore) {
> Set functionNames = mFunctions.keySet();
> if (searchMetastore) {
>   functionNames = new HashSet(functionNames);
>   try {
> Hive db = getHive();
> List dbNames = db.getAllDatabases();
> for (String dbName : dbNames) {
>   List funcNames = db.getFunctions(dbName, "*");
>   for (String funcName : funcNames) {
> functionNames.add(FunctionUtils.qualifyFunctionName(funcName, 
> dbName));
>   }
> }
>   } catch (Exception e) {
> LOG.error(e);
> // Continue on, we can still return the functions we've gotten to 
> this point.
>   }
> }
> return functionNames;
>   }
> {noformat}
> Instead of eagerly loading all metastore functions, we should only load them 
> the first time {{SHOW FUNCTIONS}} is invoked. We should also cache the 
> results.
> Note that this issue may have been fixed by HIVE-2573, though I haven't 
> verified this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)