[ 
https://issues.apache.org/jira/browse/KNOX-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650250#comment-16650250
 ] 

Kevin Risden commented on KNOX-1524:
------------------------------------

h2. Test Case and Reproduction

The following results are tested with:
 * a single 4 core 8GB RAM Centos 7 VM on my Macbook Pro laptop
 * openjdk version "1.8.0_181"
 * Hadoop 3.1.1 single node pseudo distributed
 * Hive 3.1.0 with single HiveServer2 node
 ** 
{code:java}
/opt/apache-hive-3.1.0-bin/bin/hiveserver2 --hiveconf 
hive.server2.transport.mode=http --hiveconf hive.server2.enable.doAs=false 
--hiveconf fs.hdfs.impl.disable.cache=true --hiveconf 
fs.file.impl.disable.cache=true{code}

 ** Enabling or disabling the filesystem cache did not change the results
 * Knox 1.1.0 without SSL
 * data set - [http://stat-computing.org/dataexpo/2009/the-data.html] - 
1990.csv - 486MB
 * "select *" from table with single column
 * Limit to first 1 million rows

Create table
{code:java}
CREATE TABLE tbl (a string) STORED AS TEXTFILE LOCATION '/tmp/1990';{code}
Testing commands
 * HDFS native
 ** 
{code:java}
time hdfs dfs -text /tmp/1990/1990.csv | head -n 1000000 > /dev/null{code}

 * Hive binary
 ** 
{code:java}
time /opt/apache-hive-3.1.0-bin/bin/beeline -u 
'jdbc:hive2://hive.vagrant:10000/' -n admin -p admin-password -e 'select * from 
tbl limit 1000000' > /dev/null{code}

 * Hive HTTP
 ** 
{code:java}
time /opt/apache-hive-3.1.0-bin/bin/beeline -u 
'jdbc:hive2://hive.vagrant:10001/;transportMode=http;httpPath=cliservice' -n 
admin -p admin-password -e 'select * from tbl limit 1000000' > /dev/null{code}

 * Hive Knox
 ** 
{code:java}
time /opt/apache-hive-3.1.0-bin/bin/beeline -u 
'jdbc:hive2://hive.vagrant:8443/;transportMode=http;httpPath=gateway/sandbox/hive'
 -n admin -p admin-password -e 'select * from tbl limit 1000000' > 
/dev/null{code}

Assumptions
 * JVM startup time is approximately the same for each run
 * Hive is using native Hadoop libraries (checked with ps aux | grep native)

> Hive "select *" performance evaluation
> --------------------------------------
>
>                 Key: KNOX-1524
>                 URL: https://issues.apache.org/jira/browse/KNOX-1524
>             Project: Apache Knox
>          Issue Type: Task
>            Reporter: Kevin Risden
>            Assignee: Kevin Risden
>            Priority: Major
>             Fix For: 1.2.0
>
>
> While looking at WebHDFS performance in KNOX-1221, I decided to look a bit 
> more into performance for common use cases. Hive performance is another area 
> that could use some research.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to