Re: Need urgent help on hive query performance

2014-05-30 Thread kulkarni.swar...@gmail.com
I feel it's pretty hard to answer this without understanding the following:

1. What exactly are you trying to query? CSV? Avro? 
2. Where is your data? HDFS? HBase? Local filesystem?
3. What version of hive are you using?
4. What is an example of a query that is slow? Some queries like joins and
stuff would be inherently slower than other simpler ones(though can be
optimized).

Thanks,

-- 
Swarnim


On Fri, May 30, 2014 at 5:32 PM, shouvanik.hal...@accenture.com wrote:

  Can you please give a specific example or blog to refer to. I did not
 understand



 *From:* Ashish Garg [mailto:gargcreation1...@gmail.com]
 *Sent:* Friday, May 30, 2014 3:31 PM
 *To:* user@hive.apache.org
 *Subject:* Re: Need urgent help on hive query performance



 try partitioning the table and run the queries which are partition
 specific. Hope this helps.

 Thanks and Regards,

 Ashish Garg.



 On Fri, May 30, 2014 at 6:05 PM, shouvanik.hal...@accenture.com wrote:

 Hi,



 Does anybody  help urgently on optimizing hive query performance? I am
 looking more Hadoop tuning point of view. Currently, small amount of table
 takes much time to query?



 We are running EMR cluster with 1 MASTER node, 2 Core Nodes and  Task
 Nodes.



 Quick help is much appreciated.



 Thanks,

 Shouvanik


  --


 This message is for the designated recipient only and may contain
 privileged, proprietary, or otherwise confidential information. If you have
 received it in error, please notify the sender immediately and delete the
 original. Any other use of the e-mail by you is prohibited. Where allowed
 by local law, electronic communications with Accenture and its affiliates,
 including e-mail and instant messaging (including content), may be scanned
 by our systems for the purposes of information security and assessment of
 internal compliance with Accenture policy.

 __

 www.accenture.com






-- 
Swarnim


Re: Need urgent help on hive query performance

2014-05-30 Thread Ashish Garg
hive Create External Table Emp(

id INT,

name STRING,

Salary INT)

PARTITIONED BY (Country STRING, State STRING)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ‘\t’

LOCATION ‘/user/data/’;

Now load the data which is partition specific. For example,

hive LOAD DATA LOCAL INPATH ‘---‘

OVERWRITE INTO TABLE Emp

PARTITION (Country=’US’ , State=’NJ’);

Now try running queries like

hive Select Count(*), MAX(Salary) FROM Emp Where Country='US' And
State='NJ';

This will optimize your query performance.


On Fri, May 30, 2014 at 6:32 PM, shouvanik.hal...@accenture.com wrote:

  Can you please give a specific example or blog to refer to. I did not
 understand



 *From:* Ashish Garg [mailto:gargcreation1...@gmail.com]
 *Sent:* Friday, May 30, 2014 3:31 PM
 *To:* user@hive.apache.org
 *Subject:* Re: Need urgent help on hive query performance



 try partitioning the table and run the queries which are partition
 specific. Hope this helps.

 Thanks and Regards,

 Ashish Garg.



 On Fri, May 30, 2014 at 6:05 PM, shouvanik.hal...@accenture.com wrote:

 Hi,



 Does anybody  help urgently on optimizing hive query performance? I am
 looking more Hadoop tuning point of view. Currently, small amount of table
 takes much time to query?



 We are running EMR cluster with 1 MASTER node, 2 Core Nodes and  Task
 Nodes.



 Quick help is much appreciated.



 Thanks,

 Shouvanik


  --


 This message is for the designated recipient only and may contain
 privileged, proprietary, or otherwise confidential information. If you have
 received it in error, please notify the sender immediately and delete the
 original. Any other use of the e-mail by you is prohibited. Where allowed
 by local law, electronic communications with Accenture and its affiliates,
 including e-mail and instant messaging (including content), may be scanned
 by our systems for the purposes of information security and assessment of
 internal compliance with Accenture policy.

 __

 www.accenture.com





RE: Need urgent help on hive query performance

2014-05-30 Thread shouvanik.haldar
Pls find the answers



From: kulkarni.swar...@gmail.com [mailto:kulkarni.swar...@gmail.com]
Sent: Friday, May 30, 2014 3:34 PM
To: user@hive.apache.org
Subject: Re: Need urgent help on hive query performance

I feel it's pretty hard to answer this without understanding the following:


1.  What exactly are you trying to query? CSV? Avro? 
HIVE table

2.  Where is your data? HDFS? HBase? Local filesystem?
Data is in s3

3.  What version of hive are you using?
Hive 0.12

4.  What is an example of a query that is slow? Some queries like joins and 
stuff would be inherently slower than other simpler ones(though can be 
optimized).
It has innumerable no of joins. Since its client specific query, u understand I 
cannot share. Sorry about that

Thanks,

--
Swarnim

On Fri, May 30, 2014 at 5:32 PM, 
shouvanik.hal...@accenture.commailto:shouvanik.hal...@accenture.com wrote:
Can you please give a specific example or blog to refer to. I did not understand

From: Ashish Garg 
[mailto:gargcreation1...@gmail.commailto:gargcreation1...@gmail.com]
Sent: Friday, May 30, 2014 3:31 PM
To: user@hive.apache.orgmailto:user@hive.apache.org
Subject: Re: Need urgent help on hive query performance

try partitioning the table and run the queries which are partition specific. 
Hope this helps.
Thanks and Regards,
Ashish Garg.

On Fri, May 30, 2014 at 6:05 PM, 
shouvanik.hal...@accenture.commailto:shouvanik.hal...@accenture.com wrote:
Hi,

Does anybody  help urgently on optimizing hive query performance? I am looking 
more Hadoop tuning point of view. Currently, small amount of table takes much 
time to query?

We are running EMR cluster with 1 MASTER node, 2 Core Nodes and  Task Nodes.

Quick help is much appreciated.

Thanks,
Shouvanik



This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.comhttp://www.accenture.com




--
Swarnim


Re: Need urgent help on hive query performance

2014-05-30 Thread kulkarni.swar...@gmail.com
 It has innumerable no of joins. Since its client specific query, u
understand I cannot share. Sorry about that

Like I said, Joins are slow and in not done correctly could have terrible
performance. A couple of handy techniques depend on how exactly are you
trying to perform the join. For instance, if you are trying to join a
smaller table to a larger one, a map join could work well for you where the
smaller table is kept in-memory when the join is performed. Also if you are
able to break your table down to smaller buckets, you might as well be able
to use a bucketed map join for instance. Following link should be
helpful[1][2].

Hope this helps.

[1]
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization
[2]
http://stackoverflow.com/questions/20199077/hive-efficient-join-of-two-tables


On Fri, May 30, 2014 at 5:38 PM, shouvanik.hal...@accenture.com wrote:

  Pls find the answers







 *From:* kulkarni.swar...@gmail.com [mailto:kulkarni.swar...@gmail.com]
 *Sent:* Friday, May 30, 2014 3:34 PM

 *To:* user@hive.apache.org
 *Subject:* Re: Need urgent help on hive query performance



 I feel it's pretty hard to answer this without understanding the following:



 1.  What exactly are you trying to query? CSV? Avro? 

 HIVE table

 2.  Where is your data? HDFS? HBase? Local filesystem?

 Data is in s3

 3.  What version of hive are you using?

 Hive 0.12

 4.  What is an example of a query that is slow? Some queries like
 joins and stuff would be inherently slower than other simpler ones(though
 can be optimized).

 It has innumerable no of joins. Since its client specific query, u
 understand I cannot share. Sorry about that



 Thanks,



 --
 Swarnim



 On Fri, May 30, 2014 at 5:32 PM, shouvanik.hal...@accenture.com wrote:

 Can you please give a specific example or blog to refer to. I did not
 understand



 *From:* Ashish Garg [mailto:gargcreation1...@gmail.com]
 *Sent:* Friday, May 30, 2014 3:31 PM
 *To:* user@hive.apache.org
 *Subject:* Re: Need urgent help on hive query performance



 try partitioning the table and run the queries which are partition
 specific. Hope this helps.

 Thanks and Regards,

 Ashish Garg.



 On Fri, May 30, 2014 at 6:05 PM, shouvanik.hal...@accenture.com wrote:

 Hi,



 Does anybody  help urgently on optimizing hive query performance? I am
 looking more Hadoop tuning point of view. Currently, small amount of table
 takes much time to query?



 We are running EMR cluster with 1 MASTER node, 2 Core Nodes and  Task
 Nodes.



 Quick help is much appreciated.



 Thanks,

 Shouvanik


  --


 This message is for the designated recipient only and may contain
 privileged, proprietary, or otherwise confidential information. If you have
 received it in error, please notify the sender immediately and delete the
 original. Any other use of the e-mail by you is prohibited. Where allowed
 by local law, electronic communications with Accenture and its affiliates,
 including e-mail and instant messaging (including content), may be scanned
 by our systems for the purposes of information security and assessment of
 internal compliance with Accenture policy.

 __

 www.accenture.com







 --
 Swarnim




-- 
Swarnim


Re: Need urgent help on hive query performance

2014-05-30 Thread Bala Krishna Gangisetty
Another dimension,

Try storing Hive table in ORC format. From my experience, it significantly
improves the performance compare to other formats.

Since you mentioned about join queries, on a side note, as a long term
goal, you probably want to explore Hive with Tez.

--Bala G.


On Fri, May 30, 2014 at 3:59 PM, kulkarni.swar...@gmail.com 
kulkarni.swar...@gmail.com wrote:

  It has innumerable no of joins. Since its client specific query, u
 understand I cannot share. Sorry about that

 Like I said, Joins are slow and in not done correctly could have terrible
 performance. A couple of handy techniques depend on how exactly are you
 trying to perform the join. For instance, if you are trying to join a
 smaller table to a larger one, a map join could work well for you where the
 smaller table is kept in-memory when the join is performed. Also if you are
 able to break your table down to smaller buckets, you might as well be able
 to use a bucketed map join for instance. Following link should be
 helpful[1][2].

 Hope this helps.

 [1]
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization
 [2]
 http://stackoverflow.com/questions/20199077/hive-efficient-join-of-two-tables


 On Fri, May 30, 2014 at 5:38 PM, shouvanik.hal...@accenture.com wrote:

  Pls find the answers







 *From:* kulkarni.swar...@gmail.com [mailto:kulkarni.swar...@gmail.com]
 *Sent:* Friday, May 30, 2014 3:34 PM

 *To:* user@hive.apache.org
 *Subject:* Re: Need urgent help on hive query performance



 I feel it's pretty hard to answer this without understanding the
 following:



 1.  What exactly are you trying to query? CSV? Avro? 

 HIVE table

 2.  Where is your data? HDFS? HBase? Local filesystem?

 Data is in s3

 3.  What version of hive are you using?

 Hive 0.12

 4.  What is an example of a query that is slow? Some queries like
 joins and stuff would be inherently slower than other simpler ones(though
 can be optimized).

 It has innumerable no of joins. Since its client specific query, u
 understand I cannot share. Sorry about that



 Thanks,



 --
 Swarnim



 On Fri, May 30, 2014 at 5:32 PM, shouvanik.hal...@accenture.com wrote:

 Can you please give a specific example or blog to refer to. I did not
 understand



 *From:* Ashish Garg [mailto:gargcreation1...@gmail.com]
 *Sent:* Friday, May 30, 2014 3:31 PM
 *To:* user@hive.apache.org
 *Subject:* Re: Need urgent help on hive query performance



 try partitioning the table and run the queries which are partition
 specific. Hope this helps.

 Thanks and Regards,

 Ashish Garg.



 On Fri, May 30, 2014 at 6:05 PM, shouvanik.hal...@accenture.com wrote:

 Hi,



 Does anybody  help urgently on optimizing hive query performance? I am
 looking more Hadoop tuning point of view. Currently, small amount of table
 takes much time to query?



 We are running EMR cluster with 1 MASTER node, 2 Core Nodes and  Task
 Nodes.



 Quick help is much appreciated.



 Thanks,

 Shouvanik


  --


 This message is for the designated recipient only and may contain
 privileged, proprietary, or otherwise confidential information. If you have
 received it in error, please notify the sender immediately and delete the
 original. Any other use of the e-mail by you is prohibited. Where allowed
 by local law, electronic communications with Accenture and its affiliates,
 including e-mail and instant messaging (including content), may be scanned
 by our systems for the purposes of information security and assessment of
 internal compliance with Accenture policy.

 __

 www.accenture.com







 --
 Swarnim




 --
 Swarnim