Re: Need urgent help on hive query performance
I feel it's pretty hard to answer this without understanding the following: 1. What exactly are you trying to query? CSV? Avro? 2. Where is your data? HDFS? HBase? Local filesystem? 3. What version of hive are you using? 4. What is an example of a query that is slow? Some queries like joins and stuff would be inherently slower than other simpler ones(though can be optimized). Thanks, -- Swarnim On Fri, May 30, 2014 at 5:32 PM, shouvanik.hal...@accenture.com wrote: Can you please give a specific example or blog to refer to. I did not understand *From:* Ashish Garg [mailto:gargcreation1...@gmail.com] *Sent:* Friday, May 30, 2014 3:31 PM *To:* user@hive.apache.org *Subject:* Re: Need urgent help on hive query performance try partitioning the table and run the queries which are partition specific. Hope this helps. Thanks and Regards, Ashish Garg. On Fri, May 30, 2014 at 6:05 PM, shouvanik.hal...@accenture.com wrote: Hi, Does anybody help urgently on optimizing hive query performance? I am looking more Hadoop tuning point of view. Currently, small amount of table takes much time to query? We are running EMR cluster with 1 MASTER node, 2 Core Nodes and Task Nodes. Quick help is much appreciated. Thanks, Shouvanik -- This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. __ www.accenture.com -- Swarnim
Re: Need urgent help on hive query performance
hive Create External Table Emp( id INT, name STRING, Salary INT) PARTITIONED BY (Country STRING, State STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’ LOCATION ‘/user/data/’; Now load the data which is partition specific. For example, hive LOAD DATA LOCAL INPATH ‘---‘ OVERWRITE INTO TABLE Emp PARTITION (Country=’US’ , State=’NJ’); Now try running queries like hive Select Count(*), MAX(Salary) FROM Emp Where Country='US' And State='NJ'; This will optimize your query performance. On Fri, May 30, 2014 at 6:32 PM, shouvanik.hal...@accenture.com wrote: Can you please give a specific example or blog to refer to. I did not understand *From:* Ashish Garg [mailto:gargcreation1...@gmail.com] *Sent:* Friday, May 30, 2014 3:31 PM *To:* user@hive.apache.org *Subject:* Re: Need urgent help on hive query performance try partitioning the table and run the queries which are partition specific. Hope this helps. Thanks and Regards, Ashish Garg. On Fri, May 30, 2014 at 6:05 PM, shouvanik.hal...@accenture.com wrote: Hi, Does anybody help urgently on optimizing hive query performance? I am looking more Hadoop tuning point of view. Currently, small amount of table takes much time to query? We are running EMR cluster with 1 MASTER node, 2 Core Nodes and Task Nodes. Quick help is much appreciated. Thanks, Shouvanik -- This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. __ www.accenture.com
RE: Need urgent help on hive query performance
Pls find the answers From: kulkarni.swar...@gmail.com [mailto:kulkarni.swar...@gmail.com] Sent: Friday, May 30, 2014 3:34 PM To: user@hive.apache.org Subject: Re: Need urgent help on hive query performance I feel it's pretty hard to answer this without understanding the following: 1. What exactly are you trying to query? CSV? Avro? HIVE table 2. Where is your data? HDFS? HBase? Local filesystem? Data is in s3 3. What version of hive are you using? Hive 0.12 4. What is an example of a query that is slow? Some queries like joins and stuff would be inherently slower than other simpler ones(though can be optimized). It has innumerable no of joins. Since its client specific query, u understand I cannot share. Sorry about that Thanks, -- Swarnim On Fri, May 30, 2014 at 5:32 PM, shouvanik.hal...@accenture.commailto:shouvanik.hal...@accenture.com wrote: Can you please give a specific example or blog to refer to. I did not understand From: Ashish Garg [mailto:gargcreation1...@gmail.commailto:gargcreation1...@gmail.com] Sent: Friday, May 30, 2014 3:31 PM To: user@hive.apache.orgmailto:user@hive.apache.org Subject: Re: Need urgent help on hive query performance try partitioning the table and run the queries which are partition specific. Hope this helps. Thanks and Regards, Ashish Garg. On Fri, May 30, 2014 at 6:05 PM, shouvanik.hal...@accenture.commailto:shouvanik.hal...@accenture.com wrote: Hi, Does anybody help urgently on optimizing hive query performance? I am looking more Hadoop tuning point of view. Currently, small amount of table takes much time to query? We are running EMR cluster with 1 MASTER node, 2 Core Nodes and Task Nodes. Quick help is much appreciated. Thanks, Shouvanik This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. __ www.accenture.comhttp://www.accenture.com -- Swarnim
Re: Need urgent help on hive query performance
It has innumerable no of joins. Since its client specific query, u understand I cannot share. Sorry about that Like I said, Joins are slow and in not done correctly could have terrible performance. A couple of handy techniques depend on how exactly are you trying to perform the join. For instance, if you are trying to join a smaller table to a larger one, a map join could work well for you where the smaller table is kept in-memory when the join is performed. Also if you are able to break your table down to smaller buckets, you might as well be able to use a bucketed map join for instance. Following link should be helpful[1][2]. Hope this helps. [1] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization [2] http://stackoverflow.com/questions/20199077/hive-efficient-join-of-two-tables On Fri, May 30, 2014 at 5:38 PM, shouvanik.hal...@accenture.com wrote: Pls find the answers *From:* kulkarni.swar...@gmail.com [mailto:kulkarni.swar...@gmail.com] *Sent:* Friday, May 30, 2014 3:34 PM *To:* user@hive.apache.org *Subject:* Re: Need urgent help on hive query performance I feel it's pretty hard to answer this without understanding the following: 1. What exactly are you trying to query? CSV? Avro? HIVE table 2. Where is your data? HDFS? HBase? Local filesystem? Data is in s3 3. What version of hive are you using? Hive 0.12 4. What is an example of a query that is slow? Some queries like joins and stuff would be inherently slower than other simpler ones(though can be optimized). It has innumerable no of joins. Since its client specific query, u understand I cannot share. Sorry about that Thanks, -- Swarnim On Fri, May 30, 2014 at 5:32 PM, shouvanik.hal...@accenture.com wrote: Can you please give a specific example or blog to refer to. I did not understand *From:* Ashish Garg [mailto:gargcreation1...@gmail.com] *Sent:* Friday, May 30, 2014 3:31 PM *To:* user@hive.apache.org *Subject:* Re: Need urgent help on hive query performance try partitioning the table and run the queries which are partition specific. Hope this helps. Thanks and Regards, Ashish Garg. On Fri, May 30, 2014 at 6:05 PM, shouvanik.hal...@accenture.com wrote: Hi, Does anybody help urgently on optimizing hive query performance? I am looking more Hadoop tuning point of view. Currently, small amount of table takes much time to query? We are running EMR cluster with 1 MASTER node, 2 Core Nodes and Task Nodes. Quick help is much appreciated. Thanks, Shouvanik -- This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. __ www.accenture.com -- Swarnim -- Swarnim
Re: Need urgent help on hive query performance
Another dimension, Try storing Hive table in ORC format. From my experience, it significantly improves the performance compare to other formats. Since you mentioned about join queries, on a side note, as a long term goal, you probably want to explore Hive with Tez. --Bala G. On Fri, May 30, 2014 at 3:59 PM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: It has innumerable no of joins. Since its client specific query, u understand I cannot share. Sorry about that Like I said, Joins are slow and in not done correctly could have terrible performance. A couple of handy techniques depend on how exactly are you trying to perform the join. For instance, if you are trying to join a smaller table to a larger one, a map join could work well for you where the smaller table is kept in-memory when the join is performed. Also if you are able to break your table down to smaller buckets, you might as well be able to use a bucketed map join for instance. Following link should be helpful[1][2]. Hope this helps. [1] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization [2] http://stackoverflow.com/questions/20199077/hive-efficient-join-of-two-tables On Fri, May 30, 2014 at 5:38 PM, shouvanik.hal...@accenture.com wrote: Pls find the answers *From:* kulkarni.swar...@gmail.com [mailto:kulkarni.swar...@gmail.com] *Sent:* Friday, May 30, 2014 3:34 PM *To:* user@hive.apache.org *Subject:* Re: Need urgent help on hive query performance I feel it's pretty hard to answer this without understanding the following: 1. What exactly are you trying to query? CSV? Avro? HIVE table 2. Where is your data? HDFS? HBase? Local filesystem? Data is in s3 3. What version of hive are you using? Hive 0.12 4. What is an example of a query that is slow? Some queries like joins and stuff would be inherently slower than other simpler ones(though can be optimized). It has innumerable no of joins. Since its client specific query, u understand I cannot share. Sorry about that Thanks, -- Swarnim On Fri, May 30, 2014 at 5:32 PM, shouvanik.hal...@accenture.com wrote: Can you please give a specific example or blog to refer to. I did not understand *From:* Ashish Garg [mailto:gargcreation1...@gmail.com] *Sent:* Friday, May 30, 2014 3:31 PM *To:* user@hive.apache.org *Subject:* Re: Need urgent help on hive query performance try partitioning the table and run the queries which are partition specific. Hope this helps. Thanks and Regards, Ashish Garg. On Fri, May 30, 2014 at 6:05 PM, shouvanik.hal...@accenture.com wrote: Hi, Does anybody help urgently on optimizing hive query performance? I am looking more Hadoop tuning point of view. Currently, small amount of table takes much time to query? We are running EMR cluster with 1 MASTER node, 2 Core Nodes and Task Nodes. Quick help is much appreciated. Thanks, Shouvanik -- This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. __ www.accenture.com -- Swarnim -- Swarnim