Re: Want to improve the performance for execution of Hive Jobs.
Hi Bhavesh For the two properties you mentioned, mapred.map.tasks Number of map tasks is determined from input split and input format. mapred.reduce.tasks Your hive job may not require a reduce task, hence hive sets number of reducers to zero Other parameters, I'm not sure why it is not even reflecting in job.xml. Regards Bejoy KS From: Bhavesh Shah bhavesh25s...@gmail.com To: user@hive.apache.org; bejoy...@yahoo.com Sent: Tuesday, May 8, 2012 6:16 PM Subject: Re: Want to improve the performance for execution of Hive Jobs. Thanks Bejoy for your reply. Yes I saw that for ewvery job new XML is created. In that I saw that whatever variable I set is different from that. Example I have set mapred.map.tasks=10 and mapred.reduce.tasks=2 and In for all job XML it is showing value for map is 1 and for reduce is 0. Same thing are with other parameters too. why is it? On Tue, May 8, 2012 at 5:32 PM, Bejoy KS bejoy...@yahoo.com wrote: Hi Bhavesh On a job level, if you set/override some properties it won't go into mapred-site.xml. Check your corresponding Job.xml to get the values. Also confirm from task logs that there is no warnings with respect to overriding those properties. If these two are good then you can confirm that the properties supplied by you are actually utilized for the job. Disclaimer: I'm not a EWS guy to comment on some specifics in there. My responses are related to generic hadoop behavior. :) Regards Bejoy KS Sent from handheld, please excuse typos. From: Bhavesh Shah bhavesh25s...@gmail.com Date: Tue, 8 May 2012 17:15:44 +0530 To: user@hive.apache.org; Bejoy Ksbejoy...@yahoo.com ReplyTo: user@hive.apache.org Subject: Re: Want to improve the performance for execution of Hive Jobs. Hello Bejoy KS, I did in the same way by executing hive -f filename on Amazon EMR. and when I observed the mapred-site.xml, all variables that I have set in above file are set by default with their values. I didn't see my set values. And the performance is slow too. I have tried this on my local cluster by setting this values and I saw some boost in the performance. On Tue, May 8, 2012 at 4:23 PM, Bejoy Ks bejoy...@yahoo.com wrote: Hi Bhavesh I'm not sure of AWS, but from a quick reading cluster wide settings like hdfs block size can be set on hdfs-site.xml through bootstrap actions. Since you are changing hdfs block size set min and max split size across the cluster using bootstrap actions itself. The rest of the properties can on set on a per job level. Doesn't AWS provide an option to use hive -f? If so, just provide all the properties required for tuning the query followed by queries(in order) in a file and simply execute it using hive -f file name. Regards Bejoy KS From: Bhavesh Shah bhavesh25s...@gmail.com To: user@hive.apache.org; Bejoy Ks bejoy...@yahoo.com Sent: Tuesday, May 8, 2012 3:33 PM Subject: Re: Want to improve the performance for execution of Hive Jobs. Thanks Bejoy KS for your reply, I want to ask one thing that If I want to set this parameter on Amazon Elastic Mapreduce then how can I set these variable like: e.g. SET mapred.min.split.size=m; SET mapred.max.split.size=m+n; set dfs.block.size=128 set mapred.compress.map.output=true set io.sort.mb=400 etc For all this do I need to write shell script for setting this variables on the particular path /home/hadoop/hive/bin/hive -e 'set .' or pass all this steps in bootstrap actions??? I found this link to pass the bootstrap actions http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/Bootstrap.html#BootstrapPredefined What should I do in such case?? On Tue, May 8, 2012 at 2:55 PM, Bejoy Ks bejoy...@yahoo.com wrote: Hi Bhavesh In sqoop you can optimize the performance by using --direct mode for import and increasing the number of mappers used for import. When you increase the number of mappers you need to ensure that the RDBMS connection pool will handle those number of connections gracefully. Also use a evenly distributed column as --split-by, that'll ensure that all mappers are kind of equally loaded. min split size and map split size can be set on a job level. But, there are chances of slight loss in data locality if you increase these values. By increasing these values you are increasing the data volume processed per mapper so less number of mappers , now you need to see whether this will that get you substantial performance gains. I havent seen much gains there when I tried out those on some of my workflows in the past. A better approach than this would be increasing the hdfs block size itself if your cluster deals with relatively larger files. Of you change the hdfs block size then make the changes accordingly on min split and max split values. You can set all min and max split sizes
does hive 0.9 supported hbase 0.90?
In 0.8.1 version ,it’s ok But in 0.9 When I run selelct * from tbl, it was faild. Exception in thread main java.lang.NoSuchMethodError: org.apache.hadoop.hbase.mapred.TableMapReduceUtil.initCredentials(Lorg/apache/hadoop/mapred/JobConf;)V at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:419) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:281) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:320) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:154) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1377) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:269) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) I see the codes in 0.90, this codes was added in HiveHBaseTableInputFormat.getSplits TableMapReduceUtil.initCredentials(jobConf); Should I change other fileinputformat? Best regards Ransom.
Re: does hive 0.9 supported hbase 0.90?
Hi Ransom, Hive 0.9 requires HBase 0.92 to work correctly. Thanks, Ashutosh On Wed, May 9, 2012 at 12:47 AM, Hezhiqiang (Ransom) ransom.hezhiqi...@huawei.com wrote: In 0.8.1 version ,it’s ok But in 0.9 When I run selelct * from tbl, it was faild. ** ** Exception in thread main java.lang.NoSuchMethodError: org.apache.hadoop.hbase.mapred.TableMapReduceUtil.initCredentials(Lorg/apache/hadoop/mapred/JobConf;)V at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:419) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:281) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:320) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:154) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1377)** ** at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:269)** ** at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689)*** * at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)** ** at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) ** ** I see the codes in 0.90, this codes was added in HiveHBaseTableInputFormat.getSplits *TableMapReduceUtil*.*initCredentials**(jobConf)*; ** ** Should I change other fileinputformat? ** ** Best regards Ransom. ** ** ** **
RE: does hive 0.9 supported hbase 0.90?
Thank you Ashutosh Where is the JIRA address , I doesn’t found it in JIRA and 0.90 release notes. Best regards Ransom. From: Ashutosh Chauhan [mailto:hashut...@apache.org] Sent: Wednesday, May 09, 2012 4:29 PM To: user@hive.apache.org Cc: Wenzaohua Subject: Re: does hive 0.9 supported hbase 0.90? Hi Ransom, Hive 0.9 requires HBase 0.92 to work correctly. Thanks, Ashutosh On Wed, May 9, 2012 at 12:47 AM, Hezhiqiang (Ransom) ransom.hezhiqi...@huawei.commailto:ransom.hezhiqi...@huawei.com wrote: In 0.8.1 version ,it’s ok But in 0.9 When I run selelct * from tbl, it was faild. Exception in thread main java.lang.NoSuchMethodError: org.apache.hadoop.hbase.mapred.TableMapReduceUtil.initCredentials(Lorg/apache/hadoop/mapred/JobConf;)V at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:419) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:281) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:320) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:154) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1377) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:269) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) I see the codes in 0.90, this codes was added in HiveHBaseTableInputFormat.getSplits TableMapReduceUtil.initCredentials(jobConf); Should I change other fileinputformat? Best regards Ransom.
Re: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.MapRedTask
Re-sending this since I didn't get a response. Any pointers would be much appreciated! Thank you! Mark - Original Message - From: Mark Grover mgro...@oanda.com To: user@hive.apache.org Sent: Monday, May 7, 2012 5:03:02 PM Subject: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.MapRedTask Hi all, I wanted to see if anyone has seen this error before: Query returned non-zero code: 9, cause: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.MapRedTask; nested exception is java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.MapRedTask I issued the query (that failed) from a Java client using Hive JDBC driver. There was already a Hive query running on the cluster that was issued from the Hive CLI. The queries are correct and I have ran them separately without any problem. To me, it seems like a timeout error which happens when the query issued through the JDBC driver doesn't get scheduled in a stipulated amount of time. Has anyone seen a similar error before? How did you fix it? Thank you in advance, Mark Mark Grover, Business Intelligence Analyst OANDA Corporation www: oanda.com www: fxtrade.com
Re: does hive 0.9 supported hbase 0.90?
Hi Ransom, Jira which bumped the Hbase version: https://issues.apache.org/jira/browse/HIVE-2748 Ashutosh On Wed, May 9, 2012 at 1:43 AM, Hezhiqiang (Ransom) ransom.hezhiqi...@huawei.com wrote: Thank you Ashutosh Where is the JIRA address , I doesn’t found it in JIRA and 0.90 release notes. ** ** Best regards Ransom. ** ** *From:* Ashutosh Chauhan [mailto:hashut...@apache.org] *Sent:* Wednesday, May 09, 2012 4:29 PM *To:* user@hive.apache.org *Cc:* Wenzaohua *Subject:* Re: does hive 0.9 supported hbase 0.90? ** ** Hi Ransom, ** ** Hive 0.9 requires HBase 0.92 to work correctly. ** ** Thanks, Ashutosh On Wed, May 9, 2012 at 12:47 AM, Hezhiqiang (Ransom) ransom.hezhiqi...@huawei.com wrote: In 0.8.1 version ,it’s ok But in 0.9 When I run selelct * from tbl, it was faild. Exception in thread main java.lang.NoSuchMethodError: org.apache.hadoop.hbase.mapred.TableMapReduceUtil.initCredentials(Lorg/apache/hadoop/mapred/JobConf;)V at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:419) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:281) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:320) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:154) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1377)** ** at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:269)** ** at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689)*** * at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)** ** at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) I see the codes in 0.90, this codes was added in HiveHBaseTableInputFormat.getSplits *TableMapReduceUtil*.*initCredentials**(jobConf)*; Should I change other fileinputformat? Best regards Ransom. ** **
Re: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.MapRedTask
this thread might give you an idea http://osdir.com/ml/hive-user-hadoop-apache/2010-06/msg00018.html On Wed, May 9, 2012 at 7:28 PM, Mark Grover mgro...@oanda.com wrote: Re-sending this since I didn't get a response. Any pointers would be much appreciated! Thank you! Mark - Original Message - From: Mark Grover mgro...@oanda.com To: user@hive.apache.org Sent: Monday, May 7, 2012 5:03:02 PM Subject: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.MapRedTask Hi all, I wanted to see if anyone has seen this error before: Query returned non-zero code: 9, cause: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.MapRedTask; nested exception is java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.MapRedTask I issued the query (that failed) from a Java client using Hive JDBC driver. There was already a Hive query running on the cluster that was issued from the Hive CLI. The queries are correct and I have ran them separately without any problem. To me, it seems like a timeout error which happens when the query issued through the JDBC driver doesn't get scheduled in a stipulated amount of time. Has anyone seen a similar error before? How did you fix it? Thank you in advance, Mark Mark Grover, Business Intelligence Analyst OANDA Corporation www: oanda.com www: fxtrade.com -- Nitin Pawar
Need urgent suggestion on the below issue
Hi All, I have changed the namenode from one server to another when there was a crash of hardware. After configuring the Namenode. When i am executing hive query below error is shown.. bin/hive -e “insert overwrite table pokes select a.* from invites a where a.ds=’2008-08-15′;” Hive history file=/tmp/Bhavesh.Shah/hive_job_log_Bhavesh.Shah_201112021007_2120318983.txt Total MapReduce jobs = 2 Launching Job 1 out of 2 Number of reduce tasks is set to 0 since there’s no reduce operator Starting Job = job_201112011620_0004, Tracking URL = http://x.x.x.b:50030/jobdetails.jsp?jobid=job_201112011620_0004http://localhost:50030/jobdetails.jsp?jobid=job_201112011620_0004 Kill Command = C:\cygwin\home\Bhavesh.Shah\hadoop-0.20.2\/bin/hadoop job -Dmapred.job.tracker=localhost:9101 -kill job_201112011620_0004 2011-12-02 10:07:30,777 Stage-1 map = 0%, reduce = 0% 2011-12-02 10:07:57,796 Stage-1 map = 100%, reduce = 100% Ended Job = job_201112011620_0004 with errors FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask I have noticed that it is trying to communicate with the old host.I am unable to trouble shoot where i have done wrong in building the hadoop Namenode. Can you please suggest me why hive is not able to communicate to the new Name node. -- Regards, Varun Kumar.P
Re: Need urgent suggestion on the below issue
A hive table stores the full HDFS URI to the table such as hdfs://hostname:9120/user/hive/warehouse You likely have restored you name node to a different hostname and now hive is not able to locate it. You might be able to create a DNS cname to resolve this. Hind sight, is 20/20 but I would have restored the name node to the same host/ip. If the cname does not work you will have to alter all your table to the correct path. Edward On Wed, May 9, 2012 at 12:14 PM, varun kumar varun@gmail.com wrote: Hi All, I have changed the namenode from one server to another when there was a crash of hardware. After configuring the Namenode. When i am executing hive query below error is shown.. bin/hive -e “insert overwrite table pokes select a.* from invites a where a.ds=’2008-08-15′;” Hive history file=/tmp/Bhavesh.Shah/hive_job_log_Bhavesh.Shah_201112021007_2120318983.txt Total MapReduce jobs = 2 Launching Job 1 out of 2 Number of reduce tasks is set to 0 since there’s no reduce operator Starting Job = job_201112011620_0004, Tracking URL =http://x.x.x.b:50030/jobdetails.jsp?jobid=job_201112011620_0004 Kill Command = C:\cygwin\home\Bhavesh.Shah\hadoop-0.20.2\/bin/hadoop job -Dmapred.job.tracker=localhost:9101 -kill job_201112011620_0004 2011-12-02 10:07:30,777 Stage-1 map = 0%, reduce = 0% 2011-12-02 10:07:57,796 Stage-1 map = 100%, reduce = 100% Ended Job = job_201112011620_0004 with errors FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask I have noticed that it is trying to communicate with the old host.I am unable to trouble shoot where i have done wrong in building the hadoop Namenode. Can you please suggest me why hive is not able to communicate to the new Name node. -- Regards, Varun Kumar.P
Re: Need urgent suggestion on the below issue
Varun – So yes, Hive stores the full URI to the NameNode in the metadata for every table and partition. From my experience you're best off modifying the metadata to point to the new NN, as opposed to trying to manipulate DNS. Fortunately, this is fairly straightforward since there's mainly one column you need to modify, and assuming you're using something like MySQL will only require a global search-and-replace on the URI in this column. I don't remember the exact table that contains this info, but if you browse the metastore tables you should find a LOCATION column which contains the NN URI that you need to change. On Wed, May 9, 2012 at 11:14 AM, varun kumar varun@gmail.com wrote: Hi All, I have changed the namenode from one server to another when there was a crash of hardware. After configuring the Namenode. When i am executing hive query below error is shown.. bin/hive -e “insert overwrite table pokes select a.* from invites a where a.ds=’2008-08-15′;” Hive history file=/tmp/Bhavesh.Shah/hive_job_log_Bhavesh.Shah_201112021007_2120318983.txt Total MapReduce jobs = 2 Launching Job 1 out of 2 Number of reduce tasks is set to 0 since there’s no reduce operator Starting Job = job_201112011620_0004, Tracking URL = http://x.x.x.b:50030/jobdetails.jsp?jobid=job_201112011620_0004http://localhost:50030/jobdetails.jsp?jobid=job_201112011620_0004 Kill Command = C:\cygwin\home\Bhavesh.Shah\hadoop-0.20.2\/bin/hadoop job -Dmapred.job.tracker=localhost:9101 -kill job_201112011620_0004 2011-12-02 10:07:30,777 Stage-1 map = 0%, reduce = 0% 2011-12-02 10:07:57,796 Stage-1 map = 100%, reduce = 100% Ended Job = job_201112011620_0004 with errors FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask I have noticed that it is trying to communicate with the old host.I am unable to trouble shoot where i have done wrong in building the hadoop Namenode. Can you please suggest me why hive is not able to communicate to the new Name node. -- Regards, Varun Kumar.P
Hive-Hadoop compatibility
Hi all, Does anyone know if Hive 0.7 or 0.8 can work with Hadoop 0.21.0 or 0.22.0? Thanks, Avrilia