Re: Hcatalog for Hadoop2
you may have to build it yourself. On Fri, Dec 13, 2013 at 12:56 PM, Sathwik B P sath...@apache.org wrote: Hi, I have Hive 0.12 and Hadoop 2.2. Hcatalog that has been packaged with Hive 0.12 is built for Hadoop 1. Is there a distribution of Hcatalog built for Hadoop 2 ? regards, sathwik -- Nitin Pawar
Re: Hcatalog for Hadoop2
Hi Nithin, Where can I find the instructions to build from source. I would like to build the current hive trunk. regards, sathwik On Fri, Dec 13, 2013 at 2:22 PM, Nitin Pawar nitinpawar...@gmail.comwrote: you may have to build it yourself. On Fri, Dec 13, 2013 at 12:56 PM, Sathwik B P sath...@apache.org wrote: Hi, I have Hive 0.12 and Hadoop 2.2. Hcatalog that has been packaged with Hive 0.12 is built for Hadoop 1. Is there a distribution of Hcatalog built for Hadoop 2 ? regards, sathwik -- Nitin Pawar
Re: Hcatalog for Hadoop2
here are the instructions https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-BuildingHivefromSource In the build command you can specify the hadoop version you want to use. On Fri, Dec 13, 2013 at 3:17 PM, Sathwik B P sath...@apache.org wrote: Hi Nithin, Where can I find the instructions to build from source. I would like to build the current hive trunk. regards, sathwik On Fri, Dec 13, 2013 at 2:22 PM, Nitin Pawar nitinpawar...@gmail.comwrote: you may have to build it yourself. On Fri, Dec 13, 2013 at 12:56 PM, Sathwik B P sath...@apache.org wrote: Hi, I have Hive 0.12 and Hadoop 2.2. Hcatalog that has been packaged with Hive 0.12 is built for Hadoop 1. Is there a distribution of Hcatalog built for Hadoop 2 ? regards, sathwik -- Nitin Pawar -- Nitin Pawar
Re: Hcatalog for Hadoop2
The wiki is probably no more valid as the build is based on maven now. Can you kindly provide the details for building the Hive Distrib. On Fri, Dec 13, 2013 at 3:18 PM, Nitin Pawar nitinpawar...@gmail.comwrote: here are the instructions https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-BuildingHivefromSource In the build command you can specify the hadoop version you want to use. On Fri, Dec 13, 2013 at 3:17 PM, Sathwik B P sath...@apache.org wrote: Hi Nithin, Where can I find the instructions to build from source. I would like to build the current hive trunk. regards, sathwik On Fri, Dec 13, 2013 at 2:22 PM, Nitin Pawar nitinpawar...@gmail.comwrote: you may have to build it yourself. On Fri, Dec 13, 2013 at 12:56 PM, Sathwik B P sath...@apache.orgwrote: Hi, I have Hive 0.12 and Hadoop 2.2. Hcatalog that has been packaged with Hive 0.12 is built for Hadoop 1. Is there a distribution of Hcatalog built for Hadoop 2 ? regards, sathwik -- Nitin Pawar -- Nitin Pawar
Re: Hcatalog for Hadoop2
sure. Once you checkout trunk, inside the pom there is profile called hadoop-2 which basically currently looks up to hadoop version of 2.2.0 If you want to change from that version then change the line in pom.xml ( hadoop-23.version2.2.0/hadoop-23.version) after that, you can just do a normal build mvn clean install -DskipTests -Phadoop-2 On Fri, Dec 13, 2013 at 4:25 PM, Sathwik B P sath...@apache.org wrote: The wiki is probably no more valid as the build is based on maven now. Can you kindly provide the details for building the Hive Distrib. On Fri, Dec 13, 2013 at 3:18 PM, Nitin Pawar nitinpawar...@gmail.comwrote: here are the instructions https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-BuildingHivefromSource In the build command you can specify the hadoop version you want to use. On Fri, Dec 13, 2013 at 3:17 PM, Sathwik B P sath...@apache.org wrote: Hi Nithin, Where can I find the instructions to build from source. I would like to build the current hive trunk. regards, sathwik On Fri, Dec 13, 2013 at 2:22 PM, Nitin Pawar nitinpawar...@gmail.comwrote: you may have to build it yourself. On Fri, Dec 13, 2013 at 12:56 PM, Sathwik B P sath...@apache.orgwrote: Hi, I have Hive 0.12 and Hadoop 2.2. Hcatalog that has been packaged with Hive 0.12 is built for Hadoop 1. Is there a distribution of Hcatalog built for Hadoop 2 ? regards, sathwik -- Nitin Pawar -- Nitin Pawar -- Nitin Pawar
Re: Hcatalog for Hadoop2
Found this document https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-Building On Fri, Dec 13, 2013 at 4:39 PM, Nitin Pawar nitinpawar...@gmail.comwrote: sure. Once you checkout trunk, inside the pom there is profile called hadoop-2 which basically currently looks up to hadoop version of 2.2.0 If you want to change from that version then change the line in pom.xml ( hadoop-23.version2.2.0/hadoop-23.version) after that, you can just do a normal build mvn clean install -DskipTests -Phadoop-2 On Fri, Dec 13, 2013 at 4:25 PM, Sathwik B P sath...@apache.org wrote: The wiki is probably no more valid as the build is based on maven now. Can you kindly provide the details for building the Hive Distrib. On Fri, Dec 13, 2013 at 3:18 PM, Nitin Pawar nitinpawar...@gmail.comwrote: here are the instructions https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-BuildingHivefromSource In the build command you can specify the hadoop version you want to use. On Fri, Dec 13, 2013 at 3:17 PM, Sathwik B P sath...@apache.org wrote: Hi Nithin, Where can I find the instructions to build from source. I would like to build the current hive trunk. regards, sathwik On Fri, Dec 13, 2013 at 2:22 PM, Nitin Pawar nitinpawar...@gmail.comwrote: you may have to build it yourself. On Fri, Dec 13, 2013 at 12:56 PM, Sathwik B P sath...@apache.orgwrote: Hi, I have Hive 0.12 and Hadoop 2.2. Hcatalog that has been packaged with Hive 0.12 is built for Hadoop 1. Is there a distribution of Hcatalog built for Hadoop 2 ? regards, sathwik -- Nitin Pawar -- Nitin Pawar -- Nitin Pawar
Re: handling joins in Hive 0.11.0
Hello, I found out that the dependency graph among task stages is incorrect for the skewed join optimized plan. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which has all these dependencies, but for the case the map join task is filtered out, all these dependencies are removed. Hence, all the other task stages of the query are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. I currently fixed the issue by adding dependencies among the ConditonalTask and all the child tasks of the common join task of the original plan. From the original design I see that only tasks included in the ConditionalTask are allowed to have dependencies, so I am wondering what shall be the alternative correct implementation? Maybe adding an nop task inside the ConditionalTask (in addition to the map join task), so that the dependencies are maintained for the case that the map join task is filtered out? Thanks, Adrian On 11/15/2013 10:20 PM, Adrian Popescu wrote: 2. In my experiments I also evaluate skewed joins. I enable skew joins through hive.optimize.skewjoin and I run the same tpch query 5. The skew join is not actually triggered as the number of rows with the same key is less than hive.skewjoin.key. Hence, the map join corresponding to the skewed join is filtered out at runtime, but unfortunately all the other stages are also filtered out. Thus, no result is actually generated. If I disable the skew join optimization, the query running only with common joins returns the result correctly. I believe this is a bug when the skew join operator is enabled but not triggered. Did anyone experienced the same problem with skew joins on queries of multiple map reduce joins? I attach the explain plan. Essentially only stage 6 and 22 are executed. Everything else is skipped silently with no output result being generated, nor error in hive.log. Similar behaviour is observed for other TPCH queries. Many thanks, Adrian -- Adrian
Question about running Hive on Tez
Hi, Excuse me. May I ask a question about running Hive on Tez? I've installed Hive on Tez, and running a simple query from hiveCli, hive set hive.optimize.tez=true; hive select * from table order by title_id limit 5; While, each time, I could see from the TezJobMonitor that, all the map jobs are done, but the reducer never get started, and the job is running forever there. I tried a number of times, and each time the same failure(job running hangs) happens again and again. Does anyone successfully running queries using Hive on Tez? Are there any tips or suggestions? Here is my job log: 13/12/13 20:57:31 INFO client.TezSession: Submitting dag to TezSession, sessionName=HIVE-365b35bc-2461-4e2f-83f9-8da1fa356a86, applicationId=application_1386896881353_0027 13/12/13 20:57:33 INFO client.TezSession: Submitted dag to TezSession, sessionName=HIVE-365b35bc-2461-4e2f-83f9-8da1fa356a86, applicationId=application_1386896881353_0027, dagId=dag_1386896881353_0027_1 13/12/13 20:57:33 INFO client.RMProxy: Connecting to ResourceManager at /10.183.195.180:9022 13/12/13 20:57:33 INFO log.PerfLogger: /PERFLOG method=TezSubmitDag start=1386968251250 end=1386968253338 duration=2088 from=org.apache.hadoop.hive.ql.exec.tez.TezTask 13/12/13 20:57:33 INFO tez.TezJobMonitor: 13/12/13 20:57:33 INFO log.PerfLogger: PERFLOG method=TezRunDag from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor 13/12/13 20:57:33 INFO log.PerfLogger: PERFLOG method=TezSubmitToRunningDag from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor 13/12/13 20:57:33 INFO log.PerfLogger: /PERFLOG method=TezSubmitToRunningDag start=1386968253341 end=1386968253402 duration=61 from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor Status: Running (application id: application_1386896881353_0027) 13/12/13 20:57:33 INFO tez.TezJobMonitor: Status: Running (application id: application_1386896881353_0027) 13/12/13 20:57:33 INFO log.PerfLogger: PERFLOG method=TezRunVertex.Reducer 2 from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor 13/12/13 20:57:33 INFO log.PerfLogger: PERFLOG method=TezRunVertex.Map 1 from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor Map 1: -/- Reducer 2: -/- 13/12/13 20:57:33 INFO tez.TezJobMonitor: Map 1: -/- Reducer 2: -/- Map 1: -/- Reducer 2: 0/1 13/12/13 20:57:33 INFO tez.TezJobMonitor: Map 1: -/- Reducer 2: 0/1 Map 1: 0/16 Reducer 2: 0/1 13/12/13 20:57:34 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1 Map 1: 0/16 Reducer 2: 0/1 13/12/13 20:57:37 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1 Map 1: 0/16 Reducer 2: 0/1 13/12/13 20:57:40 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1 Map 1: 0/16 Reducer 2: 0/1 13/12/13 20:57:43 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1 Map 1: 0/16 Reducer 2: 0/1 13/12/13 20:57:46 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1 Map 1: 0/16 Reducer 2: 0/1 13/12/13 20:57:49 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1 Map 1: 0/16 Reducer 2: 0/1 13/12/13 20:57:52 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1 Map 1: 0/16 Reducer 2: 0/1 13/12/13 20:57:55 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1 Map 1: 1/16 Reducer 2: 0/1 13/12/13 20:57:56 INFO tez.TezJobMonitor: Map 1: 1/16 Reducer 2: 0/1 Map 1: 2/16 Reducer 2: 0/1 13/12/13 20:57:58 INFO tez.TezJobMonitor: Map 1: 2/16 Reducer 2: 0/1 Map 1: 3/16 Reducer 2: 0/1 13/12/13 20:57:58 INFO tez.TezJobMonitor: Map 1: 3/16 Reducer 2: 0/1 Map 1: 5/16 Reducer 2: 0/1 13/12/13 20:57:59 INFO tez.TezJobMonitor: Map 1: 5/16 Reducer 2: 0/1 Map 1: 8/16 Reducer 2: 0/1 13/12/13 20:57:59 INFO tez.TezJobMonitor: Map 1: 8/16 Reducer 2: 0/1 Map 1: 12/16 Reducer 2: 0/1 13/12/13 20:57:59 INFO tez.TezJobMonitor: Map 1: 12/16 Reducer 2: 0/1 Map 1: 15/16 Reducer 2: 0/1 13/12/13 20:58:00 INFO tez.TezJobMonitor: Map 1: 15/16 Reducer 2: 0/1 13/12/13 20:58:00 INFO log.PerfLogger: /PERFLOG method=TezRunVertex.Map 1 start=1386968253402 end=1386968280223 duration=26821 from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor Map 1: 16/16 Reducer 2: 0/1 13/12/13 20:58:00 INFO tez.TezJobMonitor: Map 1: 16/16 Reducer 2: 0/1 Map 1: 16/16 Reducer 2: 0/1 13/12/13 20:58:03 INFO tez.TezJobMonitor: Map 1: 16/16 Reducer 2: 0/1 Map 1: 16/16 Reducer 2: 0/1 13/12/13 20:58:06 INFO tez.TezJobMonitor: Map 1: 16/16 Reducer 2: 0/1 Map 1: 16/16 Reducer 2: 0/1 13/12/13 20:58:09 INFO tez.TezJobMonitor: Map 1: 16/16 Reducer 2: 0/1 Map 1: 16/16 Reducer 2: 0/1 13/12/13 20:58:12 INFO tez.TezJobMonitor: Map 1: 16/16 Reducer 2: 0/1 Map 1: 16/16 Reducer 2: 0/1 13/12/13 20:58:15 INFO tez.TezJobMonitor: Map 1: 16/16 Reducer 2: 0/1 Map 1: 16/16 Reducer 2: 0/1 13/12/13 20:58:18 INFO tez.TezJobMonitor: Map 1: 16/16 Reducer 2: 0/1 Map 1: 16/16 Reducer 2: 0/1 13/12/13 20:58:21 INFO tez.TezJobMonitor: Map 1: 16/16 Reducer 2: 0/1 Map 1: 16/16 Reducer 2: 0/1 13/12/13 20:58:24 INFO tez.TezJobMonitor: Map 1: 16/16 Reducer 2: 0/1 Map 1: 16/16 Reducer 2: 0/1 13/12/13 20:58:27 INFO tez.TezJobMonitor: Map 1: 16/16 Reducer 2: 0/1 Map 1: 16/16 Reducer 2: 0/1 13/12/13 20:58:30 INFO
hive monitoring
Hi All, Could any one help me in identifying the data points for monitoring the hive server and metastore. Or any tool that could help. Saw tool name HAWK in slideshare, but could find any anywhere its source code has been shared. Thanks Biswajit -- _ The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
Re: Question about running Hive on Tez
dev on bcc Zhenxiao, Cool you got it set up. The query runs a full order by before the limit - are you sure it's not just still running? Hive on Tez prints total tasks/completed tasks, so no update may mean none of the reduce tasks have finished yet. If not, it'd be great to see the yarn logs (yarn logs -applicationId) and get more info about the table you're using (size, file format, etc). If the logs are really big you might want to consider opening/attaching them to a jira (issues.apache.org) (or send them directly to me). There are a bunch of settings that might be of interest to you (in general not just for this query) - I've attached a text doc with some details. Thanks, Gunther. On Fri, Dec 13, 2013 at 1:12 PM, Zhenxiao Luo z...@netflix.com wrote: Hi, Excuse me. May I ask a question about running Hive on Tez? I've installed Hive on Tez, and running a simple query from hiveCli, hive set hive.optimize.tez=true; hive select * from table order by title_id limit 5; While, each time, I could see from the TezJobMonitor that, all the map jobs are done, but the reducer never get started, and the job is running forever there. I tried a number of times, and each time the same failure(job running hangs) happens again and again. Does anyone successfully running queries using Hive on Tez? Are there any tips or suggestions? Here is my job log: 13/12/13 20:57:31 INFO client.TezSession: Submitting dag to TezSession, sessionName=HIVE-365b35bc-2461-4e2f-83f9-8da1fa356a86, applicationId=application_1386896881353_0027 13/12/13 20:57:33 INFO client.TezSession: Submitted dag to TezSession, sessionName=HIVE-365b35bc-2461-4e2f-83f9-8da1fa356a86, applicationId=application_1386896881353_0027, dagId=dag_1386896881353_0027_1 13/12/13 20:57:33 INFO client.RMProxy: Connecting to ResourceManager at /10.183.195.180:9022 13/12/13 20:57:33 INFO log.PerfLogger: /PERFLOG method=TezSubmitDag start=1386968251250 end=1386968253338 duration=2088 from=org.apache.hadoop.hive.ql.exec.tez.TezTask 13/12/13 20:57:33 INFO tez.TezJobMonitor: 13/12/13 20:57:33 INFO log.PerfLogger: PERFLOG method=TezRunDag from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor 13/12/13 20:57:33 INFO log.PerfLogger: PERFLOG method=TezSubmitToRunningDag from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor 13/12/13 20:57:33 INFO log.PerfLogger: /PERFLOG method=TezSubmitToRunningDag start=1386968253341 end=1386968253402 duration=61 from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor Status: Running (application id: application_1386896881353_0027) 13/12/13 20:57:33 INFO tez.TezJobMonitor: Status: Running (application id: application_1386896881353_0027) 13/12/13 20:57:33 INFO log.PerfLogger: PERFLOG method=TezRunVertex.Reducer 2 from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor 13/12/13 20:57:33 INFO log.PerfLogger: PERFLOG method=TezRunVertex.Map 1 from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor Map 1: -/- Reducer 2: -/- 13/12/13 20:57:33 INFO tez.TezJobMonitor: Map 1: -/- Reducer 2: -/- Map 1: -/- Reducer 2: 0/1 13/12/13 20:57:33 INFO tez.TezJobMonitor: Map 1: -/- Reducer 2: 0/1 Map 1: 0/16 Reducer 2: 0/1 13/12/13 20:57:34 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1 Map 1: 0/16 Reducer 2: 0/1 13/12/13 20:57:37 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1 Map 1: 0/16 Reducer 2: 0/1 13/12/13 20:57:40 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1 Map 1: 0/16 Reducer 2: 0/1 13/12/13 20:57:43 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1 Map 1: 0/16 Reducer 2: 0/1 13/12/13 20:57:46 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1 Map 1: 0/16 Reducer 2: 0/1 13/12/13 20:57:49 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1 Map 1: 0/16 Reducer 2: 0/1 13/12/13 20:57:52 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1 Map 1: 0/16 Reducer 2: 0/1 13/12/13 20:57:55 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1 Map 1: 1/16 Reducer 2: 0/1 13/12/13 20:57:56 INFO tez.TezJobMonitor: Map 1: 1/16 Reducer 2: 0/1 Map 1: 2/16 Reducer 2: 0/1 13/12/13 20:57:58 INFO tez.TezJobMonitor: Map 1: 2/16 Reducer 2: 0/1 Map 1: 3/16 Reducer 2: 0/1 13/12/13 20:57:58 INFO tez.TezJobMonitor: Map 1: 3/16 Reducer 2: 0/1 Map 1: 5/16 Reducer 2: 0/1 13/12/13 20:57:59 INFO tez.TezJobMonitor: Map 1: 5/16 Reducer 2: 0/1 Map 1: 8/16 Reducer 2: 0/1 13/12/13 20:57:59 INFO tez.TezJobMonitor: Map 1: 8/16 Reducer 2: 0/1 Map 1: 12/16 Reducer 2: 0/1 13/12/13 20:57:59 INFO tez.TezJobMonitor: Map 1: 12/16 Reducer 2: 0/1 Map 1: 15/16 Reducer 2: 0/1 13/12/13 20:58:00 INFO tez.TezJobMonitor: Map 1: 15/16 Reducer 2: 0/1 13/12/13 20:58:00 INFO log.PerfLogger: /PERFLOG method=TezRunVertex.Map 1 start=1386968253402 end=1386968280223 duration=26821 from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor Map 1: 16/16 Reducer 2: 0/1 13/12/13 20:58:00 INFO tez.TezJobMonitor: Map 1: 16/16 Reducer 2: 0/1 Map 1: 16/16 Reducer 2: 0/1 13/12/13 20:58:03 INFO tez.TezJobMonitor: Map 1: 16/16