Re: Hcatalog for Hadoop2

2013-12-13 Thread Nitin Pawar
you may have to build it yourself.


On Fri, Dec 13, 2013 at 12:56 PM, Sathwik B P sath...@apache.org wrote:

 Hi,

 I have Hive 0.12 and Hadoop 2.2. Hcatalog that has been packaged with Hive
 0.12 is built for Hadoop 1.

 Is there a distribution of Hcatalog built for Hadoop 2 ?

 regards,
 sathwik




-- 
Nitin Pawar


Re: Hcatalog for Hadoop2

2013-12-13 Thread Sathwik B P
Hi Nithin,

Where can I find the instructions to build from source. I would like to
build the current hive trunk.

regards,
sathwik


On Fri, Dec 13, 2013 at 2:22 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 you may have to build it yourself.


 On Fri, Dec 13, 2013 at 12:56 PM, Sathwik B P sath...@apache.org wrote:

 Hi,

 I have Hive 0.12 and Hadoop 2.2. Hcatalog that has been packaged with
 Hive 0.12 is built for Hadoop 1.

 Is there a distribution of Hcatalog built for Hadoop 2 ?

 regards,
 sathwik




 --
 Nitin Pawar



Re: Hcatalog for Hadoop2

2013-12-13 Thread Nitin Pawar
here are the instructions
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-BuildingHivefromSource


In the build command you can specify the hadoop version you want to use.


On Fri, Dec 13, 2013 at 3:17 PM, Sathwik B P sath...@apache.org wrote:

 Hi Nithin,

 Where can I find the instructions to build from source. I would like to
 build the current hive trunk.

 regards,
 sathwik


 On Fri, Dec 13, 2013 at 2:22 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 you may have to build it yourself.


 On Fri, Dec 13, 2013 at 12:56 PM, Sathwik B P sath...@apache.org wrote:

 Hi,

 I have Hive 0.12 and Hadoop 2.2. Hcatalog that has been packaged with
 Hive 0.12 is built for Hadoop 1.

 Is there a distribution of Hcatalog built for Hadoop 2 ?

 regards,
 sathwik




 --
 Nitin Pawar





-- 
Nitin Pawar


Re: Hcatalog for Hadoop2

2013-12-13 Thread Sathwik B P
The wiki is probably no more valid as the build is based on maven now.
Can you kindly provide the details for building the Hive Distrib.


On Fri, Dec 13, 2013 at 3:18 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 here are the instructions

 https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-BuildingHivefromSource


 In the build command you can specify the hadoop version you want to use.


 On Fri, Dec 13, 2013 at 3:17 PM, Sathwik B P sath...@apache.org wrote:

 Hi Nithin,

 Where can I find the instructions to build from source. I would like to
 build the current hive trunk.

 regards,
 sathwik


 On Fri, Dec 13, 2013 at 2:22 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 you may have to build it yourself.


 On Fri, Dec 13, 2013 at 12:56 PM, Sathwik B P sath...@apache.orgwrote:

 Hi,

 I have Hive 0.12 and Hadoop 2.2. Hcatalog that has been packaged with
 Hive 0.12 is built for Hadoop 1.

 Is there a distribution of Hcatalog built for Hadoop 2 ?

 regards,
 sathwik




 --
 Nitin Pawar





 --
 Nitin Pawar



Re: Hcatalog for Hadoop2

2013-12-13 Thread Nitin Pawar
sure.

Once you checkout trunk, inside the pom there is profile called hadoop-2
which basically currently looks up to hadoop version of 2.2.0

If you want to change from that version then change the line in pom.xml (
 hadoop-23.version2.2.0/hadoop-23.version)

after that, you can just do a normal build
mvn clean install -DskipTests -Phadoop-2


On Fri, Dec 13, 2013 at 4:25 PM, Sathwik B P sath...@apache.org wrote:

 The wiki is probably no more valid as the build is based on maven now.
 Can you kindly provide the details for building the Hive Distrib.


 On Fri, Dec 13, 2013 at 3:18 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 here are the instructions

 https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-BuildingHivefromSource


 In the build command you can specify the hadoop version you want to use.


 On Fri, Dec 13, 2013 at 3:17 PM, Sathwik B P sath...@apache.org wrote:

 Hi Nithin,

 Where can I find the instructions to build from source. I would like to
 build the current hive trunk.

 regards,
 sathwik


 On Fri, Dec 13, 2013 at 2:22 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 you may have to build it yourself.


 On Fri, Dec 13, 2013 at 12:56 PM, Sathwik B P sath...@apache.orgwrote:

 Hi,

 I have Hive 0.12 and Hadoop 2.2. Hcatalog that has been packaged with
 Hive 0.12 is built for Hadoop 1.

 Is there a distribution of Hcatalog built for Hadoop 2 ?

 regards,
 sathwik




 --
 Nitin Pawar





 --
 Nitin Pawar





-- 
Nitin Pawar


Re: Hcatalog for Hadoop2

2013-12-13 Thread Sathwik B P
Found this document
https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-Building



On Fri, Dec 13, 2013 at 4:39 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 sure.

 Once you checkout trunk, inside the pom there is profile called hadoop-2
 which basically currently looks up to hadoop version of 2.2.0

 If you want to change from that version then change the line in pom.xml (
hadoop-23.version2.2.0/hadoop-23.version)

 after that, you can just do a normal build
   mvn clean install -DskipTests -Phadoop-2


 On Fri, Dec 13, 2013 at 4:25 PM, Sathwik B P sath...@apache.org wrote:

 The wiki is probably no more valid as the build is based on maven now.
 Can you kindly provide the details for building the Hive Distrib.


 On Fri, Dec 13, 2013 at 3:18 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 here are the instructions

 https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-BuildingHivefromSource


 In the build command you can specify the hadoop version you want to use.


 On Fri, Dec 13, 2013 at 3:17 PM, Sathwik B P sath...@apache.org wrote:

 Hi Nithin,

 Where can I find the instructions to build from source. I would like to
 build the current hive trunk.

 regards,
 sathwik


 On Fri, Dec 13, 2013 at 2:22 PM, Nitin Pawar 
 nitinpawar...@gmail.comwrote:

 you may have to build it yourself.


 On Fri, Dec 13, 2013 at 12:56 PM, Sathwik B P sath...@apache.orgwrote:

 Hi,

 I have Hive 0.12 and Hadoop 2.2. Hcatalog that has been packaged with
 Hive 0.12 is built for Hadoop 1.

 Is there a distribution of Hcatalog built for Hadoop 2 ?

 regards,
 sathwik




 --
 Nitin Pawar





 --
 Nitin Pawar





 --
 Nitin Pawar



Re: handling joins in Hive 0.11.0

2013-12-13 Thread Adrian Popescu


Hello,

I found out that the dependency graph among task stages is incorrect for 
the skewed join optimized plan.


In particular, the conditional task in the optimized plan maintains no 
dependency with the child tasks
of the common join task in the original plan. The conditional task is 
composed of the map join task which
has all these dependencies, but for the case the map join task is 
filtered out, all these dependencies are removed.

Hence, all the other task stages of the query are skipped.

The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, 
processSkewJoin() function,
immediately after the ConditionalTask is created and its dependencies 
are set.


I currently fixed the issue by adding dependencies among the 
ConditonalTask and all the child tasks of the common

join task of the original plan.

From the original design I see that only tasks included in the 
ConditionalTask are allowed to have dependencies,
so I am wondering what shall be the alternative correct implementation? 
Maybe adding an nop task inside the
ConditionalTask (in addition to the map join task), so that the 
dependencies are maintained for the case that the

map join task is filtered out?

Thanks,
Adrian



On 11/15/2013 10:20 PM, Adrian Popescu wrote:


2. In my experiments I also evaluate skewed joins. I enable skew joins 
through hive.optimize.skewjoin and I run the same
tpch query 5. The skew join is not actually triggered as the number of 
rows with the same key is less than hive.skewjoin.key.
Hence, the map join corresponding to the skewed join  is filtered out 
at runtime, but unfortunately all the other stages
are also filtered out. Thus, no result is actually generated. If I 
disable the skew join optimization, the query running only with

common joins returns the result correctly.

I believe this is a bug when the skew join operator is enabled but not 
triggered. Did anyone experienced the same problem with
skew joins on queries of multiple map reduce joins? I attach the 
explain plan. Essentially only stage 6 and 22 are executed.
Everything else is skipped silently with no output result being 
generated, nor error in hive.log. Similar behaviour is observed

for other TPCH queries.

Many thanks,
Adrian





--
Adrian



Question about running Hive on Tez

2013-12-13 Thread Zhenxiao Luo
Hi,

Excuse me. May I ask a question about running Hive on Tez?

I've installed Hive on Tez, and running a simple query from hiveCli,

hive set hive.optimize.tez=true;
hive select * from table order by title_id limit 5;

While, each time, I could see from the TezJobMonitor that, all the map
jobs are done, but the reducer never get started, and the job is
running forever there.

I tried a number of times, and each time the same failure(job running
hangs) happens again and again.
Does anyone successfully running queries using Hive on Tez? Are there
any tips or suggestions?

Here is my job log:

13/12/13 20:57:31 INFO client.TezSession: Submitting dag to
TezSession, sessionName=HIVE-365b35bc-2461-4e2f-83f9-8da1fa356a86,
applicationId=application_1386896881353_0027
13/12/13 20:57:33 INFO client.TezSession: Submitted dag to TezSession,
sessionName=HIVE-365b35bc-2461-4e2f-83f9-8da1fa356a86,
applicationId=application_1386896881353_0027,
dagId=dag_1386896881353_0027_1
13/12/13 20:57:33 INFO client.RMProxy: Connecting to ResourceManager
at /10.183.195.180:9022
13/12/13 20:57:33 INFO log.PerfLogger: /PERFLOG method=TezSubmitDag
start=1386968251250 end=1386968253338 duration=2088
from=org.apache.hadoop.hive.ql.exec.tez.TezTask


13/12/13 20:57:33 INFO tez.TezJobMonitor:

13/12/13 20:57:33 INFO log.PerfLogger: PERFLOG method=TezRunDag
from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor
13/12/13 20:57:33 INFO log.PerfLogger: PERFLOG
method=TezSubmitToRunningDag
from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor
13/12/13 20:57:33 INFO log.PerfLogger: /PERFLOG
method=TezSubmitToRunningDag start=1386968253341 end=1386968253402
duration=61 from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor
Status: Running (application id: application_1386896881353_0027)

13/12/13 20:57:33 INFO tez.TezJobMonitor: Status: Running (application
id: application_1386896881353_0027)

13/12/13 20:57:33 INFO log.PerfLogger: PERFLOG
method=TezRunVertex.Reducer 2
from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor
13/12/13 20:57:33 INFO log.PerfLogger: PERFLOG
method=TezRunVertex.Map 1
from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor
Map 1: -/- Reducer 2: -/-
13/12/13 20:57:33 INFO tez.TezJobMonitor: Map 1: -/- Reducer 2: -/-
Map 1: -/- Reducer 2: 0/1
13/12/13 20:57:33 INFO tez.TezJobMonitor: Map 1: -/- Reducer 2: 0/1
Map 1: 0/16 Reducer 2: 0/1
13/12/13 20:57:34 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1
Map 1: 0/16 Reducer 2: 0/1
13/12/13 20:57:37 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1
Map 1: 0/16 Reducer 2: 0/1
13/12/13 20:57:40 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1
Map 1: 0/16 Reducer 2: 0/1
13/12/13 20:57:43 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1
Map 1: 0/16 Reducer 2: 0/1
13/12/13 20:57:46 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1
Map 1: 0/16 Reducer 2: 0/1
13/12/13 20:57:49 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1
Map 1: 0/16 Reducer 2: 0/1
13/12/13 20:57:52 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1
Map 1: 0/16 Reducer 2: 0/1
13/12/13 20:57:55 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1
Map 1: 1/16 Reducer 2: 0/1
13/12/13 20:57:56 INFO tez.TezJobMonitor: Map 1: 1/16 Reducer 2: 0/1
Map 1: 2/16 Reducer 2: 0/1
13/12/13 20:57:58 INFO tez.TezJobMonitor: Map 1: 2/16 Reducer 2: 0/1
Map 1: 3/16 Reducer 2: 0/1
13/12/13 20:57:58 INFO tez.TezJobMonitor: Map 1: 3/16 Reducer 2: 0/1
Map 1: 5/16 Reducer 2: 0/1
13/12/13 20:57:59 INFO tez.TezJobMonitor: Map 1: 5/16 Reducer 2: 0/1
Map 1: 8/16 Reducer 2: 0/1
13/12/13 20:57:59 INFO tez.TezJobMonitor: Map 1: 8/16 Reducer 2: 0/1
Map 1: 12/16 Reducer 2: 0/1
13/12/13 20:57:59 INFO tez.TezJobMonitor: Map 1: 12/16 Reducer 2: 0/1
Map 1: 15/16 Reducer 2: 0/1
13/12/13 20:58:00 INFO tez.TezJobMonitor: Map 1: 15/16 Reducer 2: 0/1
13/12/13 20:58:00 INFO log.PerfLogger: /PERFLOG
method=TezRunVertex.Map 1 start=1386968253402 end=1386968280223
duration=26821 from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor
Map 1: 16/16 Reducer 2: 0/1
13/12/13 20:58:00 INFO tez.TezJobMonitor: Map 1: 16/16 Reducer 2: 0/1
Map 1: 16/16 Reducer 2: 0/1
13/12/13 20:58:03 INFO tez.TezJobMonitor: Map 1: 16/16 Reducer 2: 0/1
Map 1: 16/16 Reducer 2: 0/1
13/12/13 20:58:06 INFO tez.TezJobMonitor: Map 1: 16/16 Reducer 2: 0/1
Map 1: 16/16 Reducer 2: 0/1
13/12/13 20:58:09 INFO tez.TezJobMonitor: Map 1: 16/16 Reducer 2: 0/1
Map 1: 16/16 Reducer 2: 0/1
13/12/13 20:58:12 INFO tez.TezJobMonitor: Map 1: 16/16 Reducer 2: 0/1
Map 1: 16/16 Reducer 2: 0/1
13/12/13 20:58:15 INFO tez.TezJobMonitor: Map 1: 16/16 Reducer 2: 0/1
Map 1: 16/16 Reducer 2: 0/1
13/12/13 20:58:18 INFO tez.TezJobMonitor: Map 1: 16/16 Reducer 2: 0/1
Map 1: 16/16 Reducer 2: 0/1
13/12/13 20:58:21 INFO tez.TezJobMonitor: Map 1: 16/16 Reducer 2: 0/1
Map 1: 16/16 Reducer 2: 0/1
13/12/13 20:58:24 INFO tez.TezJobMonitor: Map 1: 16/16 Reducer 2: 0/1
Map 1: 16/16 Reducer 2: 0/1
13/12/13 20:58:27 INFO tez.TezJobMonitor: Map 1: 16/16 Reducer 2: 0/1
Map 1: 16/16 Reducer 2: 0/1
13/12/13 20:58:30 INFO 

hive monitoring

2013-12-13 Thread Biswajit Nayak
Hi All,

Could any one help me in identifying the data points for monitoring the
hive server and metastore. Or any tool that could help. Saw tool name
HAWK in slideshare, but could find any anywhere its source code has been
shared.

Thanks
Biswajit

-- 
_
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.


Re: Question about running Hive on Tez

2013-12-13 Thread Gunther Hagleitner
dev on bcc

Zhenxiao,

Cool you got it set up.

The query runs a full order by before the limit - are you sure it's not
just still running? Hive on Tez prints total tasks/completed tasks, so no
update may mean none of the reduce tasks have finished yet.

If not, it'd be great to see the yarn logs (yarn logs -applicationId) and
get more info about the table you're using (size, file format, etc). If the
logs are really big you might want to consider opening/attaching them to a
jira (issues.apache.org) (or send them directly to me).

There are a bunch of settings that might be of interest to you (in general
not just for this query) - I've attached a text doc with some details.

Thanks,
Gunther.




On Fri, Dec 13, 2013 at 1:12 PM, Zhenxiao Luo z...@netflix.com wrote:

 Hi,

 Excuse me. May I ask a question about running Hive on Tez?

 I've installed Hive on Tez, and running a simple query from hiveCli,

 hive set hive.optimize.tez=true;
 hive select * from table order by title_id limit 5;

 While, each time, I could see from the TezJobMonitor that, all the map
 jobs are done, but the reducer never get started, and the job is
 running forever there.

 I tried a number of times, and each time the same failure(job running
 hangs) happens again and again.
 Does anyone successfully running queries using Hive on Tez? Are there
 any tips or suggestions?

 Here is my job log:

 13/12/13 20:57:31 INFO client.TezSession: Submitting dag to
 TezSession, sessionName=HIVE-365b35bc-2461-4e2f-83f9-8da1fa356a86,
 applicationId=application_1386896881353_0027
 13/12/13 20:57:33 INFO client.TezSession: Submitted dag to TezSession,
 sessionName=HIVE-365b35bc-2461-4e2f-83f9-8da1fa356a86,
 applicationId=application_1386896881353_0027,
 dagId=dag_1386896881353_0027_1
 13/12/13 20:57:33 INFO client.RMProxy: Connecting to ResourceManager
 at /10.183.195.180:9022
 13/12/13 20:57:33 INFO log.PerfLogger: /PERFLOG method=TezSubmitDag
 start=1386968251250 end=1386968253338 duration=2088
 from=org.apache.hadoop.hive.ql.exec.tez.TezTask


 13/12/13 20:57:33 INFO tez.TezJobMonitor:

 13/12/13 20:57:33 INFO log.PerfLogger: PERFLOG method=TezRunDag
 from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor
 13/12/13 20:57:33 INFO log.PerfLogger: PERFLOG
 method=TezSubmitToRunningDag
 from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor
 13/12/13 20:57:33 INFO log.PerfLogger: /PERFLOG
 method=TezSubmitToRunningDag start=1386968253341 end=1386968253402
 duration=61 from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor
 Status: Running (application id: application_1386896881353_0027)

 13/12/13 20:57:33 INFO tez.TezJobMonitor: Status: Running (application
 id: application_1386896881353_0027)

 13/12/13 20:57:33 INFO log.PerfLogger: PERFLOG
 method=TezRunVertex.Reducer 2
 from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor
 13/12/13 20:57:33 INFO log.PerfLogger: PERFLOG
 method=TezRunVertex.Map 1
 from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor
 Map 1: -/- Reducer 2: -/-
 13/12/13 20:57:33 INFO tez.TezJobMonitor: Map 1: -/- Reducer 2: -/-
 Map 1: -/- Reducer 2: 0/1
 13/12/13 20:57:33 INFO tez.TezJobMonitor: Map 1: -/- Reducer 2: 0/1
 Map 1: 0/16 Reducer 2: 0/1
 13/12/13 20:57:34 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1
 Map 1: 0/16 Reducer 2: 0/1
 13/12/13 20:57:37 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1
 Map 1: 0/16 Reducer 2: 0/1
 13/12/13 20:57:40 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1
 Map 1: 0/16 Reducer 2: 0/1
 13/12/13 20:57:43 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1
 Map 1: 0/16 Reducer 2: 0/1
 13/12/13 20:57:46 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1
 Map 1: 0/16 Reducer 2: 0/1
 13/12/13 20:57:49 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1
 Map 1: 0/16 Reducer 2: 0/1
 13/12/13 20:57:52 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1
 Map 1: 0/16 Reducer 2: 0/1
 13/12/13 20:57:55 INFO tez.TezJobMonitor: Map 1: 0/16 Reducer 2: 0/1
 Map 1: 1/16 Reducer 2: 0/1
 13/12/13 20:57:56 INFO tez.TezJobMonitor: Map 1: 1/16 Reducer 2: 0/1
 Map 1: 2/16 Reducer 2: 0/1
 13/12/13 20:57:58 INFO tez.TezJobMonitor: Map 1: 2/16 Reducer 2: 0/1
 Map 1: 3/16 Reducer 2: 0/1
 13/12/13 20:57:58 INFO tez.TezJobMonitor: Map 1: 3/16 Reducer 2: 0/1
 Map 1: 5/16 Reducer 2: 0/1
 13/12/13 20:57:59 INFO tez.TezJobMonitor: Map 1: 5/16 Reducer 2: 0/1
 Map 1: 8/16 Reducer 2: 0/1
 13/12/13 20:57:59 INFO tez.TezJobMonitor: Map 1: 8/16 Reducer 2: 0/1
 Map 1: 12/16 Reducer 2: 0/1
 13/12/13 20:57:59 INFO tez.TezJobMonitor: Map 1: 12/16 Reducer 2: 0/1
 Map 1: 15/16 Reducer 2: 0/1
 13/12/13 20:58:00 INFO tez.TezJobMonitor: Map 1: 15/16 Reducer 2: 0/1
 13/12/13 20:58:00 INFO log.PerfLogger: /PERFLOG
 method=TezRunVertex.Map 1 start=1386968253402 end=1386968280223
 duration=26821 from=org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor
 Map 1: 16/16 Reducer 2: 0/1
 13/12/13 20:58:00 INFO tez.TezJobMonitor: Map 1: 16/16 Reducer 2: 0/1
 Map 1: 16/16 Reducer 2: 0/1
 13/12/13 20:58:03 INFO tez.TezJobMonitor: Map 1: 16/16