Re: java.lang.NoClassDefFoundError: com/jayway/jsonpath/PathUtil

2013-03-10 Thread john smith
Hi,

It clearly is  a classpath issue!  When you do a select * from tab , it
works because HIve just fetches the data from HDFS using a FetchTask and
doesn't start any MR job (It probably uses json jar in your local hive lib
directory to deserialize and limit the rows to 5 and hence not raising any
error).

However the error you are facing in the 2nd query involving an MR job is
because mapper (on some remote machine which doesn't have json jar in its
class path) is unable to locate  your json jar ! Did u try doing what Dean
suggested?

Thanks

On Sun, Mar 10, 2013 at 1:49 PM, Sai Sai saigr...@yahoo.in wrote:

 Just wondering if anyone has any suggestions:

 This executes successfully:

 hive select * from twitter limit 5;

 This does not work:

 hive select tweet_id from twitter limit 5; // I have given the exception
 info below:

 Here is the output of this:

 hive select * from twitter limit 5;
 OK

 tweet_idcreated_attextuser_iduser_screen_nameuser_lang
 122106088022745088Fri Oct 07 00:28:54 + 2011wkwkw -_- ayo saja
 mba RT @yullyunet: Sepupuuu, kita lanjalan yok.. Kita karokoe-an.. Ajak mas
 galih jg kalo dia mau.. @Dindnf: doremifas124735434Dindnfen
 122106088018558976Fri Oct 07 00:28:54 + 2011@egg486 특별히
 준비했습니다!252828803CocaCola_Koreako
 122106088026939392Fri Oct 07 00:28:54 + 2011My offer of free
 gobbies for all if @amityaffliction play Blair snitch project still
 stands.168590073SarahYoungBlooden
 122106088035328001Fri Oct 07 00:28:54 + 2011the girl nxt to me
 in the lib got her headphones in dancing and singing loud af like she the
 only one here haha267296295MONEYyDREAMS_en
 122106088005971968Fri Oct 07 00:28:54 + 2011@KUnYoong_B2UTY
 Bị lsao đấy269182160b2st_b2utyhpen
 Time taken: 0.154 seconds

 This does not work:

 hive select tweet_id from twitter limit 5;


 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks is set to 0 since there's no reduce operator
 Starting Job = job_201303050432_0094, Tracking URL =
 http://ubuntu:50030/jobdetails.jsp?jobid=job_201303050432_0094
 Kill Command = /home/satish/work/hadoop-1.0.4/libexec/../bin/hadoop job
 -kill job_201303050432_0094
 Hadoop job information for Stage-1: number of mappers: 1; number of
 reducers: 0
 2013-03-10 00:14:44,509 Stage-1 map = 0%,  reduce = 0%
 2013-03-10 00:15:14,613 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201303050432_0094 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL:
 http://ubuntu:50030/jobdetails.jsp?jobid=job_201303050432_0094
 Examining task ID: task_201303050432_0094_m_02 (and more) from job
 job_201303050432_0094

 Task with the most failures(4):
 -
 Task ID:
   task_201303050432_0094_m_00

 URL:

 http://ubuntu:50030/taskdetails.jsp?jobid=job_201303050432_0094tipid=task_201303050432_0094_m_00
 -
 Diagnostic Messages for this Task:
 java.lang.RuntimeException: Error in configuring object
 at
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:416)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.reflect.InvocationTargetException

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 9 more
 Caused by: java.lang.RuntimeException: Error in configuring object
 at
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
 ... 14 more
 Caused by: java.lang.reflect.InvocationTargetException

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at 

Re: Hive Reducers hanging - interesting problem - skew ?

2011-12-06 Thread john smith
Hi Mark,

Thanks for your response. I tried skew optimization and I also saw the
video by Lin and Namit. From what I understand about skew join, instead of
a single go , they divide it into 2 stages.

Stage1
Join non-skew pairs. and write the skew pairs into temporary files on HDFS.

Stage 2
Do a Map-Join of the files by copying smaller file into mappers of larger
file.

I have a doubt here. How can they be so sure that MapJoin works in stage 2?
The files can be so large that they donot fit into the memory and join is
impossible. Am I wrong?

I also ran the query with skew optimized  and as expected, none of the the
pairs got joined in  the stage 1 and all of them got written into the HDFS.
(They are huge)

Now in the stage2 , Hive is trying to perform a map-join on these large
tables and my Map phase in stage 2 is stuck at 0.13% after 6 hours and 2 of
my machines went down. I had to kill the job finally.

The size of each table is just 2GB which is way smaller than what Hadoop
eco system can handle.

So is there anyway I can join these tables in Hive? Any thoughts ?


Thanks,
jS



On Tue, Dec 6, 2011 at 3:39 AM, Mark Grover mgro...@oanda.com wrote:

 jS,
 Check out if this helps:

 http://search-hadoop.com/m/l1usr1MAHX32subj=Re+Severely+hit+by+curse+of+last+reducer+



 Mark Grover, Business Intelligence Analyst
 OANDA Corporation

 www: oanda.com www: fxtrade.com
 e: mgro...@oanda.com

 Best Trading Platform - World Finance's Forex Awards 2009.
 The One to Watch - Treasury Today's Adam Smith Awards 2009.


 - Original Message -
 From: john smith js1987.sm...@gmail.com
 To: user@hive.apache.org
 Sent: Monday, December 5, 2011 4:38:14 PM
 Subject: Hive Reducers hanging - interesting problem - skew ?

 Hi list,

 I am trying to run a Join query on my 10 node cluster. My query looks as
 follows

 select * from A JOIN B on (A.a = B.b)

 size of A = 15 million rows
 size of B = 1 million rows

 The problem is A.a and B.b has around 25-30 distinct values per column
 which implies that they have high selectivities and the reducers are bulky.

 However the performance hit is so horrible that , ALL my reducers hang @
 75% for 6 hours and doesn't move further.

 The only thing that log shows up is Join operator - forwarding rows
 ---Huge number kinds of logs for all this long. What does
 this mean ?
 There is no swapping happening and the CPU % is constantly around 40% for
 all this time (observed through Ganglia) .

 Any way I can solve this problem? Can anyone help me with this?

 Thanks,
 jS





Attaching YourKit profiler with Hive

2011-12-04 Thread john smith
Hi folks,

Can we get a shared license key for yourkit and use it with Hive project?
The wiki page has no information about this. Can any dev help me in this
regard?

Thanks,
jS


Profiling Hive / Metrics

2011-11-16 Thread john smith
Hey devs,

My Hive reducers are running for too long. I wan't to profile Hive and
collect metrics so as to find where most of the time is spent in execution.
Can any one tell me where to start ? Are any profilers attached to Hive by
default?

Any help is appreciated.

Thanks,
jS


Re: hive runs slowly

2011-10-21 Thread john smith
Hi list,

I am also facing the same problem. My reducers hang at this position and it
takes hours to complete a single reduce task. Can any hive guru help us out
with this issue.

Thanks,
jS

2011/10/21 bangbig lizhongliangg...@163.com

 HI all,

 HIVE runs too slowly when it is doing such things(see the log below), what's 
 the problem? because I'm joining two large table?

 it runs pretty fast at first. when the job finishes 95%, it begins to slow 
 down.

 --

 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 104400 rows
 2011-10-21 16:55:57,427 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarding 104500 rows
 2011-10-21 16:55:57,545 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarding 104600 rows
 2011-10-21 16:55:57,686 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarding 104700 rows
 2011-10-21 16:55:57,806 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarding 104800 rows
 2011-10-21 16:55:57,926 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarding 104900 rows
 2011-10-21 16:55:58,045 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarding 105000 rows
 2011-10-21 16:55:58,164 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarding 105100 rows
 2011-10-21 16:55:58,284 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarding 105200 rows
 2011-10-21 16:55:58,405 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarding 105300 rows
 2011-10-21 16:55:58,525 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarding 105400 rows
 2011-10-21 16:55:58,644 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarding 105500 rows
 2011-10-21 16:55:58,764 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarding 105600 rows
 2011-10-21 16:55:58,883 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarding 105700 rows
 2011-10-21 16:55:59,003 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarding 105800 rows
 2011-10-21 16:55:59,122 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarding 105900 rows
 2011-10-21 16:55:59,242 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarding 106000 rows
 2011-10-21 16:55:59,361 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarding 106100 rows
 2011-10-21 16:55:59,482 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarding 106200 rows
 2011-10-21 16:55:59,601 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
 forwarding 106300 rows






Re: hive runs slowly

2011-10-21 Thread john smith
You mean  select a,b from a inner join b on (a.id=b.id) ? or Does those
brackets make some difference? Because the inner keyword is no where
mentioned in the language manual
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins

Any hints?




On Fri, Oct 21, 2011 at 8:47 PM, Edward Capriolo edlinuxg...@gmail.comwrote:



 On Fri, Oct 21, 2011 at 10:21 AM, john smith js1987.sm...@gmail.comwrote:

 Hi Edward,

 Thanks for replying. I have been using the query

 select a,b from a,b where a.id=b.id .  According to my knowledge of
 Hive, it reads data of both A and B and emits join_key,rowid/required row
 data pairs as map outputs and then performs cartesian joins on reduce side
 for the same join_keys .

 Is this the cartesian join you are referring to? or Is it the cartesian
 product of the total table (as in sql) ? or Am I missing something?

 Can you please throw some light on the functionality of mapred.mode=strict
 ?

 Thanks,
 jS

 On Fri, Oct 21, 2011 at 7:29 PM, Edward Capriolo 
 edlinuxg...@gmail.comwrote:



 On Fri, Oct 21, 2011 at 9:22 AM, john smith js1987.sm...@gmail.comwrote:

 Hi list,

 I am also facing the same problem. My reducers hang at this position and
 it takes hours to complete a single reduce task. Can any hive guru help us
 out with this issue.

 Thanks,
 jS

 2011/10/21 bangbig lizhongliangg...@163.com

 HI all,

 HIVE runs too slowly when it is doing such things(see the log below), 
 what's the problem? because I'm joining two large table?

 it runs pretty fast at first. when the job finishes 95%, it begins to 
 slow down.

 --

 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 104400 
 rows
 2011-10-21 16:55:57,427 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 4 forwarding 104500 rows
 2011-10-21 16:55:57,545 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 4 forwarding 104600 rows
 2011-10-21 16:55:57,686 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 4 forwarding 104700 rows
 2011-10-21 16:55:57,806 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 4 forwarding 104800 rows
 2011-10-21 16:55:57,926 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 4 forwarding 104900 rows
 2011-10-21 16:55:58,045 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 4 forwarding 105000 rows
 2011-10-21 16:55:58,164 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 4 forwarding 105100 rows
 2011-10-21 16:55:58,284 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 4 forwarding 105200 rows
 2011-10-21 16:55:58,405 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 4 forwarding 105300 rows
 2011-10-21 16:55:58,525 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 4 forwarding 105400 rows
 2011-10-21 16:55:58,644 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 4 forwarding 105500 rows
 2011-10-21 16:55:58,764 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 4 forwarding 105600 rows
 2011-10-21 16:55:58,883 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 4 forwarding 105700 rows
 2011-10-21 16:55:59,003 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 4 forwarding 105800 rows
 2011-10-21 16:55:59,122 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 4 forwarding 105900 rows
 2011-10-21 16:55:59,242 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 4 forwarding 106000 rows
 2011-10-21 16:55:59,361 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 4 forwarding 106100 rows
 2011-10-21 16:55:59,482 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 4 forwarding 106200 rows
 2011-10-21 16:55:59,601 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
 4 forwarding 106300 rows





 It is hard to say without seeing the query, the table definition, and the
 explain. Please send the query. Although I have a theory:

 This query is not good:
 select a,b from a,b where a.id=b.id
 It does a Cart join.

 This query is better.
 select a,b from a inner join b on (a.id=b.id)

 Consider setting in your hive-site.xml

 hive.mapred.mode=strict

 It can prevent you from running dangerous queries.



 To be clear:

 Do NOT join this way (it results in a cartesian product):

 select a,b from a,b where a.id=b.id

 Join this way:

 select a,b from a join b on (a.id=b.id)

 Also:
 set hive.mapred.mode=strict in your hive-site.xml to prevent yourself from
 mistakenly doing cartesian products and other bad ideas.



Re: Reducer hanging ( swapping? )

2011-09-22 Thread john smith
Hi,

I am CC'ing this to hive-user as well .

I tried to do a simple join between two tables 2.2GB and 137MB.

select count(*) from A JOIN B ON (A.a = B.b);

The query ran for 7 hours . I am sure this is not normal. The reducer gets
stuck at reduce  reduce phase . Map, copy phases complete just in a matter
of minutes and it gets stuck at reducer. Please see my previous mail below
for my config and vmstat output.

My job has 40 Maps and 7 reduces.

My JT and TT logs doesn't show any warnings, except that one of my nodes got
black listed because of Too many fetch failures.

Initially there was an error in that node's hosts file. I corrected it and
restarted the cluster. Even then that node gets blacklisted frequently.
Should I restart the node after changing hosts file?

Any help ? 7 hrs is too large for such a simple query.

On Thu, Sep 22, 2011 at 5:43 AM, Raj V rajv...@yahoo.com wrote:

 2GB for a task tracker? Here are some possible thoughts.
 Compress  map output.
 Change  mapred.reduce.slowstart.completed.maps


 By the way I see no swapping.  Anything interesting from the task tracker
 log? System log?

 Raj





 
 From: john smith js1987.sm...@gmail.com
 To: common-u...@hadoop.apache.org
 Sent: Wednesday, September 21, 2011 4:52 PM
 Subject: Reducer hanging ( swapping? )
 
 Hi Folks,
 
 I am running hive on a 10 node cluster. Since my hive queries have joins
 in
 them, their reduce phases are a bit heavy.
 
 I have 2GB RAM on each TT . The problem is that my reducer hangs at 76%
 for
 a large amount of time.  I guess this is due to excessive swapping from
 disk
 to memory. My vmstat shows  (on one of the TTs)
 
 procs ---memory-- ---swap-- -io -system--
 cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id
 wa
 1  0   1860  34884 189948 199764400 2 101  0  0
 100
 0
 
 My related config parms are pasted below. (I turned off speculative
 execution for both maps and reduces). Can anyone suggest me
 some improvements so as to make my reduce a bit faster?
 (I've allotted 900MB to task and reduced other params. Even then it is not
 showing any improvments.) . Any suggestions?
 
 
 
 property
 namemapred.min.split.size/name
 value65536/value
 /property
 
 property
 namemapred.reduce.copy.backoff/name
 value5/value
 /property
 
 
 property
 nameio.sort.factor/name
 value60/value
 /property
 
 property
 namemapred.reduce.parallel.copies/name
 value25/value
 /property
 
 property
 nameio.sort.mb/name
 value70/value
 /property
 
 property
 nameio.file.buffer.size/name
 value32768/value
 /property
 
 property
 namemapred.child.java.opts/name
 value-Xmx900m/value
   /property
 
 ===
 
 
 



Running Hive from Eclipse

2011-08-11 Thread john smith
Hi folks,

I am trying to run Hive from eclipse. I've set it up correctly and it is
building the jars and stuff. However I face execeptions when I try to run
hive queries like show tables etc. There  has been a discussion on this in
the mailing list previously but there was no solution provided. It runs
perfectly from command line .

I am making a few changes to the hive source and every time I need to jar it
from the command line and run it .Is there some way to run it directly from
eclipse?

Please help,

Thanks,
JS


Re: Running Hive from Eclipse

2011-08-11 Thread john smith
Hi Carl,

This is the stack trace I get .. http://pastebin.com/3pASqvDq

I configured mysql as my metastore and its perfectly getting updated when
ever I am adding tables via commandline.

Also one more thing is ..I am not getting any log statements while using
command line . I haven't messed up with log4j props but I wonder why this is
happening.

THanks

On Fri, Aug 12, 2011 at 2:12 AM, Carl Steinbach c...@cloudera.com wrote:

 Hi John,

 Can you please include the error messages/exceptions that you're
 encountering?

 Thanks.

 Carl


 On Thu, Aug 11, 2011 at 1:40 PM, john smith js1987.sm...@gmail.comwrote:

 Hi folks,

 I am trying to run Hive from eclipse. I've set it up correctly and it is
 building the jars and stuff. However I face execeptions when I try to run
 hive queries like show tables etc. There  has been a discussion on this
 in
 the mailing list previously but there was no solution provided. It runs
 perfectly from command line .

 I am making a few changes to the hive source and every time I need to jar
 it
 from the command line and run it .Is there some way to run it directly
 from
 eclipse?

 Please help,

 Thanks,
 JS





Re: Running Hive from Eclipse

2011-08-11 Thread john smith
Hi,

See in the line that log4j props is not in found .. I added Hive_conf dir to
the classpath while running and now I get this trace ..

http://pastebin.com/vXs98aZ5

I am completely clueless !

Thanks
JS



On Fri, Aug 12, 2011 at 9:54 AM, john smith js1987.sm...@gmail.com wrote:

 Hi Carl,

 This is the stack trace I get .. http://pastebin.com/3pASqvDq

 I configured mysql as my metastore and its perfectly getting updated when
 ever I am adding tables via commandline.

 Also one more thing is ..I am not getting any log statements while using
 command line . I haven't messed up with log4j props but I wonder why this is
 happening.

 THanks


 On Fri, Aug 12, 2011 at 2:12 AM, Carl Steinbach c...@cloudera.com wrote:

 Hi John,

 Can you please include the error messages/exceptions that you're
 encountering?

 Thanks.

 Carl


 On Thu, Aug 11, 2011 at 1:40 PM, john smith js1987.sm...@gmail.comwrote:

 Hi folks,

 I am trying to run Hive from eclipse. I've set it up correctly and it is
 building the jars and stuff. However I face execeptions when I try to run
 hive queries like show tables etc. There  has been a discussion on this
 in
 the mailing list previously but there was no solution provided. It runs
 perfectly from command line .

 I am making a few changes to the hive source and every time I need to jar
 it
 from the command line and run it .Is there some way to run it directly
 from
 eclipse?

 Please help,

 Thanks,
 JS