from:"\"john smith\""

Re: java.lang.NoClassDefFoundError: com/jayway/jsonpath/PathUtil

2013-03-10 Thread john smith

Hi,

It clearly is  a classpath issue!  When you do a select * from tab , it
works because HIve just fetches the data from HDFS using a FetchTask and
doesn't start any MR job (It probably uses json jar in your local hive lib
directory to deserialize and limit the rows to 5 and hence not raising any
error).

However the error you are facing in the 2nd query involving an MR job is
because mapper (on some remote machine which doesn't have json jar in its
class path) is unable to locate  your json jar ! Did u try doing what Dean
suggested?

Thanks

On Sun, Mar 10, 2013 at 1:49 PM, Sai Sai  wrote:

> Just wondering if anyone has any suggestions:
>
> This executes successfully:
>
> hive> select * from twitter limit 5;
>
> This does not work:
>
> hive> select tweet_id from twitter limit 5; // I have given the exception
> info below:
>
> Here is the output of this:
>
> hive> select * from twitter limit 5;
> OK
>
> tweet_idcreated_attextuser_iduser_screen_nameuser_lang
> 122106088022745088Fri Oct 07 00:28:54 + 2011wkwkw -_- ayo saja
> mba RT @yullyunet: Sepupuuu, kita lanjalan yok.. Kita karokoe-an.. Ajak mas
> galih jg kalo dia mau.. "@Dindnf: doremifas124735434Dindnfen
> 122106088018558976Fri Oct 07 00:28:54 + 2011@egg486 특별히
> 준비했습니다!252828803CocaCola_Koreako
> 122106088026939392Fri Oct 07 00:28:54 + 2011My offer of free
> gobbies for all if @amityaffliction play Blair snitch project still
> stands.168590073SarahYoungBlooden
> 122106088035328001Fri Oct 07 00:28:54 + 2011the girl nxt to me
> in the lib got her headphones in dancing and singing loud af like she the
> only one here haha267296295MONEYyDREAMS_en
> 122106088005971968Fri Oct 07 00:28:54 + 2011@KUnYoong_B2UTY
> Bị lsao đấy269182160b2st_b2utyhpen
> Time taken: 0.154 seconds
>
> This does not work:
>
> hive> select tweet_id from twitter limit 5;
>
>
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_201303050432_0094, Tracking URL =
> http://ubuntu:50030/jobdetails.jsp?jobid=job_201303050432_0094
> Kill Command = /home/satish/work/hadoop-1.0.4/libexec/../bin/hadoop job
> -kill job_201303050432_0094
> Hadoop job information for Stage-1: number of mappers: 1; number of
> reducers: 0
> 2013-03-10 00:14:44,509 Stage-1 map = 0%,  reduce = 0%
> 2013-03-10 00:15:14,613 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_201303050432_0094 with errors
> Error during job, obtaining debugging information...
> Job Tracking URL:
> http://ubuntu:50030/jobdetails.jsp?jobid=job_201303050432_0094
> Examining task ID: task_201303050432_0094_m_02 (and more) from job
> job_201303050432_0094
>
> Task with the most failures(4):
> -
> Task ID:
>   task_201303050432_0094_m_00
>
> URL:
>
> http://ubuntu:50030/taskdetails.jsp?jobid=job_201303050432_0094&tipid=task_201303050432_0094_m_00
> -
> Diagnostic Messages for this Task:
> java.lang.RuntimeException: Error in configuring object
> at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.reflect.InvocationTargetException
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
> at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> ... 9 more
> Caused by: java.lang.RuntimeException: Error in configuring object
> at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
> ... 14 more
> Caused by: java.lang.reflect.InvocationTargetException
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.Delegati

Re: Hive Reducers hanging - interesting problem - skew ?

2011-12-06 Thread john smith

Hi Mark,

Thanks for your response. I tried skew optimization and I also saw the
video by Lin and Namit. From what I understand about skew join, instead of
a single go , they divide it into 2 stages.

Stage1
Join non-skew pairs. and write the skew pairs into temporary files on HDFS.

Stage 2
Do a Map-Join of the files by copying smaller file into mappers of larger
file.

I have a doubt here. How can they be so sure that MapJoin works in stage 2?
The files can be so large that they donot fit into the memory and join is
impossible. Am I wrong?

I also ran the query with skew optimized  and as expected, none of the the
pairs got joined in  the stage 1 and all of them got written into the HDFS.
(They are huge)

Now in the stage2 , Hive is trying to perform a map-join on these large
tables and my Map phase in stage 2 is stuck at 0.13% after 6 hours and 2 of
my machines went down. I had to kill the job finally.

The size of each table is just 2GB which is way smaller than what Hadoop
eco system can handle.

So is there anyway I can join these tables in Hive? Any thoughts ?

Thanks,
jS

On Tue, Dec 6, 2011 at 3:39 AM, Mark Grover  wrote:

> jS,
> Check out if this helps:
>
> http://search-hadoop.com/m/l1usr1MAHX32&subj=Re+Severely+hit+by+curse+of+last+reducer+
>
>
>
> Mark Grover, Business Intelligence Analyst
> OANDA Corporation
>
> www: oanda.com www: fxtrade.com
> e: mgro...@oanda.com
>
> "Best Trading Platform" - World Finance's Forex Awards 2009.
> "The One to Watch" - Treasury Today's Adam Smith Awards 2009.
>
>
> - Original Message -
> From: "john smith" 
> To: user@hive.apache.org
> Sent: Monday, December 5, 2011 4:38:14 PM
> Subject: Hive Reducers hanging - interesting problem - skew ?
>
> Hi list,
>
> I am trying to run a Join query on my 10 node cluster. My query looks as
> follows
>
> select * from A JOIN B on (A.a = B.b)
>
> size of A = 15 million rows
> size of B = 1 million rows
>
> The problem is A.a and B.b has around 25-30 distinct values per column
> which implies that they have high selectivities and the reducers are bulky.
>
> However the performance hit is so horrible that , ALL my reducers hang @
> 75% for 6 hours and doesn't move further.
>
> The only thing that log shows up is "Join operator - forwarding rows
> ---" kinds of logs for all this long. What does
> this mean ?
> There is no swapping happening and the CPU % is constantly around 40% for
> all this time (observed through Ganglia) .
>
> Any way I can solve this problem? Can anyone help me with this?
>
> Thanks,
> jS
>
>
>

Hive Reducers hanging - interesting problem - skew ?

2011-12-05 Thread john smith

Hi list,

I am trying to run a Join query on my 10 node cluster. My query looks as
follows

select * from A JOIN B on (A.a = B.b)

size of A = 15 million rows
size of B = 1 million rows

The problem is A.a and B.b has around 25-30 distinct values per column
which implies that they have high selectivities and the reducers are bulky.

However the performance hit is so horrible that , ALL my reducers hang @
75% for 6 hours and doesn't move further.

The only thing that log shows up is "Join operator - forwarding rows
---" kinds of logs for all this long. What does
this mean ?
There is no swapping happening and the CPU % is constantly around 40% for
all this time (observed through Ganglia) .

Any way I can solve this problem? Can anyone help me with this?

Thanks,
jS

Attaching YourKit profiler with Hive

2011-12-04 Thread john smith

Hi folks,

Can we get a shared license key for yourkit and use it with Hive project?
The wiki page has no information about this. Can any dev help me in this
regard?

Thanks,
jS

Profiling Hive / Metrics

2011-11-16 Thread john smith

Hey devs,

My Hive reducers are running for too long. I wan't to profile Hive and
collect metrics so as to find where most of the time is spent in execution.
Can any one tell me where to start ? Are any profilers attached to Hive by
default?

Any help is appreciated.

Thanks,
jS

Re: hive runs slowly

2011-10-21 Thread john smith

You mean  select a,b from a inner join b on (a.id=b.id) ? or Does those
brackets make some difference? Because the inner keyword is no where
mentioned in the language manual
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins

Any hints?




On Fri, Oct 21, 2011 at 8:47 PM, Edward Capriolo wrote:

>
>
> On Fri, Oct 21, 2011 at 10:21 AM, john smith wrote:
>
>> Hi Edward,
>>
>> Thanks for replying. I have been using the query
>>
>> "select a,b from a,b where a.id=b.id ".  According to my knowledge of
>> Hive, it reads data of both A and B and emits > data> pairs as map outputs and then performs cartesian joins on reduce side
>> for the same join_keys .
>>
>> Is this the cartesian join you are referring to? or Is it the cartesian
>> product of the total table (as in sql) ? or Am I missing something?
>>
>> Can you please throw some light on the functionality of mapred.mode=strict
>> ?
>>
>> Thanks,
>> jS
>>
>> On Fri, Oct 21, 2011 at 7:29 PM, Edward Capriolo 
>> wrote:
>>
>>>
>>>
>>> On Fri, Oct 21, 2011 at 9:22 AM, john smith wrote:
>>>
>>>> Hi list,
>>>>
>>>> I am also facing the same problem. My reducers hang at this position and
>>>> it takes hours to complete a single reduce task. Can any hive guru help us
>>>> out with this issue.
>>>>
>>>> Thanks,
>>>> jS
>>>>
>>>> 2011/10/21 bangbig 
>>>>
>>>>> HI all,
>>>>>
>>>>> HIVE runs too slowly when it is doing such things(see the log below), 
>>>>> what's the problem? because I'm joining two large table?
>>>>>
>>>>> it runs pretty fast at first. when the job finishes 95%, it begins to 
>>>>> slow down.
>>>>>
>>>>> --
>>>>>
>>>>> INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 104400 
>>>>> rows
>>>>> 2011-10-21 16:55:57,427 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 104500 rows
>>>>> 2011-10-21 16:55:57,545 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 104600 rows
>>>>> 2011-10-21 16:55:57,686 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 104700 rows
>>>>> 2011-10-21 16:55:57,806 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 104800 rows
>>>>> 2011-10-21 16:55:57,926 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 104900 rows
>>>>> 2011-10-21 16:55:58,045 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 105000 rows
>>>>> 2011-10-21 16:55:58,164 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 105100 rows
>>>>> 2011-10-21 16:55:58,284 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 105200 rows
>>>>> 2011-10-21 16:55:58,405 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 105300 rows
>>>>> 2011-10-21 16:55:58,525 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 105400 rows
>>>>> 2011-10-21 16:55:58,644 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 105500 rows
>>>>> 2011-10-21 16:55:58,764 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 105600 rows
>>>>> 2011-10-21 16:55:58,883 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 105700 rows
>>>>> 2011-10-21 16:55:59,003 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 105800 rows
>>>>> 2011-10-21 16:55:59,122 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 105900 rows
>>>>> 2011-10-21 16:55:59,242 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 106000 rows
>>>>> 2011-10-21 16:55:59,361 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 106100 rows
>>>>> 2011-10-21 16:55:59,482 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 106200 rows
>>>>> 2011-10-21 16:55:59,601 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 106300 rows
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>> It is hard to say without seeing the query, the table definition, and the
>>> explain. Please send the query. Although I have a theory:
>>>
>>> This query is not good:
>>> select a,b from a,b where a.id=b.id
>>> It does a Cart join.
>>>
>>> This query is better.
>>> select a,b from a inner join b on (a.id=b.id)
>>>
>>> Consider setting in your hive-site.xml
>>>
>>> hive.mapred.mode=strict
>>>
>>> It can prevent you from running dangerous queries.
>>>
>>>
>>
> To be clear:
>
> Do NOT join this way (it results in a cartesian product):
>
> select a,b from a,b where a.id=b.id
>
> Join this way:
>
> select a,b from a join b on (a.id=b.id)
>
> Also:
> set hive.mapred.mode=strict in your hive-site.xml to prevent yourself from
> mistakenly doing cartesian products and other bad ideas.
>

Re: hive runs slowly

2011-10-21 Thread john smith

Hi Edward,

Thanks for replying. I have been using the query

"select a,b from a,b where a.id=b.id ".  According to my knowledge of Hive,
it reads data of both A and B and emits 
pairs as map outputs and then performs cartesian joins on reduce side for
the same join_keys .

Is this the cartesian join you are referring to? or Is it the cartesian
product of the total table (as in sql) ? or Am I missing something?

Can you please throw some light on the functionality of mapred.mode=strict ?

Thanks,
jS

On Fri, Oct 21, 2011 at 7:29 PM, Edward Capriolo wrote:

>
>
> On Fri, Oct 21, 2011 at 9:22 AM, john smith wrote:
>
>> Hi list,
>>
>> I am also facing the same problem. My reducers hang at this position and
>> it takes hours to complete a single reduce task. Can any hive guru help us
>> out with this issue.
>>
>> Thanks,
>> jS
>>
>> 2011/10/21 bangbig 
>>
>>> HI all,
>>>
>>> HIVE runs too slowly when it is doing such things(see the log below), 
>>> what's the problem? because I'm joining two large table?
>>>
>>> it runs pretty fast at first. when the job finishes 95%, it begins to slow 
>>> down.
>>>
>>> --
>>>
>>> INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 104400 
>>> rows
>>> 2011-10-21 16:55:57,427 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 104500 rows
>>> 2011-10-21 16:55:57,545 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 104600 rows
>>> 2011-10-21 16:55:57,686 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 104700 rows
>>> 2011-10-21 16:55:57,806 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 104800 rows
>>> 2011-10-21 16:55:57,926 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 104900 rows
>>> 2011-10-21 16:55:58,045 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 105000 rows
>>> 2011-10-21 16:55:58,164 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 105100 rows
>>> 2011-10-21 16:55:58,284 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 105200 rows
>>> 2011-10-21 16:55:58,405 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 105300 rows
>>> 2011-10-21 16:55:58,525 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 105400 rows
>>> 2011-10-21 16:55:58,644 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 105500 rows
>>> 2011-10-21 16:55:58,764 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 105600 rows
>>> 2011-10-21 16:55:58,883 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 105700 rows
>>> 2011-10-21 16:55:59,003 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 105800 rows
>>> 2011-10-21 16:55:59,122 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 105900 rows
>>> 2011-10-21 16:55:59,242 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 106000 rows
>>> 2011-10-21 16:55:59,361 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 106100 rows
>>> 2011-10-21 16:55:59,482 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 106200 rows
>>> 2011-10-21 16:55:59,601 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
>>> forwarding 106300 rows
>>>
>>>
>>>
>>>
>>
> It is hard to say without seeing the query, the table definition, and the
> explain. Please send the query. Although I have a theory:
>
> This query is not good:
> select a,b from a,b where a.id=b.id
> It does a Cart join.
>
> This query is better.
> select a,b from a inner join b on (a.id=b.id)
>
> Consider setting in your hive-site.xml
>
> hive.mapred.mode=strict
>
> It can prevent you from running dangerous queries.
>
>

Re: hive runs slowly

2011-10-21 Thread john smith

Hi list,

I am also facing the same problem. My reducers hang at this position and it
takes hours to complete a single reduce task. Can any hive guru help us out
with this issue.

Thanks,
jS

2011/10/21 bangbig 

> HI all,
>
> HIVE runs too slowly when it is doing such things(see the log below), what's 
> the problem? because I'm joining two large table?
>
> it runs pretty fast at first. when the job finishes 95%, it begins to slow 
> down.
>
> --
>
> INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 104400 rows
> 2011-10-21 16:55:57,427 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 104500 rows
> 2011-10-21 16:55:57,545 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 104600 rows
> 2011-10-21 16:55:57,686 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 104700 rows
> 2011-10-21 16:55:57,806 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 104800 rows
> 2011-10-21 16:55:57,926 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 104900 rows
> 2011-10-21 16:55:58,045 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 105000 rows
> 2011-10-21 16:55:58,164 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 105100 rows
> 2011-10-21 16:55:58,284 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 105200 rows
> 2011-10-21 16:55:58,405 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 105300 rows
> 2011-10-21 16:55:58,525 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 105400 rows
> 2011-10-21 16:55:58,644 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 105500 rows
> 2011-10-21 16:55:58,764 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 105600 rows
> 2011-10-21 16:55:58,883 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 105700 rows
> 2011-10-21 16:55:59,003 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 105800 rows
> 2011-10-21 16:55:59,122 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 105900 rows
> 2011-10-21 16:55:59,242 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 106000 rows
> 2011-10-21 16:55:59,361 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 106100 rows
> 2011-10-21 16:55:59,482 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 106200 rows
> 2011-10-21 16:55:59,601 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 106300 rows
>
>
>
>

Re: Reducer hanging ( swapping? )

2011-09-22 Thread john smith

Hi,

I am CC'ing this to hive-user as well .

I tried to do a simple join between two tables 2.2GB and 137MB.

select count(*) from A JOIN B ON (A.a = B.b);

The query ran for 7 hours . I am sure this is not normal. The reducer gets
stuck at reduce > reduce phase . Map, copy phases complete just in a matter
of minutes and it gets stuck at reducer. Please see my previous mail below
for my config and vmstat output.

My job has 40 Maps and 7 reduces.

My JT and TT logs doesn't show any warnings, except that one of my nodes got
black listed because of Too many fetch failures.

Initially there was an error in that node's hosts file. I corrected it and
restarted the cluster. Even then that node gets blacklisted frequently.
Should I restart the node after changing hosts file?

Any help ? 7 hrs is too large for such a simple query.

On Thu, Sep 22, 2011 at 5:43 AM, Raj V  wrote:

> 2GB for a task tracker? Here are some possible thoughts.
> Compress  map output.
> Change  mapred.reduce.slowstart.completed.maps
>
>
> By the way I see no swapping.  Anything interesting from the task tracker
> log? System log?
>
> Raj
>
>
>
>
>
> >
> >From: john smith 
> >To: common-u...@hadoop.apache.org
> >Sent: Wednesday, September 21, 2011 4:52 PM
> >Subject: Reducer hanging ( swapping? )
> >
> >Hi Folks,
> >
> >I am running hive on a 10 node cluster. Since my hive queries have joins
> in
> >them, their reduce phases are a bit heavy.
> >
> >I have 2GB RAM on each TT . The problem is that my reducer hangs at 76%
> for
> >a large amount of time.  I guess this is due to excessive swapping from
> disk
> >to memory. My vmstat shows  (on one of the TTs)
> >
> >procs ---memory-- ---swap-- -io -system--
> >cpu
> >r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id
> >wa
> >1  0   1860  34884 189948 199764400 2 101  0  0
> 100
> >0
> >
> >My related config parms are pasted below. (I turned off speculative
> >execution for both maps and reduces). Can anyone suggest me
> >some improvements so as to make my reduce a bit faster?
> >(I've allotted 900MB to task and reduced other params. Even then it is not
> >showing any improvments.) . Any suggestions?
> >
> >
> >
> >
> >mapred.min.split.size
> >65536
> >
> >
> >
> >mapred.reduce.copy.backoff
> >5
> >
> >
> >
> >
> >io.sort.factor
> >60
> >
> >
> >
> >mapred.reduce.parallel.copies
> >25
> >
> >
> >
> >io.sort.mb
> >70
> >
> >
> >
> >io.file.buffer.size
> >32768
> >
> >
> >
> >mapred.child.java.opts
> >-Xmx900m
> >  
> >
> >===
> >
> >
> >
>

Re: Running Hive from Eclipse

2011-08-11 Thread john smith

Hi,

See in the line that log4j props is not in found .. I added Hive_conf dir to
the classpath while running and now I get this trace ..

http://pastebin.com/vXs98aZ5

I am completely clueless !

Thanks
JS



On Fri, Aug 12, 2011 at 9:54 AM, john smith  wrote:

> Hi Carl,
>
> This is the stack trace I get .. http://pastebin.com/3pASqvDq
>
> I configured mysql as my metastore and its perfectly getting updated when
> ever I am adding tables via commandline.
>
> Also one more thing is ..I am not getting any log statements while using
> command line . I haven't messed up with log4j props but I wonder why this is
> happening.
>
> THanks
>
>
> On Fri, Aug 12, 2011 at 2:12 AM, Carl Steinbach  wrote:
>
>> Hi John,
>>
>> Can you please include the error messages/exceptions that you're
>> encountering?
>>
>> Thanks.
>>
>> Carl
>>
>>
>> On Thu, Aug 11, 2011 at 1:40 PM, john smith wrote:
>>
>>> Hi folks,
>>>
>>> I am trying to run Hive from eclipse. I've set it up correctly and it is
>>> building the jars and stuff. However I face execeptions when I try to run
>>> hive queries like "show tables" etc. There  has been a discussion on this
>>> in
>>> the mailing list previously but there was no solution provided. It runs
>>> perfectly from command line .
>>>
>>> I am making a few changes to the hive source and every time I need to jar
>>> it
>>> from the command line and run it .Is there some way to run it directly
>>> from
>>> eclipse?
>>>
>>> Please help,
>>>
>>> Thanks,
>>> JS
>>>
>>
>>
>

Re: Running Hive from Eclipse

2011-08-11 Thread john smith

Hi Carl,

This is the stack trace I get .. http://pastebin.com/3pASqvDq

I configured mysql as my metastore and its perfectly getting updated when
ever I am adding tables via commandline.

Also one more thing is ..I am not getting any log statements while using
command line . I haven't messed up with log4j props but I wonder why this is
happening.

THanks

On Fri, Aug 12, 2011 at 2:12 AM, Carl Steinbach  wrote:

> Hi John,
>
> Can you please include the error messages/exceptions that you're
> encountering?
>
> Thanks.
>
> Carl
>
>
> On Thu, Aug 11, 2011 at 1:40 PM, john smith wrote:
>
>> Hi folks,
>>
>> I am trying to run Hive from eclipse. I've set it up correctly and it is
>> building the jars and stuff. However I face execeptions when I try to run
>> hive queries like "show tables" etc. There  has been a discussion on this
>> in
>> the mailing list previously but there was no solution provided. It runs
>> perfectly from command line .
>>
>> I am making a few changes to the hive source and every time I need to jar
>> it
>> from the command line and run it .Is there some way to run it directly
>> from
>> eclipse?
>>
>> Please help,
>>
>> Thanks,
>> JS
>>
>
>

Running Hive from Eclipse

2011-08-11 Thread john smith

Hi folks,

I am trying to run Hive from eclipse. I've set it up correctly and it is
building the jars and stuff. However I face execeptions when I try to run
hive queries like "show tables" etc. There  has been a discussion on this in
the mailing list previously but there was no solution provided. It runs
perfectly from command line .

I am making a few changes to the hive source and every time I need to jar it
from the command line and run it .Is there some way to run it directly from
eclipse?

Please help,

Thanks,
JS

Re: java.lang.NoClassDefFoundError: com/jayway/jsonpath/PathUtil

Re: Hive Reducers hanging - interesting problem - skew ?

Hive Reducers hanging - interesting problem - skew ?

Attaching YourKit profiler with Hive

Profiling Hive / Metrics

Re: hive runs slowly

Re: hive runs slowly

Re: hive runs slowly

Re: Reducer hanging ( swapping? )

Re: Running Hive from Eclipse

Re: Running Hive from Eclipse

Running Hive from Eclipse

12 matches

Site Navigation

Mail list logo

Footer information