Re: hadoop/hive data loading

amit jaiswal Tue, 10 May 2011 23:07:22 -0700

Hi,

What is the meaning of 'union' over here. Is there any hadoop job with 1 (or 
few) reducer that combines all data together. Have you tried external (dynamic) 
partitions for combining data?

-amit

----- Original Message -----
From: hadoopman <hadoop...@gmail.com>
To: common-user@hadoop.apache.org
Cc: 
Sent: Tuesday, 10 May 2011 11:26 PM
Subject: hadoop/hive data loading

When we load data into hive sometimes we've run into situations where the load 
fails and the logs show a heap out of memory error.  If I load just a few days 
(or months) of data then no problem.  But then if I try to load two years (for 
example) of data then I've seen it fail.  Not with every feed but certain ones.

Sometimes I've been able to split the data and get it to load.  An example of 
one type of feed I'm working on is the apache web server access logs.  
Generally it works.  But there are times when I need to load more than a few 
months of data and get the memory heap errors in the task logs.

Generally how do people load their data into Hive?  We have a process where we 
first copy it to hdfs then from there we run a staging process to get it into 
hive.  Once that completes we perform a union all then overwrite table 
partition.  Usually it's during the union all stage that we see these errors 
appear.

Also is there a log which tells you which log it fails on?  I can see which 
task/job failed but not finding which file it's complaining about.  I figure 
that might help a bit..

Thanks!

Re: hadoop/hive data loading

Reply via email to