hi,hadoopman you can put the large data into your hdfs using "hadoop fs -put src dest" and then you can use "alter table xxx add partition(xxxxx) location 'desc'"
2011/5/11 amit jaiswal <amit_...@yahoo.com> > Hi, > > What is the meaning of 'union' over here. Is there any hadoop job with 1 > (or few) reducer that combines all data together. Have you tried external > (dynamic) partitions for combining data? > > -amit > > > ----- Original Message ----- > From: hadoopman <hadoop...@gmail.com> > To: common-user@hadoop.apache.org > Cc: > Sent: Tuesday, 10 May 2011 11:26 PM > Subject: hadoop/hive data loading > > When we load data into hive sometimes we've run into situations where the > load fails and the logs show a heap out of memory error. If I load just a > few days (or months) of data then no problem. But then if I try to load two > years (for example) of data then I've seen it fail. Not with every feed but > certain ones. > > Sometimes I've been able to split the data and get it to load. An example > of one type of feed I'm working on is the apache web server access logs. > Generally it works. But there are times when I need to load more than a few > months of data and get the memory heap errors in the task logs. > > Generally how do people load their data into Hive? We have a process where > we first copy it to hdfs then from there we run a staging process to get it > into hive. Once that completes we perform a union all then overwrite table > partition. Usually it's during the union all stage that we see these errors > appear. > > Also is there a log which tells you which log it fails on? I can see which > task/job failed but not finding which file it's complaining about. I figure > that might help a bit.. > > Thanks! > > -- Stay Hungry. Stay Foolish.