[jira] [Created] (HCATALOG-539) While using HCatStorer with dynamic partitioning a hadoop compute node can handle ~2000 partitions only

Arup Malakar (JIRA) Thu, 25 Oct 2012 15:05:15 -0700

Arup Malakar created HCATALOG-539:
-------------------------------------

             Summary: While using HCatStorer with dynamic partitioning a hadoop 
compute node can handle ~2000 partitions only
                 Key: HCATALOG-539
                 URL: https://issues.apache.org/jira/browse/HCATALOG-539
             Project: HCatalog
          Issue Type: Bug
    Affects Versions: 0.4, 0.5
         Environment: hadoop 0.23.4
hcatalog 0.4
            Reporter: Arup Malakar



When HCatStorer is used to store data in HCatalog table with dynamic 
partitioning, the hadoop job fails if the number of partition is high. The 
number seems to be around 2000 for my setup. I have also observed that this 
number is for the host and not for individual map task. For example if the host 
is running one map task only then the single map task can handle upto 2000 
partitions. If the host is running two map tasks parallelly then the sum of 
partitions being written by both the map task shouldn't exceed 2000. 

I have also observed that heapsize doesn't matter except dictating how many 
maps a host can run parallely. I was able to run a map task with 2000 
partitions with just 1536MB. But with the same heapsize a job with 300 
partition fails, difference being in the second case there were multiple map 
tasks running parallelly in the host. In the first case the host was running no 
other tasks.

Exception thrown is:

{code}
2012-10-23 18:47:55,640 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error 
running child : java.lang.OutOfMemoryError: unable to create new native thread
            at java.lang.Thread.start0(Native Method)
            at java.lang.Thread.start(Thread.java:597)
            at 
org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1258)
            at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1015)
            at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:972)
            at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:227)
            at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:216)
            at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:838)
            at 
org.apache.hadoop.hive.ql.io.RCFile$Writer.<init>(RCFile.java:767)
            at 
org.apache.hadoop.hive.ql.io.RCFile$Writer.<init>(RCFile.java:723)
            at 
org.apache.hadoop.hive.ql.io.RCFile$Writer.<init>(RCFile.java:705)
            at 
org.apache.hadoop.hive.ql.io.RCFileOutputFormat.getRecordWriter(RCFileOutputFormat.java:86)
            at 
org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:221)
            at 
org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:52)
            at 
org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:235)
            at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
            at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
            at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:598)
            at 
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
            at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
            at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
            at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:273)
            at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
            at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
            at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
            at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
            at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
            at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:396)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroup   
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HCATALOG-539) While using HCatStorer with dynamic partitioning a hadoop compute node can handle ~2000 partitions only

Reply via email to