[
https://issues.apache.org/jira/browse/HCATALOG-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arup Malakar updated HCATALOG-539:
----------------------------------
Priority: Minor (was: Major)
> While using HCatStorer with dynamic partitioning a hadoop compute node can
> handle ~2000 partitions only
> -------------------------------------------------------------------------------------------------------
>
> Key: HCATALOG-539
> URL: https://issues.apache.org/jira/browse/HCATALOG-539
> Project: HCatalog
> Issue Type: Bug
> Affects Versions: 0.4, 0.5
> Environment: hadoop 0.23.4
> hcatalog 0.4
> Reporter: Arup Malakar
> Priority: Minor
>
> When HCatStorer is used to store data in HCatalog table with dynamic
> partitioning, the hadoop job fails if the number of partition is high. The
> number seems to be around 2000 for my setup. I have also observed that this
> number is for the host and not for individual map task. For example if the
> host is running one map task only then the single map task can handle upto
> 2000 partitions. If the host is running two map tasks parallelly then the sum
> of partitions being written by both the map task shouldn't exceed 2000.
> I have also observed that heapsize doesn't matter except dictating how many
> maps a host can run parallely. I was able to run a map task with 2000
> partitions with just 1536MB. But with the same heapsize a job with 300
> partition fails, difference being in the second case there were multiple map
> tasks running parallelly in the host. In the first case the host was running
> no other tasks.
> Exception thrown is:
> {code}
> 2012-10-23 18:47:55,640 FATAL [main] org.apache.hadoop.mapred.YarnChild:
> Error running child : java.lang.OutOfMemoryError: unable to create new native
> thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:597)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1258)
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1015)
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:972)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:227)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:216)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:838)
> at
> org.apache.hadoop.hive.ql.io.RCFile$Writer.<init>(RCFile.java:767)
> at
> org.apache.hadoop.hive.ql.io.RCFile$Writer.<init>(RCFile.java:723)
> at
> org.apache.hadoop.hive.ql.io.RCFile$Writer.<init>(RCFile.java:705)
> at
> org.apache.hadoop.hive.ql.io.RCFileOutputFormat.getRecordWriter(RCFileOutputFormat.java:86)
> at
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:221)
> at
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:52)
> at
> org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:235)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:598)
> at
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
> at
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:273)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroup
>
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira