[jira] [Updated] (HCATALOG-539) While using HCatStorer with dynamic partitioning a hadoop compute node can handle ~2000 partitions only

Arup Malakar (JIRA) Mon, 29 Oct 2012 13:52:16 -0700

     [ 
https://issues.apache.org/jira/browse/HCATALOG-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Arup Malakar updated HCATALOG-539:
----------------------------------

    Priority: Minor  (was: Major)
    
> While using HCatStorer with dynamic partitioning a hadoop compute node can 
> handle ~2000 partitions only
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HCATALOG-539
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-539
>             Project: HCatalog
>          Issue Type: Bug
>    Affects Versions: 0.4, 0.5
>         Environment: hadoop 0.23.4
> hcatalog 0.4
>            Reporter: Arup Malakar
>            Priority: Minor
>
> When HCatStorer is used to store data in HCatalog table with dynamic 
> partitioning, the hadoop job fails if the number of partition is high. The 
> number seems to be around 2000 for my setup. I have also observed that this 
> number is for the host and not for individual map task. For example if the 
> host is running one map task only then the single map task can handle upto 
> 2000 partitions. If the host is running two map tasks parallelly then the sum 
> of partitions being written by both the map task shouldn't exceed 2000. 
> I have also observed that heapsize doesn't matter except dictating how many 
> maps a host can run parallely. I was able to run a map task with 2000 
> partitions with just 1536MB. But with the same heapsize a job with 300 
> partition fails, difference being in the second case there were multiple map 
> tasks running parallelly in the host. In the first case the host was running 
> no other tasks.
> Exception thrown is:
> {code}
> 2012-10-23 18:47:55,640 FATAL [main] org.apache.hadoop.mapred.YarnChild: 
> Error running child : java.lang.OutOfMemoryError: unable to create new native 
> thread
>             at java.lang.Thread.start0(Native Method)
>             at java.lang.Thread.start(Thread.java:597)
>             at 
> org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1258)
>             at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1015)
>             at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:972)
>             at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:227)
>             at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:216)
>             at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:838)
>             at 
> org.apache.hadoop.hive.ql.io.RCFile$Writer.<init>(RCFile.java:767)
>             at 
> org.apache.hadoop.hive.ql.io.RCFile$Writer.<init>(RCFile.java:723)
>             at 
> org.apache.hadoop.hive.ql.io.RCFile$Writer.<init>(RCFile.java:705)
>             at 
> org.apache.hadoop.hive.ql.io.RCFileOutputFormat.getRecordWriter(RCFileOutputFormat.java:86)
>             at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:221)
>             at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:52)
>             at 
> org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:235)
>             at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>             at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>             at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:598)
>             at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>             at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>             at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>             at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:273)
>             at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
>             at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>             at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>             at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
>             at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
>             at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
>             at java.security.AccessController.doPrivileged(Native Method)
>             at javax.security.auth.Subject.doAs(Subject.java:396)
>             at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroup 
>   
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HCATALOG-539) While using HCatStorer with dynamic partitioning a hadoop compute node can handle ~2000 partitions only

Reply via email to