[ 
https://issues.apache.org/jira/browse/NUTCH-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938922#comment-13938922
 ] 

Alparslan Avcı commented on NUTCH-1739:
---------------------------------------

It seems there is no problem for 2.x since all of the ParseUtil objects 
initialized once for a job. So, thread pool is shared for uses of ParseUtil in 
the same job.

> ExecutorService field in ParseUtil.java not be right use and cause memory leak
> ------------------------------------------------------------------------------
>
>                 Key: NUTCH-1739
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1739
>             Project: Nutch
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.6, 2.1, 1.7, 2.2, 1.8, 2.2.1
>         Environment: JDK32, runtime/local
>            Reporter: ysc
>            Priority: Critical
>         Attachments: nutch1.7.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> ########################Problem########################
> java.lang.Exception: java.lang.OutOfMemoryError: unable to create new native 
> thread
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:640)
>         at 
> java.util.concurrent.ThreadPoolExecutor.addThread(ThreadPoolExecutor.java:681)
>         at 
> java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(ThreadPoolExecutor.java:727)
>         at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655)
>         at 
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:92)
>         at org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:159)
>         at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:93)
>         at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:97)
>         at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:44)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>         at java.lang.Thread.run(Thread.java:662)
> ########################Analysis########################
> My server use JDK32. I began thought it was not specify enough memory. I 
> passed the test of {{java -Xmx2600m -version}} so I known my server can use 
> the max memory is 2.6G. So, I add one line config  {{NUTCH_HEAPSIZE=2000}} to 
> the script of bin/nutch. But it's not solve the problem.
> Then, I check the source code to see where to produce so many threads. I find 
> the code
> {code:java}
>  parseResult = new ParseUtil(getConf()).parse(content); 
> {code}
> which in line 97 of the java source file 
> org.apache.nutch.parse.ParseSegment.java's map method.
> Continue, In the constructor of ParseUtil,  instantiate a CachedThreadPool 
> object which no limit of the pool size , see the code:
> {code:java}
> executorService = Executors.newCachedThreadPool(new ThreadFactoryBuilder()
>       .setNameFormat("parse-%d").setDaemon(true).build());
> {code}
> Through the above analyse, I know each map method's output will  instantiate 
> a CachedThreadPool and not to close it. So, ExecutorService field in 
> ParseUtil.java not be right use and cause memory leak.
> ########################Solution########################
> Each map method use a shared FixedThreadPool object which's size can be 
> config in nutch-site.xml, more detail see the patch file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to