[ https://issues.apache.org/jira/browse/NUTCH-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802755#comment-13802755 ]
Hudson commented on NUTCH-1640: ------------------------------- SUCCESS: Integrated in Nutch-trunk #2400 (See [https://builds.apache.org/job/Nutch-trunk/2400/]) Fix NUTCH-1640 (jnioche: http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1534962) * /nutch/trunk/src/java/org/apache/nutch/parse/ParseSegment.java > OOM in ParseSegment Phase > ------------------------- > > Key: NUTCH-1640 > URL: https://issues.apache.org/jira/browse/NUTCH-1640 > Project: Nutch > Issue Type: Bug > Components: parser > Affects Versions: 1.7 > Environment: RHEL 6.2 x86_64 > Reporter: Mitesh Singh Jat > Attachments: NUTCH-1640.patch > > > The nutch ParseSegment phase fails after 2 runs on same TaskTracker, with the > following Exception: > {noformat} > Exception in thread "main" org.apache.hadoop.ipc.RemoteException: > java.io.IOException: java.lang.OutOfMemoryError: unable to create new native > thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:640) > at > org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.kill(JvmManager.java:553) > at > org.apache.hadoop.mapred.JvmManager$JvmManagerForType.killJvmRunner(JvmManager.java:317) > at > org.apache.hadoop.mapred.JvmManager$JvmManagerForType.killJvm(JvmManager.java:297) > at > org.apache.hadoop.mapred.JvmManager$JvmManagerForType.taskKilled(JvmManager.java:289) > at org.apache.hadoop.mapred.JvmManager.taskKilled(JvmManager.java:158) > at org.apache.hadoop.mapred.TaskRunner.kill(TaskRunner.java:802) > at > org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:3315) > at > org.apache.hadoop.mapred.TaskTracker$TaskInProgress.jobHasFinished(TaskTracker.java:3287) > at org.apache.hadoop.mapred.TaskTracker.purgeTask(TaskTracker.java:2316) > at > org.apache.hadoop.mapred.TaskTracker.fatalError(TaskTracker.java:3710) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1444) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1440) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1438) > at org.apache.hadoop.ipc.Client.call(Client.java:1118) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) > at $Proxy1.fatalError(Unknown Source) > at org.apache.hadoop.mapred.Child.main(Child.java:310) > {noformat} > Whereas similar parsing when done in Nutch Fetcher Phase (fetcher.parse=true, > fetcher.store.content=false) does not give such issue. > Hence, on analysing the code of Fetcher and ParseSegment, it seems the issue > should be related to creation parseResult foreach url in ParseSegment.java. > {code} > 95 ParseResult parseResult = null; > 96 try { > 97 parseResult = new ParseUtil(getConf()).parse(content); // <***** > 98 } catch (Exception e) { > 99 LOG.warn("Error parsing: " + key + ": " + > StringUtils.stringifyException(e)); > 100 return; > 101 } > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)