[ https://issues.apache.org/jira/browse/NUTCH-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002386#comment-14002386 ]
Julien Nioche commented on NUTCH-1746: -------------------------------------- bq. mapred.child.java.opts=-Xmx1024m bq. Does it matter when running in local mode? See [http://lucene.472066.n3.nabble.com/Out-of-heap-memory-on-175K-links-in-local-mode-td3760326.html] mapred.child.java.opts is not used in local mode, setting this value won't affect the amount of memory available. The automaton URLFilter would be more efficient indeed but your suggestion is definitely worth considering. Before we go any further I'm a bit puzzled by this as in theory it should not be an issue as these things are loaded via the plugins mechanism which holds a cache of instantiated objects. This means that regardless if the number declaring a URLFilter for instance there should be only one actual instance of said filter being used. I see that you found this to be an issue on 1.7 and am wondering whether this has not been fixed in 1.8 e.g. in [https://issues.apache.org/jira/browse/NUTCH-356]. Greg - would you mind giving it a try on 1.8 to make sure that it wasn't caused by the cache leaking or something similar? As for your patch, I am not sure how the changes to the Injector. There is exactly one JVM per mapper or reducer instance so moving the fields to static won't change much -> there will be one instance used anyway. > OutOfMemoryError in Mappers > --------------------------- > > Key: NUTCH-1746 > URL: https://issues.apache.org/jira/browse/NUTCH-1746 > Project: Nutch > Issue Type: Bug > Components: generator, injector > Affects Versions: 1.7 > Environment: Nutch running in local mode with 4M+ domains in > domain-urlfilter.txt > Reporter: Greg Padiasek > Attachments: Generator.patch, Injector.patch, domain-urlfilter-aa, > domain-urlfilter-ab, domain-urlfilter-ac > > > Initially I found that Generator was throwing OutOfMemoryError exception no > matter how much RAM I allocated to JVM. I fixed the problem by moving > URLFilters, URLNormalizers and ScoringFilters to top-level class as > singletons and re-using them in all Generator mapper instances. > Then I found the same problem in Injector and applied analogical fix. > Now it seems that this issue may be common in all Nutch Mapper > implementations. > I was wondering if it would it be possible to integrate this kind of change > in the upstream code base and potentially update all vulnerable Mapper > classes. -- This message was sent by Atlassian JIRA (v6.2#6252)