[ https://issues.apache.org/jira/browse/NUTCH-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Greg Padiasek updated NUTCH-1746: --------------------------------- Comment: was deleted (was: I use domain-urlfilter.txt (in the attached and split file). I also tried regex-urlfilter.txt but with regex the memory usage was even higher because each line was compiled for expression evaluation.) > OutOfMemoryError in Mappers > --------------------------- > > Key: NUTCH-1746 > URL: https://issues.apache.org/jira/browse/NUTCH-1746 > Project: Nutch > Issue Type: Bug > Components: generator, injector > Affects Versions: 1.7 > Environment: Nutch running in local mode with 4M+ domains in > domain-urlfilter.txt > Reporter: Greg Padiasek > Attachments: Generator.patch, Injector.patch, domain-urlfilter-aa, > domain-urlfilter-ab, domain-urlfilter-ac > > > Initially I found that Generator was throwing OutOfMemoryError exception no > matter how much RAM I allocated to JVM. I fixed the problem by moving > URLFilters, URLNormalizers and ScoringFilters to top-level class as > singletons and re-using them in all Generator mapper instances. > Then I found the same problem in Injector and applied analogical fix. > Now it seems that this issue may be common in all Nutch Mapper > implementations. > I was wondering if it would it be possible to integrate this kind of change > in the upstream code base and potentially update all vulnerable Mapper > classes. -- This message was sent by Atlassian JIRA (v6.2#6252)