[ https://issues.apache.org/jira/browse/NUTCH-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340745#comment-16340745 ]
ASF GitHub Bot commented on NUTCH-2501: --------------------------------------- sebastian-nagel commented on a change in pull request #279: NUTCH-2501: Take NUTCH_HEAPSIZE into account when crawling using crawl script URL: https://github.com/apache/nutch/pull/279#discussion_r164053621 ########## File path: src/bin/crawl ########## @@ -105,6 +105,10 @@ SIZE_FETCHLIST=50000 # 25K x NUM_TASKS TIME_LIMIT_FETCH=180 NUM_THREADS=50 SITEMAPS_FROM_HOSTDB_FREQUENCY=never +NUTCH_HEAP_MB=2000 Review comment: bin/nutch already allows to overwrite the Java heap size via the environment variable [NUTCH_HEAPSIZE](https://github.com/apache/nutch/blob/e533ab21b18cf81a49e052185562a7e6489ec4d6/src/bin/nutch#L24). Wouldn't it be simpler to set the environment variable and let bin/nutch add the `-D...` option? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Take into account $NUTCH_HEAPSIZE when crawling using crawl script > ------------------------------------------------------------------ > > Key: NUTCH-2501 > URL: https://issues.apache.org/jira/browse/NUTCH-2501 > Project: Nutch > Issue Type: Improvement > Reporter: Moreno Feltscher > Assignee: Lewis John McGibbney > Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)