[
https://issues.apache.org/jira/browse/NUTCH-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15145777#comment-15145777
]
ASF GitHub Bot commented on NUTCH-2218:
---
Github user lewismc commented on a diff in
Github user lewismc commented on a diff in the pull request:
https://github.com/apache/nutch/pull/91#discussion_r52822613
--- Diff: src/java/org/apache/nutch/util/CrawlCompletionStats.java ---
@@ -61,27 +69,60 @@
private int mode = 0;
public int run(String[] args
[
https://issues.apache.org/jira/browse/NUTCH-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15145533#comment-15145533
]
ASF GitHub Bot commented on NUTCH-2218:
---
GitHub user MJJoyce opened a pull request:
GitHub user MJJoyce opened a pull request:
https://github.com/apache/nutch/pull/91
NUTCH-2218 - Update CrawlComplete util with Commons CLI arg parsing
- Switch all argument parsing and checking to commons CLI.
- Update input directory processing such that the 'crawldb' folder
[
https://issues.apache.org/jira/browse/NUTCH-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce updated NUTCH-2218:
-
Issue Type: Improvement (was: Bug)
> Switch CrawlCompletion arg parsing to Commons CLI
>
Michael Joyce created NUTCH-2218:
Summary: Switch CrawlCompletion arg parsing to Commons CLI
Key: NUTCH-2218
URL: https://issues.apache.org/jira/browse/NUTCH-2218
Project: Nutch
Issue Type: B
[
https://issues.apache.org/jira/browse/NUTCH-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Wolski updated NUTCH-2217:
Description: Plugin to filter out the pages on languages other than
specified. It bases on language
GitHub user merito opened a pull request:
https://github.com/apache/nutch/pull/90
fix for NUTCH-2217 contributed by dawid.wolski
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/merito/nutch NUTCH-2217
Alternatively you can revie
[
https://issues.apache.org/jira/browse/NUTCH-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15145037#comment-15145037
]
ASF GitHub Bot commented on NUTCH-2217:
---
GitHub user merito opened a pull request:
[
https://issues.apache.org/jira/browse/NUTCH-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144677#comment-15144677
]
Dawid Wolski commented on NUTCH-2217:
-
The same plugin, but for 2.x version.
> Crawl
[
https://issues.apache.org/jira/browse/NUTCH-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Wolski updated NUTCH-2217:
External issue URL: (was:
https://issues.apache.org/jira/browse/NUTCH-1663)
External issue ID
Dawid Wolski created NUTCH-2217:
---
Summary: Crawl pages with specified language
Key: NUTCH-2217
URL: https://issues.apache.org/jira/browse/NUTCH-2217
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144518#comment-15144518
]
Markus Jelsma commented on NUTCH-2216:
--
An option is to change the default for db.ign
[
https://issues.apache.org/jira/browse/NUTCH-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144497#comment-15144497
]
Markus Jelsma commented on NUTCH-2216:
--
Additionally, it probably should not be imple
[
https://issues.apache.org/jira/browse/NUTCH-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144463#comment-15144463
]
Markus Jelsma commented on NUTCH-2216:
--
Apparently db.ignore.internal.links is not im
Markus Jelsma created NUTCH-2216:
Summary: ignore.internal.links to optionally follow internal
redirects
Key: NUTCH-2216
URL: https://issues.apache.org/jira/browse/NUTCH-2216
Project: Nutch
16 matches
Mail list logo