[jira] [Commented] (NUTCH-1314) Impose a limit on the length of outlink target urls

2016-09-26 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522329#comment-15522329 ] Sebastian Nagel commented on NUTCH-1314: Is there a reason why this issu

[jira] [Commented] (NUTCH-2315) UpdateDb jobs fails everytime (Nutch 2.3.1)

2016-09-28 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15528667#comment-15528667 ] Sebastian Nagel commented on NUTCH-2315: Thanks, for reporting that

[jira] [Commented] (NUTCH-2315) UpdateDb jobs fails everytime (Nutch 2.3.1)

2016-09-28 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15528786#comment-15528786 ] Sebastian Nagel commented on NUTCH-2315: Ev., take a lower value, accordin

[jira] [Commented] (NUTCH-2320) URLFilterChecker to run as TCP Telnet service

2016-10-05 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549245#comment-15549245 ] Sebastian Nagel commented on NUTCH-2320: Right, change logs are generated

[jira] [Commented] (NUTCH-2319) Link with "rel=alternate" doesn't return in crawl

2016-10-07 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15554993#comment-15554993 ] Sebastian Nagel commented on NUTCH-2319: See the ongoing discussion in user@n

[jira] [Updated] (NUTCH-2328) GeneratorJob does not generate anything on second run

2016-10-18 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2328: --- Affects Version/s: (was: 2.5) (was: 2.4) > GeneratorJob d

[jira] [Updated] (NUTCH-2328) GeneratorJob does not generate anything on second run

2016-10-18 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2328: --- Fix Version/s: 2.4 > GeneratorJob does not generate anything on second

[jira] [Commented] (NUTCH-2328) GeneratorJob does not generate anything on second run

2016-10-18 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585585#comment-15585585 ] Sebastian Nagel commented on NUTCH-2328: Thanks, [~arthur-evozon]. Good c

[jira] [Commented] (NUTCH-2328) GeneratorJob does not generate anything on second run

2016-10-18 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586031#comment-15586031 ] Sebastian Nagel commented on NUTCH-2328: I don't know what's spec

[jira] [Commented] (NUTCH-2328) GeneratorJob does not generate anything on second run

2016-10-18 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586690#comment-15586690 ] Sebastian Nagel commented on NUTCH-2328: > the only solution is to have a

[jira] [Commented] (NUTCH-2328) GeneratorJob does not generate anything on second run

2016-10-19 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588759#comment-15588759 ] Sebastian Nagel commented on NUTCH-2328: Hi [~arthur-evozon], > Btw., I

[jira] [Commented] (NUTCH-2334) Extension point for schedulers

2016-11-24 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15693599#comment-15693599 ] Sebastian Nagel commented on NUTCH-2334: Hi [~roannel], what does "

[jira] [Created] (NUTCH-2335) Injector not to filter and normalize existing URLs in CrawlDb

2016-11-28 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2335: -- Summary: Injector not to filter and normalize existing URLs in CrawlDb Key: NUTCH-2335 URL: https://issues.apache.org/jira/browse/NUTCH-2335 Project: Nutch

[jira] [Commented] (NUTCH-2336) SegmentReader to implement Tool

2016-11-30 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15708949#comment-15708949 ] Sebastian Nagel commented on NUTCH-2336: Thanks, [~VSlot]! Looks good to me

[jira] [Updated] (NUTCH-2336) SegmentReader to implement Tool

2016-12-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2336: --- Assignee: (was: Sebastian Nagel) > SegmentReader to implement T

[jira] [Assigned] (NUTCH-2336) SegmentReader to implement Tool

2016-12-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2336: -- Assignee: Sebastian Nagel > SegmentReader to implement T

[jira] [Resolved] (NUTCH-2336) SegmentReader to implement Tool

2016-12-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2336. Resolution: Fixed Committed (6e051f2). Thanks! > SegmentReader to implement T

[jira] [Commented] (NUTCH-2336) SegmentReader to implement Tool

2016-12-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15711816#comment-15711816 ] Sebastian Nagel commented on NUTCH-2336: The error happene

[jira] [Commented] (NUTCH-2336) SegmentReader to implement Tool

2016-12-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15711845#comment-15711845 ] Sebastian Nagel commented on NUTCH-2336: Ok, this was a temporary failure.

[jira] [Created] (NUTCH-2337) urlnormalizer-basic to strip empty port

2016-12-09 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2337: -- Summary: urlnormalizer-basic to strip empty port Key: NUTCH-2337 URL: https://issues.apache.org/jira/browse/NUTCH-2337 Project: Nutch Issue Type: Bug

[jira] [Commented] (NUTCH-2320) URLFilterChecker to run as TCP Telnet service

2016-12-13 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744691#comment-15744691 ] Sebastian Nagel commented on NUTCH-2320: Hi Markus, generally +1 - the te

[jira] [Commented] (NUTCH-2338) URLNormalizerChecker to run as TCP Telnet service

2016-12-13 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744738#comment-15744738 ] Sebastian Nagel commented on NUTCH-2338: Hi Markus, thanks! See the comment

[jira] [Commented] (NUTCH-2046) The crawl script should be able to skip an initial injection.

2016-12-13 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745073#comment-15745073 ] Sebastian Nagel commented on NUTCH-2046: A statement in change log and rel

[jira] [Resolved] (NUTCH-2337) urlnormalizer-basic to strip empty port

2016-12-13 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2337. Resolution: Fixed Fix Version/s: 2.4 Committed to trunk f351790 and 2.x 6e3c34d

[jira] [Updated] (NUTCH-2337) urlnormalizer-basic to strip empty port

2016-12-13 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2337: --- Affects Version/s: 2.3.1 > urlnormalizer-basic to strip empty p

[jira] [Commented] (NUTCH-2340) Can't install NUTCH from latest master branch. resolve-default: [ivy:resolve] :: Apache Ivy 2.4.0 - 20141213170938 :: http://ant.apache.org/ivy/ :: [ivy:resolve] :: lo

2016-12-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757702#comment-15757702 ] Sebastian Nagel commented on NUTCH-2340: Thanks [~rajanchandi]! However,

[jira] [Comment Edited] (NUTCH-2340) Can't install NUTCH from latest master branch. resolve-default: [ivy:resolve] :: Apache Ivy 2.4.0 - 20141213170938 :: http://ant.apache.org/ivy/ :: [ivy:resolve]

2016-12-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757702#comment-15757702 ] Sebastian Nagel edited comment on NUTCH-2340 at 12/17/16 10:1

[jira] [Commented] (NUTCH-2334) Extension point for schedulers

2017-01-10 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815806#comment-15815806 ] Sebastian Nagel commented on NUTCH-2334: If it's only about deciding

[jira] [Created] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"

2017-01-11 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2349: -- Summary: urlnormalizer-basic NPE for ill-formed URL "http:/" Key: NUTCH-2349 URL: https://issues.apache.org/jira/browse/NUTCH-2349 Project: Nutch

[jira] [Commented] (NUTCH-2345) FetchItemQueue logs are logged with wrong class name

2017-01-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826084#comment-15826084 ] Sebastian Nagel commented on NUTCH-2345: Hi Lewis, yes, that looks easie

[jira] [Commented] (NUTCH-2350) Add Missing activeConfId Field to NutchStatus Object

2017-01-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826198#comment-15826198 ] Sebastian Nagel commented on NUTCH-2350: +1 but isn't this just part

[jira] [Commented] (NUTCH-2351) Log with Generic Class Name at Nutch 2.x

2017-01-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826777#comment-15826777 ] Sebastian Nagel commented on NUTCH-2351: +1 that would be great, thanks! &

[jira] [Updated] (NUTCH-2351) Log with Generic Class Name at Nutch 2.x

2017-01-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2351: --- Description: There are many mistakes when some reference code is copied and created a new

[jira] [Commented] (NUTCH-2351) Log with Generic Class Name at Nutch 2.x

2017-01-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826779#comment-15826779 ] Sebastian Nagel commented on NUTCH-2351: Obsoletes NUTCH-2345 where

[jira] [Updated] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"

2017-01-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2349: --- Affects Version/s: 2.4 > urlnormalizer-basic NPE for ill-formed URL &q

[jira] [Updated] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"

2017-01-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2349: --- Fix Version/s: 2.4 > urlnormalizer-basic NPE for ill-formed URL &q

[jira] [Commented] (NUTCH-2315) UpdateDb jobs fails everytime (Nutch 2.3.1)

2017-01-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826942#comment-15826942 ] Sebastian Nagel commented on NUTCH-2315: Hi [~shubham.gupta], is the last e

[jira] [Commented] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"

2017-01-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826958#comment-15826958 ] Sebastian Nagel commented on NUTCH-2349: See also [crawler-commons#136|h

[jira] [Commented] (NUTCH-2333) Indexer for RabbitMQ

2017-01-18 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828234#comment-15828234 ] Sebastian Nagel commented on NUTCH-2333: +1 looks good, although I haven'

[jira] [Commented] (NUTCH-2352) Log with Generic Class Name at Nutch 1.x

2017-01-19 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15830548#comment-15830548 ] Sebastian Nagel commented on NUTCH-2352: +1 lgtm, going to commit... > L

[jira] [Resolved] (NUTCH-2352) Log with Generic Class Name at Nutch 1.x

2017-01-19 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2352. Resolution: Fixed Committed to 1.x. Thanks, [~kamaci]! > Log with Generic Class Name

[jira] [Resolved] (NUTCH-2351) Log with Generic Class Name at Nutch 2.x

2017-01-19 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2351. Resolution: Fixed Committed to 2.x, thanks [~kamaci]! > Log with Generic Class Name

[jira] [Reopened] (NUTCH-2346) Check Types at Object Equality

2017-01-25 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reopened NUTCH-2346: Hi, o.a.n.protocol.TestContent now fails in line 50 {code} WritableTestUtils.testWritable(r

[jira] [Comment Edited] (NUTCH-2346) Check Types at Object Equality

2017-01-25 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15837625#comment-15837625 ] Sebastian Nagel edited comment on NUTCH-2346 at 1/25/17 12:1

[jira] [Commented] (NUTCH-2346) Check Types at Object Equality

2017-01-26 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840129#comment-15840129 ] Sebastian Nagel commented on NUTCH-2346: Hi Lewis, no problem. That's

[jira] [Resolved] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"

2017-02-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2349. Resolution: Fixed Assignee: Sebastian Nagel Committed to 1.x and 2.x

[jira] [Resolved] (NUTCH-2345) FetchItemQueue logs are logged with wrong class name

2017-02-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2345. Resolution: Duplicate Thanks [~Mgupta]! The fix is included in NUTCH-2352

[jira] [Resolved] (NUTCH-2347) Use Logger Instead of Printing Throwable

2017-02-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2347. Resolution: Fixed Merged into 2.x, thanks [~kamaci]! > Use Logger Instead of Print

[jira] [Commented] (NUTCH-2355) Protocol plugins to set cookie if Cookie metadata field is present

2017-02-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848277#comment-15848277 ] Sebastian Nagel commented on NUTCH-2355: Hi Markus, useful for sure, e.g.,

[jira] [Commented] (NUTCH-2357) Index metadata throw Exception because writable object cannot be cast to Text

2017-02-09 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859272#comment-15859272 ] Sebastian Nagel commented on NUTCH-2357: Thanks! See also [this discussion on

[jira] [Updated] (NUTCH-2357) Index metadata throw Exception because writable object cannot be cast to Text

2017-02-09 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2357: --- Flags: Patch Patch Info: Patch Available > Index metadata throw Exception beca

[jira] [Commented] (NUTCH-2363) Fetcher support for reading and setting cookies

2017-03-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892562#comment-15892562 ] Sebastian Nagel commented on NUTCH-2363: Hi Markus, I'm a little bit

[jira] [Created] (NUTCH-2364) http.agent.rotate: IllegalArgumentException / last element of agent names ignored

2017-03-03 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2364: -- Summary: http.agent.rotate: IllegalArgumentException / last element of agent names ignored Key: NUTCH-2364 URL: https://issues.apache.org/jira/browse/NUTCH-2364

[jira] [Updated] (NUTCH-2364) http.agent.rotate: IllegalArgumentException / last element of agent names ignored

2017-03-03 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2364: --- Fix Version/s: 2.4 > http.agent.rotate: IllegalArgumentException / last element of ag

[jira] [Updated] (NUTCH-2364) http.agent.rotate: IllegalArgumentException / last element of agent names ignored

2017-03-03 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2364: --- Affects Version/s: 2.3.1 > http.agent.rotate: IllegalArgumentException / last element

[jira] [Commented] (NUTCH-2364) http.agent.rotate: IllegalArgumentException / last element of agent names ignored

2017-03-03 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15894122#comment-15894122 ] Sebastian Nagel commented on NUTCH-2364: 2.x is also affected, same

[jira] [Resolved] (NUTCH-2364) http.agent.rotate: IllegalArgumentException / last element of agent names ignored

2017-03-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2364. Resolution: Fixed Thanks! Committed to master and 2.x. > http.agent.rot

[jira] [Commented] (NUTCH-2335) Injector not to filter and normalize existing URLs in CrawlDb

2017-03-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15936522#comment-15936522 ] Sebastian Nagel commented on NUTCH-2335: Rebased pull-request, teste

[jira] [Commented] (NUTCH-2193) Upgrade feed parser plugin to use rome 1.5

2017-03-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15936536#comment-15936536 ] Sebastian Nagel commented on NUTCH-2193: Any objections to include this in

[jira] [Commented] (NUTCH-2212) Decrease memory consumption by tuning stack size

2017-03-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15936581#comment-15936581 ] Sebastian Nagel commented on NUTCH-2212: Hi Markus, is this really a proble

[jira] [Commented] (NUTCH-2247) Protocol resolver

2017-03-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15936622#comment-15936622 ] Sebastian Nagel commented on NUTCH-2247: Hi Markus, interesting tool! - alth

[jira] [Commented] (NUTCH-2334) Extension point for schedulers

2017-03-29 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946865#comment-15946865 ] Sebastian Nagel commented on NUTCH-2334: Hi [~roannel], see [scoring-adap

[jira] [Updated] (NUTCH-2365) HTTP Redirects to SubDomains don't get crawled

2017-04-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2365: --- Fix Version/s: 1.14 > HTTP Redirects to SubDomains don't get

[jira] [Commented] (NUTCH-2365) HTTP Redirects to SubDomains don't get crawled

2017-04-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958608#comment-15958608 ] Sebastian Nagel commented on NUTCH-2365: See also [thread on user mailing

[jira] [Resolved] (NUTCH-2319) Link with "rel=alternate" doesn't return in crawl

2017-04-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2319. Resolution: Not A Problem Hi [~zbhatuk], please reopen if the problem persists. It's

[jira] [Commented] (NUTCH-2071) A parser failure on a single document may fail crawling job

2017-04-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958660#comment-15958660 ] Sebastian Nagel commented on NUTCH-2071: - caused by a library/depend

[jira] [Updated] (NUTCH-2071) A parser failure on a single document may fail crawling job

2017-04-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2071: --- Affects Version/s: 1.11 > A parser failure on a single document may fail crawling

[jira] [Resolved] (NUTCH-2281) Support non-default FileSystem

2017-04-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2281. Resolution: Fixed Merged into master (f046e63). Thanks! > Support non-default FileSys

[jira] [Resolved] (NUTCH-2335) Injector not to filter and normalize existing URLs in CrawlDb

2017-04-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2335. Resolution: Fixed Assignee: Sebastian Nagel Merged into master, 37d8aea. Thanks

[jira] [Assigned] (NUTCH-2281) Support non-default FileSystem

2017-04-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2281: -- Assignee: Sebastian Nagel > Support non-default FileSys

[jira] [Updated] (NUTCH-2269) Clean not working after crawl

2017-04-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2269: --- Fix Version/s: 2.4 > Clean not working after cr

[jira] [Resolved] (NUTCH-2269) Clean not working after crawl

2017-04-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2269. Resolution: Fixed > Clean not working after cr

[jira] [Commented] (NUTCH-2269) Clean not working after crawl

2017-04-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958822#comment-15958822 ] Sebastian Nagel commented on NUTCH-2269: Committed to master/1.x ([d2e60ef|h

[jira] [Assigned] (NUTCH-2193) Upgrade feed parser plugin to use rome 1.5

2017-04-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2193: -- Assignee: Sebastian Nagel > Upgrade feed parser plugin to use rome

[jira] [Resolved] (NUTCH-2193) Upgrade feed parser plugin to use rome 1.5

2017-04-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2193. Resolution: Fixed Committed to master/1.x ([c181953|https://github.com/apache/nutch/commit

[jira] [Commented] (NUTCH-2335) Injector not to filter and normalize existing URLs in CrawlDb

2017-04-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958931#comment-15958931 ] Sebastian Nagel commented on NUTCH-2335: Or to include multiple commits of a

[jira] [Commented] (NUTCH-2335) Injector not to filter and normalize existing URLs in CrawlDb

2017-04-10 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962817#comment-15962817 ] Sebastian Nagel commented on NUTCH-2335: It's only disabled by de

[jira] [Updated] (NUTCH-2372) Javadocs build failing.

2017-04-10 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2372: --- Fix Version/s: 1.14 2.4 > Javadocs build fail

[jira] [Commented] (NUTCH-2372) Javadocs build failing.

2017-04-10 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963051#comment-15963051 ] Sebastian Nagel commented on NUTCH-2372: Hi [~omkar20895], great! Patch l

[jira] [Commented] (NUTCH-1932) Automatically remove orphaned pages

2017-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15964369#comment-15964369 ] Sebastian Nagel commented on NUTCH-1932: Hi [~markus.jel...@openindex.io], a

[jira] [Commented] (NUTCH-2335) Injector not to filter and normalize existing URLs in CrawlDb

2017-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15964389#comment-15964389 ] Sebastian Nagel commented on NUTCH-2335: Hi Markus, I cannot see what's

[jira] [Created] (NUTCH-2376) Improve configurability of HTTP Accept* header fields

2017-04-20 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2376: -- Summary: Improve configurability of HTTP Accept* header fields Key: NUTCH-2376 URL: https://issues.apache.org/jira/browse/NUTCH-2376 Project: Nutch

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2017-04-21 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979161#comment-15979161 ] Sebastian Nagel commented on NUTCH-1465: Hi Lewis, a couple of month ago

[jira] [Comment Edited] (NUTCH-1465) Support sitemaps in Nutch

2017-04-21 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979161#comment-15979161 ] Sebastian Nagel edited comment on NUTCH-1465 at 4/21/17 6:1

[jira] [Commented] (NUTCH-2377) Nutch can't parse relative links

2017-05-03 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994558#comment-15994558 ] Sebastian Nagel commented on NUTCH-2377: Hi [~abhakim1980], everything looks

[jira] [Commented] (NUTCH-2379) crawl script dedup's crawldb update is slow

2017-05-04 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15996868#comment-15996868 ] Sebastian Nagel commented on NUTCH-2379: +1 to add $commonOptions where

[jira] [Commented] (NUTCH-2383) Wrong FS exception in Fetcher

2017-05-04 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15996898#comment-15996898 ] Sebastian Nagel commented on NUTCH-2383: You mean set

[jira] [Updated] (NUTCH-2376) Improve configurability of HTTP Accept* header fields

2017-05-19 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2376: --- Fix Version/s: 1.14 2.4 > Improve configurability of HTTP Accept* hea

[jira] [Resolved] (NUTCH-2376) Improve configurability of HTTP Accept* header fields

2017-05-19 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2376. Resolution: Fixed Assignee: Sebastian Nagel Commited to master/1.x ([db77f19|https

[jira] [Updated] (NUTCH-2391) Spurious Duplications for MD5

2017-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2391: --- Fix Version/s: 1.14 > Spurious Duplications for

[jira] [Commented] (NUTCH-2391) Spurious Duplications for MD5

2017-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16042770#comment-16042770 ] Sebastian Nagel commented on NUTCH-2391: Hi David, yes that's plausible

[jira] [Commented] (NUTCH-2393) 2.x patch for MD5 duplication issue addressed in NUTCH-2391

2017-06-09 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045415#comment-16045415 ] Sebastian Nagel commented on NUTCH-2393: Thanks [~kaidul], for taking care of

[jira] [Updated] (NUTCH-2391) Spurious Duplications for MD5

2017-06-09 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2391: --- Description: We're seeing some incidence of a large number of documents being mark

[jira] [Commented] (NUTCH-2393) 2.x patch for MD5 duplication issue addressed in NUTCH-2391

2017-06-09 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045417#comment-16045417 ] Sebastian Nagel commented on NUTCH-2393: Just to confirm: 2.x is affected. Wi

[jira] [Commented] (NUTCH-2393) 2.x patch for MD5 duplication issue addressed in NUTCH-2391

2017-06-10 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045461#comment-16045461 ] Sebastian Nagel commented on NUTCH-2393: I don't know what happens i

[jira] [Created] (NUTCH-2397) Parser to add paragraph line breaks

2017-07-04 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2397: -- Summary: Parser to add paragraph line breaks Key: NUTCH-2397 URL: https://issues.apache.org/jira/browse/NUTCH-2397 Project: Nutch Issue Type

[jira] [Commented] (NUTCH-2397) Parser to add paragraph line breaks

2017-07-04 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16073613#comment-16073613 ] Sebastian Nagel commented on NUTCH-2397: A fix for 1.x is ready: h

[jira] [Resolved] (NUTCH-2393) 2.x patch for MD5 duplication issue addressed in NUTCH-2391

2017-07-05 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2393. Resolution: Fixed Committed to 2.x, [365077c|https://github.com/apache/nutch/commit

[jira] [Resolved] (NUTCH-2391) Spurious Duplications for MD5

2017-07-05 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2391. Resolution: Fixed Committed to 1.x, [d35b433|https://github.com/apache/nutch/commit

[jira] [Commented] (NUTCH-2397) Parser to add paragraph line breaks

2017-07-05 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074952#comment-16074952 ] Sebastian Nagel commented on NUTCH-2397: Patch/pull-request for 2.x... >

<    1   2   3   4   5   6   7   8   9   10   >