[jira] Updated: (NUTCH-162) country code jp is used instead of language code ja for Japanese

2010-05-10 Thread Hiroaki Kawai (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroaki Kawai updated NUTCH-162: Attachment: anchors_ja.properties cached_ja.properties

[jira] Updated: (NUTCH-162) country code jp is used instead of language code ja for Japanese

2010-05-10 Thread Hiroaki Kawai (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroaki Kawai updated NUTCH-162: Attachment: search_ja.properties text_ja.properties Please put these property files

[jira] Work started: (NUTCH-816) Add zip target to build.xml

2010-05-08 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-816 started by Chris A. Mattmann. Add zip target to build.xml --- Key: NUTCH-816 URL: https

[jira] Resolved: (NUTCH-816) Add zip target to build.xml

2010-05-08 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-816. - Resolution: Fixed - fixed in r942427 Add zip target to build.xml

[jira] Commented: (NUTCH-811) Develop an ORM framework

2010-05-07 Thread Enis Soztutar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12865226#action_12865226 ] Enis Soztutar commented on NUTCH-811: - Hi Piet, The code for Gora will reside in GitHub

[jira] Commented: (NUTCH-811) Develop an ORM framework

2010-05-06 Thread Piet Schrijver (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864744#action_12864744 ] Piet Schrijver commented on NUTCH-811: -- Will development for gora be tracked under

[jira] Assigned: (NUTCH-817) parse-(html)does follow links of full html page, parse-(tika) does follow any links and stops at level 1

2010-05-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche reassigned NUTCH-817: --- Assignee: Julien Nioche parse-(html)does follow links of full html page, parse-(tika) does

[jira] Updated: (NUTCH-814) SegmentMerger bug

2010-04-27 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-814: Attachment: merger.patch Patch fixing the issue, and a unit test. I will commit

[jira] Work stopped: (NUTCH-466) Flexible segment format

2010-04-27 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-466 stopped by Andrzej Bialecki . Flexible segment format --- Key: NUTCH-466 URL: https

[jira] Created: (NUTCH-816) Add zip target to build.xml

2010-04-27 Thread Chris A. Mattmann (JIRA)
Add zip target to build.xml --- Key: NUTCH-816 URL: https://issues.apache.org/jira/browse/NUTCH-816 Project: Nutch Issue Type: Improvement Components: build Affects Versions: 1.0.0 Environment

[jira] Closed: (NUTCH-808) Evaluate ORM Frameworks which support non-relational column-oriented datastores and RDBMs

2010-04-26 Thread Enis Soztutar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar closed NUTCH-808. --- Resolution: Fixed We have decided to go on with implementing an ORM layer as per the discussion

[jira] Commented: (NUTCH-710) Support for rel=canonical attribute

2010-04-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859286#action_12859286 ] Julien Nioche commented on NUTCH-710: - As suggested previously we could either treat

[jira] Commented: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implme

2010-04-20 Thread Ilguiz Latypov (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859116#action_12859116 ] Ilguiz Latypov commented on NUTCH-427: -- I hesitate adding the .zip file because

[jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implment

2010-04-20 Thread Ilguiz Latypov (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilguiz Latypov updated NUTCH-427: - Attachment: (was: protocol-smb.zip) protocol-smb: plugin protocol implementing the CIFS/SMB

[jira] Updated: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implment

2010-04-20 Thread Ilguiz Latypov (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilguiz Latypov updated NUTCH-427: - Attachment: protocol-smb-dist.zip Applied my diff to simplify importing into the Subversion tree

[jira] Work started: (NUTCH-812) Crawl.java incorrectly uses the Generator API resulting in NPE

2010-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-812 started by Chris A. Mattmann. Crawl.java incorrectly uses the Generator API resulting in NPE

[jira] Assigned: (NUTCH-812) Crawl.java incorrectly uses the Generator API resulting in NPE

2010-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-812: --- Assignee: Chris A. Mattmann Crawl.java incorrectly uses the Generator API resulting

[jira] Resolved: (NUTCH-812) Crawl.java incorrectly uses the Generator API resulting in NPE

2010-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-812. - Fix Version/s: 1.1 Resolution: Fixed - fixed in r935453. Thanks, Phil and Andrzej

[jira] Updated: (NUTCH-813) Repetitive crawl 403 status page

2010-04-17 Thread Nguyen Manh Tien (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nguyen Manh Tien updated NUTCH-813: --- Attachment: Patch Repetitive crawl 403 status page

[jira] Created: (NUTCH-813) Repetitive crawl 403 status page

2010-04-17 Thread Nguyen Manh Tien (JIRA)
Repetitive crawl 403 status page Key: NUTCH-813 URL: https://issues.apache.org/jira/browse/NUTCH-813 Project: Nutch Issue Type: Bug Affects Versions: 1.1 Reporter: Nguyen Manh Tien

[jira] Updated: (NUTCH-813) Repetitive crawl 403 status page

2010-04-17 Thread Nguyen Manh Tien (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nguyen Manh Tien updated NUTCH-813: --- Priority: Minor (was: Major) Repetitive crawl 403 status page

[jira] Updated: (NUTCH-812) Crawl.java incorrectly uses the Generator API resulting in NPE

2010-04-16 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-812: Affects Version/s: 1.1 Priority: Critical (was: Major) Crawl.java

[jira] Commented: (NUTCH-808) Evaluate ORM Frameworks which support non-relational column-oriented datastores and RDBMs

2010-04-13 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856349#action_12856349 ] Julien Nioche commented on NUTCH-808: - Hi Enis, {quote} On the other hand, current

[jira] Commented: (NUTCH-808) Evaluate ORM Frameworks which support non-relational column-oriented datastores and RDBMs

2010-04-13 Thread Enis Soztutar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856360#action_12856360 ] Enis Soztutar commented on NUTCH-808: - bq. What do you mean by current implementation

[jira] Resolved: (NUTCH-570) Improvement of URL Ordering in Generator.java

2010-04-12 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved NUTCH-570. Resolution: Won't Fix Improvement of URL Ordering in Generator.java

[jira] Commented: (NUTCH-808) Evaluate ORM Frameworks which support non-relational column-oriented datastores and RDBMs

2010-04-12 Thread Enis Soztutar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856124#action_12856124 ] Enis Soztutar commented on NUTCH-808: - So, this is the results so far : DataNucleus

[jira] Updated: (NUTCH-808) Evaluate ORM Frameworks which support non-relational column-oriented datastores and RDBMs

2010-04-07 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-808: Fix Version/s: 2.0 Evaluate ORM Frameworks which support non-relational column-oriented

[jira] Commented: (NUTCH-570) Improvement of URL Ordering in Generator.java

2010-04-07 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854665#action_12854665 ] Otis Gospodnetic commented on NUTCH-570: I'm tempted to close this issue as Won't

[jira] Commented: (NUTCH-570) Improvement of URL Ordering in Generator.java

2010-04-07 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854767#action_12854767 ] Chris A. Mattmann commented on NUTCH-570: - Hi Otis: I think your logic perfectly

[jira] Created: (NUTCH-810) Upgrade to Tika 0.7

2010-04-06 Thread Julien Nioche (JIRA)
Upgrade to Tika 0.7 --- Key: NUTCH-810 URL: https://issues.apache.org/jira/browse/NUTCH-810 Project: Nutch Issue Type: Improvement Components: parser Affects Versions: 1.0.0 Reporter: Julien Nioche

[jira] Updated: (NUTCH-789) Improvements to Tika parser

2010-04-06 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-789: Component/s: (was: fetcher) parser Fix Version/s: (was: 1.1) Have

[jira] Closed: (NUTCH-810) Upgrade to Tika 0.7

2010-04-06 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-810. --- Resolution: Fixed Committed in rev 931098. http://issues.apache.org/jira/browse/TIKA-317 changed

[jira] Commented: (NUTCH-810) Upgrade to Tika 0.7

2010-04-06 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854332#action_12854332 ] Hudson commented on NUTCH-810: -- Integrated in Nutch-trunk #1116 (See [http

[jira] Commented: (NUTCH-789) Improvements to Tika parser

2010-04-04 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853251#action_12853251 ] Julien Nioche commented on NUTCH-789: - Will upgrade as soon as 0.7 is available from

[jira] Commented: (NUTCH-789) Improvements to Tika parser

2010-04-04 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853285#action_12853285 ] Chris A. Mattmann commented on NUTCH-789: - Hey Julien, Tika 0.7 is available from

[jira] Updated: (NUTCH-807) JSParseFilter produces malformed URL

2010-04-03 Thread Minyao Zhu (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minyao Zhu updated NUTCH-807: - Summary: JSParseFilter produces malformed URL (was: JSParseFilter produces weired URL) JSParseFilter

[jira] Commented: (NUTCH-789) Improvements to Tika parser

2010-04-03 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853212#action_12853212 ] Chris A. Mattmann commented on NUTCH-789: - Hey Julien -- okey dok, Tika 0.7 has been

[jira] Created: (NUTCH-807) JSParseFilter produces weired URL

2010-04-02 Thread Minyao Zhu (JIRA)
JSParseFilter produces weired URL - Key: NUTCH-807 URL: https://issues.apache.org/jira/browse/NUTCH-807 Project: Nutch Issue Type: Bug Components: parser Affects Versions: 1.0.0

[jira] Created: (NUTCH-808) Evaluate ORM Frameworks which support non-relational column-oriented datastores and RDBMs

2010-04-02 Thread Enis Soztutar (JIRA)
Evaluate ORM Frameworks which support non-relational column-oriented datastores and RDBMs -- Key: NUTCH-808 URL: https://issues.apache.org/jira/browse/NUTCH-808

[jira] Created: (NUTCH-809) Parse-metatags plugin

2010-04-02 Thread Julien Nioche (JIRA)
Parse-metatags plugin - Key: NUTCH-809 URL: https://issues.apache.org/jira/browse/NUTCH-809 Project: Nutch Issue Type: New Feature Components: parser Reporter: Julien Nioche Assignee

[jira] Updated: (NUTCH-809) Parse-metatags plugin

2010-04-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-809: Attachment: NUTCH-809.patch Parse-metatags plugin - Key

[jira] Updated: (NUTCH-809) Parse-metatags plugin

2010-04-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-809: Attachment: (was: NUTCH-809.patch) Parse-metatags plugin

[jira] Updated: (NUTCH-809) Parse-metatags plugin

2010-04-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-809: Attachment: NUTCH-809.patch Modified version of the plugin which is compatible with parse-tika

[jira] Updated: (NUTCH-809) Parse-metatags plugin

2010-04-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-809: Description: h2. Parse-metatags plugin The parse-metatags plugin consists of a HTMLParserFilter

[jira] Commented: (NUTCH-808) Evaluate ORM Frameworks which support non-relational column-oriented datastores and RDBMs

2010-04-02 Thread Enis Soztutar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852840#action_12852840 ] Enis Soztutar commented on NUTCH-808: - A candidate framework is DataNucleus. It has

[jira] Updated: (NUTCH-706) Url regex normalizer

2010-03-31 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-706: Fix Version/s: (was: 1.1) Both variants of the substitution rule above break existing tests

[jira] Commented: (NUTCH-706) Url regex normalizer

2010-03-31 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851923#action_12851923 ] Ken Krugler commented on NUTCH-706: --- Two comments about this: 1. From my experiences

[jira] Updated: (NUTCH-249) black- white list url filtering

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-249: Fix Version/s: (was: 1.1) - push out per http://bit.ly/c7tBv9 black- white list url

[jira] Updated: (NUTCH-309) Uses commons logging Code Guards

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-309: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Uses commons

[jira] Updated: (NUTCH-763) Separate configuration files from resources to be included in the job file

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-763: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Separate

[jira] Updated: (NUTCH-577) Use explicit tika-config.xml file to enable mime magic detection to be turned on and off

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-577: Due Date: 30/Nov/07 (was: 30/Nov/07) Fix Version/s: (was: 1.1) - pushing

[jira] Updated: (NUTCH-310) Review Log Levels

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-310: Fix Version/s: (was: 1.1) Assignee: Chris A. Mattmann (was: Jerome Charron

[jira] Updated: (NUTCH-673) Upgrade the Carrot2 plug-in to release 3.0

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-673: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Upgrade

[jira] Updated: (NUTCH-664) Possibility to update already stored documents.

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-664: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Possibility

[jira] Updated: (NUTCH-750) HtmlParser plugin - page title extraction

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-750: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 HtmlParser

[jira] Updated: (NUTCH-564) External parser supports encoding attribute

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-564: Patch Info: [Patch Available] Fix Version/s: (was: 1.1) - pushing this out per

[jira] Updated: (NUTCH-477) Extend URLFilters to support different filtering chains

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-477: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Extend

[jira] Updated: (NUTCH-251) Administration GUI

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-251: Patch Info: [Patch Available] Fix Version/s: (was: 1.1) - pushing this out per

[jira] Updated: (NUTCH-609) Allow Plugins to be Loaded from Jar File(s)

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-609: Due Date: 13/Feb/08 (was: 13/Feb/08) Patch Info: [Patch Available] Fix

[jira] Resolved: (NUTCH-794) Language Identification must use check the parse metadata for language values

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-794. - Resolution: Fixed @julien -- I think this issue has been fixed in Tika right

[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-578: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 URL fetched

[jira] Updated: (NUTCH-540) some problem about the Nutch cache

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-540: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 some problem

[jira] Updated: (NUTCH-455) dedup on tokenized fields is faulty

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-455: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 dedup

[jira] Updated: (NUTCH-747) injectIndex metadatas and inherit these metadatas to all matching suburls

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-747: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 injectIndex

[jira] Updated: (NUTCH-479) Support for OR queries

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-479: Patch Info: [Patch Available] Fix Version/s: (was: 1.1) - pushing this out per

[jira] Updated: (NUTCH-677) Segment merge filering based on segment content

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-677: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Segment merge

[jira] Updated: (NUTCH-774) Retry interval in crawl date is set to 0

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-774: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Retry interval

[jira] Updated: (NUTCH-460) RDF parser plugin

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-460: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 RDF parser

[jira] Updated: (NUTCH-460) RDF parser plugin

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-460: Patch Info: [Patch Available] - pushing this out per http://bit.ly/c7tBv9 RDF parser

[jira] Updated: (NUTCH-729) NPE in FieldIndexer when BasicFields url doesn't exist

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-729: Due Date: 26/Mar/09 (was: 26/Mar/09) Patch Info: [Patch Available] Fix

[jira] Updated: (NUTCH-573) Multiple Domains - Query Search

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-573: - pushing this out per http://bit.ly/c7tBv9 Multiple Domains - Query Search

[jira] Updated: (NUTCH-717) Make Nutch Solr integration easier

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-717: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Make Nutch Solr

[jira] Updated: (NUTCH-541) Index url field untokenized

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-541: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Index url field

[jira] Updated: (NUTCH-628) Host database to keep track of host-level information

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-628: Patch Info: [Patch Available] Fix Version/s: (was: 1.1) - pushing this out per

[jira] Updated: (NUTCH-650) Hbase Integration

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-650: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Hbase

[jira] Updated: (NUTCH-583) FeedParser empty links for items

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-583: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 FeedParser

[jira] Updated: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-666: Due Date: 27/Nov/08 (was: 27/Nov/08) Fix Version/s: (was: 1.1) - pushing

[jira] Updated: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-666: Patch Info: [Patch Available] Analysis plugins for multiple language and new Language

[jira] Updated: (NUTCH-475) Adaptive crawl delay

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-475: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Adaptive crawl

[jira] Updated: (NUTCH-771) Add WebGraph classes to the bin/nutch script

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-771: Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 Add WebGraph

[jira] Commented: (NUTCH-673) Upgrade the Carrot2 plug-in to release 3.0

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852047#action_12852047 ] Chris A. Mattmann commented on NUTCH-673: - Folks: if you get time to put together

[jira] Commented: (NUTCH-789) Improvements to Tika parser

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852048#action_12852048 ] Chris A. Mattmann commented on NUTCH-789: - Folks, I'm going to put together an RC

[jira] Commented: (NUTCH-794) Language Identification must use check the parse metadata for language values

2010-03-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852101#action_12852101 ] Chris A. Mattmann commented on NUTCH-794: - Hey Julien, yepper, I posted an RC

[jira] Updated: (NUTCH-570) Improvement of URL Ordering in Generator.java

2010-03-30 Thread Serykh Evgeniy (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serykh Evgeniy updated NUTCH-570: - Attachment: GeneratorDiff_v1.out Improvement of URL Ordering in Generator.java

[jira] Resolved: (NUTCH-779) Mechanism for passing metadata from parse to crawldb

2010-03-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-779. - Resolution: Fixed Fix Version/s: 1.1 Committed revision 929038. Thanks Andrzej for your

[jira] Closed: (NUTCH-785) Fetcher : copy metadata from origin URL when redirecting + call scfilters.initialScore on newly created URL

2010-03-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-785. --- Resolution: Fixed Committed revision 929039 Thanks Andrzej for reviewing it Fetcher : copy

[jira] Commented: (NUTCH-789) Improvements to Tika parser

2010-03-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851316#action_12851316 ] Julien Nioche commented on NUTCH-789: - Shall we postpone the work on this issue to after

[jira] Commented: (NUTCH-789) Improvements to Tika parser

2010-03-30 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851331#action_12851331 ] Andrzej Bialecki commented on NUTCH-789: - There are no diffs, so it's difficult

[jira] Commented: (NUTCH-570) Improvement of URL Ordering in Generator.java

2010-03-30 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851461#action_12851461 ] Otis Gospodnetic commented on NUTCH-570: Serykh, what does your version of the patch

[jira] Commented: (NUTCH-570) Improvement of URL Ordering in Generator.java

2010-03-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851545#action_12851545 ] Julien Nioche commented on NUTCH-570: - {quote}Julien, want to take this?{quote

[jira] Commented: (NUTCH-570) Improvement of URL Ordering in Generator.java

2010-03-30 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851710#action_12851710 ] Dmitry Lihachev commented on NUTCH-570: --- Yeah, Otis. It's just an update so it applies

[jira] Commented: (NUTCH-779) Mechanism for passing metadata from parse to crawldb

2010-03-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851719#action_12851719 ] Hudson commented on NUTCH-779: -- Integrated in Nutch-trunk #1112 (See [http

[jira] Closed: (NUTCH-784) CrawlDBScanner

2010-03-29 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-784. --- Resolution: Fixed Committed revision 928746 CrawlDBScanner --- Key

[jira] Updated: (NUTCH-784) CrawlDBScanner

2010-03-29 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-784: Fix Version/s: 1.1 CrawlDBScanner --- Key: NUTCH-784

[jira] Commented: (NUTCH-784) CrawlDBScanner

2010-03-29 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850896#action_12850896 ] Andrzej Bialecki commented on NUTCH-784: - This should have been reviewed first - I

[jira] Created: (NUTCH-806) Merge CrawlDBScanner with CrawlDBReader

2010-03-29 Thread Julien Nioche (JIRA)
Merge CrawlDBScanner with CrawlDBReader --- Key: NUTCH-806 URL: https://issues.apache.org/jira/browse/NUTCH-806 Project: Nutch Issue Type: Improvement Reporter: Julien Nioche

[jira] Updated: (NUTCH-783) IndexerChecker Utilty

2010-03-29 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-783: Fix Version/s: (was: 1.1) Removed tag 1.1 Will rename to IndexingPluginsChecker later

[jira] Commented: (NUTCH-785) Fetcher : copy metadata from origin URL when redirecting + call scfilters.initialScore on newly created URL

2010-03-29 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850912#action_12850912 ] Julien Nioche commented on NUTCH-785: - Could anyone please review this issue? I would

[jira] Commented: (NUTCH-779) Mechanism for passing metadata from parse to crawldb

2010-03-29 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850915#action_12850915 ] Julien Nioche commented on NUTCH-779: - Could anyone please review this issue? I would

[jira] Commented: (NUTCH-785) Fetcher : copy metadata from origin URL when redirecting + call scfilters.initialScore on newly created URL

2010-03-29 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850931#action_12850931 ] Andrzej Bialecki commented on NUTCH-785: - +1. The scoring api should allow us

  1   2   3   4   5   6   7   8   9   10   >