[jira] [Commented] (NUTCH-874) Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora

2013-02-28 Thread kiran (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13590246#comment-13590246
 ] 

kiran commented on NUTCH-874:
-

The following plugins need to be ported for compatibility in 2.x 

i)   Feed
ii)  parse-swf
iii) parse-ext
iv) parse-zip
v) parse-metatags ( I wrote patch for this earlier, NUTCH-1478)

> Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora
> --
>
> Key: NUTCH-874
> URL: https://issues.apache.org/jira/browse/NUTCH-874
> Project: Nutch
>  Issue Type: Bug
>  Components: parser
>Affects Versions: nutchgora
> Environment: Nutch 2.0
>Reporter: Chris A. Mattmann
>Assignee: Chris A. Mattmann
>Priority: Critical
> Fix For: 2.2
>
> Attachments: NUTCH-874.patch
>
>
> I just noticed while fixing NUTCH-564 that the ExtParser hasn't been brought 
> up to date with Nutch 2.0 trunk. We should review the plugins in src/plugin 
> to make sure they all work with Gora/Nutchbase now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-874) Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora

2012-10-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473827#comment-13473827
 ] 

Hudson commented on NUTCH-874:
--

Integrated in Nutch-nutchgora #375 (See 
[https://builds.apache.org/job/Nutch-nutchgora/375/])
NUTCH-874 Make sure all plugins in src/plugin are compatible with Nutch 2.0 
and Gora (part 1) (Revision 1396850)

 Result = SUCCESS
lewismc : 
Files : 
* /nutch/branches/2.x/CHANGES.txt
* 
/nutch/branches/2.x/src/plugin/feed/src/java/org/apache/nutch/indexer/feed/FeedIndexingFilter.java
* 
/nutch/branches/2.x/src/plugin/feed/src/java/org/apache/nutch/parse/feed/FeedParser.java
* 
/nutch/branches/2.x/src/plugin/feed/src/test/org/apache/nutch/parse/feed/TestFeedParser.java
* 
/nutch/branches/2.x/src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext/ExtParser.java
* 
/nutch/branches/2.x/src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext/TestExtParser.java
* 
/nutch/branches/2.x/src/plugin/parse-swf/src/test/org/apache/nutch/parse/swf/TestSWFParser.java
* 
/nutch/branches/2.x/src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/TikaParser.java
* 
/nutch/branches/2.x/src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip/ZipParser.java
* 
/nutch/branches/2.x/src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip/ZipTextExtractor.java
* 
/nutch/branches/2.x/src/plugin/parse-zip/src/test/org/apache/nutch/parse/zip/TestZipParser.java


> Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora
> --
>
> Key: NUTCH-874
> URL: https://issues.apache.org/jira/browse/NUTCH-874
> Project: Nutch
>  Issue Type: Bug
>  Components: parser
>Affects Versions: nutchgora
> Environment: Nutch 2.0
>Reporter: Chris A. Mattmann
>Assignee: Chris A. Mattmann
>Priority: Critical
> Fix For: 2.2
>
> Attachments: NUTCH-874.patch
>
>
> I just noticed while fixing NUTCH-564 that the ExtParser hasn't been brought 
> up to date with Nutch 2.0 trunk. We should review the plugins in src/plugin 
> to make sure they all work with Gora/Nutchbase now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-874) Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora

2012-10-10 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473654#comment-13473654
 ] 

Lewis John McGibbney commented on NUTCH-874:


part 1 e.g. removal of unused imports committed @revision 1396850 in 2.x head

> Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora
> --
>
> Key: NUTCH-874
> URL: https://issues.apache.org/jira/browse/NUTCH-874
> Project: Nutch
>  Issue Type: Bug
>  Components: parser
>Affects Versions: nutchgora
> Environment: Nutch 2.0
>Reporter: Chris A. Mattmann
>Assignee: Chris A. Mattmann
>Priority: Critical
> Fix For: 2.2
>
> Attachments: NUTCH-874.patch
>
>
> I just noticed while fixing NUTCH-564 that the ExtParser hasn't been brought 
> up to date with Nutch 2.0 trunk. We should review the plugins in src/plugin 
> to make sure they all work with Gora/Nutchbase now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-874) Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora

2012-01-05 Thread Lewis John McGibbney (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180562#comment-13180562
 ] 

Lewis John McGibbney commented on NUTCH-874:


I know the heat has kind of shifted away from Nutchgora but it would be great 
to clarify what this issues actually encapsulates. Was/is it is the case that 
some plugins in Nutchgora are not actually working with the Nutchgora API? I 
kinda confused with this one! 

> Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora
> --
>
> Key: NUTCH-874
> URL: https://issues.apache.org/jira/browse/NUTCH-874
> Project: Nutch
>  Issue Type: Bug
>  Components: parser
> Environment: Nutch 2.0
>Reporter: Chris A. Mattmann
>Assignee: Chris A. Mattmann
>Priority: Critical
> Fix For: nutchgora
>
>
> I just noticed while fixing NUTCH-564 that the ExtParser hasn't been brought 
> up to date with Nutch 2.0 trunk. We should review the plugins in src/plugin 
> to make sure they all work with Gora/Nutchbase now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (NUTCH-874) Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora

2010-08-11 Thread Julien Nioche (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897190#action_12897190
 ] 

Julien Nioche commented on NUTCH-874:
-

{quote}
I think Jukka already worked on something really similar to the ExtParser in 
Tika. See: 
http://tika.apache.org/0.7/api/org/apache/tika/parser/ExternalParser.html
{quote}
yes, that's the one I had in mind

One of the plugins which hasn't been ported yet is the feed parser. We could 
rely on the one we recently added to Tika, knowing that there is a substantial 
difference in the sense that the Tika feed parser generates a simple XHTML 
representation of the document where the feeds are simply represented as 
anchors whereas the Nutch version created new documents for each feed.

There is also the parse-rss plugin in Nutch which is quite similar - what's the 
difference with the feed one again? Since the Tika parser would handle all 
sorts of feed formats why not simply rely on it? 

> Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora
> --
>
> Key: NUTCH-874
> URL: https://issues.apache.org/jira/browse/NUTCH-874
> Project: Nutch
>  Issue Type: Bug
>  Components: parser
> Environment: Nutch 2.0
>Reporter: Chris A. Mattmann
>Assignee: Chris A. Mattmann
>Priority: Critical
> Fix For: 2.0
>
>
> I just noticed while fixing NUTCH-564 that the ExtParser hasn't been brought 
> up to date with Nutch 2.0 trunk. We should review the plugins in src/plugin 
> to make sure they all work with Gora/Nutchbase now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-874) Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora

2010-08-09 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896552#action_12896552
 ] 

Chris A. Mattmann commented on NUTCH-874:
-

Hey Julien,

I think Jukka already worked on something really similar to the ExtParser in 
Tika. See: 
http://tika.apache.org/0.7/api/org/apache/tika/parser/ExternalParser.html

If we go that route here in Nutch, then I think we should add an encoding 
attribute similar to NUTCH-564 and flow it through in parse-tika then. If we 
can do that, I think we're good!

Cheers,
Chris


> Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora
> --
>
> Key: NUTCH-874
> URL: https://issues.apache.org/jira/browse/NUTCH-874
> Project: Nutch
>  Issue Type: Bug
>  Components: parser
> Environment: Nutch 2.0
>Reporter: Chris A. Mattmann
>Assignee: Chris A. Mattmann
>Priority: Critical
> Fix For: 2.0
>
>
> I just noticed while fixing NUTCH-564 that the ExtParser hasn't been brought 
> up to date with Nutch 2.0 trunk. We should review the plugins in src/plugin 
> to make sure they all work with Gora/Nutchbase now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-874) Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora

2010-08-09 Thread Julien Nioche (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896479#action_12896479
 ] 

Julien Nioche commented on NUTCH-874:
-

Some plugins have not been ported to the new API as it does not provide multi 
valued parse results. See See 
http://search.lucidimagination.com/search/document/844c48289f2d07db/nutchbase_multi_value_parseresult_missing#4ed6f352ebcce8ef

This is probably not the case for the ExtParser though. We could rely on Tika's 
mechanism for external parsing instead of maintaining ours. WDYT?

> Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora
> --
>
> Key: NUTCH-874
> URL: https://issues.apache.org/jira/browse/NUTCH-874
> Project: Nutch
>  Issue Type: Bug
>  Components: parser
> Environment: Nutch 2.0
>Reporter: Chris A. Mattmann
>Assignee: Chris A. Mattmann
>Priority: Critical
> Fix For: 2.0
>
>
> I just noticed while fixing NUTCH-564 that the ExtParser hasn't been brought 
> up to date with Nutch 2.0 trunk. We should review the plugins in src/plugin 
> to make sure they all work with Gora/Nutchbase now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.