[jira] Commented: (NUTCH-874) Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora

2010-08-09 Thread Julien Nioche (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896479#action_12896479
 ] 

Julien Nioche commented on NUTCH-874:
-

Some plugins have not been ported to the new API as it does not provide multi 
valued parse results. See See 
http://search.lucidimagination.com/search/document/844c48289f2d07db/nutchbase_multi_value_parseresult_missing#4ed6f352ebcce8ef

This is probably not the case for the ExtParser though. We could rely on Tika's 
mechanism for external parsing instead of maintaining ours. WDYT?

 Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora
 --

 Key: NUTCH-874
 URL: https://issues.apache.org/jira/browse/NUTCH-874
 Project: Nutch
  Issue Type: Bug
  Components: parser
 Environment: Nutch 2.0
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
Priority: Critical
 Fix For: 2.0


 I just noticed while fixing NUTCH-564 that the ExtParser hasn't been brought 
 up to date with Nutch 2.0 trunk. We should review the plugins in src/plugin 
 to make sure they all work with Gora/Nutchbase now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (NUTCH-864) Fetcher generates entries with status 0

2010-08-09 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche reassigned NUTCH-864:
---

Assignee: Doğacan Güney  (was: Julien Nioche)

 Fetcher generates entries with status 0
 ---

 Key: NUTCH-864
 URL: https://issues.apache.org/jira/browse/NUTCH-864
 Project: Nutch
  Issue Type: Bug
  Components: fetcher
 Environment: Gora with SQLBackend
 URL: https://svn.apache.org/repos/asf/nutch/branches/nutchbase
 Last Changed Rev: 980748
 Last Changed Date: 2010-07-30 14:19:52 +0200 (Fri, 30 Jul 2010)
Reporter: Julien Nioche
Assignee: Doğacan Güney
 Fix For: 2.0


 After a round of fetching which got the following protocol status :
 10/07/30 15:11:39 INFO mapred.JobClient: ACCESS_DENIED=2
 10/07/30 15:11:39 INFO mapred.JobClient: SUCCESS=1177
 10/07/30 15:11:39 INFO mapred.JobClient: GONE=3
 10/07/30 15:11:39 INFO mapred.JobClient: TEMP_MOVED=138
 10/07/30 15:11:39 INFO mapred.JobClient: EXCEPTION=93
 10/07/30 15:11:39 INFO mapred.JobClient: MOVED=521
 10/07/30 15:11:39 INFO mapred.JobClient: NOTFOUND=62
 I ran : ./nutch org.apache.nutch.crawl.WebTableReader -stats
 10/07/30 15:12:37 INFO crawl.WebTableReader: Statistics for WebTable: 
 10/07/30 15:12:37 INFO crawl.WebTableReader: TOTAL urls:  2690
 10/07/30 15:12:37 INFO crawl.WebTableReader: retry 0: 2690
 10/07/30 15:12:37 INFO crawl.WebTableReader: min score:   0.0
 10/07/30 15:12:37 INFO crawl.WebTableReader: avg score:   0.7587361
 10/07/30 15:12:37 INFO crawl.WebTableReader: max score:   1.0
 10/07/30 15:12:37 INFO crawl.WebTableReader: status 0 (null): 649
 10/07/30 15:12:37 INFO crawl.WebTableReader: status 2 (status_fetched):   
 1177 (SUCCESS=1177)
 10/07/30 15:12:37 INFO crawl.WebTableReader: status 3 (status_gone):  112 
 10/07/30 15:12:37 INFO crawl.WebTableReader: status 34 (status_retry):
 93 (EXCEPTION=93)
 10/07/30 15:12:37 INFO crawl.WebTableReader: status 4 (status_redir_temp):
 138  (TEMP_MOVED=138)
 10/07/30 15:12:37 INFO crawl.WebTableReader: status 5 (status_redir_perm):
 521 (MOVED=521)
 10/07/30 15:12:37 INFO crawl.WebTableReader: WebTable statistics: done
 There should not be any entries with status 0 (null)
 I will investigate a bit more...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (NUTCH-859) Diff trunk and NutchBase

2010-08-09 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche resolved NUTCH-859.
-

Resolution: Fixed

NutchBase has become 2.0 and lives in the trunk. I had another look at its 
differences with 1.2 and could not find any improvement or recent change to the 
1.x branch that was missing from NutchBase. However, since the move to the GORA 
API changed the code drastically it is possible that I missed something but 
hopefully this won't be the case.

 Diff trunk and NutchBase 
 -

 Key: NUTCH-859
 URL: https://issues.apache.org/jira/browse/NUTCH-859
 Project: Nutch
  Issue Type: Task
Reporter: Julien Nioche
Priority: Blocker
 Fix For: 2.0


 Before we turn NutchBase into trunk we need to make sure that all (more or 
 less) recent changes in the trunk have been ported to NutchBase. I have done 
 that recently but given that there is a very large number of changes I might 
 have missed a few things here and there.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (NUTCH-859) Diff trunk and NutchBase

2010-08-09 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche closed NUTCH-859.
---


 Diff trunk and NutchBase 
 -

 Key: NUTCH-859
 URL: https://issues.apache.org/jira/browse/NUTCH-859
 Project: Nutch
  Issue Type: Task
Reporter: Julien Nioche
Priority: Blocker
 Fix For: 2.0


 Before we turn NutchBase into trunk we need to make sure that all (more or 
 less) recent changes in the trunk have been ported to NutchBase. I have done 
 that recently but given that there is a very large number of changes I might 
 have missed a few things here and there.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (NUTCH-875) Port Webgraph to Nutch 2.0

2010-08-09 Thread Julien Nioche (JIRA)
Port Webgraph to Nutch 2.0
--

 Key: NUTCH-875
 URL: https://issues.apache.org/jira/browse/NUTCH-875
 Project: Nutch
  Issue Type: New Feature
  Components: linkdb
Affects Versions: 2.1 
Reporter: Julien Nioche
 Fix For: 2.1 


The webgraph has not yet been ported to the GORA-based API.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-851) Port logging to slf4j

2010-08-09 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated NUTCH-851:


Attachment: NUTCH-851-v2.patch

Updated the patch to the 2.0 code. 

Will commit tomorrow if there aren't any objections

 Port logging to slf4j
 -

 Key: NUTCH-851
 URL: https://issues.apache.org/jira/browse/NUTCH-851
 Project: Nutch
  Issue Type: New Feature
Reporter: Julien Nioche
 Fix For: 2.0

 Attachments: NUTCH-851-v2.patch


 We are already inheriting a dependency on slf4j from Solr so we might as well 
 use it :-)
 Any thoughts on this?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-851) Port logging to slf4j

2010-08-09 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated NUTCH-851:


Attachment: (was: NUTCH-851.patch)

 Port logging to slf4j
 -

 Key: NUTCH-851
 URL: https://issues.apache.org/jira/browse/NUTCH-851
 Project: Nutch
  Issue Type: New Feature
Reporter: Julien Nioche
 Fix For: 2.0

 Attachments: NUTCH-851-v2.patch


 We are already inheriting a dependency on slf4j from Solr so we might as well 
 use it :-)
 Any thoughts on this?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-874) Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora

2010-08-09 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896552#action_12896552
 ] 

Chris A. Mattmann commented on NUTCH-874:
-

Hey Julien,

I think Jukka already worked on something really similar to the ExtParser in 
Tika. See: 
http://tika.apache.org/0.7/api/org/apache/tika/parser/ExternalParser.html

If we go that route here in Nutch, then I think we should add an encoding 
attribute similar to NUTCH-564 and flow it through in parse-tika then. If we 
can do that, I think we're good!

Cheers,
Chris


 Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora
 --

 Key: NUTCH-874
 URL: https://issues.apache.org/jira/browse/NUTCH-874
 Project: Nutch
  Issue Type: Bug
  Components: parser
 Environment: Nutch 2.0
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
Priority: Critical
 Fix For: 2.0


 I just noticed while fixing NUTCH-564 that the ExtParser hasn't been brought 
 up to date with Nutch 2.0 trunk. We should review the plugins in src/plugin 
 to make sure they all work with Gora/Nutchbase now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-876) Remove remaining robots/IP blocking code in lib-http

2010-08-09 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated NUTCH-876:


Attachment: NUTCH-876.patch

Patch to fix the issue. If there are no objections I'll commit this shortly.

 Remove remaining robots/IP blocking code in lib-http
 

 Key: NUTCH-876
 URL: https://issues.apache.org/jira/browse/NUTCH-876
 Project: Nutch
  Issue Type: Bug
  Components: fetcher
Affects Versions: 2.0
Reporter: Andrzej Bialecki 
Assignee: Andrzej Bialecki 
 Attachments: NUTCH-876.patch


 There are remains of the (very old) blocking code in 
 lib-http/.../HttpBase.java. This code was used with the OldFetcher to manage 
 politeness limits. New trunk doesn't have OldFetcher anymore, so this code is 
 useless. Furthermore, there is an actual bug here - FetcherJob forgets to set 
 Protocol.CHECK_BLOCKING and Protocol.CHECK_ROBOTS to false, and the defaults 
 in lib-http are set to true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [VOTE] Apache Nutch 1.2 Release Candidate #1

2010-08-09 Thread Julien Nioche
  +1 to fixing it in 1.2 and rolling another RC, but –1 to reopening
 issues. I’m not a big fan of that, especially since we record issue fixes in
 CHANGES.txt and reopening them only leads to confusion and out of sync text
 files and JIRA.

 In the future it would be nice to just create a new issue in JIRA and then
 link your issue to the issue that you wanted to reopen. It’s just as easy
 and doesn’t cause the out of sync problem.


OK, makes sense




 Cheers,
 Chris



 On 8/9/10 7:45 AM, Julien Nioche lists.digitalpeb...@gmail.com wrote:

 I reopened https://issues.apache.org/jira/browse/NUTCH-870. It would be
 good to fix it before releasing 1.2

 On 9 August 2010 14:44, Andrzej Bialecki a...@getopt.org wrote:

 On 2010-08-08 03:04, Mattmann, Chris A (388J) wrote:

 Hi Folks,

 I have posted a release candidate for the Apache Nutch 1.2 release. The
 source code is at:

 http://people.apache.org/~mattmann/apache-nutch-1.2/rc1/http://people.apache.org/%7Emattmann/apache-nutch-1.2/rc1/
 http://people.apache.org/%7Emattmann/apache-nutch-1.2/rc1/

 For more detailed information, see the included CHANGES.txt file for
 details
 on release contents and latest changes. The release was made using the
 Nutch
 release process, documented on the Wiki here:

 http://bit.ly/d5ugid

 A Nutch 1.2 tag is at:

 http://svn.apache.org/repos/asf/nutch/tags/release-1.2/

 Sami Siren previously indicated to integrate RAT into the build, but I
 haven't had a chance to do it yet. If someone else has time, or wants to,
 please go ahead and I'd be happy to roll another RC.

 Please vote on releasing these packages as Apache Nutch 1.2. The vote is
 open for the next 72 hours.

 Only votes from Nutch PMC are binding, but folks are welcome to check the
 release candidate and voice their approval or disapproval. The vote passes
 if at least three binding +1 votes are cast.

 [ ] +1 Release the packages as Apache Nutch 1.2.

 [ ] -1 Do not release the packages because...


 +1 - all tests pass, a sample crawl works without problems, both in local
 and in distributed mode.




 ++
 Chris Mattmann, Ph.D.
 Senior Computer Scientist
 NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 171-266B, Mailstop: 171-246
 Email: *chris.mattm...@jpl.nasa.gov
 *WWW:   *http://sunset.usc.edu/~mattmann/http://sunset.usc.edu/%7Emattmann/
 *++
 Adjunct Assistant Professor, Computer Science Department
 University of Southern California, Los Angeles, CA 90089 USA
 ++




-- 
DigitalPebble Ltd

Open Source Solutions for Text Engineering
http://www.digitalpebble.com


[Nutch Wiki] Update of TikaPlugin by AndreRicardo

2010-08-09 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The TikaPlugin page has been changed by AndreRicardo.
http://wiki.apache.org/nutch/TikaPlugin?action=diffrev1=5rev2=6

--

  '''js''': ?
  
  '''mp3''': Nutch identifies several fields (Title, Album, Artist) whereas 
Tika knows only about Titles, the rest is stored as paragraphs. 
+ Tika-app can also identify in an mp3 id3v1 and id3v2 tags like: album, 
artista, audioSampleRate, composer, genre, logcomment, releaseDate, trackNumber.
  
  '''msexcel''': comparable (+ Tika able to represent content in structured way 
as XHTML tables which can be useful for HTML parser plugins)
  


Re: [VOTE] Apache Nutch 1.2 Release Candidate #1

2010-08-09 Thread Scott Gonyea
I got yelled at, too.  :-(

I'll pull down 1.2 and do a big-stupid-crawl after that metadata issue is
fixed.  I'm not sure if it affects what I'm doing, but I see the word
metadata and it gives me pause.

Scott

On Mon, Aug 9, 2010 at 8:01 AM, Julien Nioche lists.digitalpeb...@gmail.com
 wrote:


  +1 to fixing it in 1.2 and rolling another RC, but –1 to reopening
 issues. I’m not a big fan of that, especially since we record issue fixes in
 CHANGES.txt and reopening them only leads to confusion and out of sync text
 files and JIRA.

 In the future it would be nice to just create a new issue in JIRA and then
 link your issue to the issue that you wanted to reopen. It’s just as easy
 and doesn’t cause the out of sync problem.


 OK, makes sense




 Cheers,
 Chris



 On 8/9/10 7:45 AM, Julien Nioche lists.digitalpeb...@gmail.com wrote:

 I reopened https://issues.apache.org/jira/browse/NUTCH-870. It would be
 good to fix it before releasing 1.2

 On 9 August 2010 14:44, Andrzej Bialecki a...@getopt.org wrote:

 On 2010-08-08 03:04, Mattmann, Chris A (388J) wrote:

 Hi Folks,

 I have posted a release candidate for the Apache Nutch 1.2 release. The
 source code is at:

 http://people.apache.org/~mattmann/apache-nutch-1.2/rc1/http://people.apache.org/%7Emattmann/apache-nutch-1.2/rc1/
 http://people.apache.org/%7Emattmann/apache-nutch-1.2/rc1/

 For more detailed information, see the included CHANGES.txt file for
 details
 on release contents and latest changes. The release was made using the
 Nutch
 release process, documented on the Wiki here:

 http://bit.ly/d5ugid

 A Nutch 1.2 tag is at:

 http://svn.apache.org/repos/asf/nutch/tags/release-1.2/

 Sami Siren previously indicated to integrate RAT into the build, but I
 haven't had a chance to do it yet. If someone else has time, or wants to,
 please go ahead and I'd be happy to roll another RC.

 Please vote on releasing these packages as Apache Nutch 1.2. The vote is
 open for the next 72 hours.

 Only votes from Nutch PMC are binding, but folks are welcome to check the
 release candidate and voice their approval or disapproval. The vote passes
 if at least three binding +1 votes are cast.

 [ ] +1 Release the packages as Apache Nutch 1.2.

 [ ] -1 Do not release the packages because...


 +1 - all tests pass, a sample crawl works without problems, both in local
 and in distributed mode.




 ++
 Chris Mattmann, Ph.D.
 Senior Computer Scientist
 NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 171-266B, Mailstop: 171-246
 Email: *chris.mattm...@jpl.nasa.gov
 *WWW:   *http://sunset.usc.edu/~mattmann/http://sunset.usc.edu/%7Emattmann/
 *++
 Adjunct Assistant Professor, Computer Science Department
 University of Southern California, Los Angeles, CA 90089 USA
 ++




 --
 DigitalPebble Ltd

 Open Source Solutions for Text Engineering
 http://www.digitalpebble.com



[jira] Created: (NUTCH-877) Allow setting of slop values for non-quote phrase queries on query-basic plugin

2010-08-09 Thread Dennis Kubes (JIRA)
Allow setting of slop values for non-quote phrase queries on query-basic plugin
---

 Key: NUTCH-877
 URL: https://issues.apache.org/jira/browse/NUTCH-877
 Project: Nutch
  Issue Type: Improvement
  Components: searcher
Affects Versions: 1.2
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
 Fix For: 1.2


Patch adds a configuration variable for setting slop values on phrase queries.  
The default slop value, which currently can't be changed through configuration, 
is Integer.MAX_VALUE.  It produces something like this, which doesn't seem 
right to me.  If you are searching for a phrase you usually want it within a 
certain distance:

2.9141337E-4 = weight(content:my phrase~2147483647 in 1029), product of:

* 0.07163286 = queryWeight(content:my phrase~2147483647), product of:
  o 9.657982 = idf(content: my=13470 phrase=534)
  o 0.0074169594 = queryNorm

This patch adds the query.phrase.slop configuration value to the 
nutch-default.xml file.  It has a default setting of 5.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Alternative search box for Nutch site

2010-08-09 Thread Otis Gospodnetic
Hello,
(sending this to d...@nutch instead of old nutch-...@lucene)

Over at http://search-lucene.com we index Nutch's mailing lists, wiki, web 
site, 
source code, javadoc, jira...

Would the community be interested in a patch that adds another search option to 
the search box on nutch.apache.org?

I happened to try a few searches from nutch.a.o just now (now: yesterday) and I 
got stuff like this:

  Found 189 results in 6.211 seconds. Displaying page 1 of 19, sorted by
  Found 12,808 results in 64.342 seconds. Displaying page 1 of 1,281, sorted by

Note the times.  Ouch!
This makes me think having an alternative option would be a good thing to have.

Assuming people are for this, any suggestions for how the search should 
function 
by default or any specific instructions for how the search box should be 
modified would be great!

Thanks,
Otis


Build failed in Hudson: Nutch-trunk #1215

2010-08-09 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/1215/

--
[...truncated 986 lines...]
A src/plugin/subcollection/src/test/org/apache
A src/plugin/subcollection/src/test/org/apache/nutch
A src/plugin/subcollection/src/test/org/apache/nutch/collection
A 
src/plugin/subcollection/src/test/org/apache/nutch/collection/TestSubcollection.java
A src/plugin/subcollection/src/java
A src/plugin/subcollection/src/java/org
A src/plugin/subcollection/src/java/org/apache
A src/plugin/subcollection/src/java/org/apache/nutch
A src/plugin/subcollection/src/java/org/apache/nutch/collection
A 
src/plugin/subcollection/src/java/org/apache/nutch/collection/Subcollection.java
A 
src/plugin/subcollection/src/java/org/apache/nutch/collection/CollectionManager.java
A 
src/plugin/subcollection/src/java/org/apache/nutch/collection/package.html
A src/plugin/subcollection/src/java/org/apache/nutch/indexer
A 
src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection
A 
src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection/SubcollectionIndexingFilter.java
A src/plugin/subcollection/README.txt
A src/plugin/subcollection/plugin.xml
A src/plugin/subcollection/build.xml
A src/plugin/index-more
A src/plugin/index-more/ivy.xml
A src/plugin/index-more/src
A src/plugin/index-more/src/test
A src/plugin/index-more/src/test/org
A src/plugin/index-more/src/test/org/apache
A src/plugin/index-more/src/test/org/apache/nutch
A src/plugin/index-more/src/test/org/apache/nutch/indexer
A src/plugin/index-more/src/test/org/apache/nutch/indexer/more
A 
src/plugin/index-more/src/test/org/apache/nutch/indexer/more/TestMoreIndexingFilter.java
A src/plugin/index-more/src/java
A src/plugin/index-more/src/java/org
A src/plugin/index-more/src/java/org/apache
A src/plugin/index-more/src/java/org/apache/nutch
A src/plugin/index-more/src/java/org/apache/nutch/indexer
A src/plugin/index-more/src/java/org/apache/nutch/indexer/more
A 
src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java
A 
src/plugin/index-more/src/java/org/apache/nutch/indexer/more/package.html
A src/plugin/index-more/plugin.xml
A src/plugin/index-more/build.xml
AUsrc/plugin/plugin.dtd
A src/plugin/parse-ext
A src/plugin/parse-ext/ivy.xml
A src/plugin/parse-ext/src
A src/plugin/parse-ext/src/test
A src/plugin/parse-ext/src/test/org
A src/plugin/parse-ext/src/test/org/apache
A src/plugin/parse-ext/src/test/org/apache/nutch
A src/plugin/parse-ext/src/test/org/apache/nutch/parse
A src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext
A 
src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext/TestExtParser.java
A src/plugin/parse-ext/src/java
A src/plugin/parse-ext/src/java/org
A src/plugin/parse-ext/src/java/org/apache
A src/plugin/parse-ext/src/java/org/apache/nutch
A src/plugin/parse-ext/src/java/org/apache/nutch/parse
A src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext
A 
src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext/ExtParser.java
A src/plugin/parse-ext/plugin.xml
A src/plugin/parse-ext/build.xml
A src/plugin/parse-ext/command
A src/plugin/urlnormalizer-pass
A src/plugin/urlnormalizer-pass/ivy.xml
A src/plugin/urlnormalizer-pass/src
A src/plugin/urlnormalizer-pass/src/test
A src/plugin/urlnormalizer-pass/src/test/org
A src/plugin/urlnormalizer-pass/src/test/org/apache
A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch
A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net
A 
src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer
A 
src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass
AU
src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass/TestPassURLNormalizer.java
A src/plugin/urlnormalizer-pass/src/java
A src/plugin/urlnormalizer-pass/src/java/org
A src/plugin/urlnormalizer-pass/src/java/org/apache
A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch
A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net
A 
src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer
A 
src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer/pass
AU
src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer/pass/PassURLNormalizer.java
AUsrc/plugin/urlnormalizer-pass/plugin.xml
AU