I tried configuring my instance to fetch and parse your page with the
following result
lewismc@lewismc-HP-Mini-110-3100:~/ASF/trunk/runtime/local/bin$
./nutch parsechecker
http://lucene.472066.n3.nabble.com/file/n3984604/dtree.js
fetching: http://lucene.472066.n3.nabble.com/file/n3984604/dtree.js
One final poin there which I forgot.
The point of the parse-js plugin is to extract outlinks from JS pages.
The page you supplied contained only one outlink to a page which no
longer exists, so depending on what your purposes are you may not find
the parse-js plugin of much help
Lewis
On Fri,
You need to add the site field in your schema.xml - in your solr.
Jim
On Fri, May 18, 2012 at 12:58 AM, cameron tran cameront...@gmail.comwrote:
Hello
I am trying to get Nutch 1.4 (downloaded binary) to do solrindex to
http://127.0.0.1:8983/solr/ but is getting the following error. Using
How can I exlude certain mime-types from crawling, for example Word-documents?
If I have parse-tika in plugin.includes it will parse them. Do I have
to change parse-plugins.xml?
I can't exclude them in regex-urlfilter as the .doc extension is not
present in the urls.
Thanks
Matthias
When will Nutch 1.5 be released?
Matthias
On Wed, Apr 18, 2012 at 1:46 PM, Bharat Goyal bharat.go...@shiksha.com wrote:
+1
On Monday 16 April 2012 12:34 PM, Markus Jelsma wrote:
+1
On Mon, 16 Apr 2012 05:43:22 +, Mattmann, Chris A (388J)
chris.a.mattm...@jpl.nasa.gov wrote:
Hi
-Original message-
From:Matthias Paul magethle.nu...@gmail.com
Sent: Fri 18-May-2012 14:57
To: user@nutch.apache.org
Subject: Exclude certain mime-types
How can I exlude certain mime-types from crawling, for example Word-documents?
If I have parse-tika in plugin.includes it
As soon as the release manager finds some spare time to manage the release
process.
Please be patient or build from trunk which is the next 1.5.
-Original message-
From:Matthias Paul magethle.nu...@gmail.com
Sent: Fri 18-May-2012 15:09
To: user@nutch.apache.org
Subject: Re: [VOTE]
When the community is satisfied that we have a good release candidate
and when the VOTE'ing suits the required conditions.
Ultimately the timing for a release is down to the release manager but
I think it is fair to say that we are on our way to getting 1.5
released soon as the (trunk) codebase
Hey Guys,
Sorry I've been on hiatus enjoying a trip with my family :)
I was hoping to respin rc #2 before I left, but I didn't find the
spare cycles. Lewis, basically if you look through the rc #1 thread there are
about 3-4 comments from Julien, you, and I think from Sami.
I have them written
Yes. Also take a look at this page
[1]http://wiki.apache.org/nutch/Whole-Web%20Crawling%20incremental%20script
for
script exemples.
[1]
http://wiki.apache.org/nutch/Whole-Web%20Crawling%20incremental%20script
On Thu, May 17, 2012 at 6:07 AM, Tolga to...@ozses.net wrote:
I'm still confused. You
10 matches
Mail list logo