Ah, that means don't use the crawl command and do a little shell
scripting to execute the separte crawl cycle commands, see the nutch
wiki for examples. And don't do solrdedup. Search the Solr wiki for
deduplication.
cheers
On Fri, 11 May 2012 07:39:36 +0300, Tolga to...@ozses.net wrote:
Ok I got it. Its there in userlog folder in each of the slaves..
On Fri, May 11, 2012 at 12:28 AM, Vijith vijithkv...@gmail.com wrote:
Regarding the nutch log, i think i missed out something while running the
job. In eclipse I have given the following as VM arguments
- -Dhadoop.log.dir=logs
Hello,
I am using Nutch 1.4 with Solr 3.6.0 and would like to get the HTML keywords
and description metatags indexed into Solr. On the Nutch side I have followed
thehttp://wiki.apache.org/nutch/IndexMetatags to get nutch parsing the
extracting the metatags (using index-metatags and
I have tried with a seperate logger and a printWriter objects to do this.
It works in local mode but not in deploy mode.
I am running the nutch job file. Its running and generating the hadoop log
without any errors. But the files are not created in any of the nodes.
On Fri, May 11, 2012 at 3:07
When running hadoop in deploy mode the actual tasks are ran by the
mapreduce framework so you have to check the mapreduce user logs. Either
use the jobtracker interface or check them directly on the nodes in
HADOOP_HOME/logs/userlogs or something like that.
On Fri, May 11, 2012 at 1:11 PM, Vijith
There is, every task gets run a temporary working directory. But in general
the output is cleaned after the task completes. If you want to save side
data you have to figure a workaround. This page should give you a few
pointers:
In was confused by this tutorial: http://wiki.apache.org/nutch/NutchTutorial
Reading this page one might get to the conclusion that the crawl tool
can't do iterative crawling, because under 3.2 Using Individual
Commands for Whole-Web Crawling there's the sentence This also
permits ... incremental
If you would like I could add you to the moderators group and you can
word it how you wish.
Please sign up to Jira, give me your Jira username on this page, and I
will happily add you the the group.
On the other-hand, if you don't wish to do this, then please reply
here with your suggestion and
Hi
Nutch uses Log4j and with it you can write log output from different
classes or different log levels to different output files. I'm sure this
will work with Nutch in local mode so i believe you can make it happen
with Hadoop but may be tricky, or not possible.
Cheers
On Fri, 11 May
Hello, I am using index-metatags plugins(I supose that you have index-metatags
plugins on nutch's plugins folder).
Fist you need to include on nutch-site some like this
|index-(basic|anchor|metatags|more)|
also you need to include the metadata names that you want to index(in this file
also):
I keep forgetting about the parsechecker. I'll have to take a look and see
what it kicks out.
And I've already changed solr, I was just looking at what I could do with
Nutch as well.
Thanks.
On Tue, May 8, 2012 at 8:44 AM, Markus Jelsma markus.jel...@openindex.iowrote:
Hi
Nutch should
Hi,
Actually I have already done all that, as I followed the Nutch Wiki for this
purpose: http://wiki.apache.org/nutch/IndexMetatags
Now your suggestion about cleaning my segments as well as solr index then
re-index is a good idea. Could you just help me on the commands to achieve
these 3
Hi.
I only have index-metatags plugins in my nutch-site.xml and is function
succesfully I also was trying with parse-metatags without positive result and
finaly dont use it.
also make sure that your schema in nutch is the same in solr.
if your index is not big you can erase the folder of your
I just checked out nutchgora on Wednesday, and I'm getting exceptions
trying to run an initial crawl. I found some threads regarding this issue
but not sure if it was ever solved:
Hi Ramsel,
It would be great if you could provide what configuration you have
included, also whether or not you are keeping up to date with HEAD?
This is most likely something to do with your HSQLDB configuration not
matching between server and gora.properties configuration
Lewis
On Sat, May
15 matches
Mail list logo