date:20060810

Re: Stalling during fetch (0.7)

2006-08-10 Thread Benjamin Higgins


Further details:

If I run strace on the process, it looks like this, over and over and over:

gettimeofday({1155249187, 52}, NULL) = 0
gettimeofday({1155249188, 389}, NULL)   = 0
gettimeofday({1155249188, 679}, NULL)   = 0
gettimeofday({1155249188, 955}, NULL)   = 0
clock_gettime(CLOCK_REALTIME, {1155249188, 1235000}) = 0
futex(0xb1f0185c, FUTEX_WAIT, 7163, {0, 99972}) = -1 ETIMEDOUT
(Connection timed out)
futex(0x805d250, FUTEX_WAKE, 1) = 0
futex(0x805c378, FUTEX_WAIT, 2, NULL)   = 0
futex(0x805c378, FUTEX_WAKE, 1) = 0

I'm afraid I don't know how to go about finding what part of the code might
be causing this...

Any ideas?

Ben

On 8/10/06, Benjamin Higgins <[EMAIL PROTECTED]> wrote:


Hello,

Nutch is stalling in the fetch process.  I've run it twice now, and it is
stopping on the *same* URL both times. I don't get what's going on!

The last status report was:
060810 145315 status: segment 20060810142649, 7900 pages, 14 errors,
98421231 bytes, 1571224 ms
060810 145315 status: 5.0279274 pages/s, 489.3738 kb/s, 12458.384bytes/page

Then, exactly 94 documents later with no errors in between, it just
stops.  On what appears to be a perfectly normal URL and a perfectly normal
page.  I don't get it.

How can I debug this situation further, to see what's going on?

I'm really frustrated since I don't know where to start looking.

Nutch is still running, taking up a lot of CPU.  I don't want to kill it
unless it really stuck.  How can I tell?

Ben

Re: More Fetcher NullPointerException

2006-08-10 Thread Raphael Hoffmann


I had the same problem before. Just read
http://www.mail-archive.com/nutch-dev%40lucene.apache.org/msg04303.html

Make that tiny change on line 385 of HttpBase.java and it will work fine.

Raphael



Sellek, Greg wrote:


I am experiencing the same issue as a similar post for 8/6.  Whenever I
try and fetch pages, I see a lot of  "fetch of xxx failed with:
java.lang.NullPointerException"  I have put the appropriate agent info
in both the nutch-default and nutch-site config files.  I tried using
DEBUG logging to get more info, but this error is the extent of what I
see.  Seems to happen on about 95% of the urls I am trying to crawl.



BTW, this happens with both the 0.8 build and the latest nightly build.



TIA for any advice as to what I am doing wrong.



Greg

Stalling during fetch (0.7)

2006-08-10 Thread Benjamin Higgins


Hello,

Nutch is stalling in the fetch process.  I've run it twice now, and it is
stopping on the *same* URL both times. I don't get what's going on!

The last status report was:
060810 145315 status: segment 20060810142649, 7900 pages, 14 errors,
98421231 bytes, 1571224 ms
060810 145315 status: 5.0279274 pages/s, 489.3738 kb/s, 12458.384 bytes/page

Then, exactly 94 documents later with no errors in between, it just stops.
On what appears to be a perfectly normal URL and a perfectly normal page.  I
don't get it.

How can I debug this situation further, to see what's going on?

I'm really frustrated since I don't know where to start looking.

Nutch is still running, taking up a lot of CPU.  I don't want to kill it
unless it really stuck.  How can I tell?

Ben

crawl-urlfilter subpages of domains

2006-08-10 Thread Jens Martin Schubert


Hello,

is it possible to crawl e.g. http://www.domain.com,
but to skip crawling all urls matching to (http://www.domain.com/subpage/)

I tried to achieve this with crawl-urlfilter.txt/regex-urlfilter.txt. 
but it doesn't work:


-ftp.tu-clausthal.de
-^http://([a-z0-9]*\.)asta.tu-clausthal.de/de/mobil/
+^http://([a-z0-9]*\.)asta.tu-clausthal.de
+^http://([a-z0-9]*\.)*tu-clausthal.de/
# skip everything else
-.

skipping ftp.tu-clausthal.de works perfect,
but http://www.asta.tu-clausthal.de/de/mobil/ is still indexed, which 
takes a long time to crawl.


regards,
Jens Martin Schubert

Nutch vs. Google Appliance

2006-08-10 Thread Stevenson, Kerry

Hello all - I have been taking a look at Nutch for purposes of indexing
a large pile of internal LAN files at our company, and so far it looks
quite impressive. I believe it could substitute for the Google Mini
appliance. However, the bigger Google boxes add more features that I am
not sure can be duplicated in Nutch. Specifically I am interested in the
indexing and searching for secured files. Apparently Google will index
all files, including those that are secure (given appropriate authority)
- but will only show search results based on the security and
credentials of the searcher. In other words, if you don't have access to
a document, Google won't show you that it even exists. Can something
like that be done in Nutch? Are there other differences between Nutch
and Google? 

-
The contents of this communication, including any attachment(s), are
confidential and may be privileged. If you are not the intended
recipient (or are not receiving this communication on behalf of the
intended recipient), please notify the sender immediately and delete or
destroy this communication without reading it, and without making,
forwarding, or retaining any copy or record of it or its contents. Thank
you. Note: We have taken precautions against viruses, but take no
responsibility for loss or damage caused by any virus present.

common-terms.utf8

2006-08-10 Thread Lourival Júnior


Hi,

Could anyone explain me what does exactly the common-terms.utf8 file? I
don't understand the real functionality of this file...

Regards,

--
Lourival Junior
Universidade Federal do Pará
Curso de Bacharelado em Sistemas de Informação
http://www.ufpa.br/cbsi
Msn: [EMAIL PROTECTED]

file access rights/permissions considerations - the least painful way

2006-08-10 Thread Tomi NA


I'm interested in crawling multiple shared folders (among other
things) on a corporate LAN.
It is a LAN of MS clients with Active Directory managed accounts.

The users routinely access the files based on ntfs-level (and
sharing?) permissions.

Idealy, I'd like to set up a central server (probably linux, but any
*n*x would do) where I'd mount all the shared folders.
I'd then set up apache so that the files are accessible via http and,
more importantly, webdav. I imagine apache could use mod_dav, mod_auth
and possibly one or two other modules to regulate access priviledges -
I could very well be completely wrong here.
Finally, I'd like to set up nutch to crawl the shared documents
through the web server, so that the stored links are valid in the
whole LAN. Nutch would therefore require absolute access to all
documents, but the documents would be served via a web server who
checks user identities and access rights.

Nutch users who've tackled the access rights problem themselves would
save me a world of time, effort and trouble with a couple of pointers
on how to go about the whole security issue.
If the setup I described is the worst possible way to go about it, I'd
appreciate a notice saying so and elaborating why. :)

TIA,
t.n.a.

More Fetcher NullPointerException

2006-08-10 Thread Sellek, Greg

I am experiencing the same issue as a similar post for 8/6.  Whenever I
try and fetch pages, I see a lot of  "fetch of xxx failed with:
java.lang.NullPointerException"  I have put the appropriate agent info
in both the nutch-default and nutch-site config files.  I tried using
DEBUG logging to get more info, but this error is the extent of what I
see.  Seems to happen on about 95% of the urls I am trying to crawl.

 

BTW, this happens with both the 0.8 build and the latest nightly build.

 

TIA for any advice as to what I am doing wrong.

 

Greg

Re: number of mapper

2006-08-10 Thread Dennis Kubes

Take a look at this, 
http://wiki.apache.org/lucene-hadoop/HowManyMapsAndReduces


It will answer why you have a few more map tasks that are set in the 
configuration.


Dennis

Murat Ali Bayir wrote:

my configs are given below:
in hadoop-site number of mapper = 130
in my code I use job.setNumMapTasks = 130
in hadoop-default numberof mapper = 2
in this configuration I have taken 135 mapper in my job. However there 
is no problem in number of reducer.


Andrzej Bialecki wrote:


Murat Ali Bayir wrote:

Hi everbody, Although I change the number of mappers in 
hadoop-site.xml and use job.setNumMapTasks method the system gives 
another number as a number of mapper, the problem only occurs for 
number of mapper, number of reducers works correctly.  What I have 
to do for setting the number of mappers in the system?



Any value that you put in hadoop-site.xml will always override any 
other config settings, even those set programatically in 
job.setNumMapTasks. You should remove these settings from 
hadoop-site, and put them into mapred-default.xml.

Re: number of mapper

2006-08-10 Thread Murat Ali Bayir


my configs are given below:
in hadoop-site number of mapper = 130
in my code I use job.setNumMapTasks = 130
in hadoop-default numberof mapper = 2
in this configuration I have taken 135 mapper in my job. However there 
is no problem in number of reducer.


Andrzej Bialecki wrote:


Murat Ali Bayir wrote:

Hi everbody, Although I change the number of mappers in 
hadoop-site.xml and use job.setNumMapTasks method the system gives 
another number as a number of mapper, the problem only occurs for 
number of mapper, number of reducers works correctly.  What I have to 
do for setting the number of mappers in the system?



Any value that you put in hadoop-site.xml will always override any 
other config settings, even those set programatically in 
job.setNumMapTasks. You should remove these settings from hadoop-site, 
and put them into mapred-default.xml.

Re: number of mapper

2006-08-10 Thread Andrzej Bialecki


Murat Ali Bayir wrote:
Hi everbody, Although I change the number of mappers in 
hadoop-site.xml and use job.setNumMapTasks method the system gives 
another number as a number of mapper, the problem only occurs for 
number of mapper, number of reducers works correctly.  What I have to 
do for setting the number of mappers in the system?


Any value that you put in hadoop-site.xml will always override any other 
config settings, even those set programatically in job.setNumMapTasks. 
You should remove these settings from hadoop-site, and put them into 
mapred-default.xml.


--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: number of mapper

2006-08-10 Thread Murat Ali Bayir

it can not be problem, it only restrict the number of tasks running 
simultaneously, there can be pending tasks also, i check that this not 
problem.  I am not sure but I notice that the number of mapper tasks is 
equal to k*number of different parts in input path.  To illusrate I have 
15 parts in my input path, I set the number of mappers 130 in 
hadoop-site.xml however when I run the job I have 135 mapper which is 9 
times of number of input part.



Dennis Kubes wrote:

There is also a mapred.tasktracker.tasks.maximum variable which may be 
causing the task number to be different.


Dennis

Murat Ali Bayir wrote:

Hi everbody, Although I change the number of mappers in 
hadoop-site.xml and use job.setNumMapTasks method the system gives 
another number as a number of mapper, the problem only occurs for 
number of mapper, number of reducers works correctly.  What I have to 
do for setting the number of mappers in the system?




.

Re: problems with start-all command

2006-08-10 Thread Dennis Kubes

The name node is running. Run the bin/stop-all.sh script first and then
do a ps -ef | grep NameNode to see if the process is still running. If
it is, it may need to be killed by hand kill -9 processid.
The second problem is the setup of ssh keys as described in previous
email. Also I would recommend NOT running the namenode as root but in
having a specific user setup to run the various servers as described in
the tutorial.

Dennis

kawther khazri wrote:

[input] [input] [input] [input]
hello,
we are trying to install nutch in single machine using this guide:
"http://wiki.apache.org/nutch/NutchHadoopTutorial?highlight=%28nutch%29";,
we are blocked in this step:

*first we execute this command as root
[EMAIL PROTECTED] search]# bin/start-all.sh
namenode running as process 16323. Stop it first.
[EMAIL PROTECTED]'s password:
localhost: starting datanode, logging to
/nutch/search/logs/hadoop-root-datanode-localhost.localdomain.out
starting jobtracker, logging to
/nutch/search/logs/hadoop-root-jobtracker-localhost.localdomain.out
[EMAIL PROTECTED]'s password:
localhost: tasktracker running as process 16448. Stop it first.

*second we execute it in a normal user's session:nutch
[EMAIL PROTECTED] search]$ bin/start-all.sh
starting namenode, logging to
/nutch/search/logs/hadoop-nutch-namenode-localhost.localdomain.out
The authenticity of host 'localhost (127.0.0.1)' can't be established.
RSA key fingerprint is 9e:56:da:f3:72:dc:1a:91:5d:78:89:ce:89:04:3d:d3.
Are you sure you want to continue connecting (yes/no)? yes
localhost: Failed to add the host to the list of known hosts
(/nutch/home/.ssh/known_hosts).
Enter passphrase for key '/nutch/home/.ssh/id_rsa':
[EMAIL PROTECTED]'s password:
localhost: starting datanode, logging to
/nutch/search/logs/hadoop-nutch-datanode-localhost.localdomain.out
starting jobtracker, logging to
/nutch/search/logs/hadoop-nutch-jobtracker-localhost.localdomain.out
The authenticity of host 'localhost (127.0.0.1)' can't be established.
RSA key fingerprint is 9e:56:da:f3:72:dc:1a:91:5d:78:89:ce:89:04:3d:d3.
Are you sure you want to continue connecting (yes/no)? yes
localhost: Failed to add the host to the list of known hosts
(/nutch/home/.ssh/known_hosts).
Enter passphrase for key '/nutch/home/.ssh/id_rsa':
[EMAIL PROTECTED]'s password:
localhost: starting tasktracker, logging to
/nutch/search/logs/hadoop-nutch-tasktracker-localhost.localdomain.out

what is the différence between the two. what's the meaning this message:"namenode
running as process 16323. Stop it first." Is it normal to obtain this. I don't know
the cause of this error.
Please, If you have any idea,help me.
BEST REGARDS,

-
Découvrez un nouveau moyen de poser toutes vos questions quelque soit le sujet ! Yahoo! Questions/Réponses pour partager vos connaissances, vos opinions et vos expériences. Cliquez ici.

Re: number of mapper

2006-08-10 Thread Dennis Kubes

There is also a mapred.tasktracker.tasks.maximum variable which may be 
causing the task number to be different.


Dennis

Murat Ali Bayir wrote:
Hi everbody, Although I change the number of mappers in 
hadoop-site.xml and use job.setNumMapTasks method the system gives 
another number as a number of mapper, the problem only occurs for 
number of mapper, number of reducers works correctly.  What I have to 
do for setting the number of mappers in the system?

Index with synonyms

2006-08-10 Thread Keyserzero


Hey list,

I would like to ask you if it is possible to start a search query with a 
simple word (e.g. "Home"). Then Nutch will lookup the word “Home” in a 
list with synonyms. Nutch will then recognize that “House” is a synonym 
for “Home”. Now, Nutch can start a search query with “House” and “Home” 
and show both results.


Is that possible?

Regards


___ 
Telefonate ohne weitere Kosten vom PC zum PC: http://messenger.yahoo.de

number of mapper

2006-08-10 Thread Murat Ali Bayir

Hi everbody, Although I change the number of mappers in hadoop-site.xml 
and use job.setNumMapTasks method the system gives another number as a 
number of mapper, the problem only occurs for number of mapper, number 
of reducers works correctly.  What I have to do for setting the number 
of mappers in the system?

Extended crawling configuration with "mapred.input.value.class"?

2006-08-10 Thread Timo Scheuer

Hi,

I am interested in more comprehensive configuration of the crawl targets. The 
actual version only supports lists (files) containing URLs. One thing that 
could be desirable is the injection of URLs with metadata attached. This 
metadata (inserted into the CrawlData object) could be read by plugins in 
following steps of the whole indexing process and used as hints for 
processing decissions. This would be similar to the use of the metadata for 
getting the score from one stage to the next stage or even to the outlinks in 
the next cycle.

Now my question: Can I use an XMLWritable (this is my new configuartion class) 
instead of UTF8 by setting the Hadoop config entry mapred.input.value.class 
to XMLWritable? Is this Hadoop setting only used for URL injection or does my 
change of the settings do any harm to other utilizations that also use the 
class configured at this point?


Cheers,
Timo.

problem with the DFS commande

2006-08-10 Thread kawther khazri

hello,
When I execute the DFS commande,I have this:
[EMAIL PROTECTED] search]$ bin/start-all.sh
starting namenode, logging to 
/nutch/search/logs/hadoop-nutch-namenode-localhost.out
The authenticity of host 'localhost (127.0.0.1)' can't be established.
RSA key fingerprint is 81:0e:49:ce:61:8c:7b:09:1f:dc:5d:2c:64:f1:68:d6.
Are you sure you want to continue connecting (yes/no)? yes
localhost: Failed to add the host to the list of known hosts 
(/nutch/home/.ssh/known_hosts).
Enter passphrase for key '/nutch/home/.ssh/id_rsa':
[EMAIL PROTECTED]'s password:
localhost: starting datanode, logging to 
/nutch/search/logs/hadoop-nutch-datanode-localhost.out
starting jobtracker, logging to 
/nutch/search/logs/hadoop-nutch-jobtracker-localhost.out
The authenticity of host 'localhost (127.0.0.1)' can't be established.
RSA key fingerprint is 81:0e:49:ce:61:8c:7b:09:1f:dc:5d:2c:64:f1:68:d6.
Are you sure you want to continue connecting (yes/no)? yes
localhost: Failed to add the host to the list of known hosts 
(/nutch/home/.ssh/known_hosts).
Enter passphrase for key '/nutch/home/.ssh/id_rsa':
[EMAIL PROTECTED]'s password:
localhost: starting tasktracker, logging to 
/nutch/search/logs/hadoop-nutch-tasktracker-localhost.out
[EMAIL PROTECTED] search]$ bin/hadoop dfs -ls
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /nutch/search/logs/hadoop.log (Permission denied)
at java.io.FileOutputStream.openAppend(Native Method)
at java.io.FileOutputStream.(FileOutputStream.java:177)
at java.io.FileOutputStream.(FileOutputStream.java:102)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:289)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:163)
at 
org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:215)
at 
org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:256)
at 
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:132)
at 
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:96)
at 
org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:654)
at 
org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:612)
at 
org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:509)
at 
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:415)
at 
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:441)
at 
org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:468)
at org.apache.log4j.LogManager.(LogManager.java:122)
at org.apache.log4j.Logger.getLogger(Logger.java:104)
at 
org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:229)
at 
org.apache.commons.logging.impl.Log4JLogger.(Log4JLogger.java:65)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:494)
at 
org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:529)
at 
org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:235)
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:370)
at org.apache.hadoop.util.ToolBase.(ToolBase.java:71)
log4j:ERROR Either File or DatePattern options are not set for appender [DRFA].
java.lang.NullPointerException
at java.net.Socket.(Socket.java:358)
at java.net.Socket.(Socket.java:208)
at org.apache.hadoop.ipc.Client$Connection.(Client.java:113)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:359)
at org.apache.hadoop.ipc.Client.call(Client.java:297)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:150)
at org.apache.hadoop.dfs.$Proxy0.getListing(Unknown Source)
at org.apache.hadoop.dfs.DFSClient.listPaths(DFSClient.java:332)
at 
org.apache.hadoop.dfs.DistributedFileSystem.listPathsRaw(DistributedFileSystem.java:157)
at org.apache.hadoop.fs.FileSystem.listPaths(FileSystem.java:509)
at org.apache.hadoop.fs.FileSystem.listPaths(FileSystem.java:479)
at org.apache.hadoop.dfs.DFSShell.ls(DFSShell.java:165)
at org.apache.hadoop.dfs.DFSShell.run(DFSShell.java:329)
at org.apache.hadoop.util.ToolBase.executeCommand(ToolBase.java:173)
at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:182)
at org.apache.hadoop.dfs.DFSShell.main(DFSShell.java:360)

-
 Découvrez un nouveau moyen de poser toutes vos questions quelque s

Crawling flash

2006-08-10 Thread Iain

I want to include embedded flash in my crawls.

 

Despite (apparently successfully) including the parse-swf plugin, embedded
flash does not seem to be retrieved.  Im assuming that the object tags are
not being parsed to find the .swf  files.

 

Can anyone comment?

 

Thanks

 

Iain

problems with start-all command

2006-08-10 Thread kawther khazri

[input]   [input]   [input]   [input]   
 hello,
we are trying to install nutch in single machine using this guide:
"http://wiki.apache.org/nutch/NutchHadoopTutorial?highlight=%28nutch%29";, 
we are blocked in this step:
   *first we execute this command as root
[EMAIL PROTECTED] search]# bin/start-all.sh
namenode running as process 16323. Stop it first.
[EMAIL PROTECTED]'s password:
localhost: starting datanode, logging to 
/nutch/search/logs/hadoop-root-datanode-localhost.localdomain.out
starting jobtracker, logging to 
/nutch/search/logs/hadoop-root-jobtracker-localhost.localdomain.out
[EMAIL PROTECTED]'s password:
localhost: tasktracker running as process 16448. Stop it first.

   *second we execute it in a normal user's session:nutch
[EMAIL PROTECTED] search]$ bin/start-all.sh
starting namenode, logging to 
/nutch/search/logs/hadoop-nutch-namenode-localhost.localdomain.out
The authenticity of host 'localhost (127.0.0.1)' can't be established.
RSA key  fingerprint is 9e:56:da:f3:72:dc:1a:91:5d:78:89:ce:89:04:3d:d3.
Are you sure you want to continue connecting (yes/no)? yes
localhost: Failed to add the host to the list of known hosts 
(/nutch/home/.ssh/known_hosts).
Enter passphrase for key '/nutch/home/.ssh/id_rsa':
[EMAIL PROTECTED]'s password:
localhost: starting datanode, logging to 
/nutch/search/logs/hadoop-nutch-datanode-localhost.localdomain.out
starting jobtracker, logging to 
/nutch/search/logs/hadoop-nutch-jobtracker-localhost.localdomain.out
The authenticity of host 'localhost (127.0.0.1)' can't be established.
RSA key fingerprint is 9e:56:da:f3:72:dc:1a:91:5d:78:89:ce:89:04:3d:d3.
Are you sure you want to continue connecting (yes/no)? yes
localhost: Failed to add the host to the list of known hosts 
(/nutch/home/.ssh/known_hosts).
Enter passphrase for key '/nutch/home/.ssh/id_rsa':
[EMAIL PROTECTED]'s password:
localhost: starting tasktracker, logging to  
/nutch/search/logs/hadoop-nutch-tasktracker-localhost.localdomain.out

what is the différence between the two. what's the meaning this 
message:"namenode running as process 16323. Stop it first." Is it normal to 
obtain this. I don't know the cause of this error.
Please, If you have any idea,help me.
BEST REGARDS, 

-
 Découvrez un nouveau moyen de poser toutes vos questions quelque soit le sujet 
! Yahoo! Questions/Réponses pour partager vos connaissances, vos opinions et 
vos expériences. Cliquez ici.

Re: Stalling during fetch (0.7)

Re: More Fetcher NullPointerException

Stalling during fetch (0.7)

crawl-urlfilter subpages of domains

Nutch vs. Google Appliance

common-terms.utf8

file access rights/permissions considerations - the least painful way

More Fetcher NullPointerException

Re: number of mapper

Re: number of mapper

Re: number of mapper

Re: number of mapper

Re: problems with start-all command

Re: number of mapper

Index with synonyms

number of mapper

Extended crawling configuration with "mapred.input.value.class"?

problem with the DFS commande

Crawling flash

problems with start-all command

20 matches

Site Navigation

Mail list logo

Footer information