Hi,

I was trying to fetch DMOZ open directory using using the exact
example in the nutch tutorial website. So did the following steps:

mkdir db
mkdir segments
bin/nutch admin db -create
bin/nutch inject db -dmozfile ../nutch-0.7.1/content.rdf.u8 -subset 3000
bin/nutch generate db segments
s1=`ls -d segments/2* | tail -1`
echo $s1
bin/nutch fetch -showThreadID -noParsing -threads 50 $s1
bin/nutch updatedb db $s1

It starts fetching the pages, but after couple hundred pages it starts
giving me this exception:

"java.net.SocketException: No buffer space available"

Do you have any idea why this might happen? I know it is running out
of availabe buffer for new socket, but why the old socket are not
closed? Even if a fetch fails its socket should be closed and the its
buffer should get freed!

I tried both 0.7 and 0.7.1.
On example of the given Exception is like this:

051018 153727 28 fetching http://perso.wanadoo.es/largo/
java.net.SocketException: No buffer space available
       at java.net.PlainSocketImpl.socketConnect(Native Method)
       at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
       at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
       at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
       at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:364)
       at java.net.Socket.connect(Socket.java:507)
       at java.net.Socket.connect(Socket.java:457)
       at java.net.Socket.<init>(Socket.java:365)
       at java.net.Socket.<init>(Socket.java:238)
       at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.c
   reateSocket(DefaultProtocolSocketFactory.java:79)
       at org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory$
   1.doit(ControllerThreadSocketFactory.java:90)
       at org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory$
   SocketTask.run(ControllerThreadSocketFactory.java:157)
       at java.lang.Thread.run(Thread.java:595)

Nima


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to