Hi,
I was trying to fetch DMOZ open directory using using the exact example in
the nutch tutorial website. So did the following steps:
mkdir db
mkdir segments
bin/nutch admin db -create
bin/nutch inject db -dmozfile ../nutch-0.7.1/content.rdf.u8 -subset 3000
bin/nutch generate db segments
s1=`ls -d segments/2* | tail -1`
echo $s1
bin/nutch fetch -showThreadID -noParsing -threads 50 $s1
bin/nutch updatedb db $s1
It starts fetching the pages, but after couple hundred pages it starts
giving me this exception:
"java.net.SocketException: No buffer space available"
Do you have any idea why this might happen? I know it is running out of
availabe buffer for new socket, but why the old socket are not closed? Even
if a fetch fails its socket should be closed and the its buffer should get
freed!
I tried both 0.7 and 0.7.1.
Thanks. Nima