Hi all, I'trying to fetch some million of pages,but I've got some
performance problems.
I'm using a P4 1700, 768MB ram, and a 10Mb connection.
I've changed theese configuration values in nuke-sites.xml:

<property>
  <name>fetcher.threads.fetch</name>
  <value>25</value>
</property>

<property>
  <name>http.max.delays</name>
  <value>1</value>
</property>

<property>
  <name>fetcher.threads.per.host</name>
  <value>1</value>
</property>

<property>
  <name>io.sort.factor</name>
  <value>10</value>
</property>

<property>
  <name>io.sort.mb</name>
  <value>1</value>
</property>

<property>
  <name>indexer.maxMergeDocs</name>
  <value>20</value>
</property>

<property>
  <name>indexer.termIndexInterval</name>
  <value>64</value>
</property>

and I've also added the following line into bin/nutch:
JAVA_HEAP_MAX=-Xmx750M

It seems a good configuration. So, I give the fetch command, I get theese log 
messages:

050926 181531 status: segment 20050924151836, 100 pages, 11 errors, 1277608 
bytes, 11755 ms
050926 181531 status: 8.507018 pages/s, 849.11206 kb/s, 12776.08 bytes/page
050926 181537 status: segment 20050924151836, 200 pages, 17 errors, 2620277 
bytes, 18157 ms
050926 181537 status: 11.015036 pages/s, 1127.4392 kb/s, 13101.385 bytes/page
050926 181548 status: segment 20050924151836, 300 pages, 26 errors, 4243689 
bytes, 28657 ms
050926 181548 status: 10.468647 pages/s, 1156.9187 kb/s, 14145.63 bytes/page
050926 181557 status: segment 20050924151836, 400 pages, 32 errors, 5515098 
bytes, 38102 ms
050926 181557 status: 10.4981365 pages/s, 1130.8252 kb/s, 13787.745 bytes/page
050926 181607 status: segment 20050924151836, 500 pages, 44 errors, 6678319 
bytes, 48464 ms
050926 181607 status: 10.3169365 pages/s, 1076.5592 kb/s, 13356.638 bytes/page

but,after some thousand of pages, rates decrease constantly:

050926 180746 status: segment 20050924151836, 6400 pages, 566 errors,85809551 
bytes, 853401 ms
050926 180746 status: 7.4994054 pages/s, 785.5476 kb/s, 13407.742 bytes/page
050926 180807 status: segment 20050924151836, 6500 pages, 581 errors,87133135 
bytes, 874799 ms
050926 180807 status: 7.4302783 pages/s, 778.1532 kb/s, 13405.098 bytes/page
050926 180823 status: segment 20050924151836, 6600 pages, 589 errors, 88789053 
bytes, 890686 ms
050926 180823 status: 7.410019 pages/s, 778.79803 kb/s, 13452.888 bytes/page
050926 180841 status: segment 20050924151836, 6700 pages, 594 errors, 90286731 
bytes, 908720 ms
050926 180841 status: 7.3730083 pages/s, 776.21826 kb/s, 13475.631 bytes/page
050926 180901 status: segment 20050924151836, 6800 pages, 601 errors, 91663461 
bytes, 928498 ms
050926 180901 status: 7.323656 pages/s, 771.268 kb/s, 13479.921 bytes/page
050926 181014 status: segment 20050924151836, 7200 pages, 627 errors,96922711 
bytes, 1001732 ms
050926 181014 status: 7.187551 pages/s, 755.8995 kb/s, 13461.487 bytes/page
050926 181037 status: segment 20050924151836, 7300 pages, 637 errors, 98478215 
bytes, 1024844 ms
050926 181037 status: 7.1230354 pages/s, 750.7104 kb/s, 13490.167 bytes/page


and I cannot understand how to get a fixed 10pages/s rate (or even a higher 
one!!). I've read this pages
http://wiki.apache.org/nutch/HardwareRequirements
and it states that is possible, with 25 fetchers, to download (more or less) at 
4Mbit per second,
with hardware similar to mine.
So, how can I set up nutch to fetch at a higher rate??


Thank you so much!!!!!
        Menoz


-- 
                      Free Software Enthusiast
                 Debian Powered Linux User #332564 
                     http://menoz.homelinux.org

Reply via email to