See : http://frutch.free.fr/wikini/wakka.php?wiki=DimensionnementMoteur
It means that you can crawl up to 4 billion pages. But look at the configuration... Regards. Marc De�: cao yuzhong [mailto:[EMAIL PROTECTED] Envoy�: jeudi 2 juin 2005 10:12 ��: [email protected] Objet�: Can Nutch index over 90G html pages ? Have anyone used nutch to index over 90G html pages(about 6 million pages)? Is it possible? How many rams does it require? I tried to use Nutch to index 90G html pages. My pc has 1G Ram and the JVM parameter set to -Xmx1000m Following is my problem: Exception in thread "main" java.lang.OutOfMemoryError at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:194) at net.nutch.fs.LocalFileSystem$LocalNFSFileInputStream.read(LocalFileSystem.java:68) at net.nutch.fs.NFSDataInputStream$PositionCache.read(NFSDataInputStream.java:24) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:313) at java.io.DataInputStream. at java.io.BufferedInputStream.read(BufferedInputStream.java:313) at java.io.DataInputStream.readFully(DataInputStream.java:176) at net.nutch.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:42) at net.nutch.io.DataOutputBuffer.write(DataOutputBuffer.java:76) at net.nutch.io.SequenceFile$Reader.next(SequenceFile.java:241) at net.nutch.io.MapFile$Reader.seek(MapFile.java:263) at net.nutch.io.MapFile$Reader.get(MapFile.java:306) at net.nutch.io.ArrayFile$Reader.get(ArrayFile.java:62) at net.nutch.segment.SegmentReader.get(SegmentReader.java:284) at net.nutch.indexer.IndexSegment.indexPages(IndexSegment.java:110) at net.nutch.indexer.IndexSegment.main(IndexSegment.java:241) Any seggestions? Best regards! cyz
