I just want to use one PC to index 90G pages.
Can anyone estimate how many rams it will need?
Are there some tricks to reduce the requirement for ram?

From: "Marc DELERUE" <[EMAIL PROTECTED]>
Reply-To: [email protected]
To: <[email protected]>
Subject: RE: Can Nutch index over 90G html pages ?
Date: Thu, 2 Jun 2005 10:24:26 +0200


See : http://frutch.free.fr/wikini/wakka.php?wiki=DimensionnementMoteur

It means that you can crawl up to 4 billion pages.
But look at the configuration...

Regards.
Marc


De? cao yuzhong [mailto:[EMAIL PROTECTED]
Envoy?: jeudi 2 juin 2005 10:12
?: [email protected]
Objet? Can Nutch index over 90G html pages ?

Have anyone used nutch to index over 90G html pages(about 6 million
pages)?
Is it possible? How many rams does it require?

I tried to use Nutch to index 90G html pages.
My pc has 1G Ram and the JVM parameter set to -Xmx1000m
Following is my problem:

Exception in thread "main" java.lang.OutOfMemoryError
        at java.io.FileInputStream.readBytes(Native Method)
        at java.io.FileInputStream.read(FileInputStream.java:194)
at
net.nutch.fs.LocalFileSystem$LocalNFSFileInputStream.read(LocalFileSystem.java:68)


        at
net.nutch.fs.NFSDataInputStream$PositionCache.read(NFSDataInputStream.java:24)


        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
        at java.io.DataInputStream.

        at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
        at java.io.DataInputStream.readFully(DataInputStream.java:176)
        at net.nutch.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:42)
        at net.nutch.io.DataOutputBuffer.write(DataOutputBuffer.java:76)
        at net.nutch.io.SequenceFile$Reader.next(SequenceFile.java:241)
        at net.nutch.io.MapFile$Reader.seek(MapFile.java:263)

        at net.nutch.io.MapFile$Reader.get(MapFile.java:306)
        at net.nutch.io.ArrayFile$Reader.get(ArrayFile.java:62)
        at net.nutch.segment.SegmentReader.get(SegmentReader.java:284)
        at net.nutch.indexer.IndexSegment.indexPages(IndexSegment.java:110)
        at net.nutch.indexer.IndexSegment.main(IndexSegment.java:241)

Any seggestions?

Best regards!
cyz




Reply via email to