Re: Outofmemory error for large files

2009-02-17 Thread Shalin Shekhar Mangar
On Tue, Feb 17, 2009 at 1:10 PM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:

 Right.  But I was trying to point out that a single 150MB Document is not
 in fact what the o.p. wants to do.  For example, if your 150MB represents,
 say, a whole book, should that really be a single document?  Or should
 individual chapters be separate documents, for example?


Yes, a 150MB document is probably not a good idea. I am only trying to point
out that even if he writes multiple documents in a 150MB batch, he may still
hit the OOME because all the XML is written to memory first and then out to
the server.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Outofmemory error for large files

2009-02-16 Thread Otis Gospodnetic
Siddharth,

At the end of your email you said:
One option I see is to break the file in chunks, but with this, I won't be 
able to search with multiple words if they are distributed in different 
documents.

Unless I'm missing something unusual about your application, I don't think the 
above is technically correct.  Have you tried doing this and have you 
then tried your searches?  Everything should still work, even if you index one 
document at a time.

Otis--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch 





From: Gargate, Siddharth sgarg...@ptc.com
To: solr-user@lucene.apache.org
Sent: Monday, February 16, 2009 2:00:58 PM
Subject: Outofmemory error for large files


I am trying to index around 150 MB text file with 1024 MB max heap. But
I get Outofmemory error in the SolrJ code. 

Exception in thread main java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2882)
    at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.jav
a:100)
    at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:572)
    at java.lang.StringBuffer.append(StringBuffer.java:320)
    at java.io.StringWriter.write(StringWriter.java:60)
    at org.apache.solr.common.util.XML.escape(XML.java:206)
    at org.apache.solr.common.util.XML.escapeCharData(XML.java:79)
    at org.apache.solr.common.util.XML.writeXML(XML.java:149)
    at
org.apache.solr.client.solrj.util.ClientUtils.writeXML(ClientUtils.java:
115)
    at
org.apache.solr.client.solrj.request.UpdateRequest.writeXML(UpdateReques
t.java:200)
    at
org.apache.solr.client.solrj.request.UpdateRequest.getXML(UpdateRequest.
java:178)
    at
org.apache.solr.client.solrj.request.UpdateRequest.getContentStreams(Upd
ateRequest.java:173)
    at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde
dSolrServer.java:136)
    at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:243)
    at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)


I modified the UpdateRequest class to initialize the StringWriter object
in UpdateRequest.getXML with initial size, and cleared the
SolrInputDocument that is having the reference of the file text. Then I
am getting OOM as below:


Exception in thread main java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2786)
    at java.lang.StringCoding.safeTrim(StringCoding.java:64)
    at java.lang.StringCoding.access$300(StringCoding.java:34)
    at
java.lang.StringCoding$StringEncoder.encode(StringCoding.java:251)
    at java.lang.StringCoding.encode(StringCoding.java:272)
    at java.lang.String.getBytes(String.java:947)
    at
org.apache.solr.common.util.ContentStreamBase$StringStream.getStream(Con
tentStreamBase.java:142)
    at
org.apache.solr.common.util.ContentStreamBase$StringStream.getReader(Con
tentStreamBase.java:154)
    at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:61)
    at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte
ntStreamHandlerBase.java:54)
    at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:131)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
    at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde
dSolrServer.java:139)
    at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:249)
    at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)


After I increase the heap size upto 1250 MB, I get OOM as 

Exception in thread main java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:3209)
    at java.lang.String.init(String.java:216)
    at java.lang.StringBuffer.toString(StringBuffer.java:585)
    at
com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:403)
    at
com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821)
    at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:276)
    at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
    at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
    at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte
ntStreamHandlerBase.java:54)
    at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:131)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
    at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde
dSolrServer.java:139)
    at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:249)
    at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)


So looks like I won't be able to get out of these OOMs. 
Is there any way to avoid these OOMs? One option I see is to break the
file in chunks, but with this, I won't be able to search with multiple
words if they are distributed in different documents.
Also, can somebody tell me the 

RE: Outofmemory error for large files

2009-02-16 Thread Gargate, Siddharth

 Otis,

I haven't tried it yet but what I meant is :
If we divide the content in multiple parts, then words will be splitted in two 
different SOLR documents. If the main document contains 'Hello World' then 
these two words might get indexed in two different documents. Searching for 
'Hello world' won't give me the required search result unless I use OR in the 
query.

Thanks,
Siddharth

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Tuesday, February 17, 2009 9:58 AM
To: solr-user@lucene.apache.org
Subject: Re: Outofmemory error for large files

Siddharth,

At the end of your email you said:
One option I see is to break the file in chunks, but with this, I won't be 
able to search with multiple words if they are distributed in different 
documents.

Unless I'm missing something unusual about your application, I don't think the 
above is technically correct.  Have you tried doing this and have you 
then tried your searches?  Everything should still work, even if you index one 
document at a time.

Otis--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch 





From: Gargate, Siddharth sgarg...@ptc.com
To: solr-user@lucene.apache.org
Sent: Monday, February 16, 2009 2:00:58 PM
Subject: Outofmemory error for large files


I am trying to index around 150 MB text file with 1024 MB max heap. But I get 
Outofmemory error in the SolrJ code. 

Exception in thread main java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2882)
    at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.jav
a:100)
    at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:572)
    at java.lang.StringBuffer.append(StringBuffer.java:320)
    at java.io.StringWriter.write(StringWriter.java:60)
    at org.apache.solr.common.util.XML.escape(XML.java:206)
    at org.apache.solr.common.util.XML.escapeCharData(XML.java:79)
    at org.apache.solr.common.util.XML.writeXML(XML.java:149)
    at
org.apache.solr.client.solrj.util.ClientUtils.writeXML(ClientUtils.java:
115)
    at
org.apache.solr.client.solrj.request.UpdateRequest.writeXML(UpdateReques
t.java:200)
    at
org.apache.solr.client.solrj.request.UpdateRequest.getXML(UpdateRequest.
java:178)
    at
org.apache.solr.client.solrj.request.UpdateRequest.getContentStreams(Upd
ateRequest.java:173)
    at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde
dSolrServer.java:136)
    at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:243)
    at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)


I modified the UpdateRequest class to initialize the StringWriter object in 
UpdateRequest.getXML with initial size, and cleared the SolrInputDocument that 
is having the reference of the file text. Then I am getting OOM as below:


Exception in thread main java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2786)
    at java.lang.StringCoding.safeTrim(StringCoding.java:64)
    at java.lang.StringCoding.access$300(StringCoding.java:34)
    at
java.lang.StringCoding$StringEncoder.encode(StringCoding.java:251)
    at java.lang.StringCoding.encode(StringCoding.java:272)
    at java.lang.String.getBytes(String.java:947)
    at
org.apache.solr.common.util.ContentStreamBase$StringStream.getStream(Con
tentStreamBase.java:142)
    at
org.apache.solr.common.util.ContentStreamBase$StringStream.getReader(Con
tentStreamBase.java:154)
    at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:61)
    at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte
ntStreamHandlerBase.java:54)
    at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:131)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
    at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde
dSolrServer.java:139)
    at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:249)
    at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)


After I increase the heap size upto 1250 MB, I get OOM as 

Exception in thread main java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:3209)
    at java.lang.String.init(String.java:216)
    at java.lang.StringBuffer.toString(StringBuffer.java:585)
    at
com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:403)
    at
com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821)
    at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:276)
    at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
    at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
    at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte
ntStreamHandlerBase.java:54)
    at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:131

Re: Outofmemory error for large files

2009-02-16 Thread Otis Gospodnetic
Siddharth,

But does your 150MB file represent a single Document?  That doesn't sound right.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch





From: Gargate, Siddharth sgarg...@ptc.com
To: solr-user@lucene.apache.org
Sent: Tuesday, February 17, 2009 12:39:53 PM
Subject: RE: Outofmemory error for large files


Otis,

I haven't tried it yet but what I meant is :
If we divide the content in multiple parts, then words will be splitted in two 
different SOLR documents. If the main document contains 'Hello World' then 
these two words might get indexed in two different documents. Searching for 
'Hello world' won't give me the required search result unless I use OR in the 
query.

Thanks,
Siddharth

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Tuesday, February 17, 2009 9:58 AM
To: solr-user@lucene.apache.org
Subject: Re: Outofmemory error for large files

Siddharth,

At the end of your email you said:
One option I see is to break the file in chunks, but with this, I won't be 
able to search with multiple words if they are distributed in different 
documents.

Unless I'm missing something unusual about your application, I don't think the 
above is technically correct.  Have you tried doing this and have you 
then tried your searches?  Everything should still work, even if you index one 
document at a time.

Otis--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch 





From: Gargate, Siddharth sgarg...@ptc.com
To: solr-user@lucene.apache.org
Sent: Monday, February 16, 2009 2:00:58 PM
Subject: Outofmemory error for large files


I am trying to index around 150 MB text file with 1024 MB max heap. But I get 
Outofmemory error in the SolrJ code. 

Exception in thread main java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2882)
    at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.jav
a:100)
    at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:572)
    at java.lang.StringBuffer.append(StringBuffer.java:320)
    at java.io.StringWriter.write(StringWriter.java:60)
    at org.apache.solr.common.util.XML.escape(XML.java:206)
    at org.apache.solr.common.util.XML.escapeCharData(XML.java:79)
    at org.apache.solr.common.util.XML.writeXML(XML.java:149)
    at
org.apache.solr.client.solrj.util.ClientUtils.writeXML(ClientUtils.java:
115)
    at
org.apache.solr.client.solrj.request.UpdateRequest.writeXML(UpdateReques
t.java:200)
    at
org.apache.solr.client.solrj.request.UpdateRequest.getXML(UpdateRequest.
java:178)
    at
org.apache.solr.client.solrj.request.UpdateRequest.getContentStreams(Upd
ateRequest.java:173)
    at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde
dSolrServer.java:136)
    at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:243)
    at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)


I modified the UpdateRequest class to initialize the StringWriter object in 
UpdateRequest.getXML with initial size, and cleared the SolrInputDocument that 
is having the reference of the file text. Then I am getting OOM as below:


Exception in thread main java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2786)
    at java.lang.StringCoding.safeTrim(StringCoding.java:64)
    at java.lang.StringCoding.access$300(StringCoding.java:34)
    at
java.lang.StringCoding$StringEncoder.encode(StringCoding.java:251)
    at java.lang.StringCoding.encode(StringCoding.java:272)
    at java.lang.String.getBytes(String.java:947)
    at
org.apache.solr.common.util.ContentStreamBase$StringStream.getStream(Con
tentStreamBase.java:142)
    at
org.apache.solr.common.util.ContentStreamBase$StringStream.getReader(Con
tentStreamBase.java:154)
    at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:61)
    at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte
ntStreamHandlerBase.java:54)
    at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:131)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
    at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde
dSolrServer.java:139)
    at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:249)
    at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)


After I increase the heap size upto 1250 MB, I get OOM as 

Exception in thread main java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:3209)
    at java.lang.String.init(String.java:216)
    at java.lang.StringBuffer.toString(StringBuffer.java:585)
    at
com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:403)
    at
com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821)
    at org.apache.solr.handler.XMLLoader.readDoc

Re: Outofmemory error for large files

2009-02-16 Thread Shalin Shekhar Mangar
On Tue, Feb 17, 2009 at 10:26 AM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:

 Siddharth,

 But does your 150MB file represent a single Document?  That doesn't sound
 right.


Otis, Solrj writes the whole XML in memory before writing it to server. That
may be one reason behind Sidhharth's OOME. See
https://issues.apache.org/jira/browse/SOLR-973

-- 
Regards,
Shalin Shekhar Mangar.


Re: Outofmemory error for large files

2009-02-16 Thread Otis Gospodnetic
Right.  But I was trying to point out that a single 150MB Document is not in 
fact what the o.p. wants to do.  For example, if your 150MB represents, say, a 
whole book, should that really be a single document?  Or should individual 
chapters be separate documents, for example?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch





From: Shalin Shekhar Mangar shalinman...@gmail.com
To: solr-user@lucene.apache.org
Sent: Tuesday, February 17, 2009 2:48:08 PM
Subject: Re: Outofmemory error for large files

On Tue, Feb 17, 2009 at 10:26 AM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:

 Siddharth,

 But does your 150MB file represent a single Document?  That doesn't sound
 right.


Otis, Solrj writes the whole XML in memory before writing it to server. That
may be one reason behind Sidhharth's OOME. See
https://issues.apache.org/jira/browse/SOLR-973

-- 
Regards,
Shalin Shekhar Mangar.