Re: Incremantally updating a VERY LARGE field - Is this possibe ?
Yonik Seeley-2-2 wrote > > On Wed, Apr 4, 2012 at 3:14 PM, vybe3142wrote: >> >>> Updating a single field is not possible in solr. The whole record has >>> to >>> be rewritten. >> >> Unfortunate. Lucene allows it. > > I think you're mistaken - the same limitations apply to Lucene. > > -Yonik > lucenerevolution.com - Lucene/Solr Open Source Search Conference. > Boston May 7-10 > You're correct (and I stand corrected). I looked at our older codebase that used lucene. I need to dig deeper to understand how come it doesn't crash when invoking addField() multiple times on each portion of the large text data whereas SOLR does. Speaking to the developer who wrote that code, we resorted to multiple addField() invocations to address the heap space issue. I'll post back -- View this message in context: http://lucene.472066.n3.nabble.com/Incrementally-updating-a-VERY-LARGE-field-Is-this-possibe-tp3881945p3885711.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Incremantally updating a VERY LARGE field - Is this possibe ?
depending on you jvm version, -XX:+UseCompressedStrings would help alleviate the problem. It did help me before. xab -- View this message in context: http://lucene.472066.n3.nabble.com/Incrementally-updating-a-VERY-LARGE-field-Is-this-possibe-tp3881945p3885493.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Incremantally updating a VERY LARGE field - Is this possibe ?
I believe we are talking about two different things. The original question was about incrementally building up a field during indexing, right? After a document is committed, a field cannot be separately updated, that is true in both Lucene and Solr. wunder On Apr 4, 2012, at 12:20 PM, Yonik Seeley wrote: > On Wed, Apr 4, 2012 at 3:14 PM, vybe3142 wrote: >> >>> Updating a single field is not possible in solr. The whole record has to >>> be rewritten. >> >> Unfortunate. Lucene allows it. > > I think you're mistaken - the same limitations apply to Lucene. > > -Yonik > lucenerevolution.com - Lucene/Solr Open Source Search Conference. > Boston May 7-10
Re: Incremantally updating a VERY LARGE field - Is this possibe ?
On Wed, Apr 4, 2012 at 3:14 PM, vybe3142 wrote: > >> Updating a single field is not possible in solr. The whole record has to >> be rewritten. > > Unfortunate. Lucene allows it. I think you're mistaken - the same limitations apply to Lucene. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: Incremantally updating a VERY LARGE field - Is this possibe ?
> Updating a single field is not possible in solr. The whole record has to > be rewritten. Unfortunate. Lucene allows it. -- View this message in context: http://lucene.472066.n3.nabble.com/Incrementally-updating-a-VERY-LARGE-field-Is-this-possibe-tp3881945p3885253.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Incremantally updating a VERY LARGE field - Is this possibe ?
Thanks. Increasing max. heap space is not a scalable option as it reduces the ability of the system to scale with multiple concurrent index requests. The use case is indexing a set of text files which we have no control over i.e. could be small or large. -- View this message in context: http://lucene.472066.n3.nabble.com/Incrementally-updating-a-VERY-LARGE-field-Is-this-possibe-tp3881945p3885233.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Incremantally updating a VERY LARGE field - Is this possibe ?
Yes, I think there are good reasons why it works like that. Focus of search system is to be efficient on query side at cost of being not that efficient on storage. You must however also note that by default a field's length is limited to 1 words in solrconf.xml which you may also need to modify. But I guess if its going out of memory you might have already done this? Ravish On Wed, Apr 4, 2012 at 1:34 PM, Mikhail Khludnev wrote: > There is https://issues.apache.org/jira/browse/LUCENE-3837 but I suppose > it's too far from completion. > > On Wed, Apr 4, 2012 at 2:48 PM, Ravish Bhagdev >wrote: > > > Updating a single field is not possible in solr. The whole record has to > > be rewritten. > > > > 300 MB is still not that big a file. Have you tried doing the indexing > (if > > its only a one time thing) by giving it ~2 GB or xmx? > > > > A single file with that size is strange! May I ask what is it? > > > > Rav > > > > On Tue, Apr 3, 2012 at 7:32 PM, vybe3142 wrote: > > > > > > > > Some days ago, I posted about an issue with SOLR running out of memory > > when > > > attempting to index large text files (say 300 MB ). Details at > > > > > > > > > http://lucene.472066.n3.nabble.com/Solr-Tika-crashing-when-attempting-to-index-large-files-td3846939.html > > > > > > Two things I need to point out: > > > > > > 1. I don't need Tika for content extraction as the files are already in > > > plain text format. > > > 2. The heap space error was caused by a futile Tika/SOLR attempt at > > > creating > > > the corresponding huge XML document in memory > > > > > > I've decided to develop a custom handler that > > > 1. reads the file text directly > > > 2. attempts to create a SOLR document and directly add the text data to > > the > > > corresponding field. > > > > > > One approach I've taken is to read manageable chunks of text data > > > sequentially from the file and process. We've used this approach > > > sucessfully > > > with Lucene in the past and I'm attempting to make it work with SOLR > > too. I > > > got most of the work done yesterday, but need a bit of guidance w.r.t. > > > point > > > 2. > > > > > > How can I achieve updating the same field multiple times. Looking at > the > > > SOLR source, processor.addField() merely > > > a. adds to the in-memory field map and > > > b. attempts to write EVERYTHING to the index later on. > > > > > > In my situation, (a) eventually causes a heap space error. > > > > > > > > > > > > > > > Here's part of the handler code. > > > > > > > > > > > > Thanks much > > > > > > Thanks > > > > > > -- > > > View this message in context: > > > > > > http://lucene.472066.n3.nabble.com/Incremantally-updating-a-VERY-LARGE-field-Is-this-possibe-tp3881945p3881945.html > > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > > > > -- > Sincerely yours > Mikhail Khludnev > ge...@yandex.ru > > <http://www.griddynamics.com> > >
Re: Incremantally updating a VERY LARGE field - Is this possibe ?
There is https://issues.apache.org/jira/browse/LUCENE-3837 but I suppose it's too far from completion. On Wed, Apr 4, 2012 at 2:48 PM, Ravish Bhagdev wrote: > Updating a single field is not possible in solr. The whole record has to > be rewritten. > > 300 MB is still not that big a file. Have you tried doing the indexing (if > its only a one time thing) by giving it ~2 GB or xmx? > > A single file with that size is strange! May I ask what is it? > > Rav > > On Tue, Apr 3, 2012 at 7:32 PM, vybe3142 wrote: > > > > > Some days ago, I posted about an issue with SOLR running out of memory > when > > attempting to index large text files (say 300 MB ). Details at > > > > > http://lucene.472066.n3.nabble.com/Solr-Tika-crashing-when-attempting-to-index-large-files-td3846939.html > > > > Two things I need to point out: > > > > 1. I don't need Tika for content extraction as the files are already in > > plain text format. > > 2. The heap space error was caused by a futile Tika/SOLR attempt at > > creating > > the corresponding huge XML document in memory > > > > I've decided to develop a custom handler that > > 1. reads the file text directly > > 2. attempts to create a SOLR document and directly add the text data to > the > > corresponding field. > > > > One approach I've taken is to read manageable chunks of text data > > sequentially from the file and process. We've used this approach > > sucessfully > > with Lucene in the past and I'm attempting to make it work with SOLR > too. I > > got most of the work done yesterday, but need a bit of guidance w.r.t. > > point > > 2. > > > > How can I achieve updating the same field multiple times. Looking at the > > SOLR source, processor.addField() merely > > a. adds to the in-memory field map and > > b. attempts to write EVERYTHING to the index later on. > > > > In my situation, (a) eventually causes a heap space error. > > > > > > > > > > Here's part of the handler code. > > > > > > > > Thanks much > > > > Thanks > > > > -- > > View this message in context: > > > http://lucene.472066.n3.nabble.com/Incremantally-updating-a-VERY-LARGE-field-Is-this-possibe-tp3881945p3881945.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > -- Sincerely yours Mikhail Khludnev ge...@yandex.ru <http://www.griddynamics.com>
Re: Incremantally updating a VERY LARGE field - Is this possibe ?
Updating a single field is not possible in solr. The whole record has to be rewritten. 300 MB is still not that big a file. Have you tried doing the indexing (if its only a one time thing) by giving it ~2 GB or xmx? A single file with that size is strange! May I ask what is it? Rav On Tue, Apr 3, 2012 at 7:32 PM, vybe3142 wrote: > > Some days ago, I posted about an issue with SOLR running out of memory when > attempting to index large text files (say 300 MB ). Details at > > http://lucene.472066.n3.nabble.com/Solr-Tika-crashing-when-attempting-to-index-large-files-td3846939.html > > Two things I need to point out: > > 1. I don't need Tika for content extraction as the files are already in > plain text format. > 2. The heap space error was caused by a futile Tika/SOLR attempt at > creating > the corresponding huge XML document in memory > > I've decided to develop a custom handler that > 1. reads the file text directly > 2. attempts to create a SOLR document and directly add the text data to the > corresponding field. > > One approach I've taken is to read manageable chunks of text data > sequentially from the file and process. We've used this approach > sucessfully > with Lucene in the past and I'm attempting to make it work with SOLR too. I > got most of the work done yesterday, but need a bit of guidance w.r.t. > point > 2. > > How can I achieve updating the same field multiple times. Looking at the > SOLR source, processor.addField() merely > a. adds to the in-memory field map and > b. attempts to write EVERYTHING to the index later on. > > In my situation, (a) eventually causes a heap space error. > > > > > Here's part of the handler code. > > > > Thanks much > > Thanks > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Incremantally-updating-a-VERY-LARGE-field-Is-this-possibe-tp3881945p3881945.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Incremantally updating a VERY LARGE field - Is this possibe ?
Some days ago, I posted about an issue with SOLR running out of memory when attempting to index large text files (say 300 MB ). Details at http://lucene.472066.n3.nabble.com/Solr-Tika-crashing-when-attempting-to-index-large-files-td3846939.html Two things I need to point out: 1. I don't need Tika for content extraction as the files are already in plain text format. 2. The heap space error was caused by a futile Tika/SOLR attempt at creating the corresponding huge XML document in memory I've decided to develop a custom handler that 1. reads the file text directly 2. attempts to create a SOLR document and directly add the text data to the corresponding field. One approach I've taken is to read manageable chunks of text data sequentially from the file and process. We've used this approach sucessfully with Lucene in the past and I'm attempting to make it work with SOLR too. I got most of the work done yesterday, but need a bit of guidance w.r.t. point 2. How can I achieve updating the same field multiple times. Looking at the SOLR source, processor.addField() merely a. adds to the in-memory field map and b. attempts to write EVERYTHING to the index later on. In my situation, (a) eventually causes a heap space error. Here's part of the handler code. Thanks much Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Incremantally-updating-a-VERY-LARGE-field-Is-this-possibe-tp3881945p3881945.html Sent from the Solr - User mailing list archive at Nabble.com.