Re: Out of memory errors with Spatial indexing

2020-07-06 Thread David Smiley
I believe you are experiencing this bug: LUCENE-5056

The fix would probably be adjusting code in here
org.apache.lucene.spatial.query.SpatialArgs#calcDistanceFromErrPct

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Jul 6, 2020 at 5:18 AM Sunil Varma  wrote:

> Hi David
> Thanks for your response. Yes, I noticed that all the data causing issue
> were at the poles. I tried the "RptWithGeometrySpatialField" field type
> definition but get a "Spatial context does not support S2 spatial
> index"error. Setting "spatialContextFactory="Geo3D" I still see the
> original OOM error .
>
> On Sat, 4 Jul 2020 at 05:49, David Smiley  wrote:
>
> > Hi Sunil,
> >
> > Your shape is at a pole, and I'm aware of a bug causing an exponential
> > explosion of needed grid squares when you have polygons super-close to
> the
> > pole.  Might you try S2PrefixTree instead?  I forget if this would fix it
> > or not by itself.  For indexing non-point data, I recommend
> > class="solr.RptWithGeometrySpatialField" which internally is based off a
> > combination of a course grid and storing the original vector geometry for
> > accurate verification:
> >  > class="solr.RptWithGeometrySpatialField"
> >   prefixTree="s2" />
> > The internally coarser grid will lessen the impact of that pole bug.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Fri, Jul 3, 2020 at 7:48 AM Sunil Varma 
> > wrote:
> >
> > > We are seeing OOM errors  when trying to index some spatial data. I
> > believe
> > > the data itself might not be valid but it shouldn't cause the Server to
> > > crash. We see this on both Solr 7.6 and Solr 8. Below is the input that
> > is
> > > causing the error.
> > >
> > > {
> > > "id": "bad_data_1",
> > > "spatialwkt_srpt": "LINESTRING (-126.86037681029909 -90.0
> > > 1.000150474662E30, 73.58164711175415 -90.0 1.000150474662E30,
> > > 74.52836551959528 -90.0 1.000150474662E30, 74.97006811540834 -90.0
> > > 1.000150474662E30)"
> > > }
> > >
> > > Above dynamic field is mapped to field type "location_rpt" (
> > > solr.SpatialRecursivePrefixTreeFieldType).
> > >
> > >   Any pointers to get around this issue would be highly appreciated.
> > >
> > > Thanks!
> > >
> >
>


Re: Out of memory errors with Spatial indexing

2020-07-06 Thread Sunil Varma
Hi David
Thanks for your response. Yes, I noticed that all the data causing issue
were at the poles. I tried the "RptWithGeometrySpatialField" field type
definition but get a "Spatial context does not support S2 spatial
index"error. Setting "spatialContextFactory="Geo3D" I still see the
original OOM error .

On Sat, 4 Jul 2020 at 05:49, David Smiley  wrote:

> Hi Sunil,
>
> Your shape is at a pole, and I'm aware of a bug causing an exponential
> explosion of needed grid squares when you have polygons super-close to the
> pole.  Might you try S2PrefixTree instead?  I forget if this would fix it
> or not by itself.  For indexing non-point data, I recommend
> class="solr.RptWithGeometrySpatialField" which internally is based off a
> combination of a course grid and storing the original vector geometry for
> accurate verification:
>  class="solr.RptWithGeometrySpatialField"
>   prefixTree="s2" />
> The internally coarser grid will lessen the impact of that pole bug.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Fri, Jul 3, 2020 at 7:48 AM Sunil Varma 
> wrote:
>
> > We are seeing OOM errors  when trying to index some spatial data. I
> believe
> > the data itself might not be valid but it shouldn't cause the Server to
> > crash. We see this on both Solr 7.6 and Solr 8. Below is the input that
> is
> > causing the error.
> >
> > {
> > "id": "bad_data_1",
> > "spatialwkt_srpt": "LINESTRING (-126.86037681029909 -90.0
> > 1.000150474662E30, 73.58164711175415 -90.0 1.000150474662E30,
> > 74.52836551959528 -90.0 1.000150474662E30, 74.97006811540834 -90.0
> > 1.000150474662E30)"
> > }
> >
> > Above dynamic field is mapped to field type "location_rpt" (
> > solr.SpatialRecursivePrefixTreeFieldType).
> >
> >   Any pointers to get around this issue would be highly appreciated.
> >
> > Thanks!
> >
>


Re: Out of memory errors with Spatial indexing

2020-07-03 Thread David Smiley
Hi Sunil,

Your shape is at a pole, and I'm aware of a bug causing an exponential
explosion of needed grid squares when you have polygons super-close to the
pole.  Might you try S2PrefixTree instead?  I forget if this would fix it
or not by itself.  For indexing non-point data, I recommend
class="solr.RptWithGeometrySpatialField" which internally is based off a
combination of a course grid and storing the original vector geometry for
accurate verification:

The internally coarser grid will lessen the impact of that pole bug.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Jul 3, 2020 at 7:48 AM Sunil Varma  wrote:

> We are seeing OOM errors  when trying to index some spatial data. I believe
> the data itself might not be valid but it shouldn't cause the Server to
> crash. We see this on both Solr 7.6 and Solr 8. Below is the input that is
> causing the error.
>
> {
> "id": "bad_data_1",
> "spatialwkt_srpt": "LINESTRING (-126.86037681029909 -90.0
> 1.000150474662E30, 73.58164711175415 -90.0 1.000150474662E30,
> 74.52836551959528 -90.0 1.000150474662E30, 74.97006811540834 -90.0
> 1.000150474662E30)"
> }
>
> Above dynamic field is mapped to field type "location_rpt" (
> solr.SpatialRecursivePrefixTreeFieldType).
>
>   Any pointers to get around this issue would be highly appreciated.
>
> Thanks!
>


Re: Out of Memory Errors

2017-06-14 Thread Susheel Kumar
The attachment will not come thru.  Can you upload thru dropbox / other
sharing sites etc.

On Wed, Jun 14, 2017 at 12:41 PM, Satya Marivada 
wrote:

> Susheel, Please see attached. There  heap towards the end of graph has
> spiked
>
>
>
> On Wed, Jun 14, 2017 at 11:46 AM Susheel Kumar 
> wrote:
>
>> You may have gc logs saved when OOM happened. Can you draw it in GC Viewer
>> or so and share.
>>
>> Thnx
>>
>> On Wed, Jun 14, 2017 at 11:26 AM, Satya Marivada <
>> satya.chaita...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > I am getting Out of Memory Errors after a while on solr-6.3.0.
>> > The -XX:OnOutOfMemoryError=/sanfs/mnt/vol01/solr/solr-6.3.0/bin/
>> oom_solr.sh
>> > just kills the jvm right after.
>> > Using Jconsole, I see the nice triangle pattern, where it uses the heap
>> > and being reclaimed back.
>> >
>> > The heap size is set at 3g. The index size hosted on that particular
>> node
>> > is 17G.
>> >
>> > java -server -Xms3g -Xmx3g -XX:NewRatio=3 -XX:SurvivorRatio=4
>> > -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8
>> > -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4
>> > -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark
>> > -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:
>> > CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
>> >
>> > Looking at the solr_gc.log.0, the eden space is being used 100% all the
>> > while and being successfully reclaimed. So don't think that has go to do
>> > with it.
>> >
>> > Apart from that in the solr.log, I see exceptions that are aftermath of
>> > killing the jvm
>> >
>> > org.eclipse.jetty.io.EofException: Closed
>> > at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.
>> java:383)
>> > at org.apache.commons.io.output.ProxyOutputStream.write(
>> > ProxyOutputStream.java:90)
>> > at org.apache.solr.common.util.FastOutputStream.flush(
>> > FastOutputStream.java:213)
>> > at org.apache.solr.common.util.FastOutputStream.flushBuffer(
>> > FastOutputStream.java:206)
>> > at org.apache.solr.common.util.JavaBinCodec.marshal(
>> > JavaBinCodec.java:136)
>> >
>> > Any suggestions on how to go about it.
>> >
>> > Thanks,
>> > Satya
>> >
>>
>


Re: Out of Memory Errors

2017-06-14 Thread Satya Marivada
Susheel, Please see attached. There  heap towards the end of graph has
spiked



On Wed, Jun 14, 2017 at 11:46 AM Susheel Kumar 
wrote:

> You may have gc logs saved when OOM happened. Can you draw it in GC Viewer
> or so and share.
>
> Thnx
>
> On Wed, Jun 14, 2017 at 11:26 AM, Satya Marivada <
> satya.chaita...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I am getting Out of Memory Errors after a while on solr-6.3.0.
> > The
> -XX:OnOutOfMemoryError=/sanfs/mnt/vol01/solr/solr-6.3.0/bin/oom_solr.sh
> > just kills the jvm right after.
> > Using Jconsole, I see the nice triangle pattern, where it uses the heap
> > and being reclaimed back.
> >
> > The heap size is set at 3g. The index size hosted on that particular node
> > is 17G.
> >
> > java -server -Xms3g -Xmx3g -XX:NewRatio=3 -XX:SurvivorRatio=4
> > -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8
> > -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4
> > -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark
> > -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:
> > CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
> >
> > Looking at the solr_gc.log.0, the eden space is being used 100% all the
> > while and being successfully reclaimed. So don't think that has go to do
> > with it.
> >
> > Apart from that in the solr.log, I see exceptions that are aftermath of
> > killing the jvm
> >
> > org.eclipse.jetty.io.EofException: Closed
> > at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:383)
> > at org.apache.commons.io.output.ProxyOutputStream.write(
> > ProxyOutputStream.java:90)
> > at org.apache.solr.common.util.FastOutputStream.flush(
> > FastOutputStream.java:213)
> > at org.apache.solr.common.util.FastOutputStream.flushBuffer(
> > FastOutputStream.java:206)
> > at org.apache.solr.common.util.JavaBinCodec.marshal(
> > JavaBinCodec.java:136)
> >
> > Any suggestions on how to go about it.
> >
> > Thanks,
> > Satya
> >
>


Re: Out of Memory Errors

2017-06-14 Thread Susheel Kumar
You may have gc logs saved when OOM happened. Can you draw it in GC Viewer
or so and share.

Thnx

On Wed, Jun 14, 2017 at 11:26 AM, Satya Marivada 
wrote:

> Hi,
>
> I am getting Out of Memory Errors after a while on solr-6.3.0.
> The -XX:OnOutOfMemoryError=/sanfs/mnt/vol01/solr/solr-6.3.0/bin/oom_solr.sh
> just kills the jvm right after.
> Using Jconsole, I see the nice triangle pattern, where it uses the heap
> and being reclaimed back.
>
> The heap size is set at 3g. The index size hosted on that particular node
> is 17G.
>
> java -server -Xms3g -Xmx3g -XX:NewRatio=3 -XX:SurvivorRatio=4
> -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8
> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4
> -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark
> -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:
> CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
>
> Looking at the solr_gc.log.0, the eden space is being used 100% all the
> while and being successfully reclaimed. So don't think that has go to do
> with it.
>
> Apart from that in the solr.log, I see exceptions that are aftermath of
> killing the jvm
>
> org.eclipse.jetty.io.EofException: Closed
> at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:383)
> at org.apache.commons.io.output.ProxyOutputStream.write(
> ProxyOutputStream.java:90)
> at org.apache.solr.common.util.FastOutputStream.flush(
> FastOutputStream.java:213)
> at org.apache.solr.common.util.FastOutputStream.flushBuffer(
> FastOutputStream.java:206)
> at org.apache.solr.common.util.JavaBinCodec.marshal(
> JavaBinCodec.java:136)
>
> Any suggestions on how to go about it.
>
> Thanks,
> Satya
>


Re: Out of memory error during full import

2016-02-04 Thread Shawn Heisey
On 2/4/2016 12:18 AM, Srinivas Kashyap wrote:
> I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the 
> child entities in data-config.xml. When i try to do full import, i'm getting 
> OutOfMemory error(Java Heap Space). I increased the HEAP allocation to the 
> maximum extent possible. Is there a workaround to do initial data load 
> without running into this error?
>
> I found that 'batchSize=-1' parameter needs to be specified in the datasource 
> for MySql, is there a way to specify for others Databases as well?

Setting batchSize to -1 in the DIH config translates to a 'setFetchSize'
on the JDBC object of Integer.MIN_VALUE.  This is how to turn on result
streaming in MySQL.

The method for doing this with other JDBC implementations is likely to
be different.  The Microsoft driver for SQL Server uses a URL parameter,
and newer versions of that particular driver have the streaming behavior
as default.  I have no idea how to do it for any other driver, you would
need to ask the author of the driver.

When you turn on caching (SortedMapBackedCache), you are asking Solr to
put all of the data received into memory -- very similar to what happens
if result streaming is not turned on.  When the SQL result is very
large, this can require a LOT of memory.  In situations like that,
you'll just have to remove the caching.  One alternative to child
entities is to do a query using JOIN in a single entity, so that all the
data you need is returned by a single SQL query, where the heavy lifting
is done by the database server instead of Solr.

The MySQL database that serves as the information source for *my* Solr
index is hundreds of gigabytes in size, so caching it is not possible
for me.  The batchSize=-1 option is the only way to get the import to work.

Thanks,
Shawn



Re: out of memory when trying to sort by id in a 1.5 billion index

2014-11-07 Thread Yago Riveiro
For sorting DocValues are the best option I think.

—
/Yago Riveiro

On Fri, Nov 7, 2014 at 12:45 PM, adfel70 adfe...@gmail.com wrote:

 hi
 I have 11 machines in my cluster.
 each machine 128GB memory, 2 solr jvm's with 12gb heap each.
 cluster has 7 shard, 3 replicas.
 1.5 billion docs total.
 most user queries are pretty simple for now, sorting by date fields and
 another field the has around 1000 unique values.
 I have a usecase for using cursorpage and when tried to check this, I got
 outOfMemory just for sorting by id.
 I read in old posts that I should add heap memory, and I can do that, but I
 would rather not .
 All other usecases I have are using stable 8gb heap .
 Any other way to handle this in solr 4.8?
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/out-of-memory-when-trying-to-sort-by-id-in-a-1-5-billion-index-tp4168156.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: out of memory when trying to sort by id in a 1.5 billion index

2014-11-07 Thread Chris Hostetter

: For sorting DocValues are the best option I think.

yep, definitely a good idea.

:  I have a usecase for using cursorpage and when tried to check this, I got
:  outOfMemory just for sorting by id.

what does the field/fieldType for your uniqueKey field look like?

If you aren't using DocValues, then the amount of RAM needed is going to 
vary widely depending on the datatype used by the FieldCache.

-Hoss
http://www.lucidworks.com/


Re: Out of Memory when i downdload 5 Million records from sqlserver to solr

2014-07-02 Thread Shawn Heisey
On 7/1/2014 4:57 AM, mskeerthi wrote:
 I have to download my 5 million records from sqlserver to solr into one
 index. I am getting below exception after downloading 1 Million records. Is
 there any configuration or another to download from sqlserver to solr.
 
 Below is the exception i am getting in solr:
 org.apache.solr.common.SolrException; auto commit
 error...:java.lang.IllegalStateException: this writer hit an
 OutOfMemoryError; cannot commit

JDBC has a bad habit of defaulting to a mode where it will try to load
the entire SQL result set into RAM.  Different JDBC drivers have
different ways of dealing with this problem.  For Microsoft SQL Server,
here's a guide:

https://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_MS_SQL_Server_database_with_sqljdbc_driver._DataImportHandler_is_going_out_of_memory._I_tried_adjustng_the_batchSize_values_but_they_don.27t_seem_to_make_any_difference._How_do_I_fix_this.3F

If you have trouble with that really long URL in your mail client, just
visit the main FAQ page and click on the link for SQL Server:

https://wiki.apache.org/solr/DataImportHandlerFaq

Thanks,
Shawn



Re: Out of Memory when i downdload 5 Million records from sqlserver to solr

2014-07-01 Thread Aman Tandon
You can try gave some more memory to solr
On Jul 1, 2014 4:41 PM, mskeerthi mskeer...@gmail.com wrote:

 I have to download my 5 million records from sqlserver to solr into one
 index. I am getting below exception after downloading 1 Million records. Is
 there any configuration or another to download from sqlserver to solr.

 Below is the exception i am getting in solr:
 org.apache.solr.common.SolrException; auto commit
 error...:java.lang.IllegalStateException: this writer hit an
 OutOfMemoryError; cannot commit
 at

 org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2915)
 at
 org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3096)
 at
 org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3063)
 at

 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:578)
 at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
 Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at

 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown
 Source)
 at

 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
 Source)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)
 at java.lang.Thread.run(Unknown Source)



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Out-of-Memory-when-i-downdload-5-Million-records-from-sqlserver-to-solr-tp4144949.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Out of Memory when i downdload 5 Million records from sqlserver to solr

2014-07-01 Thread IJ
We faced similar problems on our side. We found it more reliable to have a
mechanism to extract all data from the Database into a flat file - and then
use a JAVA program to bulk index into Solr from the file via SolrJ API.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Out-of-Memory-when-i-downdload-5-Million-records-from-sqlserver-to-solr-tp4144949p4145041.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: out of memory during indexing do to large incoming queue

2013-06-17 Thread Yoni Amir
Thanks Shawn,
This was very helpful. Indeed I had some terminology problem regarding the 
segment merging. In any case, I tweaked those parameters that you recommended 
and it helped a lot.

I was wondering about your recommendation to use facet.method=enum? Can you 
explain what is the trade-off here? I understand that I gain a benefit by using 
less memory, but what with I lose? Is it speed?

Also, do you know if there is an answer to my original question in this thread? 
Solr has a queue of incoming requests, which, in my case, kept on growing. I 
looked at the code but couldn't find it, I think maybe it is an implicit queue 
in the form of Java's concurrent thread pool or something like that.

Is it possible to limit the size of this queue, or to determine its size during 
runtime? This is the last issue that I am trying to figure out right now.

Also, to answer your question about the field all_text: all the fields are 
stored in order to support partial-update of documents. Most of the fields are 
used for highlighting, all_text is used for searching. I'll gladly omit 
all_text from being stored, but then partial-update won't work.
The reason I didn't use edismax to search all the fields, is because the list 
of all fields is very long. Can edismax handle several hundred fields in the 
list? What about dynamic fields? Edismax requires the list to be fixed in the 
configuration file, so I can't include dynamic fields there. I can pass along 
the full list in the 'qf' parameter in every search request, but this seems 
like a waste? Also, what about performance? I was told that the best practice 
in this case (you have lots of fields and want to search everything) is to copy 
everything to a catch-all field.

Thanks again,
Yoni

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Monday, June 03, 2013 17:08
To: solr-user@lucene.apache.org
Subject: Re: out of memory during indexing do to large incoming queue

On 6/3/2013 1:06 AM, Yoni Amir wrote:
 Solrconfig.xml - http://apaste.info/dsbv
 
 Schema.xml - http://apaste.info/67PI
 
 This solrconfig.xml file has optimization enabled. I had another file which I 
 can't locate at the moment, in which I defined a custom merge scheduler in 
 order to disable optimization.
 
 When I say 1000 segments, I mean that's the number I saw in Solr UI. I assume 
 there were much more files than that.

I think we have a terminology problem happening here.  There's nothing you can 
put in a solrconfig.xml file to enable optimization.  Solr will only optimize 
when you explicitly send an optimize command to it.  There is segment merging, 
but that's not the same thing.  Segment merging is completely normal.  Normally 
it's in the background and indexing will continue while it's occurring, but if 
you get too many merges happening at once, that can stop indexing.  I have a 
solution for that:

At the following URL s my indexConfig section, geared towards heavy indexing.  
The TieredMergePolicy settings are the equivalent of a legacy mergeFactor of 
35.  I've gone with a lower-than-default ramBufferSizeMB here, to reduce memory 
usage.  The default value for this setting as of version 4.1 is 100:

http://apaste.info/4gaD

One thing that this configuration does which might directly impact on your 
setup is increase the maxMergeCount.  I believe the default value for this is 
3.  This means that if you get more than three levels of merging happening at 
the same time, indexing will stop until until the number of levels drops.  
Because Solr always does the biggest merge first, this can really take a long 
time.  The combination of a large mergeFactor and a larger-than-normal 
maxMergeCount will ensure that this situation never happens.

If you are not using SSD, don't increase maxThreadCount beyond one.  The 
random-access characteristics of regular hard disks will make things go slower 
with more threads, not faster.  With SSD, increasing the threads can make 
things go faster.

There's a few high memory use things going on in your config/schema.

The first thing that jumped out at me is facets.  They use a lot of memory.  
You can greatly reduce the memory use by adding facet.method=enum to the 
query.  The default for the method is fc, which means fieldcache.  The size of 
the Lucene fieldcache cannot be directly controlled by Solr, unlike Solr's own 
caches.  It gets as big as it needs to be, and facets using the fc method will 
put all the facet data for the entire index in the fieldcache.

The second thing that jumped out at me is the fact that all_text is being 
stored.  Apparently this is for highlighting.  I will admit that I do not know 
anything about highlighting, so you might need separate help there.  You are 
using edismax for your query parser, which is perfectly capable of searching 
all the fields that make up all_text, so in my mind, all_text doesn't need to 
exist at all.

If you wrote a custom merge scheduler that disables

Re: out of memory during indexing do to large incoming queue

2013-06-17 Thread Shawn Heisey

On 6/17/2013 4:32 AM, Yoni Amir wrote:

I was wondering about your recommendation to use facet.method=enum? Can you 
explain what is the trade-off here? I understand that I gain a benefit by using 
less memory, but what with I lose? Is it speed?


The problem with facet.method=fc (the default) and memeory is that every 
field and query that you use for faceting ends up separately cached in 
the FieldCache, and the memory required grows as your index grows.  If 
you only use facets on one or two fields, then the normal method is 
fine, and subsequent facets will be faster.  It does eat a lot of java 
heap memory, though ... and the bigger your java heap is, the more 
problems you'll have with garbage collection.


With enum, it must gather the data out of the index for every facet run. 
 If you have plenty of extra memory for the OS disk cache, this is not 
normally a major issue, because it will be pulled out of RAM, similar to 
what happens with fc, except that it's not java heap memory.  The OS is 
a lot more efficient with how it uses memory than Java is.



Also, do you know if there is an answer to my original question in this thread? 
Solr has a queue of incoming requests, which, in my case, kept on growing. I 
looked at the code but couldn't find it, I think maybe it is an implicit queue 
in the form of Java's concurrent thread pool or something like that.

Is it possible to limit the size of this queue, or to determine its size during 
runtime? This is the last issue that I am trying to figure out right now.


I do not know the answer to this.


Also, to answer your question about the field all_text: all the fields are 
stored in order to support partial-update of documents. Most of the fields are 
used for highlighting, all_text is used for searching. I'll gladly omit 
all_text from being stored, but then partial-update won't work.


Your copyFields will still work just fine with atomic updates even if 
they are not stored.  Behind the scenes, an atomic update is a delete 
and an add with the stored data plus the changes... if all your source 
fields are stored, then the copyField should be generated correctly from 
all the source fields.


The wiki page on the subject actually says that copyField destinations 
*MUST* be set to stored=false.


http://wiki.apache.org/solr/Atomic_Updates#Caveats_and_Limitations


The reason I didn't use edismax to search all the fields, is because the list 
of all fields is very long. Can edismax handle several hundred fields in the 
list? What about dynamic fields? Edismax requires the list to be fixed in the 
configuration file, so I can't include dynamic fields there. I can pass along 
the full list in the 'qf' parameter in every search request, but this seems 
like a waste? Also, what about performance? I was told that the best practice 
in this case (you have lots of fields and want to search everything) is to copy 
everything to a catch-all field.


If there is ever any situation where you can come up with some searches 
that only need to search against some of the fields and other searches 
that need to search against different fields, then you might consider 
creating different search handlers with different qf lists.  If you 
always want to search against all the fields, then it's probably more 
efficient to keep your current method.


Thanks,
Shawn



RE: out of memory during indexing do to large incoming queue

2013-06-03 Thread Yoni Amir
Solrconfig.xml - http://apaste.info/dsbv

Schema.xml - http://apaste.info/67PI

This solrconfig.xml file has optimization enabled. I had another file which I 
can't locate at the moment, in which I defined a custom merge scheduler in 
order to disable optimization.

When I say 1000 segments, I mean that's the number I saw in Solr UI. I assume 
there were much more files than that.

Thanks,
Yoni



-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Sunday, June 02, 2013 22:53
To: solr-user@lucene.apache.org
Subject: Re: out of memory during indexing do to large incoming queue

On 6/2/2013 12:25 PM, Yoni Amir wrote:
 Hi Shawn and Shreejay, thanks for the response.
 Here is some more information:
 1) The machine is a virtual machine on ESX server. It has 4 CPUs and 
 8GB of RAM. I don't remember what CPU but something modern enough. It 
 is running Java 7 without any special parameters, and 4GB allocated 
 for Java (-Xmx)
 2) After successful indexing, I have 2.5 Million documents, 117GB index size. 
 This is the size after it was optimized.
 3) I plan to upgrade to 4.3 just didn't have time. 4.0 beta is what was 
 available at the time that we had a release deadline.
 4) The setup with master-slave replication, not Solr Cloud. The server that I 
 am discussing is the indexing server, and in these tests there were actually 
 no slaves involved, and virtually zero searches performed.
 5) Attached is my configuration. I tried to disable the warm-up and opening 
 of searchers, it didn't change anything. The commits are done by Solr, using 
 autocommit. The client sends the updates without a commit command.
 6) I want to disable optimization, but when I disabled it, the OOME occurred 
 even faster. The number of segments reached around a thousand within an hour 
 or so. I don't know if it's normal or not, but at that point if I restarted 
 Solr it immediately took about 1GB of heap space just on start-up, instead of 
 the usual 50MB or so.
 
 If I commit less frequently, don't I increase the risk of losing data, e.g., 
 if the power goes down, etc.?
 If I disable optimization, is it necessary to avoid such a large number of 
 segments? Is it possible?

Last part first: Losing data is much less of a risk with Solr 4.x, if you have 
enabled the updateLog.

We'll need some more info.  See the end of the message for specifics.

Right off the bat, I can tell you that with an index that's 117GB, you're going 
to need a LOT of RAM.

Each of my 4.2.1 servers has 42GB of index and about 37 million documents 
between all the index shards.  The web application never uses facets, which 
tend to use a lot of memory.  My index is a lot smaller than yours, and I need 
a 6GB heap, seeing OOM errors if it's only 4GB.
You probably need at least an 8GB heap, and possibly larger.

Beyond the amount of memory that Solr itself uses, for good performance you 
will also need a large amount of memory for OS disk caching.  Unless the server 
is using SSD, you need to allocate at least 64GB of real memory to the virtual 
machine.  If you've got your index on SSD, 32GB might be enough.  I've got 64GB 
total on my servers.

http://wiki.apache.org/solr/SolrPerformanceProblems

When you say that there are over 1000 segments, are you seeing 1000 files, or 
are there literally 1000 segments, giving you between 12000 and 15000 files?  
Even if your mergeFactor were higher than the default 10, that just shouldn't 
happen.

Can you share your solrconfig.xml and schema.xml?  Use a paste website like 
http://apaste.info and share the URLs.

Thanks,
Shawn


Confidentiality: This communication and any attachments are intended for the 
above-named persons only and may be confidential and/or legally privileged. Any 
opinions expressed in this communication are not necessarily those of NICE 
Actimize. If this communication has come to you in error you must take no 
action based on it, nor must you copy or show it to anyone; please 
delete/destroy and inform the sender by e-mail immediately.  
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and 
attachments are free from any virus, we advise that in keeping with good 
computing practice the recipient should ensure they are actually virus free.



Re: out of memory during indexing do to large incoming queue

2013-06-02 Thread Shreejay
A couple of things: 

1) can you give some more details about your setup ? Like whether its cloud or 
single instance . How many nodes if its cloud.  The hardware - memory per 
machine , JVM options. Etc 

2) any specific reason for using 4.0 beta? The latest version is 4.3. I used 
4.0 for a few weeks and there were a lot if bugs related to memory and 
communication between nodes ( zookeeper) 
3) if you haven't seen it already , please go through this wiki page . It's an 
excellent starting point for troubleshooting memory n indexing issues. 
Specially section 3 to 7 
http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations


-- 
Shreejay


On Sunday, June 2, 2013 at 7:16, Yoni Amir wrote:

 Hello,
 I am receiving OutOfMemoryError during indexing, and after investigating the 
 heap dump, I am still missing some information, and I thought this might be a 
 good place for help.
 
 I am using Solr 4.0 beta, and I have 5 threads that send update requests to 
 Solr. Each request is a bulk of 100 SolrInputDocuments (using solrj), and my 
 goal is to index around 2.5 million documents.
 Solr is configured to do a hard-commit every 10 seconds, so initially I 
 thought that it can only accumulate in memory 10 seconds worth of updates, 
 but that's not the case. I can see in a profiler how it accumulates memory 
 over time, even with 4 to 6 GB of memory. It is also configured to optimize 
 with mergeFactor=10.
 
 At first I thought that optimization is a blocking, synchronous operation. It 
 is, in the sense that the index can't be updated during optimization. 
 However, it is not synchronous, in the sense that the update request coming 
 from my code is not blocked - Solr just returns an OK response, even while 
 the index is optimizing.
 This indicates that Solr has an internal queue of inbound requests, and that 
 the OK response just means that it is in the queue. I get confirmation for 
 this from a friend who is a Solr expert (or so I hope).
 
 My main question is: how can I put a bound on this internal queue, and make 
 update requests synchronous in case the queue is full? Put it another way, I 
 need to know if Solr is really ready to receive more requests, so I don't 
 overload it and cause OOME.
 
 I performed several tests, with slow and fast disks, and on the really fasts 
 disk the problem didn't occur. However, I can't demand such fast disk from 
 all the clients, and also even with a fast disk the problem will occur 
 eventually when I try to index 10 million documents.
 I also tried to perform indexing with optimization disabled, but it didn't 
 help.
 
 Thanks,
 Yoni
 
 Confidentiality: This communication and any attachments are intended for the 
 above-named persons only and may be confidential and/or legally privileged. 
 Any opinions expressed in this communication are not necessarily those of 
 NICE Actimize. If this communication has come to you in error you must take 
 no action based on it, nor must you copy or show it to anyone; please 
 delete/destroy and inform the sender by e-mail immediately. 
 Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
 Viruses: Although we have taken steps toward ensuring that this e-mail and 
 attachments are free from any virus, we advise that in keeping with good 
 computing practice the recipient should ensure they are actually virus free.
 
 




Re: out of memory during indexing do to large incoming queue

2013-06-02 Thread Shawn Heisey
On 6/2/2013 8:16 AM, Yoni Amir wrote:
 Hello,
 I am receiving OutOfMemoryError during indexing, and after investigating the 
 heap dump, I am still missing some information, and I thought this might be a 
 good place for help.
 
 I am using Solr 4.0 beta, and I have 5 threads that send update requests to 
 Solr. Each request is a bulk of 100 SolrInputDocuments (using solrj), and my 
 goal is to index around 2.5 million documents.
 Solr is configured to do a hard-commit every 10 seconds, so initially I 
 thought that it can only accumulate in memory 10 seconds worth of updates, 
 but that's not the case. I can see in a profiler how it accumulates memory 
 over time, even with 4 to 6 GB of memory. It is also configured to optimize 
 with mergeFactor=10.

4.0-BETA came out several months ago.  Even at the time, support for the
alpha and beta releases was limited.  Now it has been superseded by
4.0.0, 4.1.0, 4.2.0, 4.2.1, and 4.3.0, all of which are full releases.
There is a 4.3.1 release currently in the works.  Please upgrade.

Ten seconds is a very short interval for hard commits, even if you have
openSearcher=false.  Frequent hard commits can cause a whole host of
problems.  It's better to have an interval of several minutes, and I
wouldn't go less than a minute.  Soft commits can be much more frequent,
but if you are frequently opening new searchers, you'll probably want to
disable cache warming.

On optimization: don't do it unless you absolutely must.  Most of the
time, optimization is only needed if you delete a lot of documents and
you need to get them removed from your index.  If you must optimize to
get rid of deleted documents, do it on a very long interval (once a day,
once a week) and pause indexing during optimization.

You haven't said anything about your index size, java heap size, total
RAM, etc.  With those numbers I could offer some guesses about what you
need, but I'll warn you that they would only be guesses - watching a
system with real data under load is the only way to get concrete
information.  Here are some basic guidelines on performance problems and
RAM information:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn



RE: out of memory during indexing do to large incoming queue

2013-06-02 Thread Yoni Amir
Hi Shawn and Shreejay, thanks for the response.
Here is some more information:
1) The machine is a virtual machine on ESX server. It has 4 CPUs and 8GB of 
RAM. I don't remember what CPU but something modern enough. It is running Java 
7 without any special parameters, and 4GB allocated for Java (-Xmx)
2) After successful indexing, I have 2.5 Million documents, 117GB index size. 
This is the size after it was optimized.
3) I plan to upgrade to 4.3 just didn't have time. 4.0 beta is what was 
available at the time that we had a release deadline.
4) The setup with master-slave replication, not Solr Cloud. The server that I 
am discussing is the indexing server, and in these tests there were actually no 
slaves involved, and virtually zero searches performed.
5) Attached is my configuration. I tried to disable the warm-up and opening of 
searchers, it didn't change anything. The commits are done by Solr, using 
autocommit. The client sends the updates without a commit command.
6) I want to disable optimization, but when I disabled it, the OOME occurred 
even faster. The number of segments reached around a thousand within an hour or 
so. I don't know if it's normal or not, but at that point if I restarted Solr 
it immediately took about 1GB of heap space just on start-up, instead of the 
usual 50MB or so.

If I commit less frequently, don't I increase the risk of losing data, e.g., if 
the power goes down, etc.?
If I disable optimization, is it necessary to avoid such a large number of 
segments? Is it possible?

Thanks again,
Yoni



-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Sunday, June 02, 2013 18:05
To: solr-user@lucene.apache.org
Subject: Re: out of memory during indexing do to large incoming queue

On 6/2/2013 8:16 AM, Yoni Amir wrote:
 Hello,
 I am receiving OutOfMemoryError during indexing, and after investigating the 
 heap dump, I am still missing some information, and I thought this might be a 
 good place for help.
 
 I am using Solr 4.0 beta, and I have 5 threads that send update requests to 
 Solr. Each request is a bulk of 100 SolrInputDocuments (using solrj), and my 
 goal is to index around 2.5 million documents.
 Solr is configured to do a hard-commit every 10 seconds, so initially I 
 thought that it can only accumulate in memory 10 seconds worth of updates, 
 but that's not the case. I can see in a profiler how it accumulates memory 
 over time, even with 4 to 6 GB of memory. It is also configured to optimize 
 with mergeFactor=10.

4.0-BETA came out several months ago.  Even at the time, support for the alpha 
and beta releases was limited.  Now it has been superseded by 4.0.0, 4.1.0, 
4.2.0, 4.2.1, and 4.3.0, all of which are full releases.
There is a 4.3.1 release currently in the works.  Please upgrade.

Ten seconds is a very short interval for hard commits, even if you have 
openSearcher=false.  Frequent hard commits can cause a whole host of problems.  
It's better to have an interval of several minutes, and I wouldn't go less than 
a minute.  Soft commits can be much more frequent, but if you are frequently 
opening new searchers, you'll probably want to disable cache warming.

On optimization: don't do it unless you absolutely must.  Most of the time, 
optimization is only needed if you delete a lot of documents and you need to 
get them removed from your index.  If you must optimize to get rid of deleted 
documents, do it on a very long interval (once a day, once a week) and pause 
indexing during optimization.

You haven't said anything about your index size, java heap size, total RAM, 
etc.  With those numbers I could offer some guesses about what you need, but 
I'll warn you that they would only be guesses - watching a system with real 
data under load is the only way to get concrete information.  Here are some 
basic guidelines on performance problems and RAM information:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn


Confidentiality: This communication and any attachments are intended for the 
above-named persons only and may be confidential and/or legally privileged. Any 
opinions expressed in this communication are not necessarily those of NICE 
Actimize. If this communication has come to you in error you must take no 
action based on it, nor must you copy or show it to anyone; please 
delete/destroy and inform the sender by e-mail immediately.  
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and 
attachments are free from any virus, we advise that in keeping with good 
computing practice the recipient should ensure they are actually virus free.
?xml version=1.0 encoding=UTF-8 ?
config
	luceneMatchVersionLUCENE_40/luceneMatchVersion

	dataDir${solr.data.dir:}/dataDir

	directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory} /

	indexConfig

		!-- ramBufferSizeMB sets the amount

Re: out of memory during indexing do to large incoming queue

2013-06-02 Thread Shawn Heisey
On 6/2/2013 12:25 PM, Yoni Amir wrote:
 Hi Shawn and Shreejay, thanks for the response.
 Here is some more information:
 1) The machine is a virtual machine on ESX server. It has 4 CPUs and 8GB of 
 RAM. I don't remember what CPU but something modern enough. It is running 
 Java 7 without any special parameters, and 4GB allocated for Java (-Xmx)
 2) After successful indexing, I have 2.5 Million documents, 117GB index size. 
 This is the size after it was optimized.
 3) I plan to upgrade to 4.3 just didn't have time. 4.0 beta is what was 
 available at the time that we had a release deadline.
 4) The setup with master-slave replication, not Solr Cloud. The server that I 
 am discussing is the indexing server, and in these tests there were actually 
 no slaves involved, and virtually zero searches performed.
 5) Attached is my configuration. I tried to disable the warm-up and opening 
 of searchers, it didn't change anything. The commits are done by Solr, using 
 autocommit. The client sends the updates without a commit command.
 6) I want to disable optimization, but when I disabled it, the OOME occurred 
 even faster. The number of segments reached around a thousand within an hour 
 or so. I don't know if it's normal or not, but at that point if I restarted 
 Solr it immediately took about 1GB of heap space just on start-up, instead of 
 the usual 50MB or so.
 
 If I commit less frequently, don't I increase the risk of losing data, e.g., 
 if the power goes down, etc.?
 If I disable optimization, is it necessary to avoid such a large number of 
 segments? Is it possible?

Last part first: Losing data is much less of a risk with Solr 4.x, if
you have enabled the updateLog.

We'll need some more info.  See the end of the message for specifics.

Right off the bat, I can tell you that with an index that's 117GB,
you're going to need a LOT of RAM.

Each of my 4.2.1 servers has 42GB of index and about 37 million
documents between all the index shards.  The web application never uses
facets, which tend to use a lot of memory.  My index is a lot smaller
than yours, and I need a 6GB heap, seeing OOM errors if it's only 4GB.
You probably need at least an 8GB heap, and possibly larger.

Beyond the amount of memory that Solr itself uses, for good performance
you will also need a large amount of memory for OS disk caching.  Unless
the server is using SSD, you need to allocate at least 64GB of real
memory to the virtual machine.  If you've got your index on SSD, 32GB
might be enough.  I've got 64GB total on my servers.

http://wiki.apache.org/solr/SolrPerformanceProblems

When you say that there are over 1000 segments, are you seeing 1000
files, or are there literally 1000 segments, giving you between 12000
and 15000 files?  Even if your mergeFactor were higher than the default
10, that just shouldn't happen.

Can you share your solrconfig.xml and schema.xml?  Use a paste website
like http://apaste.info and share the URLs.

Thanks,
Shawn



Re: Out of memory on some faceting queries

2013-04-08 Thread Dotan Cohen
On Wed, Apr 3, 2013 at 8:47 PM, Shawn Heisey s...@elyograg.org wrote:
 On 4/2/2013 3:09 AM, Dotan Cohen wrote:
 I notice that this only occurs on queries that run facets. I start
 Solr with the following command:
 sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
 -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar
 /opt/solr-4.1.0/example/start.jar 

 It looks like you've followed some advice that I gave previously on how
 to tune java.  I have since learned that this advice is bad, it results
 in long GC pauses, even with heaps that aren't huge.


I see, thanks.

 As others have pointed out, you don't have a max heap setting, which
 would mean that you're using whatever Java chooses for its default,
 which might not be enough.  If you can get Solr to successfully run for
 a while with queries and updates happening, the heap should eventually
 max out and the admin UI will show you what Java is choosing by default.

 Here is what I would now recommend for a beginning point on your Solr
 startup command.  You may need to increase the heap beyond 4GB, but be
 careful that you still have enough free memory to be able to do
 effective caching of your index.

 sudo nohup java -Xms4096M -Xmx4096M -XX:+UseConcMarkSweepGC
 -XX:CMSInitiatingOccupancyFraction=75 -XX:NewRatio=3
 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled
 -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts
 -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar
 /opt/solr-4.1.0/example/start.jar 


Thank you, I will experiment with that.

 If you are running a really old build of java (latest versions on
 Oracle's website are 1.6 build 43 and 1.7 build 17), you might want to
 leave AggressiveOpts out.  Some people would argue that you should never
 use that option.


Great, thank for the warning. This is what we're running, I'll see
about updating it through my distro's package manager:
$ java -version
java version 1.6.0_27
OpenJDK Runtime Environment (IcedTea6 1.12.3) (6b27-1.12.3-0ubuntu1~12.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Out of memory on some faceting queries

2013-04-03 Thread Toke Eskildsen
On Tue, 2013-04-02 at 17:08 +0200, Dotan Cohen wrote:
 Most of the time I facet on one field that has about twenty unique
 values.

They are likely to be disk cached so warming those for 9M documents
should only take a few seconds.

 However, once per day I would like to facet on the text field,
 which is a free-text field usually around 1 KiB (about 100 words), in
 order to determine what the top keywords / topics are. That query
 would take up to 200 seconds to run, [...]

If that query is somehow part of your warming, then I am surprised that
search has worked at all with your commit frequency. That would however
explain your OOM if you have multiple warmups running at the same time.

It sounds like TermsComponent would be a better fit for getting top
topics: https://wiki.apache.org/solr/TermsComponent



Re: Out of memory on some faceting queries

2013-04-03 Thread Dotan Cohen
On Tue, Apr 2, 2013 at 6:26 PM, Andre Bois-Crettez
andre.b...@kelkoo.com wrote:
 warmupTime is available on the admin page for each type of cache (in
 milliseconds) :
 http://solr-box:8983/solr/#/core1/plugins/cache

 Or if you are only interested in the total :
 http://solr-box:8983/solr/core1/admin/mbeans?stats=truekey=searcher


Thanks.


 Batches of 20-50 results are added to solr a few times a minute, and a
 commit is done after each batch since I'm calling Solr as such:
 http://127.0.0.1:8983/solr/core/update/json?commit=true Should I
 remove commit=true and run a cron job to commit once per minute?


 Even better, it sounds like a job for CommitWithin :
 http://wiki.apache.org/solr/CommitWithin



I'll look into that. Thank you!


--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Out of memory on some faceting queries

2013-04-03 Thread Dotan Cohen
On Wed, Apr 3, 2013 at 10:11 AM, Toke Eskildsen t...@statsbiblioteket.dk 
wrote:
 However, once per day I would like to facet on the text field,
 which is a free-text field usually around 1 KiB (about 100 words), in
 order to determine what the top keywords / topics are. That query
 would take up to 200 seconds to run, [...]

 If that query is somehow part of your warming, then I am surprised that
 search has worked at all with your commit frequency. That would however
 explain your OOM if you have multiple warmups running at the same time.


No, the 'heavy facet' is not part of the warming. I run it at most
once per day, at the end of the day. Solr is not shut down daily.

 It sounds like TermsComponent would be a better fit for getting top
 topics: https://wiki.apache.org/solr/TermsComponent


I had once looked at TermsComponent, but I think that I eliminated it
as a possibility because I actually need the top keywords related to a
specific keyword. For instance, I need to know which words are most
commonly used with the word coffee.


--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Out of memory on some faceting queries

2013-04-03 Thread Shawn Heisey
On 4/2/2013 3:09 AM, Dotan Cohen wrote:
 I notice that this only occurs on queries that run facets. I start
 Solr with the following command:
 sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
 -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar
 /opt/solr-4.1.0/example/start.jar 

It looks like you've followed some advice that I gave previously on how
to tune java.  I have since learned that this advice is bad, it results
in long GC pauses, even with heaps that aren't huge.

As others have pointed out, you don't have a max heap setting, which
would mean that you're using whatever Java chooses for its default,
which might not be enough.  If you can get Solr to successfully run for
a while with queries and updates happening, the heap should eventually
max out and the admin UI will show you what Java is choosing by default.

Here is what I would now recommend for a beginning point on your Solr
startup command.  You may need to increase the heap beyond 4GB, but be
careful that you still have enough free memory to be able to do
effective caching of your index.

sudo nohup java -Xms4096M -Xmx4096M -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75 -XX:NewRatio=3
-XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled
-XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts
-Dsolr.solr.home=/mnt/SolrFiles100/solr -jar
/opt/solr-4.1.0/example/start.jar 

If you are running a really old build of java (latest versions on
Oracle's website are 1.6 build 43 and 1.7 build 17), you might want to
leave AggressiveOpts out.  Some people would argue that you should never
use that option.

Thanks,
Shawn



Re: Out of memory on some faceting queries

2013-04-02 Thread Toke Eskildsen
On Tue, 2013-04-02 at 11:09 +0200, Dotan Cohen wrote:
 On some queries I get out of memory errors:
 
 {error:{msg:java.lang.OutOfMemoryError: Java heap
[...]
 org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:273)\n\tat
 org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:178)\n\tat
[...]

Yep, your OOM is due to faceting.

How many documents does your index have, how many fields do you facet on
and approximately how many unique values does your facet fields have?

 I notice that this only occurs on queries that run facets. I start
 Solr with the following command:
 sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
 -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar
 /opt/solr-4.1.0/example/start.jar 

You are not specifying any maximum heap size (-Xmx), which you should do
in order to avoid unpleasant surprises. Facets and sorting are often
memory hungry, but your system seems to have 13GB free RAM so the easy
solution attempt would be to increase the heap until Solr serves the
facets without OOM.

- Toke Eskildsen, State and University Library, Denmark



Re: Out of memory on some faceting queries

2013-04-02 Thread Dotan Cohen
On Tue, Apr 2, 2013 at 12:59 PM, Toke Eskildsen t...@statsbiblioteket.dk 
wrote:
 How many documents does your index have, how many fields do you facet on
 and approximately how many unique values does your facet fields have?


8971763 documents, growing at a rate of about 500 per minute. We
actually expect that to be ~5 per minute once we get out of
testing. Most documents are less than a KiB in the 'text' field, and
they have a few other fields which store short strings, dates, or
ints. You can think of these documents like tweets: short general
purpose text messages.

 I notice that this only occurs on queries that run facets. I start
 Solr with the following command:
 sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
 -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar
 /opt/solr-4.1.0/example/start.jar 

 You are not specifying any maximum heap size (-Xmx), which you should do
 in order to avoid unpleasant surprises. Facets and sorting are often
 memory hungry, but your system seems to have 13GB free RAM so the easy
 solution attempt would be to increase the heap until Solr serves the
 facets without OOM.


Thanks, I will start with -Xmx8g and test.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Out of memory on some faceting queries

2013-04-02 Thread Toke Eskildsen
On Tue, 2013-04-02 at 12:16 +0200, Dotan Cohen wrote:
 8971763 documents, growing at a rate of about 500 per minute. We
 actually expect that to be ~5 per minute once we get out of
 testing.

9M documents in a heavily updated index with faceting. Maybe you are
committing faster than the faceting can be prepared?
https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F

Regards,
Toke Eskildsen



Re: Out of memory on some faceting queries

2013-04-02 Thread Dotan Cohen
On Tue, Apr 2, 2013 at 2:41 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote:
 9M documents in a heavily updated index with faceting. Maybe you are
 committing faster than the faceting can be prepared?
 https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F


Thank you Toke, this is exactly on my list of things to learn about
Solr. We do get the error mentioned and we cannot reduce the amount
of commits. Also, I do believe that we have the necessary server
resources (16 GiB RAM).

I have increased maxWarmingSearchers to 4, let's see how this goes.

Thank you.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Out of memory on some faceting queries

2013-04-02 Thread Toke Eskildsen
On Tue, 2013-04-02 at 15:55 +0200, Dotan Cohen wrote:

[Tokd: maxWarmingSearchers limit exceeded?]

 Thank you Toke, this is exactly on my list of things to learn about
 Solr. We do get the error mentioned and we cannot reduce the amount
 of commits. Also, I do believe that we have the necessary server
 resources (16 GiB RAM).

Memory does not help you if you commit too frequently. If you commit
each X seconds and warming takes X+Y seconds, then you will run out of
memory at some point.

 I have increased maxWarmingSearchers to 4, let's see how this goes.

If you still get the error with 4 concurrent searchers, you will have to
either speed up warmup time or commit less frequently. You should be
able to reduce facet startup time by switching to segment based faceting
(at the cost of worse search-time performance) or maybe by using
DocValues. Some of the current threads on the solr-user list is about
these topics.

How often do you commit and how many unique values does your facet
fields have?

Regards,
Toke Eskildsen



Re: Out of memory on some faceting queries

2013-04-02 Thread Dotan Cohen
On Tue, Apr 2, 2013 at 5:33 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote:
 On Tue, 2013-04-02 at 15:55 +0200, Dotan Cohen wrote:

 [Tokd: maxWarmingSearchers limit exceeded?]

 Thank you Toke, this is exactly on my list of things to learn about
 Solr. We do get the error mentioned and we cannot reduce the amount
 of commits. Also, I do believe that we have the necessary server
 resources (16 GiB RAM).

 Memory does not help you if you commit too frequently. If you commit
 each X seconds and warming takes X+Y seconds, then you will run out of
 memory at some point.

 I have increased maxWarmingSearchers to 4, let's see how this goes.

 If you still get the error with 4 concurrent searchers, you will have to
 either speed up warmup time or commit less frequently. You should be
 able to reduce facet startup time by switching to segment based faceting
 (at the cost of worse search-time performance) or maybe by using
 DocValues. Some of the current threads on the solr-user list is about
 these topics.

 How often do you commit and how many unique values does your facet
 fields have?

 Regards,
 Toke Eskildsen




-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Out of memory on some faceting queries

2013-04-02 Thread Dotan Cohen
On Tue, Apr 2, 2013 at 5:33 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote:
 Memory does not help you if you commit too frequently. If you commit
 each X seconds and warming takes X+Y seconds, then you will run out of
 memory at some point.


How might I time the warming? I've been googling warming since your
earlier message but there does not seem to be any really good
documentation on the subject. If there is anything that you feel I
should be reading I would appreciate a link or a keyword to search on.
I've read the Solr wiki on caching and performance, but other than
that I don't see the issue addressed.


 I have increased maxWarmingSearchers to 4, let's see how this goes.

 If you still get the error with 4 concurrent searchers, you will have to
 either speed up warmup time or commit less frequently. You should be
 able to reduce facet startup time by switching to segment based faceting
 (at the cost of worse search-time performance) or maybe by using
 DocValues. Some of the current threads on the solr-user list is about
 these topics.

 How often do you commit and how many unique values does your facet
 fields have?


Batches of 20-50 results are added to solr a few times a minute, and a
commit is done after each batch since I'm calling Solr as such:
http://127.0.0.1:8983/solr/core/update/json?commit=true

Should I remove commit=true and run a cron job to commit once per minute?

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Out of memory on some faceting queries

2013-04-02 Thread Dotan Cohen
 How often do you commit and how many unique values does your facet
 fields have?


Most of the time I facet on one field that has about twenty unique
values. However, once per day I would like to facet on the text field,
which is a free-text field usually around 1 KiB (about 100 words), in
order to determine what the top keywords / topics are. That query
would take up to 200 seconds to run, but it does not have to return
the results in real-time (the output goes to another process, not to a
waiting user).

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Out of memory on some faceting queries

2013-04-02 Thread Andre Bois-Crettez

On 04/02/2013 05:04 PM, Dotan Cohen wrote:

How might I time the warming? I've been googling warming since your
earlier message but there does not seem to be any really good
documentation on the subject. If there is anything that you feel I
should be reading I would appreciate a link or a keyword to search on.
I've read the Solr wiki on caching and performance, but other than
that I don't see the issue addressed.


warmupTime is available on the admin page for each type of cache (in
milliseconds) :
http://solr-box:8983/solr/#/core1/plugins/cache

Or if you are only interested in the total :
http://solr-box:8983/solr/core1/admin/mbeans?stats=truekey=searcher


Batches of 20-50 results are added to solr a few times a minute, and a
commit is done after each batch since I'm calling Solr as such:
http://127.0.0.1:8983/solr/core/update/json?commit=true Should I
remove commit=true and run a cron job to commit once per minute?


Even better, it sounds like a job for CommitWithin :
http://wiki.apache.org/solr/CommitWithin


André

Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: Out of Memory doing a query Solr 4.2

2013-03-15 Thread Bernd Fehling
We are currently using
Oracle Corporation Java HotSpot(TM) 64-Bit Server VM (1.7.0_07 23.3-b01)

Runs excellent and also no memory parameter tweaking neccessary.
Give enough physical and JVM memory, use -XX:+UseG1GC and thats it.

Also no saw tooth and GC timeouts from JVM as with earlier versions.

Regards
Bernd


Am 15.03.2013 09:09, schrieb raulgrande83:
 Why? Could this be the cause of the problem? This was working ok for Solr
 3.5.
 
 Could you recommend me one ?
 
 Thanks.
 
 
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Out-of-Memory-doing-a-query-Solr-4-2-tp4047394p4047621.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 


Re: Out of Memory doing a query Solr 4.2

2013-03-15 Thread Robert Muir
On Fri, Mar 15, 2013 at 6:46 AM, raulgrande83 raulgrand...@hotmail.com wrote:
 Thank you for your help. I'm afraid it won't be so easy to change de jvm
 version, because it is required at the moment.

 It seems that Solr 4.2 supports Java 1.6 at least. Is that correct?

 Could you find any clue of what is happening in the attached traces? It
 would be great to know why it is happening now, because it was working for
 Solr 3.5.

Its probably not an OOM at all. instead its more likely IBM JVM is
probably miscompiling our code and producing large integers, like it
does quite often. For example, we had to disable testing it completely
recently for this reason. If someone were to report a JIRA issue that
mentioned IBM, I'd make the same comment there but in general not take
it seriously at all due to the kind of bugs i've seen from that JVM.

The fact that IBM JVM didnt miscompile 3.5's code is irrelevant.


Re: Out of Memory doing a query Solr 4.2

2013-03-14 Thread Robert Muir
On Thu, Mar 14, 2013 at 12:07 PM, raulgrande83 raulgrand...@hotmail.com wrote:
 JVM: IBM J9 VM(1.6.0.2.4)

I don't recommend using this JVM.


Re: Out Of Memory =( Too many cores on one server?

2012-11-21 Thread Shawn Heisey

On 11/21/2012 12:36 AM, stockii wrote:

okay. i will try out more RAM.

i am using not much caching because of near-realt-time-search. in this
case its better to increase xmn or only xmx and xms?


I have personally found that increasing the size of the young generation 
(Eden) is beneficial to Solr, at least if you are using the parallel GC 
options.  I theorize that the collector for the young generation is more 
efficient than the full GC, but that's just a guess.  When I started 
doing that, the amount of time my Solr JVM spent doing garbage 
collection went way down, even though the number of garbage collections 
went up.


Lately I have been increasing the Eden size by using -XX:NewRatio=1 
rather than an explicit value on -Xmn.  This has one advantage - if you 
change the min/max heap size, the same value for NewRatio will still work.


Here are the options that I am currently using in production with Java6:

-Xms4096M
-Xmx8192M
-XX:NewRatio=1
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled

Here is what I am planning for the future with Solr4 and beyond with 
Java7,  including an environment variable for Xmx. Due to the 
experimental nature of the G1 collector, I would only trust it with the 
latest Java releases, especially for Java6.  The Unlock option is not 
required on Java7, only Java6.


-Xms256M
-Xmx${JMEM}
-XX:+UnlockExperimentalVMOptions
-XX:+UseG1GC

Thanks,
Shawn



Re: Out Of Memory =( Too many cores on one server?

2012-11-21 Thread Mark Miller
 I have personally found that increasing the size of the young generation 
 (Eden) is beneficial to Solr,

I've seen the same thing - I think it's because requests create a lot
of short lived objects and if the eden is not large enough, a lot of
those objects will make it to the tenured space, which is basically an
alg fail.

It's not a bad knob to tweak, because if you just keep raising the
heap, you can wastefully keep giving more unnecessary RAM to the
tenured space when you might only want to give more to the eden space.

- Mark


On Wed, Nov 21, 2012 at 11:00 AM, Shawn Heisey s...@elyograg.org wrote:
 On 11/21/2012 12:36 AM, stockii wrote:

 okay. i will try out more RAM.

 i am using not much caching because of near-realt-time-search. in this
 case its better to increase xmn or only xmx and xms?


 I have personally found that increasing the size of the young generation
 (Eden) is beneficial to Solr, at least if you are using the parallel GC
 options.  I theorize that the collector for the young generation is more
 efficient than the full GC, but that's just a guess.  When I started doing
 that, the amount of time my Solr JVM spent doing garbage collection went way
 down, even though the number of garbage collections went up.

 Lately I have been increasing the Eden size by using -XX:NewRatio=1 rather
 than an explicit value on -Xmn.  This has one advantage - if you change the
 min/max heap size, the same value for NewRatio will still work.

 Here are the options that I am currently using in production with Java6:

 -Xms4096M
 -Xmx8192M
 -XX:NewRatio=1
 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC
 -XX:+CMSParallelRemarkEnabled

 Here is what I am planning for the future with Solr4 and beyond with Java7,
 including an environment variable for Xmx. Due to the experimental nature of
 the G1 collector, I would only trust it with the latest Java releases,
 especially for Java6.  The Unlock option is not required on Java7, only
 Java6.

 -Xms256M
 -Xmx${JMEM}
 -XX:+UnlockExperimentalVMOptions
 -XX:+UseG1GC

 Thanks,
 Shawn




-- 
- Mark


Re: Out Of Memory =( Too many cores on one server?

2012-11-16 Thread Bernd Fehling
I guess you should give JVM more memory.

When starting to find a good value for -Xmx I oversized and  set
it to Xmx20G and Xms20G. Then I monitored the system and saw that JVM is
between 5G and 10G (java7 with G1 GC).
Now it is finally set to Xmx11G and Xms11G for my system with 1 core and 38 
million docs.
But JVM memory depends pretty much on number of fields in schema.xml
and fieldCache (sortable fields).

Regards
Bernd

Am 16.11.2012 09:29, schrieb stockii:
 Hello.
 
 if my server is running for a while i get some OOM Problems. I think the
 problem is, that i running to many cores on one Server with too many
 documents.
 
 this is my server concept:
 14 cores. 
 1 with 30 million docs
 1 with 22 million docs
 1 with growing 25 million docs
 1 with 67 million docs
 and the other cores are under 1 million docs.
 
 all these cores are running fine in one jetty and searching is very fast and
 we are satisfied with this.
 yesterday we got OOM. 
 
 Do you think that we should outsource the big cores into another virtual
 instance of the server? so that the JVM not share the memory and going OOM?
 starting with: MEMORY_OPTIONS=-Xmx6g -Xms2G -Xmn1G
 


Re: Out Of Memory =( Too many cores on one server?

2012-11-16 Thread Vadim Kisselmann
Hi,
your JVM need more RAM. My setup works well with 10 Cores, and 300mio.
docs, Xmx8GB Xms8GB, 16GB for OS.
But it's how Bernd mentioned, the memory consumption depends on the
number of fields and the fieldCache.
Best Regards
Vadim



2012/11/16 Bernd Fehling bernd.fehl...@uni-bielefeld.de:
 I guess you should give JVM more memory.

 When starting to find a good value for -Xmx I oversized and  set
 it to Xmx20G and Xms20G. Then I monitored the system and saw that JVM is
 between 5G and 10G (java7 with G1 GC).
 Now it is finally set to Xmx11G and Xms11G for my system with 1 core and 38 
 million docs.
 But JVM memory depends pretty much on number of fields in schema.xml
 and fieldCache (sortable fields).

 Regards
 Bernd

 Am 16.11.2012 09:29, schrieb stockii:
 Hello.

 if my server is running for a while i get some OOM Problems. I think the
 problem is, that i running to many cores on one Server with too many
 documents.

 this is my server concept:
 14 cores.
 1 with 30 million docs
 1 with 22 million docs
 1 with growing 25 million docs
 1 with 67 million docs
 and the other cores are under 1 million docs.

 all these cores are running fine in one jetty and searching is very fast and
 we are satisfied with this.
 yesterday we got OOM.

 Do you think that we should outsource the big cores into another virtual
 instance of the server? so that the JVM not share the memory and going OOM?
 starting with: MEMORY_OPTIONS=-Xmx6g -Xms2G -Xmn1G



Re: Out of Memory

2012-01-31 Thread Erick Erickson
Right. Mutlivalued fields use fieldCache for
faceting (as I remember) whereas single valued
fields don't under some circumstances. See:
http://wiki.apache.org/solr/SolrCaching#The_Lucene_FieldCache

Before your change, you were probably using the
filterCache for what faceting you were doing.

So yes, you're probably memory-constrained at this
point. How much physical memory do you have anyway?

Best
Erick

On Mon, Jan 30, 2012 at 12:10 PM, Milan Dobrota mi...@milandobrota.com wrote:
 Hi,

 I have a Solr instance with 6M item index. It normally uses around 3G of
 memory. I have suddenly started getting out of memory errors and increasing
 the Xmx parameter to over 4G didn't fix the problem. It was just buying us
 time. Inspecting the heap, I figured that 90% of memory is occupied by
 FieldCache. Is this normal? We do very little sorting and no faceting.

 Is FieldCache ever supposed to get cleared? Can this be done through HTTP?

 Do we need more memory? If so, I don't understand why the minimal set of
 changes we introduced (one multivalued field) would cause the memory to
 drastically increase.

 The communication with the Solr instance is done via HTTP.

 Java version:
 java version 1.6.0_17
 Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
 Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode)

 Milan


Re: Out of memory during the indexing

2011-12-06 Thread Erick Erickson
I'm going to defer to the folks who actually know the guts here.
If you've turned down the cache entries for your Solr caches,
you're pretty much left with Lucene caching which is a mystery...

Best
Erick

On Mon, Dec 5, 2011 at 9:23 AM, Jeff Crump jeffrey.cr...@gmail.com wrote:
 Yes, and without doing much in the way of queries, either.   Basically, our
 test data has large numbers of distinct terms, each of which can be large
 in themselves.   Heap usage is a straight line -- up --  75 percent of the
 heap is consumed with byte[] allocations at the leaf of an object graph
 like so:

 SolrCore
 SolrIndexSearcher
 DirectoryReader
 SegmentReader
 SegmentCoreReaders
 PerFieldPostingsFormat$FieldsReader
 ...
 FST
 byte[]

 Our application is less concerned with query performance than it is with
 making sure our index doesn't OOM.   My suspicion is that we're looking at
 just in-memory representation of the index rather than any caching (it's
 all turned down to levels suggested in other documentation); plus, we're
 not doing much querying in this test anyway.

 Any suggestions or places to go for further information?

 On 5 December 2011 08:38, Erick Erickson erickerick...@gmail.com wrote:

 There's no good way to say to Solr Use only this
 much memory for searching. You can certainly
 limit the size somewhat by configuring your caches
 to be small. But if you're sorting, then Lucene will
 use up some cache space etc.

 Are you actually running into problems?

 Best
 Erick

 On Fri, Dec 2, 2011 at 2:26 PM, Jeff Crump jeffrey.cr...@gmail.com
 wrote:
  Can anyone advise techniques for limiting the size of the RAM buffers to
  begin with?  As the index grows, I shouldn't have to keep increasing the
  heap.  We have a high-ingest, low-query-rate environment and I'm not as
  much concerned with the query-time caches as I am with the segment core
  readers/SolrIndexSearchers themselves.
 
  On 9 November 2011 06:10, Andre Bois-Crettez andre.b...@kelkoo.com
 wrote:
 
  How much memory you actually allocate to the JVM ?
  http://wiki.apache.org/solr/**SolrPerformanceFactors#Memory_**
  allocated_to_the_Java_VM
 http://wiki.apache.org/solr/SolrPerformanceFactors#Memory_allocated_to_the_Java_VM
 
  You need to increase the -Xmx value, otherwise your large ram buffers
  won't fit in the java heap.
 
 
 
  sivaprasad wrote:
 
  Hi,
 
  I am getting the following error during the indexing.I am trying to
 index
  14
  million records but the document size is very minimal.
 
  *Error:*
  2011-11-08 14:53:24,634 ERROR [STDERR] (Thread-12)
  java.lang.OutOfMemoryError: GC overhead limit exceeded
 
 
 
  [...]
 
   Do i need to increase the heap size for JVM?
 
  My solrconfig settings are given below.
 
  indexDefaults
       useCompoundFilefalse/**useCompoundFile
 
     mergeFactor25/mergeFactor
         maxBufferedDocs2/**maxBufferedDocs
        ramBufferSizeMB1024/**ramBufferSizeMB
     maxMergeDocs2147483647/maxMergeDocs
     maxFieldLength1/**maxFieldLength
     writeLockTimeout1000/**writeLockTimeout
     commitLockTimeout1/**commitLockTimeout
 
  and the main index values are
  useCompoundFilefalse/**useCompoundFile
     ramBufferSizeMB512/**ramBufferSizeMB
     mergeFactor10/mergeFactor
     maxMergeDocs2147483647/maxMergeDocs
     maxFieldLength1/**maxFieldLength
 
  Do i need to increase the ramBufferSizeMB to a little higher?
 
  Please provide your inputs.
 
  Regards,
  Siva
 
  --
  View this message in context: http://lucene.472066.n3.**
 
 nabble.com/Out-of-memory-**during-the-indexing-**tp3492701p3492701.html
 http://lucene.472066.n3.nabble.com/Out-of-memory-during-the-indexing-tp3492701p3492701.html
 
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
  --
  André Bois-Crettez
 
  Search technology, Kelkoo
  http://www.kelkoo.com/
 
 



Re: Out of memory during the indexing

2011-12-05 Thread Erick Erickson
There's no good way to say to Solr Use only this
much memory for searching. You can certainly
limit the size somewhat by configuring your caches
to be small. But if you're sorting, then Lucene will
use up some cache space etc.

Are you actually running into problems?

Best
Erick

On Fri, Dec 2, 2011 at 2:26 PM, Jeff Crump jeffrey.cr...@gmail.com wrote:
 Can anyone advise techniques for limiting the size of the RAM buffers to
 begin with?  As the index grows, I shouldn't have to keep increasing the
 heap.  We have a high-ingest, low-query-rate environment and I'm not as
 much concerned with the query-time caches as I am with the segment core
 readers/SolrIndexSearchers themselves.

 On 9 November 2011 06:10, Andre Bois-Crettez andre.b...@kelkoo.com wrote:

 How much memory you actually allocate to the JVM ?
 http://wiki.apache.org/solr/**SolrPerformanceFactors#Memory_**
 allocated_to_the_Java_VMhttp://wiki.apache.org/solr/SolrPerformanceFactors#Memory_allocated_to_the_Java_VM
 You need to increase the -Xmx value, otherwise your large ram buffers
 won't fit in the java heap.



 sivaprasad wrote:

 Hi,

 I am getting the following error during the indexing.I am trying to index
 14
 million records but the document size is very minimal.

 *Error:*
 2011-11-08 14:53:24,634 ERROR [STDERR] (Thread-12)
 java.lang.OutOfMemoryError: GC overhead limit exceeded



 [...]

  Do i need to increase the heap size for JVM?

 My solrconfig settings are given below.

 indexDefaults
      useCompoundFilefalse/**useCompoundFile

    mergeFactor25/mergeFactor
        maxBufferedDocs2/**maxBufferedDocs
       ramBufferSizeMB1024/**ramBufferSizeMB
    maxMergeDocs2147483647/maxMergeDocs
    maxFieldLength1/**maxFieldLength
    writeLockTimeout1000/**writeLockTimeout
    commitLockTimeout1/**commitLockTimeout

 and the main index values are
 useCompoundFilefalse/**useCompoundFile
    ramBufferSizeMB512/**ramBufferSizeMB
    mergeFactor10/mergeFactor
    maxMergeDocs2147483647/maxMergeDocs
    maxFieldLength1/**maxFieldLength

 Do i need to increase the ramBufferSizeMB to a little higher?

 Please provide your inputs.

 Regards,
 Siva

 --
 View this message in context: http://lucene.472066.n3.**
 nabble.com/Out-of-memory-**during-the-indexing-**tp3492701p3492701.htmlhttp://lucene.472066.n3.nabble.com/Out-of-memory-during-the-indexing-tp3492701p3492701.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 André Bois-Crettez

 Search technology, Kelkoo
 http://www.kelkoo.com/




Re: Out of memory during the indexing

2011-12-05 Thread Jeff Crump
Yes, and without doing much in the way of queries, either.   Basically, our
test data has large numbers of distinct terms, each of which can be large
in themselves.   Heap usage is a straight line -- up --  75 percent of the
heap is consumed with byte[] allocations at the leaf of an object graph
like so:

SolrCore
SolrIndexSearcher
DirectoryReader
SegmentReader
SegmentCoreReaders
PerFieldPostingsFormat$FieldsReader
...
FST
byte[]

Our application is less concerned with query performance than it is with
making sure our index doesn't OOM.   My suspicion is that we're looking at
just in-memory representation of the index rather than any caching (it's
all turned down to levels suggested in other documentation); plus, we're
not doing much querying in this test anyway.

Any suggestions or places to go for further information?

On 5 December 2011 08:38, Erick Erickson erickerick...@gmail.com wrote:

 There's no good way to say to Solr Use only this
 much memory for searching. You can certainly
 limit the size somewhat by configuring your caches
 to be small. But if you're sorting, then Lucene will
 use up some cache space etc.

 Are you actually running into problems?

 Best
 Erick

 On Fri, Dec 2, 2011 at 2:26 PM, Jeff Crump jeffrey.cr...@gmail.com
 wrote:
  Can anyone advise techniques for limiting the size of the RAM buffers to
  begin with?  As the index grows, I shouldn't have to keep increasing the
  heap.  We have a high-ingest, low-query-rate environment and I'm not as
  much concerned with the query-time caches as I am with the segment core
  readers/SolrIndexSearchers themselves.
 
  On 9 November 2011 06:10, Andre Bois-Crettez andre.b...@kelkoo.com
 wrote:
 
  How much memory you actually allocate to the JVM ?
  http://wiki.apache.org/solr/**SolrPerformanceFactors#Memory_**
  allocated_to_the_Java_VM
 http://wiki.apache.org/solr/SolrPerformanceFactors#Memory_allocated_to_the_Java_VM
 
  You need to increase the -Xmx value, otherwise your large ram buffers
  won't fit in the java heap.
 
 
 
  sivaprasad wrote:
 
  Hi,
 
  I am getting the following error during the indexing.I am trying to
 index
  14
  million records but the document size is very minimal.
 
  *Error:*
  2011-11-08 14:53:24,634 ERROR [STDERR] (Thread-12)
  java.lang.OutOfMemoryError: GC overhead limit exceeded
 
 
 
  [...]
 
   Do i need to increase the heap size for JVM?
 
  My solrconfig settings are given below.
 
  indexDefaults
   useCompoundFilefalse/**useCompoundFile
 
 mergeFactor25/mergeFactor
 maxBufferedDocs2/**maxBufferedDocs
ramBufferSizeMB1024/**ramBufferSizeMB
 maxMergeDocs2147483647/maxMergeDocs
 maxFieldLength1/**maxFieldLength
 writeLockTimeout1000/**writeLockTimeout
 commitLockTimeout1/**commitLockTimeout
 
  and the main index values are
  useCompoundFilefalse/**useCompoundFile
 ramBufferSizeMB512/**ramBufferSizeMB
 mergeFactor10/mergeFactor
 maxMergeDocs2147483647/maxMergeDocs
 maxFieldLength1/**maxFieldLength
 
  Do i need to increase the ramBufferSizeMB to a little higher?
 
  Please provide your inputs.
 
  Regards,
  Siva
 
  --
  View this message in context: http://lucene.472066.n3.**
 
 nabble.com/Out-of-memory-**during-the-indexing-**tp3492701p3492701.html
 http://lucene.472066.n3.nabble.com/Out-of-memory-during-the-indexing-tp3492701p3492701.html
 
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
  --
  André Bois-Crettez
 
  Search technology, Kelkoo
  http://www.kelkoo.com/
 
 



Re: Out of memory during the indexing

2011-12-02 Thread Jeff Crump
Can anyone advise techniques for limiting the size of the RAM buffers to
begin with?  As the index grows, I shouldn't have to keep increasing the
heap.  We have a high-ingest, low-query-rate environment and I'm not as
much concerned with the query-time caches as I am with the segment core
readers/SolrIndexSearchers themselves.

On 9 November 2011 06:10, Andre Bois-Crettez andre.b...@kelkoo.com wrote:

 How much memory you actually allocate to the JVM ?
 http://wiki.apache.org/solr/**SolrPerformanceFactors#Memory_**
 allocated_to_the_Java_VMhttp://wiki.apache.org/solr/SolrPerformanceFactors#Memory_allocated_to_the_Java_VM
 You need to increase the -Xmx value, otherwise your large ram buffers
 won't fit in the java heap.



 sivaprasad wrote:

 Hi,

 I am getting the following error during the indexing.I am trying to index
 14
 million records but the document size is very minimal.

 *Error:*
 2011-11-08 14:53:24,634 ERROR [STDERR] (Thread-12)
 java.lang.OutOfMemoryError: GC overhead limit exceeded



 [...]

  Do i need to increase the heap size for JVM?

 My solrconfig settings are given below.

 indexDefaults
  useCompoundFilefalse/**useCompoundFile

mergeFactor25/mergeFactor
maxBufferedDocs2/**maxBufferedDocs
   ramBufferSizeMB1024/**ramBufferSizeMB
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength1/**maxFieldLength
writeLockTimeout1000/**writeLockTimeout
commitLockTimeout1/**commitLockTimeout

 and the main index values are
 useCompoundFilefalse/**useCompoundFile
ramBufferSizeMB512/**ramBufferSizeMB
mergeFactor10/mergeFactor
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength1/**maxFieldLength

 Do i need to increase the ramBufferSizeMB to a little higher?

 Please provide your inputs.

 Regards,
 Siva

 --
 View this message in context: http://lucene.472066.n3.**
 nabble.com/Out-of-memory-**during-the-indexing-**tp3492701p3492701.htmlhttp://lucene.472066.n3.nabble.com/Out-of-memory-during-the-indexing-tp3492701p3492701.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 André Bois-Crettez

 Search technology, Kelkoo
 http://www.kelkoo.com/




Re: Out of memory, not during import or updates of the index

2011-11-10 Thread Andre Bois-Crettez

Using Solr 3.4.0. That changelog actually says it should reduce memory usage 
for that version. We were on a much older version previously, 1.something.
Norms are off on all fields that it can be turned off on.
I'm just hoping this new version doesn't have any leaks. Does FastLRUCache vs 
LRUCache make any memory difference?


You can add JVM parameters to better trace the heap usage with 
-XX:+PrintGCDetails -verbose:gc -Xloggc:/your/gc/logfile


Graphing that over time may help you see if you are constantly near the 
limit, or only at particular times, and try to correlate that to other 
operations (insertions, commit, optimize, ...)



--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/



Re: Out of memory, not during import or updates of the index

2011-11-10 Thread Paul Libbrecht
Steve,

do you have any custom code in your Solr?
We had out-of-memory errors just because of that, I was using one method to 
obtain the request which was leaking... had not read javadoc carefully enough. 
Since then, no leak.

What do you do after the OoME?

paul


Le 9 nov. 2011 à 21:33, Steve Fatula a écrit :

 We get at rare times out of memory errors during the day. I know one reason 
 for this is data imports, none are going on. I see in the wiki, document adds 
 have some quirks, not doing that. I don't know to to expect for memory use 
 though.
 
 We had Solr running under Tomcat set to 2G ram. I presume cache size has an 
 effect on memory, that's set to 30,000 for filter, document and queryResult. 
 Have experimented with different sizes for a while, these limits are all 
 lower than we used to have them set to. So, hoping there no sort of memory 
 leak involved.
 
 In any case, some of the messages are:
 
 Exception in thread http-8080-21 java.lang.OutOfMemoryError: Java heap space
 
 
 Some look like this:
 
 Exception in thread http-8080-22 java.lang.NullPointerException
 at 
 java.util.concurrent.ConcurrentLinkedQueue.offer(ConcurrentLinkedQueue.java:273)
 ...
 
 I presume the null pointer is a result of being out of memory. 
 
 Should Solr possibly need more than 2GB? What else can we tune that might 
 reduce memory usage?



Re: Out of memory, not during import or updates of the index

2011-11-10 Thread Steve Fatula


From: Paul Libbrecht p...@hoplahup.net
To: solr-user@lucene.apache.org
Sent: Thursday, November 10, 2011 7:19 AM
Subject: Re: Out of memory, not during import or updates of the index

do you have any custom code in your Solr?
We had out-of-memory errors just because of that, I was using one method to 
obtain the request which was leaking... had not read javadoc carefully enough. 
Since then, no leak.

There is no custom code as Solr is called via http. So, only the web interface 
is used. So, it can't be our code since Solr runs within it's own tomcat 
instance, and only Solr.

Re: Out of memory, not during import or updates of the index

2011-11-10 Thread Steve Fatula
From: Andre Bois-Crettez andre.b...@kelkoo.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Thursday, November 10, 2011 7:02 AM
Subject: Re: Out of memory, not during import or updates of the index

You can add JVM parameters to better trace the heap usage with 
-XX:+PrintGCDetails -verbose:gc -Xloggc:/your/gc/logfile

Graphing that over time may help you see if you are constantly near the limit, 
or only at particular times, and try to correlate that to other operations 
(insertions, commit, optimize, ...)


That would be true, except, there are NO insertions, deletions, updates, etc. 
as that is done in the middle of the night, long before the problem occurs. It 
is done using the data import manager. Right now, for example, we've raised 
the limit to 2.5GB and currently, 2GB is free. The only activity is searches, 
using the http interface, nothing we code in java, etc. So, the only thing 
consuming memory within Tomcat is solr, the only app.

So, since the caches are all full, and, 2GB of 2.5GB is free, yet, the other 
day, all 2GB were consumed and out of memory, there is something that consumed 
that 1.5GB freespace.

I did change the garbage collector today to the parallel one from the default 
one. Should have been in the first place. Not sure if this will matter or not 
as far as running out of space. I do have GC log as well (now). There is only 
one collection every minute or so, and in 11 hours, not one full gc.

Re: Out of memory, not during import or updates of the index

2011-11-10 Thread Mark Miller
How big is your index?

What kind of queries do you tend to see? Do you facet on a lot of fields? Sort 
on a lot of fields?

Before you get the OOM and are running along nicely, how much RAM is used?

On Nov 9, 2011, at 3:33 PM, Steve Fatula wrote:

 We get at rare times out of memory errors during the day. I know one reason 
 for this is data imports, none are going on. I see in the wiki, document adds 
 have some quirks, not doing that. I don't know to to expect for memory use 
 though.
 
 We had Solr running under Tomcat set to 2G ram. I presume cache size has an 
 effect on memory, that's set to 30,000 for filter, document and queryResult. 
 Have experimented with different sizes for a while, these limits are all 
 lower than we used to have them set to. So, hoping there no sort of memory 
 leak involved.
 
 In any case, some of the messages are:
 
 Exception in thread http-8080-21 java.lang.OutOfMemoryError: Java heap space
 
 
 Some look like this:
 
 Exception in thread http-8080-22 java.lang.NullPointerException
 at 
 java.util.concurrent.ConcurrentLinkedQueue.offer(ConcurrentLinkedQueue.java:273)
 ...
 
 I presume the null pointer is a result of being out of memory. 
 
 Should Solr possibly need more than 2GB? What else can we tune that might 
 reduce memory usage?

- Mark Miller
lucidimagination.com













Re: Out of memory, not during import or updates of the index

2011-11-10 Thread Steve Fatula
From: Mark Miller markrmil...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Thursday, November 10, 2011 3:00 PM
Subject: Re: Out of memory, not during import or updates of the index

How big is your index?

The total for the data dir is 651M.


What kind of queries do you tend to see? Do you facet on a lot of fields? Sort 
on a lot of fields?

If a query is sorted, it's on one field. A fair amount of faceting. Not sure 
how to answer what kind of queries, various kinds, all dismax. Remember, - 2GB 
ram was allotted to Solr/Tomcat. Most of the queries run in very small 
fractions of a second, the longest query we have runs in 140ms. Most are 1ms.


Before you get the OOM and are running along nicely, how much RAM is used?

I wish I was sitting there just before that happened. Now that we have the GC 
log, it will be easier to tell I suppose.

Re: Out of memory during the indexing

2011-11-09 Thread Andre Bois-Crettez

How much memory you actually allocate to the JVM ?
http://wiki.apache.org/solr/SolrPerformanceFactors#Memory_allocated_to_the_Java_VM
You need to increase the -Xmx value, otherwise your large ram buffers 
won't fit in the java heap.



sivaprasad wrote:

Hi,

I am getting the following error during the indexing.I am trying to index 14
million records but the document size is very minimal.

*Error:*
2011-11-08 14:53:24,634 ERROR [STDERR] (Thread-12)
java.lang.OutOfMemoryError: GC overhead limit exceeded

  

[...]

Do i need to increase the heap size for JVM?

My solrconfig settings are given below.

indexDefaults
  
useCompoundFilefalse/useCompoundFile


mergeFactor25/mergeFactor

maxBufferedDocs2/maxBufferedDocs
   
ramBufferSizeMB1024/ramBufferSizeMB

maxMergeDocs2147483647/maxMergeDocs
maxFieldLength1/maxFieldLength
writeLockTimeout1000/writeLockTimeout
commitLockTimeout1/commitLockTimeout

and the main index values are 


useCompoundFilefalse/useCompoundFile
ramBufferSizeMB512/ramBufferSizeMB
mergeFactor10/mergeFactor
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength1/maxFieldLength

Do i need to increase the ramBufferSizeMB to a little higher?

Please provide your inputs.

Regards,
Siva
 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Out-of-memory-during-the-indexing-tp3492701p3492701.html
Sent from the Solr - User mailing list archive at Nabble.com.

  


--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/



Re: Out of memory, not during import or updates of the index

2011-11-09 Thread Otis Gospodnetic
Hi,

Some options:
* Yes, on the slave/search side you can reduce your cache sizes and lower the 
memory footprint.
* You can also turn off norms in various fields if you don't need that and save 
memory there.
* You can increase your Xmx

I don't know what version of Solr you have, but look through Lucene/Solr's 
CHANGES.txt to see if there were any changes that affect memory requirements 
since your version of Solr.

Otis


Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



From: Steve Fatula compconsult...@yahoo.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Wednesday, November 9, 2011 3:33 PM
Subject: Out of memory, not during import or updates of the index

We get at rare times out of memory errors during the day. I know one reason 
for this is data imports, none are going on. I see in the wiki, document adds 
have some quirks, not doing that. I don't know to to expect for memory use 
though.

We had Solr running under Tomcat set to 2G ram. I presume cache size has an 
effect on memory, that's set to 30,000 for filter, document and queryResult. 
Have experimented with different sizes for a while, these limits are all lower 
than we used to have them set to. So, hoping there no sort of memory leak 
involved.

In any case, some of the messages are:

Exception in thread http-8080-21 java.lang.OutOfMemoryError: Java heap space


Some look like this:

Exception in thread http-8080-22 java.lang.NullPointerException
        at 
java.util.concurrent.ConcurrentLinkedQueue.offer(ConcurrentLinkedQueue.java:273)
...

I presume the null pointer is a result of being out of memory. 

Should Solr possibly need more than 2GB? What else can we tune that might 
reduce memory usage?



Re: Out of memory, not during import or updates of the index

2011-11-09 Thread Steve Fatula
From: Otis Gospodnetic otis_gospodne...@yahoo.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Wednesday, November 9, 2011 2:51 PM
Subject: Re: Out of memory, not during import or updates of the index

Hi,

Some options:
* Yes, on the slave/search side you can reduce your cache sizes and lower the 
memory footprint.
* You can also turn off norms in various fields if you don't need that and 
save memory there.
* You can increase your Xmx

I don't know what version of Solr you have, but look through Lucene/Solr's 
CHANGES.txt to see if there were any changes that affect memory requirements 
since your version of Solr.





Using Solr 3.4.0. That changelog actually says it should reduce memory usage 
for that version. We were on a much older version previously, 1.something.

Norms are off on all fields that it can be turned off on.

I'm just hoping this new version doesn't have any leaks. Does FastLRUCache vs 
LRUCache make any memory difference?

RE: Out of memory

2011-09-16 Thread Chris Hostetter

: Actually I am storing twitter streaming data into the core, so the rate of
: index is about 12tweets(docs)/second. The same solr contains 3 other cores
...
: . At any given time I dont need data more than past 15 days, unless
: someone queries for it explicetly. How can this be achieved?

so you are adding 12 docs a second, and you need to keep all docs forever, 
in case someone askes for a specific doc, but otherwise you only typically 
need to search for docs in the past 15 days.

if you index is going to grow w/o bounds at this rate forever then it 
doesn't matter what tricks you try, or how you tune things -- you are 
always going to run out of resources unless you adopt some sort of 
distributed approach.

off the cuff, i would suggest indexing all of the docs for a single day 
in one shard, and making most of your searches be a distributed request 
against the most recent 15 shards.

you didn't say how people query for it explicitly when looking for older 
docs -- if it's by date then when a user asks for a specific date range 
you cna just query those shards explicitly, if it's by some unique id then 
you'll want to cache in your application the min/max id for each doc in 
each shard (easy enough to determine by looping over them all and doing a 
stast query)


-Hoss


Re: Out of memory

2011-09-16 Thread Luis Cappa Banda
Hello.

Facet queries are slower than others specially when you are working with a
69G index. I would like to know more about the context where occurs the Out
of memory exception: is it during an indexation? Do you index at the same
time as users launches queries to twitter index? Are you using the
autocommit option? If yes, which is your configuration of number of
documents buffered or time elapsed to do next commit?

By the way, I think that using more servers/indexes with the shards option
to launch distributed queries won´t solve the problem. In my opinion it will
continue as slow as the present, or even more. Try to check what kind of
query you are launching while faceting. I mean that it´s not the same to
query with an *q=*:* and then facet=truefacet.field=whatever* as
*q=field1:value+AND+field2:value...facet=truefacet.field=whatever.
*I recommend you to check again the query and to test it usingcaching and fq
parameters in it. Probably you´ll get better Qtime results.

Luis Cappa.


RE: Out of memory

2011-09-16 Thread Rohit
Thanks Chris,

This makes sense, at any time we show users a trend graph for all the tweets
relevant for them in the last 15 days. So I guess keeping a shards for the
last 15-20 days data would be a good option and all the other data moved to
different shards each with 2 months data.

I have no idea about sharding right now, if you could point me to some
resource for date wise sharding. 

Regards,
Rohit

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: 17 September 2011 00:19
To: solr-user@lucene.apache.org
Subject: RE: Out of memory


: Actually I am storing twitter streaming data into the core, so the rate of
: index is about 12tweets(docs)/second. The same solr contains 3 other cores
...
: . At any given time I dont need data more than past 15 days,
unless
: someone queries for it explicetly. How can this be achieved?

so you are adding 12 docs a second, and you need to keep all docs forever, 
in case someone askes for a specific doc, but otherwise you only typically 
need to search for docs in the past 15 days.

if you index is going to grow w/o bounds at this rate forever then it 
doesn't matter what tricks you try, or how you tune things -- you are 
always going to run out of resources unless you adopt some sort of 
distributed approach.

off the cuff, i would suggest indexing all of the docs for a single day 
in one shard, and making most of your searches be a distributed request 
against the most recent 15 shards.

you didn't say how people query for it explicitly when looking for older 
docs -- if it's by date then when a user asks for a specific date range 
you cna just query those shards explicitly, if it's by some unique id then 
you'll want to cache in your application the min/max id for each doc in 
each shard (easy enough to determine by looping over them all and doing a 
stast query)


-Hoss



RE: Out of memory

2011-09-16 Thread Rohit
Hi Luis,

First to ans you questions,

-Do you index at the same time as users launches queries to twitter index?
Yes, currently I am using the same server for indexing and querying due to
lack of resources to extent to another server. 

-Are you using the autocommit option?
No, I am committing on approximately 50,000 documents and optimizing once a
day.

I am using the fq parameter during faceting, since all my queries are
datetime bound and max and min auto_id bound. Eg, fq=createdOnGMT[Date1 TO
Date2]fq=id:[id1 TO id2]facet=truefacet.field..




Regards,
Rohit
Mobile: +91-9901768202
About Me: http://about.me/rohitg


-Original Message-
From: Luis Cappa Banda [mailto:luisca...@gmail.com] 
Sent: 17 September 2011 03:38
To: solr-user@lucene.apache.org
Subject: Re: Out of memory

Hello.

Facet queries are slower than others specially when you are working with a
69G index. I would like to know more about the context where occurs the Out
of memory exception: is it during an indexation? Do you index at the same
time as users launches queries to twitter index? Are you using the
autocommit option? If yes, which is your configuration of number of
documents buffered or time elapsed to do next commit?

By the way, I think that using more servers/indexes with the shards option
to launch distributed queries won´t solve the problem. In my opinion it will
continue as slow as the present, or even more. Try to check what kind of
query you are launching while faceting. I mean that it´s not the same to
query with an *q=*:* and then facet=truefacet.field=whatever* as
*q=field1:value+AND+field2:value...facet=truefacet.field=whatever.
*I recommend you to check again the query and to test it usingcaching and fq
parameters in it. Probably you´ll get better Qtime results.

Luis Cappa.



Re: Out of memory

2011-09-15 Thread Dmitry Kan
Hello,
Since you use caching, you can monitor the eviction parameter on the solr
admin page (http://localhost:port/solr/admin/stats.jsp#cache). If it is non
zero, the cache can be made e.g. bigger.
queryResultWindowSize=50 in my case.
Not sure, if solr 3.1 supports, but in 1.4 I have:
HashDocSet maxSize=1000 loadFactor=0.75/

Does the OOM happen on update/commit or search?

Dmitry

On Wed, Sep 14, 2011 at 2:47 PM, Rohit ro...@in-rev.com wrote:

 Thanks Dmirty for the offer to help, I am using some caching in one of the
 cores not. Earlier I was using on other cores too, but now I have commented
 them out because of frequent OOM, also some warming up in one of the core. I
 have share the links for my config files for all the 4 cores,

 http://haklus.com/crssConfig.xml
 http://haklus.com/rssConfig.xml
 http://haklus.com/twitterConfig.xml
 http://haklus.com/facebookConfig.xml


 Thanks again
 Rohit


 -Original Message-
 From: Dmitry Kan [mailto:dmitry@gmail.com]
 Sent: 14 September 2011 10:23
 To: solr-user@lucene.apache.org
 Subject: Re: Out of memory

 Hi,

 OK 64GB fits into one shard quite nicely in our setup. But I have never
 used
 multicore setup. In total you have 79,9 GB. We try to have 70-100GB per
 shard with caching on. Do you do warming up of your index on starting?
 Also,
 there was a setting of pre-populating the cache.

 It could also help, if you can show some parts of your solrconfig file.
 What
 is the solr version you use?

 Regards,
 Dmitry

 On Wed, Sep 14, 2011 at 11:38 AM, Rohit ro...@in-rev.com wrote:

  Hi Dimtry,
 
  To answer your questions,
 
  -Do you use caching?
  I do user caching, but will disable it and give it a go.
 
  -How big is your index in size on the disk?
  These are the size of the data folder for each of the cores.
  Core1 : 64GB
  Core2 : 6.1GB
  Core3 : 7.9GB
  Core4 : 1.9GB
 
  Will try attaching a jconsole to my solr as suggested to get a better
  picture.
 
  Regards,
  Rohit
 
 
  -Original Message-
  From: Dmitry Kan [mailto:dmitry@gmail.com]
  Sent: 14 September 2011 08:15
  To: solr-user@lucene.apache.org
  Subject: Re: Out of memory
 
  Hi Rohit,
 
  Do you use caching?
  How big is your index in size on the disk?
  What is the stack trace contents?
 
  The OOM problems that we have seen so far were related to the
  index physical size and usage of caching. I don't think we have ever
 found
  the exact cause of these problems, but sharding has helped to keep each
  index relatively small and OOM have gone away.
 
  You can also attach jconsole onto your SOLR via the jmx and monitor the
  memory / cpu usage in a graphical interface. I have also run garbage
  collector manually through jconsole sometimes and it was of a help.
 
  Regards,
  Dmitry
 
  On Wed, Sep 14, 2011 at 9:10 AM, Rohit ro...@in-rev.com wrote:
 
   Thanks Jaeger.
  
   Actually I am storing twitter streaming data into the core, so the rate
  of
   index is about 12tweets(docs)/second. The same solr contains 3 other
  cores
   but these cores are not very heavy. Now the twitter core has become
 very
   large (77516851) and its taking a long time to query (Mostly facet
  queries
   based on date, string fields).
  
   After sometime about 18-20hr solr goes out of memory, the thread dump
   doesn't show anything. How can I improve this besides adding more ram
  into
   the system.
  
  
  
   Regards,
   Rohit
   Mobile: +91-9901768202
   About Me: http://about.me/rohitg
  
   -Original Message-
   From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov]
   Sent: 13 September 2011 21:06
   To: solr-user@lucene.apache.org
   Subject: RE: Out of memory
  
   numDocs is not the number of documents in memory.  It is the number of
   documents currently in the index (which is kept on disk).  Same goes
 for
   maxDocs, except that it is a count of all of the documents that have
 ever
   been in the index since it was created or optimized (including deleted
   documents).
  
   Your subject indicates that something is giving you some kind of Out of
   memory error.  We might better be able to help you if you provide more
   information about your exact problem.
  
   JRJ
  
  
   -Original Message-
   From: Rohit [mailto:ro...@in-rev.com]
   Sent: Tuesday, September 13, 2011 2:29 PM
   To: solr-user@lucene.apache.org
   Subject: Out of memory
  
   I have solr running on a machine with 18Gb Ram , with 4 cores. One of
 the
   core is very big containing 77516851 docs, the stats for searcher given
   below
  
  
  
   searcherName : Searcher@5a578998 main
   caching : true
   numDocs : 77516851
   maxDoc : 77518729
   lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842
   indexVersion : 1308817281798
   openedAt : Tue Sep 13 18:59:52 GMT 2011
   registeredAt : Tue Sep 13 19:00:55 GMT 2011
   warmupTime : 63139
  
  
  
   . Is there a way to reduce the number of docs loaded into
 memory
   for
   this core?
  
   . At any given

RE: Out of memory

2011-09-15 Thread Rohit
It's happening more in search and search has become very slow particularly on 
the core with 69GB index data.

Regards,
Rohit

-Original Message-
From: Dmitry Kan [mailto:dmitry@gmail.com] 
Sent: 15 September 2011 07:51
To: solr-user@lucene.apache.org
Subject: Re: Out of memory

Hello,
Since you use caching, you can monitor the eviction parameter on the solr
admin page (http://localhost:port/solr/admin/stats.jsp#cache). If it is non
zero, the cache can be made e.g. bigger.
queryResultWindowSize=50 in my case.
Not sure, if solr 3.1 supports, but in 1.4 I have:
HashDocSet maxSize=1000 loadFactor=0.75/

Does the OOM happen on update/commit or search?

Dmitry

On Wed, Sep 14, 2011 at 2:47 PM, Rohit ro...@in-rev.com wrote:

 Thanks Dmirty for the offer to help, I am using some caching in one of the
 cores not. Earlier I was using on other cores too, but now I have commented
 them out because of frequent OOM, also some warming up in one of the core. I
 have share the links for my config files for all the 4 cores,

 http://haklus.com/crssConfig.xml
 http://haklus.com/rssConfig.xml
 http://haklus.com/twitterConfig.xml
 http://haklus.com/facebookConfig.xml


 Thanks again
 Rohit


 -Original Message-
 From: Dmitry Kan [mailto:dmitry@gmail.com]
 Sent: 14 September 2011 10:23
 To: solr-user@lucene.apache.org
 Subject: Re: Out of memory

 Hi,

 OK 64GB fits into one shard quite nicely in our setup. But I have never
 used
 multicore setup. In total you have 79,9 GB. We try to have 70-100GB per
 shard with caching on. Do you do warming up of your index on starting?
 Also,
 there was a setting of pre-populating the cache.

 It could also help, if you can show some parts of your solrconfig file.
 What
 is the solr version you use?

 Regards,
 Dmitry

 On Wed, Sep 14, 2011 at 11:38 AM, Rohit ro...@in-rev.com wrote:

  Hi Dimtry,
 
  To answer your questions,
 
  -Do you use caching?
  I do user caching, but will disable it and give it a go.
 
  -How big is your index in size on the disk?
  These are the size of the data folder for each of the cores.
  Core1 : 64GB
  Core2 : 6.1GB
  Core3 : 7.9GB
  Core4 : 1.9GB
 
  Will try attaching a jconsole to my solr as suggested to get a better
  picture.
 
  Regards,
  Rohit
 
 
  -Original Message-
  From: Dmitry Kan [mailto:dmitry@gmail.com]
  Sent: 14 September 2011 08:15
  To: solr-user@lucene.apache.org
  Subject: Re: Out of memory
 
  Hi Rohit,
 
  Do you use caching?
  How big is your index in size on the disk?
  What is the stack trace contents?
 
  The OOM problems that we have seen so far were related to the
  index physical size and usage of caching. I don't think we have ever
 found
  the exact cause of these problems, but sharding has helped to keep each
  index relatively small and OOM have gone away.
 
  You can also attach jconsole onto your SOLR via the jmx and monitor the
  memory / cpu usage in a graphical interface. I have also run garbage
  collector manually through jconsole sometimes and it was of a help.
 
  Regards,
  Dmitry
 
  On Wed, Sep 14, 2011 at 9:10 AM, Rohit ro...@in-rev.com wrote:
 
   Thanks Jaeger.
  
   Actually I am storing twitter streaming data into the core, so the rate
  of
   index is about 12tweets(docs)/second. The same solr contains 3 other
  cores
   but these cores are not very heavy. Now the twitter core has become
 very
   large (77516851) and its taking a long time to query (Mostly facet
  queries
   based on date, string fields).
  
   After sometime about 18-20hr solr goes out of memory, the thread dump
   doesn't show anything. How can I improve this besides adding more ram
  into
   the system.
  
  
  
   Regards,
   Rohit
   Mobile: +91-9901768202
   About Me: http://about.me/rohitg
  
   -Original Message-
   From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov]
   Sent: 13 September 2011 21:06
   To: solr-user@lucene.apache.org
   Subject: RE: Out of memory
  
   numDocs is not the number of documents in memory.  It is the number of
   documents currently in the index (which is kept on disk).  Same goes
 for
   maxDocs, except that it is a count of all of the documents that have
 ever
   been in the index since it was created or optimized (including deleted
   documents).
  
   Your subject indicates that something is giving you some kind of Out of
   memory error.  We might better be able to help you if you provide more
   information about your exact problem.
  
   JRJ
  
  
   -Original Message-
   From: Rohit [mailto:ro...@in-rev.com]
   Sent: Tuesday, September 13, 2011 2:29 PM
   To: solr-user@lucene.apache.org
   Subject: Out of memory
  
   I have solr running on a machine with 18Gb Ram , with 4 cores. One of
 the
   core is very big containing 77516851 docs, the stats for searcher given
   below
  
  
  
   searcherName : Searcher@5a578998 main
   caching : true
   numDocs : 77516851
   maxDoc : 77518729
   lockFactory=org.apache.lucene.store.NativeFSLockFactory

Re: Out of memory

2011-09-15 Thread Dmitry Kan
If you have many users you could scale vertically, i.e. do replication. Buf
before that you could do sharding, for example by indexing entries based on
a hash function. Let's say split 69GB to two shards first and experiment
with it.

Regards,
Dmitry

On Thu, Sep 15, 2011 at 12:22 PM, Rohit ro...@in-rev.com wrote:

 It's happening more in search and search has become very slow particularly
 on the core with 69GB index data.

 Regards,
 Rohit

 -Original Message-
 From: Dmitry Kan [mailto:dmitry@gmail.com]
 Sent: 15 September 2011 07:51
 To: solr-user@lucene.apache.org
 Subject: Re: Out of memory

 Hello,
 Since you use caching, you can monitor the eviction parameter on the solr
 admin page (http://localhost:port/solr/admin/stats.jsp#cache). If it is
 non
 zero, the cache can be made e.g. bigger.
 queryResultWindowSize=50 in my case.
 Not sure, if solr 3.1 supports, but in 1.4 I have:
 HashDocSet maxSize=1000 loadFactor=0.75/

 Does the OOM happen on update/commit or search?

 Dmitry

 On Wed, Sep 14, 2011 at 2:47 PM, Rohit ro...@in-rev.com wrote:

  Thanks Dmirty for the offer to help, I am using some caching in one of
 the
  cores not. Earlier I was using on other cores too, but now I have
 commented
  them out because of frequent OOM, also some warming up in one of the
 core. I
  have share the links for my config files for all the 4 cores,
 
  http://haklus.com/crssConfig.xml
  http://haklus.com/rssConfig.xml
  http://haklus.com/twitterConfig.xml
  http://haklus.com/facebookConfig.xml
 
 
  Thanks again
  Rohit
 
 
  -Original Message-
  From: Dmitry Kan [mailto:dmitry@gmail.com]
  Sent: 14 September 2011 10:23
  To: solr-user@lucene.apache.org
  Subject: Re: Out of memory
 
  Hi,
 
  OK 64GB fits into one shard quite nicely in our setup. But I have never
  used
  multicore setup. In total you have 79,9 GB. We try to have 70-100GB per
  shard with caching on. Do you do warming up of your index on starting?
  Also,
  there was a setting of pre-populating the cache.
 
  It could also help, if you can show some parts of your solrconfig file.
  What
  is the solr version you use?
 
  Regards,
  Dmitry
 
  On Wed, Sep 14, 2011 at 11:38 AM, Rohit ro...@in-rev.com wrote:
 
   Hi Dimtry,
  
   To answer your questions,
  
   -Do you use caching?
   I do user caching, but will disable it and give it a go.
  
   -How big is your index in size on the disk?
   These are the size of the data folder for each of the cores.
   Core1 : 64GB
   Core2 : 6.1GB
   Core3 : 7.9GB
   Core4 : 1.9GB
  
   Will try attaching a jconsole to my solr as suggested to get a better
   picture.
  
   Regards,
   Rohit
  
  
   -Original Message-
   From: Dmitry Kan [mailto:dmitry@gmail.com]
   Sent: 14 September 2011 08:15
   To: solr-user@lucene.apache.org
   Subject: Re: Out of memory
  
   Hi Rohit,
  
   Do you use caching?
   How big is your index in size on the disk?
   What is the stack trace contents?
  
   The OOM problems that we have seen so far were related to the
   index physical size and usage of caching. I don't think we have ever
  found
   the exact cause of these problems, but sharding has helped to keep each
   index relatively small and OOM have gone away.
  
   You can also attach jconsole onto your SOLR via the jmx and monitor the
   memory / cpu usage in a graphical interface. I have also run garbage
   collector manually through jconsole sometimes and it was of a help.
  
   Regards,
   Dmitry
  
   On Wed, Sep 14, 2011 at 9:10 AM, Rohit ro...@in-rev.com wrote:
  
Thanks Jaeger.
   
Actually I am storing twitter streaming data into the core, so the
 rate
   of
index is about 12tweets(docs)/second. The same solr contains 3 other
   cores
but these cores are not very heavy. Now the twitter core has become
  very
large (77516851) and its taking a long time to query (Mostly facet
   queries
based on date, string fields).
   
After sometime about 18-20hr solr goes out of memory, the thread dump
doesn't show anything. How can I improve this besides adding more ram
   into
the system.
   
   
   
Regards,
Rohit
Mobile: +91-9901768202
About Me: http://about.me/rohitg
   
-Original Message-
From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov]
Sent: 13 September 2011 21:06
To: solr-user@lucene.apache.org
Subject: RE: Out of memory
   
numDocs is not the number of documents in memory.  It is the number
 of
documents currently in the index (which is kept on disk).  Same goes
  for
maxDocs, except that it is a count of all of the documents that have
  ever
been in the index since it was created or optimized (including
 deleted
documents).
   
Your subject indicates that something is giving you some kind of Out
 of
memory error.  We might better be able to help you if you provide
 more
information about your exact problem.
   
JRJ
   
   
-Original Message

RE: Out of memory

2011-09-15 Thread Rohit
Thanks Dmitry, let me look into sharading concepts.

Regards,
Rohit
Mobile: +91-9901768202
About Me: http://about.me/rohitg


-Original Message-
From: Dmitry Kan [mailto:dmitry@gmail.com] 
Sent: 15 September 2011 10:15
To: solr-user@lucene.apache.org
Subject: Re: Out of memory

If you have many users you could scale vertically, i.e. do replication. Buf
before that you could do sharding, for example by indexing entries based on
a hash function. Let's say split 69GB to two shards first and experiment
with it.

Regards,
Dmitry

On Thu, Sep 15, 2011 at 12:22 PM, Rohit ro...@in-rev.com wrote:

 It's happening more in search and search has become very slow particularly
 on the core with 69GB index data.

 Regards,
 Rohit

 -Original Message-
 From: Dmitry Kan [mailto:dmitry@gmail.com]
 Sent: 15 September 2011 07:51
 To: solr-user@lucene.apache.org
 Subject: Re: Out of memory

 Hello,
 Since you use caching, you can monitor the eviction parameter on the solr
 admin page (http://localhost:port/solr/admin/stats.jsp#cache). If it is
 non
 zero, the cache can be made e.g. bigger.
 queryResultWindowSize=50 in my case.
 Not sure, if solr 3.1 supports, but in 1.4 I have:
 HashDocSet maxSize=1000 loadFactor=0.75/

 Does the OOM happen on update/commit or search?

 Dmitry

 On Wed, Sep 14, 2011 at 2:47 PM, Rohit ro...@in-rev.com wrote:

  Thanks Dmirty for the offer to help, I am using some caching in one of
 the
  cores not. Earlier I was using on other cores too, but now I have
 commented
  them out because of frequent OOM, also some warming up in one of the
 core. I
  have share the links for my config files for all the 4 cores,
 
  http://haklus.com/crssConfig.xml
  http://haklus.com/rssConfig.xml
  http://haklus.com/twitterConfig.xml
  http://haklus.com/facebookConfig.xml
 
 
  Thanks again
  Rohit
 
 
  -Original Message-
  From: Dmitry Kan [mailto:dmitry@gmail.com]
  Sent: 14 September 2011 10:23
  To: solr-user@lucene.apache.org
  Subject: Re: Out of memory
 
  Hi,
 
  OK 64GB fits into one shard quite nicely in our setup. But I have never
  used
  multicore setup. In total you have 79,9 GB. We try to have 70-100GB per
  shard with caching on. Do you do warming up of your index on starting?
  Also,
  there was a setting of pre-populating the cache.
 
  It could also help, if you can show some parts of your solrconfig file.
  What
  is the solr version you use?
 
  Regards,
  Dmitry
 
  On Wed, Sep 14, 2011 at 11:38 AM, Rohit ro...@in-rev.com wrote:
 
   Hi Dimtry,
  
   To answer your questions,
  
   -Do you use caching?
   I do user caching, but will disable it and give it a go.
  
   -How big is your index in size on the disk?
   These are the size of the data folder for each of the cores.
   Core1 : 64GB
   Core2 : 6.1GB
   Core3 : 7.9GB
   Core4 : 1.9GB
  
   Will try attaching a jconsole to my solr as suggested to get a better
   picture.
  
   Regards,
   Rohit
  
  
   -Original Message-
   From: Dmitry Kan [mailto:dmitry@gmail.com]
   Sent: 14 September 2011 08:15
   To: solr-user@lucene.apache.org
   Subject: Re: Out of memory
  
   Hi Rohit,
  
   Do you use caching?
   How big is your index in size on the disk?
   What is the stack trace contents?
  
   The OOM problems that we have seen so far were related to the
   index physical size and usage of caching. I don't think we have ever
  found
   the exact cause of these problems, but sharding has helped to keep each
   index relatively small and OOM have gone away.
  
   You can also attach jconsole onto your SOLR via the jmx and monitor the
   memory / cpu usage in a graphical interface. I have also run garbage
   collector manually through jconsole sometimes and it was of a help.
  
   Regards,
   Dmitry
  
   On Wed, Sep 14, 2011 at 9:10 AM, Rohit ro...@in-rev.com wrote:
  
Thanks Jaeger.
   
Actually I am storing twitter streaming data into the core, so the
 rate
   of
index is about 12tweets(docs)/second. The same solr contains 3 other
   cores
but these cores are not very heavy. Now the twitter core has become
  very
large (77516851) and its taking a long time to query (Mostly facet
   queries
based on date, string fields).
   
After sometime about 18-20hr solr goes out of memory, the thread dump
doesn't show anything. How can I improve this besides adding more ram
   into
the system.
   
   
   
Regards,
Rohit
Mobile: +91-9901768202
About Me: http://about.me/rohitg
   
-Original Message-
From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov]
Sent: 13 September 2011 21:06
To: solr-user@lucene.apache.org
Subject: RE: Out of memory
   
numDocs is not the number of documents in memory.  It is the number
 of
documents currently in the index (which is kept on disk).  Same goes
  for
maxDocs, except that it is a count of all of the documents that have
  ever
been in the index since it was created or optimized

RE: Out of memory

2011-09-14 Thread Rohit
Thanks Jaeger.

Actually I am storing twitter streaming data into the core, so the rate of
index is about 12tweets(docs)/second. The same solr contains 3 other cores
but these cores are not very heavy. Now the twitter core has become very
large (77516851) and its taking a long time to query (Mostly facet queries
based on date, string fields).

After sometime about 18-20hr solr goes out of memory, the thread dump
doesn't show anything. How can I improve this besides adding more ram into
the system.



Regards,
Rohit
Mobile: +91-9901768202
About Me: http://about.me/rohitg

-Original Message-
From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] 
Sent: 13 September 2011 21:06
To: solr-user@lucene.apache.org
Subject: RE: Out of memory

numDocs is not the number of documents in memory.  It is the number of
documents currently in the index (which is kept on disk).  Same goes for
maxDocs, except that it is a count of all of the documents that have ever
been in the index since it was created or optimized (including deleted
documents).

Your subject indicates that something is giving you some kind of Out of
memory error.  We might better be able to help you if you provide more
information about your exact problem.

JRJ


-Original Message-
From: Rohit [mailto:ro...@in-rev.com] 
Sent: Tuesday, September 13, 2011 2:29 PM
To: solr-user@lucene.apache.org
Subject: Out of memory

I have solr running on a machine with 18Gb Ram , with 4 cores. One of the
core is very big containing 77516851 docs, the stats for searcher given
below

 

searcherName : Searcher@5a578998 main 
caching : true 
numDocs : 77516851 
maxDoc : 77518729 
lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842 
indexVersion : 1308817281798 
openedAt : Tue Sep 13 18:59:52 GMT 2011 
registeredAt : Tue Sep 13 19:00:55 GMT 2011 
warmupTime : 63139

 

. Is there a way to reduce the number of docs loaded into memory for
this core?

. At any given time I dont need data more than past 15 days, unless
someone queries for it explicetly. How can this be achieved?

. Will it be better to go for Solr replication or distribution if
there is little option left

 

 

Regards,

Rohit

Mobile: +91-9901768202

About Me:  http://about.me/rohitg http://about.me/rohitg

 



Re: Out of memory

2011-09-14 Thread Dmitry Kan
Hi Rohit,

Do you use caching?
How big is your index in size on the disk?
What is the stack trace contents?

The OOM problems that we have seen so far were related to the
index physical size and usage of caching. I don't think we have ever found
the exact cause of these problems, but sharding has helped to keep each
index relatively small and OOM have gone away.

You can also attach jconsole onto your SOLR via the jmx and monitor the
memory / cpu usage in a graphical interface. I have also run garbage
collector manually through jconsole sometimes and it was of a help.

Regards,
Dmitry

On Wed, Sep 14, 2011 at 9:10 AM, Rohit ro...@in-rev.com wrote:

 Thanks Jaeger.

 Actually I am storing twitter streaming data into the core, so the rate of
 index is about 12tweets(docs)/second. The same solr contains 3 other cores
 but these cores are not very heavy. Now the twitter core has become very
 large (77516851) and its taking a long time to query (Mostly facet queries
 based on date, string fields).

 After sometime about 18-20hr solr goes out of memory, the thread dump
 doesn't show anything. How can I improve this besides adding more ram into
 the system.



 Regards,
 Rohit
 Mobile: +91-9901768202
 About Me: http://about.me/rohitg

 -Original Message-
 From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov]
 Sent: 13 September 2011 21:06
 To: solr-user@lucene.apache.org
 Subject: RE: Out of memory

 numDocs is not the number of documents in memory.  It is the number of
 documents currently in the index (which is kept on disk).  Same goes for
 maxDocs, except that it is a count of all of the documents that have ever
 been in the index since it was created or optimized (including deleted
 documents).

 Your subject indicates that something is giving you some kind of Out of
 memory error.  We might better be able to help you if you provide more
 information about your exact problem.

 JRJ


 -Original Message-
 From: Rohit [mailto:ro...@in-rev.com]
 Sent: Tuesday, September 13, 2011 2:29 PM
 To: solr-user@lucene.apache.org
 Subject: Out of memory

 I have solr running on a machine with 18Gb Ram , with 4 cores. One of the
 core is very big containing 77516851 docs, the stats for searcher given
 below



 searcherName : Searcher@5a578998 main
 caching : true
 numDocs : 77516851
 maxDoc : 77518729
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842
 indexVersion : 1308817281798
 openedAt : Tue Sep 13 18:59:52 GMT 2011
 registeredAt : Tue Sep 13 19:00:55 GMT 2011
 warmupTime : 63139



 . Is there a way to reduce the number of docs loaded into memory
 for
 this core?

 . At any given time I dont need data more than past 15 days, unless
 someone queries for it explicetly. How can this be achieved?

 . Will it be better to go for Solr replication or distribution if
 there is little option left





 Regards,

 Rohit

 Mobile: +91-9901768202

 About Me:  http://about.me/rohitg http://about.me/rohitg






RE: Out of memory

2011-09-14 Thread Rohit
Hi Dimtry,

To answer your questions,

-Do you use caching?
I do user caching, but will disable it and give it a go.

-How big is your index in size on the disk?
These are the size of the data folder for each of the cores.
Core1 : 64GB
Core2 : 6.1GB
Core3 : 7.9GB
Core4 : 1.9GB

Will try attaching a jconsole to my solr as suggested to get a better picture.

Regards,
Rohit


-Original Message-
From: Dmitry Kan [mailto:dmitry@gmail.com] 
Sent: 14 September 2011 08:15
To: solr-user@lucene.apache.org
Subject: Re: Out of memory

Hi Rohit,

Do you use caching?
How big is your index in size on the disk?
What is the stack trace contents?

The OOM problems that we have seen so far were related to the
index physical size and usage of caching. I don't think we have ever found
the exact cause of these problems, but sharding has helped to keep each
index relatively small and OOM have gone away.

You can also attach jconsole onto your SOLR via the jmx and monitor the
memory / cpu usage in a graphical interface. I have also run garbage
collector manually through jconsole sometimes and it was of a help.

Regards,
Dmitry

On Wed, Sep 14, 2011 at 9:10 AM, Rohit ro...@in-rev.com wrote:

 Thanks Jaeger.

 Actually I am storing twitter streaming data into the core, so the rate of
 index is about 12tweets(docs)/second. The same solr contains 3 other cores
 but these cores are not very heavy. Now the twitter core has become very
 large (77516851) and its taking a long time to query (Mostly facet queries
 based on date, string fields).

 After sometime about 18-20hr solr goes out of memory, the thread dump
 doesn't show anything. How can I improve this besides adding more ram into
 the system.



 Regards,
 Rohit
 Mobile: +91-9901768202
 About Me: http://about.me/rohitg

 -Original Message-
 From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov]
 Sent: 13 September 2011 21:06
 To: solr-user@lucene.apache.org
 Subject: RE: Out of memory

 numDocs is not the number of documents in memory.  It is the number of
 documents currently in the index (which is kept on disk).  Same goes for
 maxDocs, except that it is a count of all of the documents that have ever
 been in the index since it was created or optimized (including deleted
 documents).

 Your subject indicates that something is giving you some kind of Out of
 memory error.  We might better be able to help you if you provide more
 information about your exact problem.

 JRJ


 -Original Message-
 From: Rohit [mailto:ro...@in-rev.com]
 Sent: Tuesday, September 13, 2011 2:29 PM
 To: solr-user@lucene.apache.org
 Subject: Out of memory

 I have solr running on a machine with 18Gb Ram , with 4 cores. One of the
 core is very big containing 77516851 docs, the stats for searcher given
 below



 searcherName : Searcher@5a578998 main
 caching : true
 numDocs : 77516851
 maxDoc : 77518729
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842
 indexVersion : 1308817281798
 openedAt : Tue Sep 13 18:59:52 GMT 2011
 registeredAt : Tue Sep 13 19:00:55 GMT 2011
 warmupTime : 63139



 . Is there a way to reduce the number of docs loaded into memory
 for
 this core?

 . At any given time I dont need data more than past 15 days, unless
 someone queries for it explicetly. How can this be achieved?

 . Will it be better to go for Solr replication or distribution if
 there is little option left





 Regards,

 Rohit

 Mobile: +91-9901768202

 About Me:  http://about.me/rohitg http://about.me/rohitg







Re: Out of memory

2011-09-14 Thread Dmitry Kan
Hi,

OK 64GB fits into one shard quite nicely in our setup. But I have never used
multicore setup. In total you have 79,9 GB. We try to have 70-100GB per
shard with caching on. Do you do warming up of your index on starting? Also,
there was a setting of pre-populating the cache.

It could also help, if you can show some parts of your solrconfig file. What
is the solr version you use?

Regards,
Dmitry

On Wed, Sep 14, 2011 at 11:38 AM, Rohit ro...@in-rev.com wrote:

 Hi Dimtry,

 To answer your questions,

 -Do you use caching?
 I do user caching, but will disable it and give it a go.

 -How big is your index in size on the disk?
 These are the size of the data folder for each of the cores.
 Core1 : 64GB
 Core2 : 6.1GB
 Core3 : 7.9GB
 Core4 : 1.9GB

 Will try attaching a jconsole to my solr as suggested to get a better
 picture.

 Regards,
 Rohit


 -Original Message-
 From: Dmitry Kan [mailto:dmitry@gmail.com]
 Sent: 14 September 2011 08:15
 To: solr-user@lucene.apache.org
 Subject: Re: Out of memory

 Hi Rohit,

 Do you use caching?
 How big is your index in size on the disk?
 What is the stack trace contents?

 The OOM problems that we have seen so far were related to the
 index physical size and usage of caching. I don't think we have ever found
 the exact cause of these problems, but sharding has helped to keep each
 index relatively small and OOM have gone away.

 You can also attach jconsole onto your SOLR via the jmx and monitor the
 memory / cpu usage in a graphical interface. I have also run garbage
 collector manually through jconsole sometimes and it was of a help.

 Regards,
 Dmitry

 On Wed, Sep 14, 2011 at 9:10 AM, Rohit ro...@in-rev.com wrote:

  Thanks Jaeger.
 
  Actually I am storing twitter streaming data into the core, so the rate
 of
  index is about 12tweets(docs)/second. The same solr contains 3 other
 cores
  but these cores are not very heavy. Now the twitter core has become very
  large (77516851) and its taking a long time to query (Mostly facet
 queries
  based on date, string fields).
 
  After sometime about 18-20hr solr goes out of memory, the thread dump
  doesn't show anything. How can I improve this besides adding more ram
 into
  the system.
 
 
 
  Regards,
  Rohit
  Mobile: +91-9901768202
  About Me: http://about.me/rohitg
 
  -Original Message-
  From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov]
  Sent: 13 September 2011 21:06
  To: solr-user@lucene.apache.org
  Subject: RE: Out of memory
 
  numDocs is not the number of documents in memory.  It is the number of
  documents currently in the index (which is kept on disk).  Same goes for
  maxDocs, except that it is a count of all of the documents that have ever
  been in the index since it was created or optimized (including deleted
  documents).
 
  Your subject indicates that something is giving you some kind of Out of
  memory error.  We might better be able to help you if you provide more
  information about your exact problem.
 
  JRJ
 
 
  -Original Message-
  From: Rohit [mailto:ro...@in-rev.com]
  Sent: Tuesday, September 13, 2011 2:29 PM
  To: solr-user@lucene.apache.org
  Subject: Out of memory
 
  I have solr running on a machine with 18Gb Ram , with 4 cores. One of the
  core is very big containing 77516851 docs, the stats for searcher given
  below
 
 
 
  searcherName : Searcher@5a578998 main
  caching : true
  numDocs : 77516851
  maxDoc : 77518729
  lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842
  indexVersion : 1308817281798
  openedAt : Tue Sep 13 18:59:52 GMT 2011
  registeredAt : Tue Sep 13 19:00:55 GMT 2011
  warmupTime : 63139
 
 
 
  . Is there a way to reduce the number of docs loaded into memory
  for
  this core?
 
  . At any given time I dont need data more than past 15 days,
 unless
  someone queries for it explicetly. How can this be achieved?
 
  . Will it be better to go for Solr replication or distribution if
  there is little option left
 
 
 
 
 
  Regards,
 
  Rohit
 
  Mobile: +91-9901768202
 
  About Me:  http://about.me/rohitg http://about.me/rohitg
 
 
 
 




-- 
Regards,

Dmitry Kan


RE: Out of memory

2011-09-14 Thread Rohit
Thanks Dmirty for the offer to help, I am using some caching in one of the 
cores not. Earlier I was using on other cores too, but now I have commented 
them out because of frequent OOM, also some warming up in one of the core. I 
have share the links for my config files for all the 4 cores,

http://haklus.com/crssConfig.xml
http://haklus.com/rssConfig.xml
http://haklus.com/twitterConfig.xml
http://haklus.com/facebookConfig.xml


Thanks again
Rohit


-Original Message-
From: Dmitry Kan [mailto:dmitry@gmail.com] 
Sent: 14 September 2011 10:23
To: solr-user@lucene.apache.org
Subject: Re: Out of memory

Hi,

OK 64GB fits into one shard quite nicely in our setup. But I have never used
multicore setup. In total you have 79,9 GB. We try to have 70-100GB per
shard with caching on. Do you do warming up of your index on starting? Also,
there was a setting of pre-populating the cache.

It could also help, if you can show some parts of your solrconfig file. What
is the solr version you use?

Regards,
Dmitry

On Wed, Sep 14, 2011 at 11:38 AM, Rohit ro...@in-rev.com wrote:

 Hi Dimtry,

 To answer your questions,

 -Do you use caching?
 I do user caching, but will disable it and give it a go.

 -How big is your index in size on the disk?
 These are the size of the data folder for each of the cores.
 Core1 : 64GB
 Core2 : 6.1GB
 Core3 : 7.9GB
 Core4 : 1.9GB

 Will try attaching a jconsole to my solr as suggested to get a better
 picture.

 Regards,
 Rohit


 -Original Message-
 From: Dmitry Kan [mailto:dmitry@gmail.com]
 Sent: 14 September 2011 08:15
 To: solr-user@lucene.apache.org
 Subject: Re: Out of memory

 Hi Rohit,

 Do you use caching?
 How big is your index in size on the disk?
 What is the stack trace contents?

 The OOM problems that we have seen so far were related to the
 index physical size and usage of caching. I don't think we have ever found
 the exact cause of these problems, but sharding has helped to keep each
 index relatively small and OOM have gone away.

 You can also attach jconsole onto your SOLR via the jmx and monitor the
 memory / cpu usage in a graphical interface. I have also run garbage
 collector manually through jconsole sometimes and it was of a help.

 Regards,
 Dmitry

 On Wed, Sep 14, 2011 at 9:10 AM, Rohit ro...@in-rev.com wrote:

  Thanks Jaeger.
 
  Actually I am storing twitter streaming data into the core, so the rate
 of
  index is about 12tweets(docs)/second. The same solr contains 3 other
 cores
  but these cores are not very heavy. Now the twitter core has become very
  large (77516851) and its taking a long time to query (Mostly facet
 queries
  based on date, string fields).
 
  After sometime about 18-20hr solr goes out of memory, the thread dump
  doesn't show anything. How can I improve this besides adding more ram
 into
  the system.
 
 
 
  Regards,
  Rohit
  Mobile: +91-9901768202
  About Me: http://about.me/rohitg
 
  -Original Message-
  From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov]
  Sent: 13 September 2011 21:06
  To: solr-user@lucene.apache.org
  Subject: RE: Out of memory
 
  numDocs is not the number of documents in memory.  It is the number of
  documents currently in the index (which is kept on disk).  Same goes for
  maxDocs, except that it is a count of all of the documents that have ever
  been in the index since it was created or optimized (including deleted
  documents).
 
  Your subject indicates that something is giving you some kind of Out of
  memory error.  We might better be able to help you if you provide more
  information about your exact problem.
 
  JRJ
 
 
  -Original Message-
  From: Rohit [mailto:ro...@in-rev.com]
  Sent: Tuesday, September 13, 2011 2:29 PM
  To: solr-user@lucene.apache.org
  Subject: Out of memory
 
  I have solr running on a machine with 18Gb Ram , with 4 cores. One of the
  core is very big containing 77516851 docs, the stats for searcher given
  below
 
 
 
  searcherName : Searcher@5a578998 main
  caching : true
  numDocs : 77516851
  maxDoc : 77518729
  lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842
  indexVersion : 1308817281798
  openedAt : Tue Sep 13 18:59:52 GMT 2011
  registeredAt : Tue Sep 13 19:00:55 GMT 2011
  warmupTime : 63139
 
 
 
  . Is there a way to reduce the number of docs loaded into memory
  for
  this core?
 
  . At any given time I dont need data more than past 15 days,
 unless
  someone queries for it explicetly. How can this be achieved?
 
  . Will it be better to go for Solr replication or distribution if
  there is little option left
 
 
 
 
 
  Regards,
 
  Rohit
 
  Mobile: +91-9901768202
 
  About Me:  http://about.me/rohitg http://about.me/rohitg
 
 
 
 




-- 
Regards,

Dmitry Kan



RE: Out of memory

2011-09-13 Thread Jaeger, Jay - DOT
numDocs is not the number of documents in memory.  It is the number of 
documents currently in the index (which is kept on disk).  Same goes for 
maxDocs, except that it is a count of all of the documents that have ever been 
in the index since it was created or optimized (including deleted documents).

Your subject indicates that something is giving you some kind of Out of memory 
error.  We might better be able to help you if you provide more information 
about your exact problem.

JRJ


-Original Message-
From: Rohit [mailto:ro...@in-rev.com] 
Sent: Tuesday, September 13, 2011 2:29 PM
To: solr-user@lucene.apache.org
Subject: Out of memory

I have solr running on a machine with 18Gb Ram , with 4 cores. One of the
core is very big containing 77516851 docs, the stats for searcher given
below

 

searcherName : Searcher@5a578998 main 
caching : true 
numDocs : 77516851 
maxDoc : 77518729 
lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842 
indexVersion : 1308817281798 
openedAt : Tue Sep 13 18:59:52 GMT 2011 
registeredAt : Tue Sep 13 19:00:55 GMT 2011 
warmupTime : 63139

 

. Is there a way to reduce the number of docs loaded into memory for
this core?

. At any given time I dont need data more than past 15 days, unless
someone queries for it explicetly. How can this be achieved?

. Will it be better to go for Solr replication or distribution if
there is little option left

 

 

Regards,

Rohit

Mobile: +91-9901768202

About Me:  http://about.me/rohitg http://about.me/rohitg

 



RE: Out of memory on sorting

2011-05-26 Thread pravesh
For saving Memory:

1. allocate as much memory to the JVM (especially if you are using 64bit OS)
2. You can set omitNorms=true for your date  id fields (actually for all
fields where index-time boosting  length normalization isn't required. This
will require a full reindex)
3. Are you sorting on all document available in index. Try to limit it using
filter queries.
4. Avoid match all docs query like, q=*:*  (if you are using this)
5. If you could do away with sorting on ID field, and sort on field with
lesser unique terms


Hope this helps

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Out-of-memory-on-sorting-tp2960578p2988336.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Out of memory on sorting

2011-05-19 Thread rajini maski
Explicit Warming of Sort Fields

If you do a lot of field based sorting, it is advantageous to add explicitly
warming queries to the newSearcher and firstSearcher event listeners in
your solrconfig which sort on those fields, so the FieldCache is populated
prior to any queries being executed by your users.
firstSearcher
lst str name=qsolr rocks/strstr name=start0/strstr
name=rows10/strstr name=sortempID asc/str/lst



On Thu, May 19, 2011 at 2:39 PM, Rohit ro...@in-rev.com wrote:

 Hi,



 We are moving to a multi-core Solr installation with each of the core
 having
 millions of documents, also documents would be added to the index on an
 hourly basis.  Everything seems to run find and I getting the expected
 result and performance, except where sorting is concerned.



 I have an index size of 13217121 documents, now when I want to get
 documents
 between two dates and then sort them by ID  solr goes out of memory. This
 is
 with just me using the system, we might also have simultaneous users, how
 can I improve this performance?



 Rohit




RE: Out of memory on sorting

2011-05-19 Thread Rohit
Thanks for pointing me in the right direction, now I see the configuration
for firstsearcher or newsearcher, the str name=q needs to configured
previously. In my case the q is every changing, users can actually search
for anything and the possibilities of queries unlimited. 

How can I make this generic?

-Rohit



-Original Message-
From: rajini maski [mailto:rajinima...@gmail.com] 
Sent: 19 May 2011 14:53
To: solr-user@lucene.apache.org
Subject: Re: Out of memory on sorting

Explicit Warming of Sort Fields

If you do a lot of field based sorting, it is advantageous to add explicitly
warming queries to the newSearcher and firstSearcher event listeners in
your solrconfig which sort on those fields, so the FieldCache is populated
prior to any queries being executed by your users.
firstSearcher
lst str name=qsolr rocks/strstr name=start0/strstr
name=rows10/strstr name=sortempID asc/str/lst



On Thu, May 19, 2011 at 2:39 PM, Rohit ro...@in-rev.com wrote:

 Hi,



 We are moving to a multi-core Solr installation with each of the core
 having
 millions of documents, also documents would be added to the index on an
 hourly basis.  Everything seems to run find and I getting the expected
 result and performance, except where sorting is concerned.



 I have an index size of 13217121 documents, now when I want to get
 documents
 between two dates and then sort them by ID  solr goes out of memory. This
 is
 with just me using the system, we might also have simultaneous users, how
 can I improve this performance?



 Rohit





Re: Out of memory on sorting

2011-05-19 Thread Erick Erickson
The warming queries warm up the caches used in sorting. So
just including the sort=. will warm the sort caches. the terms
searched are not important. The same is true with facets...

However, I don't understand how that relates to your OOM problems. I'd
expect the OOM to start happening on startup, you'd be doing
the operation that runs you out of memory on startup...

So, we need more details:
1 how is your sort field defined? String? Integer? If it's a string
 and you could change it to a numeric type, you'd use a lot
 less memory.
2 How many distinct terms? I'm guessing one/document actually,
 this is somewhat of an anti-pattern in Solr for all it's sometimes
 necessary.
3 How much memory are you allocating for the JVM?
4 What other fields are you sorting on and how many unique values
 in each? Solr Admin can help you here

Best
Erick


On Thu, May 19, 2011 at 6:20 AM, Rohit ro...@in-rev.com wrote:
 Thanks for pointing me in the right direction, now I see the configuration
 for firstsearcher or newsearcher, the str name=q needs to configured
 previously. In my case the q is every changing, users can actually search
 for anything and the possibilities of queries unlimited.

 How can I make this generic?

 -Rohit



 -Original Message-
 From: rajini maski [mailto:rajinima...@gmail.com]
 Sent: 19 May 2011 14:53
 To: solr-user@lucene.apache.org
 Subject: Re: Out of memory on sorting

 Explicit Warming of Sort Fields

 If you do a lot of field based sorting, it is advantageous to add explicitly
 warming queries to the newSearcher and firstSearcher event listeners in
 your solrconfig which sort on those fields, so the FieldCache is populated
 prior to any queries being executed by your users.
 firstSearcher
 lst str name=qsolr rocks/strstr name=start0/strstr
 name=rows10/strstr name=sortempID asc/str/lst



 On Thu, May 19, 2011 at 2:39 PM, Rohit ro...@in-rev.com wrote:

 Hi,



 We are moving to a multi-core Solr installation with each of the core
 having
 millions of documents, also documents would be added to the index on an
 hourly basis.  Everything seems to run find and I getting the expected
 result and performance, except where sorting is concerned.



 I have an index size of 13217121 documents, now when I want to get
 documents
 between two dates and then sort them by ID  solr goes out of memory. This
 is
 with just me using the system, we might also have simultaneous users, how
 can I improve this performance?



 Rohit






RE: Out of memory on sorting

2011-05-19 Thread Rohit
Hi Erick,

My OOM problem starts when I query the core with 13217121 documents. My
schema and other details are given below,

1 how is your sort field defined? String? Integer? If it's a string and you
could change it to a numeric type, you'd use a lot less memory.

We primarily use two different sort criteria one is a date field and the
other is string (id). I cannot change the id field as this is also the
uniquekey for my schema. 

2 How many distinct terms? I'm guessing one/document actually,this is
somewhat of an anti-pattern in Solr for all it's sometimes necessary.

Since one of the field is a timestamp instance and the other a unique key
all are distinct. (These are tweets happening for keyword)

3 How much memory are you allocating for the JVM?

I am starting solr with the following command java -Xms1024M -Xmx-2048M
start.jar


All out test case for moving to solr has passed, this is proving to be a big
set back. Help would be greatly appreciated.

Regards,
Rohit



-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 19 May 2011 18:21
To: solr-user@lucene.apache.org
Subject: Re: Out of memory on sorting

The warming queries warm up the caches used in sorting. So
just including the sort=. will warm the sort caches. the terms
searched are not important. The same is true with facets...

However, I don't understand how that relates to your OOM problems. I'd
expect the OOM to start happening on startup, you'd be doing
the operation that runs you out of memory on startup...

So, we need more details:
1 how is your sort field defined? String? Integer? If it's a string
 and you could change it to a numeric type, you'd use a lot
 less memory.
2 How many distinct terms? I'm guessing one/document actually,
 this is somewhat of an anti-pattern in Solr for all it's sometimes
 necessary.
3 How much memory are you allocating for the JVM?
4 What other fields are you sorting on and how many unique values
 in each? Solr Admin can help you here

Best
Erick


On Thu, May 19, 2011 at 6:20 AM, Rohit ro...@in-rev.com wrote:
 Thanks for pointing me in the right direction, now I see the configuration
 for firstsearcher or newsearcher, the str name=q needs to configured
 previously. In my case the q is every changing, users can actually search
 for anything and the possibilities of queries unlimited.

 How can I make this generic?

 -Rohit



 -Original Message-
 From: rajini maski [mailto:rajinima...@gmail.com]
 Sent: 19 May 2011 14:53
 To: solr-user@lucene.apache.org
 Subject: Re: Out of memory on sorting

 Explicit Warming of Sort Fields

 If you do a lot of field based sorting, it is advantageous to add
explicitly
 warming queries to the newSearcher and firstSearcher event listeners
in
 your solrconfig which sort on those fields, so the FieldCache is populated
 prior to any queries being executed by your users.
 firstSearcher
 lst str name=qsolr rocks/strstr name=start0/strstr
 name=rows10/strstr name=sortempID asc/str/lst



 On Thu, May 19, 2011 at 2:39 PM, Rohit ro...@in-rev.com wrote:

 Hi,



 We are moving to a multi-core Solr installation with each of the core
 having
 millions of documents, also documents would be added to the index on an
 hourly basis.  Everything seems to run find and I getting the expected
 result and performance, except where sorting is concerned.



 I have an index size of 13217121 documents, now when I want to get
 documents
 between two dates and then sort them by ID  solr goes out of memory. This
 is
 with just me using the system, we might also have simultaneous users, how
 can I improve this performance?



 Rohit







Re: Out of memory on sorting

2011-05-19 Thread Erick Erickson
See below:

On Thu, May 19, 2011 at 9:06 AM, Rohit ro...@in-rev.com wrote:
 Hi Erick,

 My OOM problem starts when I query the core with 13217121 documents. My
 schema and other details are given below,

H, how many cores are you running and what are they doing? Because they
all use the same memory pool, so you may be getting some carry-over. So one
strategy would be just to move this core to a dedicated machine.


 1 how is your sort field defined? String? Integer? If it's a string and you
 could change it to a numeric type, you'd use a lot less memory.

 We primarily use two different sort criteria one is a date field and the
 other is string (id). I cannot change the id field as this is also the
 uniquekey for my schema.

OK, but can you use a separate field just for sorting? Populate it with
a copyField and sort on that rather than ID. This is only helpful if
you can make a compact representation, e.g. integer.


 2 How many distinct terms? I'm guessing one/document actually,this is
 somewhat of an anti-pattern in Solr for all it's sometimes necessary.

 Since one of the field is a timestamp instance and the other a unique key
 all are distinct. (These are tweets happening for keyword)


Not one, but two fields where all values are distinct. Although  I don't think
the timestamp is much of a problem, assuming you're storing it as one
of the numeric types (I'd especially make sure it was one of the Trie types,
specifically tdate if you're going to do range queries). There are tricks for
dealing with this, but your id field will get you a bigger bang for the buck,
concentrate on that first.

 3 How much memory are you allocating for the JVM?

 I am starting solr with the following command java -Xms1024M -Xmx-2048M
 start.jar


Well, you can bump this higher if you're on 64 bit OSs, The other possibility is
to shard your index. But really, with 13M documents this should fit on one
machine.

What does your statistics page tell you, especially about cache usage?




 All out test case for moving to solr has passed, this is proving to be a big
 set back. Help would be greatly appreciated.

 Regards,
 Rohit



 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: 19 May 2011 18:21
 To: solr-user@lucene.apache.org
 Subject: Re: Out of memory on sorting

 The warming queries warm up the caches used in sorting. So
 just including the sort=. will warm the sort caches. the terms
 searched are not important. The same is true with facets...

 However, I don't understand how that relates to your OOM problems. I'd
 expect the OOM to start happening on startup, you'd be doing
 the operation that runs you out of memory on startup...

 So, we need more details:
 1 how is your sort field defined? String? Integer? If it's a string
     and you could change it to a numeric type, you'd use a lot
     less memory.
 2 How many distinct terms? I'm guessing one/document actually,
     this is somewhat of an anti-pattern in Solr for all it's sometimes
     necessary.
 3 How much memory are you allocating for the JVM?
 4 What other fields are you sorting on and how many unique values
     in each? Solr Admin can help you here

 Best
 Erick


 On Thu, May 19, 2011 at 6:20 AM, Rohit ro...@in-rev.com wrote:
 Thanks for pointing me in the right direction, now I see the configuration
 for firstsearcher or newsearcher, the str name=q needs to configured
 previously. In my case the q is every changing, users can actually search
 for anything and the possibilities of queries unlimited.

 How can I make this generic?

 -Rohit



 -Original Message-
 From: rajini maski [mailto:rajinima...@gmail.com]
 Sent: 19 May 2011 14:53
 To: solr-user@lucene.apache.org
 Subject: Re: Out of memory on sorting

 Explicit Warming of Sort Fields

 If you do a lot of field based sorting, it is advantageous to add
 explicitly
 warming queries to the newSearcher and firstSearcher event listeners
 in
 your solrconfig which sort on those fields, so the FieldCache is populated
 prior to any queries being executed by your users.
 firstSearcher
 lst str name=qsolr rocks/strstr name=start0/strstr
 name=rows10/strstr name=sortempID asc/str/lst



 On Thu, May 19, 2011 at 2:39 PM, Rohit ro...@in-rev.com wrote:

 Hi,



 We are moving to a multi-core Solr installation with each of the core
 having
 millions of documents, also documents would be added to the index on an
 hourly basis.  Everything seems to run find and I getting the expected
 result and performance, except where sorting is concerned.



 I have an index size of 13217121 documents, now when I want to get
 documents
 between two dates and then sort them by ID  solr goes out of memory. This
 is
 with just me using the system, we might also have simultaneous users, how
 can I improve this performance?



 Rohit








Re: Out of memory while creating indexes

2011-03-04 Thread Praveen Parameswaran
Hi ,

post.sh is using curl as I see , will that be helpful?

On Fri, Mar 4, 2011 at 1:24 PM, Upayavira u...@odoko.co.uk wrote:

 post.jar is intended for demo purposes, not production use, so it
 doesn;t surprise me you've managed to break it.

 Have you tried using curl to do the post?

 Upayavira

 On Thu, 03 Mar 2011 17:02 -0500, Solr User solr...@gmail.com wrote:
  Hi All,
 
  I am trying to create indexes out of a 400MB XML file using the following
  command and I am running into out of memory exception.
 
  $JAVA_HOME/bin/java -Xms768m -Xmx1024m -*Durl*=http://$SOLR_HOST
  SOLR_PORT/solr/customercarecore/update -jar
  $SOLRBASEDIR/*dataconvertor*/common/lib/post.jar
  $SOLRBASEDIR/dataconvertor/customercare/xml/CustomerData.xml
 
  I am planning to bump up the memory and try again.
 
  Did any one ran into similar issue? Any inputs would be very helpful to
  resolve the out of memory exception.
 
  I was able to create indexes with small file but not with large file. I
  am
  not using Solr J.
 
  Thanks,
  Solr User
 
 ---
 Enterprise Search Consultant at Sourcesense UK,
 Making Sense of Open Source




Re: Out of memory while creating indexes

2011-03-03 Thread Gora Mohanty
On Fri, Mar 4, 2011 at 3:32 AM, Solr User solr...@gmail.com wrote:
 Hi All,

 I am trying to create indexes out of a 400MB XML file using the following
 command and I am running into out of memory exception.

Is this a single record in the XML file? If it is more than one, breaking
it up into separate XML files, say one per record, should help.

 $JAVA_HOME/bin/java -Xms768m -Xmx1024m -*Durl*=http://$SOLR_HOST
 SOLR_PORT/solr/customercarecore/update -jar
 $SOLRBASEDIR/*dataconvertor*/common/lib/post.jar
 $SOLRBASEDIR/dataconvertor/customercare/xml/CustomerData.xml

 I am planning to bump up the memory and try again.
[...]

If you give Solr enough memory this should work, but IMHO, it would
be better to break up your input XML files if you can.

Regards,
Gora


Re: Out of memory while creating indexes

2011-03-03 Thread Upayavira
post.jar is intended for demo purposes, not production use, so it
doesn;t surprise me you've managed to break it.

Have you tried using curl to do the post?

Upayavira

On Thu, 03 Mar 2011 17:02 -0500, Solr User solr...@gmail.com wrote:
 Hi All,
 
 I am trying to create indexes out of a 400MB XML file using the following
 command and I am running into out of memory exception.
 
 $JAVA_HOME/bin/java -Xms768m -Xmx1024m -*Durl*=http://$SOLR_HOST
 SOLR_PORT/solr/customercarecore/update -jar
 $SOLRBASEDIR/*dataconvertor*/common/lib/post.jar
 $SOLRBASEDIR/dataconvertor/customercare/xml/CustomerData.xml
 
 I am planning to bump up the memory and try again.
 
 Did any one ran into similar issue? Any inputs would be very helpful to
 resolve the out of memory exception.
 
 I was able to create indexes with small file but not with large file. I
 am
 not using Solr J.
 
 Thanks,
 Solr User
 
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source



Re: Out of memory error

2010-12-07 Thread Erick Erickson
Have you seen this page? http://wiki.apache.org/solr/DataImportHandlerFaq
http://wiki.apache.org/solr/DataImportHandlerFaqSee especially batchsize,
but it looks like you're already on to that.

Do you have any idea how big the records are in the database? You might
try adjusting the rambuffersize down, what is it at now?

In general, what are our Solr commit options?

Does anything get to Solr or is the OOM when the SQL is executed?
The first question to answer is whether you index anything at all...

There's a little-know DIH debug page you can access at:
.../solr/admin/dataimport.jsp that might help, and progress can be monitored
at:
.../solr/dataimport

DIH can be interesting, you get finer control with SolrJ and a direct
JDBC connection. If you don't get anywhere with DIH.

Scattergun response, but things to try...

Best
Erick

On Tue, Dec 7, 2010 at 12:03 AM, sivaprasad sivaprasa...@echidnainc.comwrote:


 Hi,

 When i am trying to import the data using DIH, iam getting Out of memory
 error.The below are the configurations which i have.

 Database:Mysql
 Os:windows
 No Of documents:15525532
 In Db-config.xml i made batch size as -1

 The solr server is running on Linux machine with tomcat.
 i set tomcat arguments as ./startup.sh -Xms1024M -Xmx2048M

 Can anybody has idea, where the things are going wrong?

 Regards,
 JS


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Out-of-memory-error-tp2031761p2031761.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Out of memory error

2010-12-07 Thread Fuad Efendi
Related: SOLR-846

Sent on the TELUS Mobility network with BlackBerry

-Original Message-
From: Erick Erickson erickerick...@gmail.com
Date: Tue, 7 Dec 2010 08:11:41 
To: solr-user@lucene.apache.org
Reply-To: solr-user@lucene.apache.org
Subject: Re: Out of memory error

Have you seen this page? http://wiki.apache.org/solr/DataImportHandlerFaq
http://wiki.apache.org/solr/DataImportHandlerFaqSee especially batchsize,
but it looks like you're already on to that.

Do you have any idea how big the records are in the database? You might
try adjusting the rambuffersize down, what is it at now?

In general, what are our Solr commit options?

Does anything get to Solr or is the OOM when the SQL is executed?
The first question to answer is whether you index anything at all...

There's a little-know DIH debug page you can access at:
.../solr/admin/dataimport.jsp that might help, and progress can be monitored
at:
.../solr/dataimport

DIH can be interesting, you get finer control with SolrJ and a direct
JDBC connection. If you don't get anywhere with DIH.

Scattergun response, but things to try...

Best
Erick

On Tue, Dec 7, 2010 at 12:03 AM, sivaprasad sivaprasa...@echidnainc.comwrote:


 Hi,

 When i am trying to import the data using DIH, iam getting Out of memory
 error.The below are the configurations which i have.

 Database:Mysql
 Os:windows
 No Of documents:15525532
 In Db-config.xml i made batch size as -1

 The solr server is running on Linux machine with tomcat.
 i set tomcat arguments as ./startup.sh -Xms1024M -Xmx2048M

 Can anybody has idea, where the things are going wrong?

 Regards,
 JS


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Out-of-memory-error-tp2031761p2031761.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Out of memory error

2010-12-06 Thread Fuad Efendi
Batch size -1??? Strange but could be a problem. 

Note also you can't provide parameters to default startup.sh command; you 
should modify setenv.sh instead

--Original Message--
From: sivaprasad
To: solr-user@lucene.apache.org
ReplyTo: solr-user@lucene.apache.org
Subject: Out of memory error
Sent: Dec 7, 2010 12:03 AM


Hi,

When i am trying to import the data using DIH, iam getting Out of memory
error.The below are the configurations which i have.

Database:Mysql
Os:windows
No Of documents:15525532
In Db-config.xml i made batch size as -1

The solr server is running on Linux machine with tomcat.
i set tomcat arguments as ./startup.sh -Xms1024M -Xmx2048M

Can anybody has idea, where the things are going wrong?

Regards,
JS


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Out-of-memory-error-tp2031761p2031761.html
Sent from the Solr - User mailing list archive at Nabble.com.


Sent on the TELUS Mobility network with BlackBerry

RE: Out of Memory

2010-03-23 Thread Craig Christman
Is this on Oracle 10.2.0.4?  Looking at the Oracle support site there's a 
memory leak using some of the XML functions that can be fixed by upgrading to 
10.2.0.5, 11.2, or by using 10.2.0.4 Patch 2 in Windows 32-bit.

-Original Message-
From: Neil Chaudhuri [mailto:nchaudh...@potomacfusion.com]
Sent: Tuesday, March 23, 2010 3:21 PM
To: 'solr-user@lucene.apache.org'
Subject: Out of Memory

I am using the DataImportHandler to index literally millions of documents in an 
Oracle database. Not surprisingly, I got the following after a few hours:

java.sql.SQLException: ORA-04030: out of process memory when trying to allocate 
4032 bytes (kolaGetRfcHeap,kghsseg: kolaslCreateCtx)

Has anyone come across this? What are the ways around this, if any?

Thanks.


RE: Out of Memory

2010-03-23 Thread Dennis Gearon
Now THAT's real open source help! Nice job Craig.
Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Tue, 3/23/10, Craig Christman cchrist...@caci.com wrote:

 From: Craig Christman cchrist...@caci.com
 Subject: RE: Out of Memory
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Tuesday, March 23, 2010, 1:01 PM
 Is this on Oracle 10.2.0.4? 
 Looking at the Oracle support site there's a memory leak
 using some of the XML functions that can be fixed by
 upgrading to 10.2.0.5, 11.2, or by using 10.2.0.4 Patch 2 in
 Windows 32-bit.
 
 -Original Message-
 From: Neil Chaudhuri [mailto:nchaudh...@potomacfusion.com]
 Sent: Tuesday, March 23, 2010 3:21 PM
 To: 'solr-user@lucene.apache.org'
 Subject: Out of Memory
 
 I am using the DataImportHandler to index literally
 millions of documents in an Oracle database. Not
 surprisingly, I got the following after a few hours:
 
 java.sql.SQLException: ORA-04030: out of process memory
 when trying to allocate 4032 bytes (kolaGetRfcHeap,kghsseg:
 kolaslCreateCtx)
 
 Has anyone come across this? What are the ways around this,
 if any?
 
 Thanks.



Re: Out of Memory Errors

2008-10-22 Thread Nick Jenkin
Have you confirmed Java's -Xmx setting? (Max memory)

e.g. java -Xmx2000MB -jar start.jar
-Nick

On Wed, Oct 22, 2008 at 3:24 PM, Mark Miller [EMAIL PROTECTED] wrote:
 How much RAM in the box total? How many sort fields and what types? Sorts on
 each core?

 Willie Wong wrote:

 Hello,

 I've been having issues with out of memory errors on searches in Solr. I
 was wondering if I'm hitting a limit with solr or if I've configured
 something seriously wrong.

 Solr Setup
 - 3 cores - 3163615 documents each
 - 10 GB size
 - approx 10 fields
 - document sizes vary from a few kb to a few MB
 - no faceting is used however the search query can be fairly complex with
 8 or more fields being searched on at once

 Environment:
 - windows 2003
 - 2.8 GHz zeon processor
 - 1.5 GB memory assigned to solr
 - Jetty 6 server

 Once we get to around a few  concurrent users OOM start occuring and Jetty
 restarts.  Would this just be a case of more memory or are there certain
 configuration settings that need to be set?  We're using an out of the box
 Solr 1.3 beta version.
 A few of the things we considered that might help:
 - Removing sorts on the result sets (result sets are approx 40,000 +
 documents)
 - Reducing cache sizes such as the queryResultMaxDocsCached setting,
 document cache, queryResultCache, filterCache, etc

 Am I missing anything else that should be looked at, or is it time to
 simply increase the memory/start looking at distributing the indexes?  Any
 help would be much appreciated.


 Regards,

 WW






RE: Out of Memory Errors

2008-10-22 Thread r.prieto
Hi Willie,

Are you using highliting ???

If, the response is yes, you need to know that for each document retrieved,
the solr highliting load into memory the full field who is using for this
functionality. If the field is too long, you have problems with memory.

You can solve the problem using this patch:

http://mail-archives.apache.org/mod_mbox/lucene-solr-dev/200806.mbox/%3C1552
[EMAIL PROTECTED]

to copy the content of the field who is used to highliting to another field
and reduce the size.

You also need to know too that Windows have a limitation for memory process
in 2 GB.



-Mensaje original-
De: Willie Wong [mailto:[EMAIL PROTECTED] 
Enviado el: miércoles, 22 de octubre de 2008 3:48
Para: solr-user@lucene.apache.org
Asunto: Out of Memory Errors

Hello,

I've been having issues with out of memory errors on searches in Solr. I 
was wondering if I'm hitting a limit with solr or if I've configured 
something seriously wrong.

Solr Setup
- 3 cores 
- 3163615 documents each
- 10 GB size
- approx 10 fields
- document sizes vary from a few kb to a few MB
- no faceting is used however the search query can be fairly complex with 
8 or more fields being searched on at once

Environment:
- windows 2003
- 2.8 GHz zeon processor
- 1.5 GB memory assigned to solr
- Jetty 6 server

Once we get to around a few  concurrent users OOM start occuring and Jetty 
restarts.  Would this just be a case of more memory or are there certain 
configuration settings that need to be set?  We're using an out of the box 
Solr 1.3 beta version. 

A few of the things we considered that might help:
- Removing sorts on the result sets (result sets are approx 40,000 + 
documents)
- Reducing cache sizes such as the queryResultMaxDocsCached setting, 
document cache, queryResultCache, filterCache, etc

Am I missing anything else that should be looked at, or is it time to 
simply increase the memory/start looking at distributing the indexes?  Any 
help would be much appreciated.


Regards,

WW



Re: Out of Memory Errors

2008-10-22 Thread Jae Joo
Here is what I am doing to check the memory statues.
1. Run the Servelt and Solr application.
2. On command prompt, jstat -gc pid 5s (5s means that getting data every 5
seconds.)
3. Watch it or pipe to the file.
4. Analyze the data gathered.

Jae

On Tue, Oct 21, 2008 at 9:48 PM, Willie Wong [EMAIL PROTECTED]wrote:

 Hello,

 I've been having issues with out of memory errors on searches in Solr. I
 was wondering if I'm hitting a limit with solr or if I've configured
 something seriously wrong.

 Solr Setup
 - 3 cores
 - 3163615 documents each
 - 10 GB size
 - approx 10 fields
 - document sizes vary from a few kb to a few MB
 - no faceting is used however the search query can be fairly complex with
 8 or more fields being searched on at once

 Environment:
 - windows 2003
 - 2.8 GHz zeon processor
 - 1.5 GB memory assigned to solr
 - Jetty 6 server

 Once we get to around a few  concurrent users OOM start occuring and Jetty
 restarts.  Would this just be a case of more memory or are there certain
 configuration settings that need to be set?  We're using an out of the box
 Solr 1.3 beta version.

 A few of the things we considered that might help:
 - Removing sorts on the result sets (result sets are approx 40,000 +
 documents)
 - Reducing cache sizes such as the queryResultMaxDocsCached setting,
 document cache, queryResultCache, filterCache, etc

 Am I missing anything else that should be looked at, or is it time to
 simply increase the memory/start looking at distributing the indexes?  Any
 help would be much appreciated.


 Regards,

 WW



Re: Out of Memory Errors

2008-10-21 Thread Mark Miller
How much RAM in the box total? How many sort fields and what types? 
Sorts on each core?


Willie Wong wrote:

Hello,

I've been having issues with out of memory errors on searches in Solr. I 
was wondering if I'm hitting a limit with solr or if I've configured 
something seriously wrong.


Solr Setup
- 3 cores 
- 3163615 documents each

- 10 GB size
- approx 10 fields
- document sizes vary from a few kb to a few MB
- no faceting is used however the search query can be fairly complex with 
8 or more fields being searched on at once


Environment:
- windows 2003
- 2.8 GHz zeon processor
- 1.5 GB memory assigned to solr
- Jetty 6 server

Once we get to around a few  concurrent users OOM start occuring and Jetty 
restarts.  Would this just be a case of more memory or are there certain 
configuration settings that need to be set?  We're using an out of the box 
Solr 1.3 beta version. 


A few of the things we considered that might help:
- Removing sorts on the result sets (result sets are approx 40,000 + 
documents)
- Reducing cache sizes such as the queryResultMaxDocsCached setting, 
document cache, queryResultCache, filterCache, etc


Am I missing anything else that should be looked at, or is it time to 
simply increase the memory/start looking at distributing the indexes?  Any 
help would be much appreciated.



Regards,

WW

  




RE: Out of memory on Solr sorting

2008-08-05 Thread sundar shankar
Hi all,
I seemed to have found the solution to this problem. Apparently, 
allocating enough virtual memory on the server seems to only solve on half of 
the problem. Even after allocating 4 gigs of Virtual memory on jboss server, I 
still did get the Out of memory on sorting. 
 
I didn't how ever notice that the LRU cache on my config was set to default 
which was still 512 megs of max memory. I had to increase that to a round 2 
gigs and the sorting did work perfectly ok.
 
Even though I am satisfied that I have found the solution to the problem, i am 
still not satisfied to know that Sort consumes so much memory. In no products 
have I seen sorting 10 fields take up 1 gig and half of virtual memory. I am 
not sure, if there could be a better implementation of this. But something 
doesn't seem right to me.
 
Thanks for all your support. It has truly been overwhelming.
 
Sundar



 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: RE: Out of 
 memory on Solr sorting Date: Tue, 29 Jul 2008 10:43:05 -0700  A sneaky 
 source of OutOfMemory errors is the permanent generation. If you add this: 
 -XX:PermSize=64m -XX:MaxPermSize=96m You will increase the size of the 
 permanent generation. We found this helped.  Also note that when you 
 undeploy a war file, the old deployment has permanent storage that is not 
 reclaimed, and so each undeploy/redeploy cycle eats up the permanent 
 generation pool.  -Original Message- From: david w [mailto:[EMAIL 
 PROTECTED]  Sent: Tuesday, July 29, 2008 7:20 AM To: 
 solr-user@lucene.apache.org Subject: Re: Out of memory on Solr sorting  
 Hi, Daniel  I got the same probem like Sundar. Is that possible to tell me 
 what profiling tool you are using?  thx a lot.  /David  On Tue, Jul 
 29, 2008 at 8:19 PM, Daniel Alheiros [EMAIL PROTECTED]wrote:   Hi 
 Sundar.   Well it would be good if you could do some profiling on your 
 Solr app.  I've done it during the indexing process so I could figure out 
 what   was going on in the OutOfMemoryErrors I was getting.   But you 
 won't definitelly need to have as much memory as your whole   index size. I 
 have a 3.5 million documents (aprox. 10Gb) running on   this 2Gb heap VM. 
   Cheers,  Daniel   -Original Message-  From: sundar 
 shankar [mailto:[EMAIL PROTECTED]  Sent: 23 July 2008 23:45  To: 
 solr-user@lucene.apache.org  Subject: RE: Out of memory on Solr sorting  
   Hi Daniel,  I am afraid that didnt solve my problem. I was guessing my 
   problem was that I have too much of data and too little memory   
 allocated for that. I happened to read in couple of the posts which   
 mentioned that I need VM that is close to the size of my data(folder).   I 
 have like 540 Megs now and a little more than a million and a half   docs. 
 Ideally in that case 512 megs should be enough for me. In fact I   am able 
 to perform all other operations now, commit, optmize, select,   update, 
 nightly cron jobs to index data again. etc etc with no   hassles. Even my 
 load tests perform very well. Just the sort and it   doesnt seem to work. I 
 allocated  2 gigs of memory now. Still same results. Used the GC params u 
 gave me   too. No change what so ever. Am not sure, whats going on. Is 
 there   something that I can do to find out how much is needed in actuality 
 as   my production server might need to be configured in accordance.   
 I dont store any documents. We basically fetch standard column data   from 
 oracle database store them into Solr fields. Before I had   EdgeNGram 
 configured and had Solr 1.2, My data size was less that half   of what it 
 is right now. I guess if I remember right, it was of the   order of 100 
 megs. The max size of a field right now might not cross a 100 chars too.  
 Quizzled even more now.   -Sundar   P.S: My configurations :  Solr 
 1.3  Red hat  540 megs of data (1855013 docs)  2 gigs of memory 
 installed and allocated like this   JAVA_OPTS=$JAVA_OPTS -Xms2048m 
 -Xmx2048m -XX:MinHeapFreeRatio=50   -XX:NewSize=1024m  -XX:NewRatio=2 
 -Dsun.rmi.dgc.client.gcInterval=360  
 -Dsun.rmi.dgc.server.gcInterval=360   Jboss 4.05 Subject: 
 RE: Out of memory on Solr sorting   Date: Wed, 23 Jul 2008 10:49:06 +0100 
   From: [EMAIL PROTECTED]   To: solr-user@lucene.apache.org 
 Hi I haven't read the whole thread so I will take my chances here. 
 I've been fighting recently to keep my Solr instances stable because 
they were frequently crashing with OutOfMemoryErrors. I'm using Solr  
  1.2 and when it happens there is a bug that makes the index locked
 unless you restart Solr... So in my cenario it was extremelly  damaging.  
After some profiling I realized that my major problem was caused by  
   the way the JVM heap was being used as I haven't configured it to
 run using any advanced configuration (I had just made it bigger -Xmx 
 and Xms 1.5 Gb), it's running on Sun JVM 1.5 (the most recent1.5   
 available) and it's deployed

RE: Out of memory on Solr sorting

2008-08-05 Thread Fuad Efendi

Hi Sundar,


If increasing LRU cache helps you:
- you are probably using 'tokenized' field for sorting (could you  
confirm please?)...


...you should use 'non-tokenized single-valued non-boolean' for better  
performance of

sorting...


Fuad Efendi
==
http://www.tokenizer.org



Quoting sundar shankar [EMAIL PROTECTED]:


Hi all,
I seemed to have found the solution to this problem.   
Apparently, allocating enough virtual memory on the server seems to   
only solve on half of the problem. Even after allocating 4 gigs of   
Virtual memory on jboss server, I still did get the Out of memory on  
 sorting.


I didn't how ever notice that the LRU cache on my config was set to   
default which was still 512 megs of max memory. I had to increase   
that to a round 2 gigs and the sorting did work perfectly ok.


Even though I am satisfied that I have found the solution to the   
problem, i am still not satisfied to know that Sort consumes so much  
 memory. In no products have I seen sorting 10 fields take up 1 gig   
and half of virtual memory. I am not sure, if there could be a   
better implementation of this. But something doesn't seem right to me.


Thanks for all your support. It has truly been overwhelming.

Sundar







RE: Out of memory on Solr sorting

2008-08-05 Thread sundar shankar



The field is of type text_ws. Is this not recomended. Should I use text 
instead?

 Date: Tue, 5 Aug 2008 10:58:35 -0700 From: [EMAIL PROTECTED] To: [EMAIL 
 PROTECTED] Subject: RE: Out of memory on Solr sorting  Hi Sundar,   If 
 increasing LRU cache helps you: - you are probably using 'tokenized' field 
 for sorting (could you  confirm please?)...  ...you should use 
 'non-tokenized single-valued non-boolean' for better  performance of 
 sorting...   Fuad Efendi == http://www.tokenizer.org  
 Quoting sundar shankar [EMAIL PROTECTED]:   Hi all,  I seemed to have 
 found the solution to this problem.   Apparently, allocating enough virtual 
 memory on the server seems to   only solve on half of the problem. Even 
 after allocating 4 gigs of   Virtual memory on jboss server, I still did 
 get the Out of memory on   sorting.   I didn't how ever notice that the 
 LRU cache on my config was set to   default which was still 512 megs of max 
 memory. I had to increase   that to a round 2 gigs and the sorting did work 
 perfectly ok.   Even though I am satisfied that I have found the solution 
 to the   problem, i am still not satisfied to know that Sort consumes so 
 much   memory. In no products have I seen sorting 10 fields take up 1 gig  
  and half of virtual memory. I am not sure, if there could be a   better 
 implementation of this. But something doesn't seem right to me.   Thanks 
 for all your support. It has truly been overwhelming.   Sundar

Movies, sports  news! Get your daily entertainment fix, only on live.com Try 
it now! 
_
Searching for the best deals on travel? Visit MSN Travel.
http://msn.coxandkings.co.in/cnk/cnk.do

RE: Out of memory on Solr sorting

2008-08-05 Thread Fuad Efendi
My understanding of Lucene Sorting is that it will sort by 'tokens'  
and not by 'full fields'... so that for sorting you need 'full-string'  
(non-tokenized) field, and to search you need another one tokenized.


For instance, use 'string' for sorting, and 'text_ws' for search; and  
use 'copyField'... (some memory for copyField)


Sorting using tokenized field: 100,000 documents, each 'Book Title'  
consists of 10 tokens in average, ... - total 1,000,000 (probably  
unique) tokens in a hashtable; with nontokenized field - 100,000  
entries, and Lucene internal FieldCache is used instead of SOLR LRU.



Also, with tokenized fields 'sorting' is not natural (alphabetical order)...


Fuad Efendi
==
http://www.linkedin.com/in/liferay

Quoting sundar shankar [EMAIL PROTECTED]:


The field is of type text_ws. Is this not recomended. Should I use  
 text instead?


If increasing LRU cache helps you: -  you are probably using  
'tokenized' field for sorting (could you   confirm please?)...  
...you should use 'non-tokenized  single-valued non-boolean' for  
better performance of sorting...





RE: Out of memory on Solr sorting

2008-08-05 Thread Fuad Efendi

Best choice for sorting field:
!-- This is an example of using the KeywordTokenizer along
 With various TokenFilterFactories to produce a sortable field
 that does not include some properties of the source text
  --
fieldType name=alphaOnlySort class=solr.TextField  
sortMissingLast=true omitNorms=true


- case-insentitive etc...


I might be partially wrong about SOLR LRU Cache but it is used somehow  
in your specific case... 'filterCache' is probably used for  
'tokenized' sorting: it stores (token, DocList)...



Fuad Efendi
==
http://www.tokenizer.org


Quoting Fuad Efendi [EMAIL PROTECTED]:


My understanding of Lucene Sorting is that it will sort by 'tokens' and
not by 'full fields'... so that for sorting you need 'full-string'
(non-tokenized) field, and to search you need another one tokenized.

For instance, use 'string' for sorting, and 'text_ws' for search; and
use 'copyField'... (some memory for copyField)

Sorting using tokenized field: 100,000 documents, each 'Book Title'
consists of 10 tokens in average, ... - total 1,000,000 (probably
unique) tokens in a hashtable; with nontokenized field - 100,000
entries, and Lucene internal FieldCache is used instead of SOLR LRU.


Also, with tokenized fields 'sorting' is not natural (alphabetical order)...


Fuad Efendi
==
http://www.linkedin.com/in/liferay

Quoting sundar shankar [EMAIL PROTECTED]:


The field is of type text_ws. Is this not recomended. Should I   
use  text instead?


If increasing LRU cache helps you: -  you are probably using   
'tokenized' field for sorting (could you   confirm please?)...   
...you should use 'non-tokenized  single-valued non-boolean' for   
better performance of sorting...






Re: Out of memory on Solr sorting

2008-08-05 Thread Yonik Seeley
On Tue, Aug 5, 2008 at 1:59 PM, Fuad Efendi [EMAIL PROTECTED] wrote:
 If increasing LRU cache helps you:
 - you are probably using 'tokenized' field for sorting (could you confirm
 please?)...

Sorting does not utilize any Solr caches.

-Yonik


Re: Out of memory on Solr sorting

2008-08-05 Thread Fuad Efendi
I know, and this is strange... I was guessing filterCache is used  
implicitly to get DocSet for token; as Sundar wrote, increase of  
LRUCache helped him (he is sorting on 'text-ws' field)

-Fuad

If increasing LRU cache helps you:
- you are probably using 'tokenized' field for sorting (could you confirm
please?)...


Sorting does not utilize any Solr caches.

-Yonik







RE: Out of memory on Solr sorting

2008-08-05 Thread sundar shankar
Yes this is what I did. I got an out of memory while executing a query with a 
sort param
 
1. Stopped Jboss server
 
2. 
filterCache  class=solr.LRUCache  size=2048  initialSize=512 
 autowarmCount=256/
   !-- queryResultCache caches results of searches - ordered lists of 
document ids (DocList) based on a query, a sort, and the range of 
documents requested.  --queryResultCache  class=solr.LRUCache  
size=2048  initialSize=512  autowarmCount=256/
  !-- documentCache caches Lucene Document objects (the stored fields for each 
document).   Since Lucene internal document ids are transient, this cache 
will not be autowarmed.  --documentCache  class=solr.LRUCache  
size=2048  initialSize=512  autowarmCount=0/
In these 3 params, I changed size from 512 to 2048. 3. Restarted the server
4. Ran query again.
It worked just fine. after that. I am currently reinexing, replaving the 
text_ws to string and having the default size of all 3 caches to 512 and seeing 
if the problem goes away.
 
-Sundar



 Date: Tue, 5 Aug 2008 14:05:05 -0700 From: [EMAIL PROTECTED] To: 
 solr-user@lucene.apache.org Subject: Re: Out of memory on Solr sorting  I 
 know, and this is strange... I was guessing filterCache is used  implicitly 
 to get DocSet for token; as Sundar wrote, increase of  LRUCache helped him 
 (he is sorting on 'text-ws' field) -Fuad  If increasing LRU cache helps 
 you:  - you are probably using 'tokenized' field for sorting (could you 
 confirm  please?)...   Sorting does not utilize any Solr caches.   
 -Yonik
_
Searching for the best deals on travel? Visit MSN Travel.
http://msn.coxandkings.co.in/cnk/cnk.do

RE: Out of memory on Solr sorting

2008-08-05 Thread Fuad Efendi
Sundar, very strange that increase of size/initialSize of LRUCache  
helps with OutOfMemoryError...


2048 is number of entries in cache and _not_ 2Gb of memory...

Making size==initialSize of HashMap-based LRUCache would help with  
performance anyway; may be with OOMs (probably no need to resize  
HashMap...)





In these 3 params, I changed size from 512 to 2048. 3. Restarted the server




sorting  I know, and this is strange... I was guessing   
filterCache is used  implicitly to get DocSet for token; as Sundar  
 wrote, increase of  LRUCache helped him (he is sorting on   
'text-ws' field) -Fuad  If increasing LRU cache helps you:
- you are probably using 'tokenized' field for sorting (could you   
confirm  please?)...   Sorting does not utilize any Solr   
caches.   -Yonik   

_
Searching for the best deals on travel? Visit MSN Travel.
http://msn.coxandkings.co.in/cnk/cnk.do






RE: Out of memory on Solr sorting

2008-08-05 Thread sundar shankar
Oh Wow, I didnt know that was the case. I am completely left baffled now. BAck 
to square one I guess. :)

 Date: Tue, 5 Aug 2008 14:31:28 -0700 From: [EMAIL PROTECTED] To: 
 solr-user@lucene.apache.org Subject: RE: Out of memory on Solr sorting  
 Sundar, very strange that increase of size/initialSize of LRUCache  helps 
 with OutOfMemoryError...  2048 is number of entries in cache and _not_ 2Gb 
 of memory...  Making size==initialSize of HashMap-based LRUCache would help 
 with  performance anyway; may be with OOMs (probably no need to resize  
 HashMap...) In these 3 params, I changed size from 512 to 2048. 3. 
 Restarted the server sorting  I know, and this is strange... I 
 was guessing   filterCache is used  implicitly to get DocSet for token; 
 as Sundar   wrote, increase of  LRUCache helped him (he is sorting on  
  'text-ws' field) -Fuad  If increasing LRU cache helps you:- 
 you are probably using 'tokenized' field for sorting (could you   confirm 
  please?)...   Sorting does not utilize any Solr   caches.   
 -Yonik 
 _  
 Searching for the best deals on travel? Visit MSN Travel.  
 http://msn.coxandkings.co.in/cnk/cnk.do   
_
Searching for the best deals on travel? Visit MSN Travel.
http://msn.coxandkings.co.in/cnk/cnk.do

RE: Out of memory on Solr sorting

2008-07-29 Thread Lance Norskog
A sneaky source of OutOfMemory errors is the permanent generation.  If you
add this:
-XX:PermSize=64m -XX:MaxPermSize=96m
You will increase the size of the permanent generation. We found this
helped.

Also note that when you undeploy a war file, the old deployment has
permanent storage that is not reclaimed, and so each undeploy/redeploy cycle
eats up the permanent generation pool.

-Original Message-
From: david w [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 29, 2008 7:20 AM
To: solr-user@lucene.apache.org
Subject: Re: Out of memory on Solr sorting

Hi, Daniel

  I got the same probem like Sundar. Is that possible to tell me what
profiling tool you are using?

  thx a lot.

/David

On Tue, Jul 29, 2008 at 8:19 PM, Daniel Alheiros
[EMAIL PROTECTED]wrote:

 Hi Sundar.

 Well it would be good if you could do some profiling on your Solr app.
 I've done it during the indexing process so I could figure out what 
 was going on in the OutOfMemoryErrors I was getting.

 But you won't definitelly need to have as much memory as your whole 
 index size. I have a 3.5 million documents (aprox. 10Gb) running on 
 this 2Gb heap VM.

 Cheers,
 Daniel

 -Original Message-
 From: sundar shankar [mailto:[EMAIL PROTECTED]
 Sent: 23 July 2008 23:45
 To: solr-user@lucene.apache.org
 Subject: RE: Out of memory on Solr sorting


 Hi Daniel,
 I am afraid that didnt solve my problem. I was guessing my 
 problem was that I have too much of data and too little memory 
 allocated for that. I happened to read in couple of the posts which 
 mentioned that I need VM that is close to the size of my data(folder). 
 I have like 540 Megs now and a little more than a million and a half 
 docs. Ideally in that case 512 megs should be enough for me. In fact I 
 am able to perform all other operations now, commit, optmize, select, 
 update, nightly cron jobs to index data again. etc etc with no 
 hassles. Even my load tests perform very well. Just the sort and it 
 doesnt seem to work. I allocated
 2 gigs of memory now. Still same results. Used the GC params u gave me 
 too. No change what so ever. Am not sure, whats going on. Is there 
 something that I can do to find out how much is needed in actuality as 
 my production server might need to be configured in accordance.

 I dont store any documents. We basically fetch standard column data 
 from oracle database store them into Solr fields. Before I had 
 EdgeNGram configured and had Solr 1.2, My data size was less that half 
 of what it is right now. I guess if I remember right, it was of the 
 order of 100 megs. The max size of a field right now might not cross a 100
chars too.
 Quizzled even more now.

 -Sundar

 P.S: My configurations :
 Solr 1.3
 Red hat
 540 megs of data (1855013 docs)
 2 gigs of memory installed and allocated like this 
 JAVA_OPTS=$JAVA_OPTS -Xms2048m -Xmx2048m -XX:MinHeapFreeRatio=50 
 -XX:NewSize=1024m
 -XX:NewRatio=2 -Dsun.rmi.dgc.client.gcInterval=360
 -Dsun.rmi.dgc.server.gcInterval=360

 Jboss 4.05


  Subject: RE: Out of memory on Solr sorting
  Date: Wed, 23 Jul 2008 10:49:06 +0100
  From: [EMAIL PROTECTED]
  To: solr-user@lucene.apache.org
 
  Hi
 
  I haven't read the whole thread so I will take my chances here.
 
  I've been fighting recently to keep my Solr instances stable because 
  they were frequently crashing with OutOfMemoryErrors. I'm using Solr
  1.2 and when it happens there is a bug that makes the index locked 
  unless you restart Solr... So in my cenario it was extremelly
 damaging.
 
  After some profiling I realized that my major problem was caused by 
  the way the JVM heap was being used as I haven't configured it to 
  run using any advanced configuration (I had just made it bigger - 
  Xmx and Xms 1.5 Gb), it's running on Sun JVM 1.5 (the most recent 
  1.5
  available) and it's deployed on a Jboss 4.2 on a RHEL.
 
  So my findings were too many objects were being allocated on the old 
  generation area of the heap, which makes them harder to be disposed, 
  and also the default behaviour was letting the heap get too filled 
  up before kicking a GC and according to the JVM specs the default is 
  if after a short period when a full gc is executed if a certain 
  percentage of the heap is not freed an OutOfMemoryError should be
 thrown.
 
  I've changed my JVM startup params and it's working extremelly 
  stable since then:
 
  -Xmx2048m -Xms2048m -XX:MinHeapFreeRatio=50 -XX:NewSize=1024m
  -XX:NewRatio=2 -Dsun.rmi.dgc.client.gcInterval=360
  -Dsun.rmi.dgc.server.gcInterval=360
 
  I hope it helps.
 
  Regards,
  Daniel Alheiros
 
  -Original Message-
  From: Fuad Efendi [mailto:[EMAIL PROTECTED]
  Sent: 22 July 2008 23:23
  To: solr-user@lucene.apache.org
  Subject: RE: Out of memory on Solr sorting
 
  Yes, it is a cache, it stores sorted by sorted field array of 
  Document IDs together with sorted fields; query results can 
  intersect with it and reorder accordingly

RE: Out of memory on Solr sorting

2008-07-23 Thread Daniel Alheiros
Hi

I haven't read the whole thread so I will take my chances here.

I've been fighting recently to keep my Solr instances stable because
they were frequently crashing with OutOfMemoryErrors. I'm using Solr 1.2
and when it happens there is a bug that makes the index locked unless
you restart Solr... So in my cenario it was extremelly damaging.

After some profiling I realized that my major problem was caused by the
way the JVM heap was being used as I haven't configured it to run using
any advanced configuration (I had just made it bigger - Xmx and Xms 1.5
Gb), it's running on Sun JVM 1.5 (the most recent 1.5 available) and
it's deployed on a Jboss 4.2 on a RHEL. 

So my findings were too many objects were being allocated on the old
generation area of the heap, which makes them harder to be disposed, and
also the default behaviour was letting the heap get too filled up before
kicking a GC and according to the JVM specs the default is if after a
short period when a full gc is executed if a certain percentage of the
heap is not freed an OutOfMemoryError should be thrown.

I've changed my JVM startup params and it's working extremelly stable
since then:

-Xmx2048m -Xms2048m -XX:MinHeapFreeRatio=50 -XX:NewSize=1024m
-XX:NewRatio=2 -Dsun.rmi.dgc.client.gcInterval=360
-Dsun.rmi.dgc.server.gcInterval=360

I hope it helps.

Regards,
Daniel Alheiros

-Original Message-
From: Fuad Efendi [mailto:[EMAIL PROTECTED] 
Sent: 22 July 2008 23:23
To: solr-user@lucene.apache.org
Subject: RE: Out of memory on Solr sorting

Yes, it is a cache, it stores sorted by sorted field array of
Document IDs together with sorted fields; query results can intersect
with it and reorder accordingly.

But memory requirements should be well documented.

It uses internally WeakHashMap which is not good(!!!) - a lot of
underground warming ups of caches which SOLR is not aware of...  
Could be.

I think Lucene-SOLR developers should join this discussion:


/**
  * Expert: The default cache implementation, storing all values in
memory.
  * A WeakHashMap is used for storage.
  *
..

   // inherit javadocs
   public StringIndex getStringIndex(IndexReader reader, String field)
   throws IOException {
 return (StringIndex) stringsIndexCache.get(reader, field);
   }

   Cache stringsIndexCache = new Cache() {

 protected Object createValue(IndexReader reader, Object fieldKey)
 throws IOException {
   String field = ((String) fieldKey).intern();
   final int[] retArray = new int[reader.maxDoc()];
   String[] mterms = new String[reader.maxDoc()+1];
   TermDocs termDocs = reader.termDocs();
   TermEnum termEnum = reader.terms (new Term (field, ));






Quoting Fuad Efendi [EMAIL PROTECTED]:

 I am hoping [new StringIndex (retArray, mterms)] is called only once 
 per-sort-field and cached somewhere at Lucene;

 theoretically you need multiply number of documents on size of field 
 (supposing that field contains unique text); you need not tokenize 
 this field; you need not store TermVector.

 for 2 000 000 documents with simple untokenized text field such as 
 title of book (256 bytes) you need probably 512 000 000 bytes per 
 Searcher, and as Mark mentioned you should limit number of searchers 
 in SOLR.

 So that Xmx512M is definitely not enough even for simple cases...


 Quoting sundar shankar [EMAIL PROTECTED]:

 I haven't seen the source code before, But I don't know why the
 sorting isn't done after the fetch is done. Wouldn't that make it
 more faster. at least in case of field level sorting? I could be
 wrong too and the implementation might probably be better. But   
 don't  know why all of the fields have had to be loaded.





 Date: Tue, 22 Jul 2008 14:26:26 -0700 From: [EMAIL PROTECTED] To:
 solr-user@lucene.apache.org Subject: Re: Out of memory on Solr
 sorting   Ok, after some analysis of FieldCacheImpl:  - it is
   supposed that (sorted) Enumeration of terms is less than
 total  number of documents (so that SOLR uses specific field type  
 for  sorted searches:  solr.StrField with omitNorms=true) 
 It   creates int[reader.maxDoc()] array, checks (sorted)  
 Enumeration  of   terms (untokenized solr.StrField), and 
 populates array  with  document  Ids.   - it also creates array 
 of String  String[]  mterms = new String[reader.maxDoc()+1];  Why

 do we  need that? For  1G
 document with average term/StrField size  of  100 bytes (which   
 could be unique text!!!) it will create kind of   huge 100Gb cache

 which is not really needed... StringIndex  value = new StringIndex

 (retArray, mterms);  If I understand  correctly...
 StringIndex  _must_ be a file in a  filesystem for  such a case... 
 We create  StringIndex, and retrieve top  10  documents, huge 
 overhead.   
  Quoting Fuad Efendi [EMAIL PROTECTED]:Ok, what is
 confusing me is implicit guess that FieldCache contains  field  
   and Lucene uses in-memory sort

RE: Out of memory on Solr sorting

2008-07-23 Thread sundar shankar

Hi Daniel,
 I am afraid that didnt solve my problem. I was guessing my problem 
was that I have too much of data and too little memory allocated for that. I 
happened to read in couple of the posts which mentioned that I need VM that is 
close to the size of my data(folder). I have like 540 Megs now and a little 
more than a million and a half docs. Ideally in that case 512 megs should be 
enough for me. In fact I am able to perform all other operations now, commit, 
optmize, select, update, nightly cron jobs to index data again. etc etc with no 
hassles. Even my load tests perform very well. Just the sort and it doesnt seem 
to work. I allocated 2 gigs of memory now. Still same results. Used the GC 
params u gave me too. No change what so ever. Am not sure, whats going on. Is 
there something that I can do to find out how much is needed in actuality as my 
production server might need to be configured in accordance.

I dont store any documents. We basically fetch standard column data from oracle 
database store them into Solr fields. Before I had EdgeNGram configured and had 
Solr 1.2, My data size was less that half of what it is right now. I guess if I 
remember right, it was of the order of 100 megs. The max size of a field right 
now might not cross a 100 chars too. Quizzled even more now. 

-Sundar

P.S: My configurations : 
Solr 1.3 
Red hat 
540 megs of data (1855013 docs)
2 gigs of memory installed and allocated like this
JAVA_OPTS=$JAVA_OPTS -Xms2048m -Xmx2048m -XX:MinHeapFreeRatio=50 
-XX:NewSize=1024m -XX:NewRatio=2 -Dsun.rmi.dgc.client.gcInterval=360 
-Dsun.rmi.dgc.server.gcInterval=360

Jboss 4.05


 Subject: RE: Out of memory on Solr sorting
 Date: Wed, 23 Jul 2008 10:49:06 +0100
 From: [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 
 Hi
 
 I haven't read the whole thread so I will take my chances here.
 
 I've been fighting recently to keep my Solr instances stable because
 they were frequently crashing with OutOfMemoryErrors. I'm using Solr 1.2
 and when it happens there is a bug that makes the index locked unless
 you restart Solr... So in my cenario it was extremelly damaging.
 
 After some profiling I realized that my major problem was caused by the
 way the JVM heap was being used as I haven't configured it to run using
 any advanced configuration (I had just made it bigger - Xmx and Xms 1.5
 Gb), it's running on Sun JVM 1.5 (the most recent 1.5 available) and
 it's deployed on a Jboss 4.2 on a RHEL. 
 
 So my findings were too many objects were being allocated on the old
 generation area of the heap, which makes them harder to be disposed, and
 also the default behaviour was letting the heap get too filled up before
 kicking a GC and according to the JVM specs the default is if after a
 short period when a full gc is executed if a certain percentage of the
 heap is not freed an OutOfMemoryError should be thrown.
 
 I've changed my JVM startup params and it's working extremelly stable
 since then:
 
 -Xmx2048m -Xms2048m -XX:MinHeapFreeRatio=50 -XX:NewSize=1024m
 -XX:NewRatio=2 -Dsun.rmi.dgc.client.gcInterval=360
 -Dsun.rmi.dgc.server.gcInterval=360
 
 I hope it helps.
 
 Regards,
 Daniel Alheiros
 
 -Original Message-
 From: Fuad Efendi [mailto:[EMAIL PROTECTED] 
 Sent: 22 July 2008 23:23
 To: solr-user@lucene.apache.org
 Subject: RE: Out of memory on Solr sorting
 
 Yes, it is a cache, it stores sorted by sorted field array of
 Document IDs together with sorted fields; query results can intersect
 with it and reorder accordingly.
 
 But memory requirements should be well documented.
 
 It uses internally WeakHashMap which is not good(!!!) - a lot of
 underground warming ups of caches which SOLR is not aware of...  
 Could be.
 
 I think Lucene-SOLR developers should join this discussion:
 
 
 /**
   * Expert: The default cache implementation, storing all values in
 memory.
   * A WeakHashMap is used for storage.
   *
 ..
 
// inherit javadocs
public StringIndex getStringIndex(IndexReader reader, String field)
throws IOException {
  return (StringIndex) stringsIndexCache.get(reader, field);
}
 
Cache stringsIndexCache = new Cache() {
 
  protected Object createValue(IndexReader reader, Object fieldKey)
  throws IOException {
String field = ((String) fieldKey).intern();
final int[] retArray = new int[reader.maxDoc()];
String[] mterms = new String[reader.maxDoc()+1];
TermDocs termDocs = reader.termDocs();
TermEnum termEnum = reader.terms (new Term (field, ));
 
 
 
 
 
 
 Quoting Fuad Efendi [EMAIL PROTECTED]:
 
  I am hoping [new StringIndex (retArray, mterms)] is called only once 
  per-sort-field and cached somewhere at Lucene;
 
  theoretically you need multiply number of documents on size of field 
  (supposing that field contains unique text); you need not tokenize 
  this field; you need not store TermVector.
 
  for 2 000 000

RE: Out of memory on Solr sorting

2008-07-22 Thread sundar shankar



 From: [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Subject: Out of memory on Solr sorting
 Date: Tue, 22 Jul 2008 19:11:02 +
 
 
 Hi,
 Sorry again fellos. I am not sure whats happening. The day with solr is bad 
 for me I guess. EZMLM didnt let me send any mails this morning. Asked me to 
 confirm subscription and when I did, it said I was already a member. Now my 
 mails are all coming out bad. Sorry for troubling y'all this bad. I hope this 
 mail comes out right.


Hi,
We are developing a product in a agile manner and the current 
implementation has a data of size just about a 800 megs in dev. 
The memory allocated to solr on dev (Dual core Linux box) is 128-512.

My config
=

   !-- autocommit pending docs if certain criteria are met
autoCommit
  maxDocs1/maxDocs
  maxTime1000/maxTime
/autoCommit
--

filterCache
  class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=256/

queryResultCache
  class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=256/

documentCache
  class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=0/

enableLazyFieldLoadingtrue/enableLazyFieldLoading


My Field
===

fieldType name=autocomplete class=solr.TextField   
analyzer type=index   
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PatternReplaceFilterFactory 
pattern=([^a-z0-9]) replacement= replace=all /
filter class=solr.EdgeNGramFilterFactory 
maxGramSize=100 minGramSize=1 /  
/analyzer
analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PatternReplaceFilterFactory 
pattern=([^a-z0-9]) replacement= replace=all /
filter class=solr.PatternReplaceFilterFactory 
pattern=^(.{20})(.*)? replacement=$1 replace=all /
/analyzer
/fieldType


Problem
==

I execute a query that returns 24 rows of result. I pick 10 out of it. I have 
no problem when I execute this.
But When I do sort it by a String field that is fetched from this result. I get 
an OOM. I am able to execute several
other queries with no problem. Just having a sort asc clause added to the query 
throws an OOM. Why is that.
What should I have ideally done. My config on QA is pretty similar to the dev 
box and probably has more data than on dev. 
It didnt throw any OOM during the integration test. The Autocomplete is a new 
field we added recently.

Another point is that the indexing is done with a field of type string
 field name=XXX type=string indexed=true stored=true 
termVectors=true/

and the autocomplete field is a copy field.

The sorting is done based on string field.

Please do lemme know what mistake am I doing?

Regards
Sundar

P.S: The stack trace of the exception is


Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing 
query
 at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:86)
 at 
org.apache.solr.client.solrj.impl.BaseSolrServer.query(BaseSolrServer.java:101)
 at 
com.apollo.sisaw.solr.service.AbstractSolrSearchService.makeSolrQuery(AbstractSolrSearchService.java:193)
 ... 105 more
Caused by: org.apache.solr.common.SolrException: Java heap space  
java.lang.OutOfMemoryError: Java heap space 
at 
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403) 
 
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)  
at 
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:352) 
 
at 
org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:416)
  
at 
org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:207)
  
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)  
at 
org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168)
  
at 
org.apache.lucene.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:56)
  
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:907)
  
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838)
  
at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)  
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
  
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:156)
  
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:128)
  
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1025)  
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) 
 
at 

  1   2   >