Re: Re: solr 4.2.1 index gets slower over time

2014-04-02 Thread elisabeth benoit
This sounds interesting, I'll check this out.

Thanks!
Elisabeth


2014-04-02 8:54 GMT+02:00 Dmitry Kan :

> Thanks, Markus, that is useful.
> I'm guessing the higher the weight, the longer the op takes?
>
>
> On Tue, Apr 1, 2014 at 10:39 PM, Markus Jelsma
> wrote:
>
> > You may want to increase reclaimdeletesweight for tieredmergepolicy from
> 2
> > to 3 or 4. By default it may keep too much deleted or updated docs in the
> > index. This can increase index size by 50%!! Dmitry Kan <
> > solrexp...@gmail.com> schreef:Elisabeth,
> >
> > Yes, I believe you are right in that the deletes are part of the optimize
> > process. If you delete often, you may consider (if not already) the
> > TieredMergePolicy, which is suited for this scenario. Check out this
> > relevant discussion I had with Lucene committers:
> > https://twitter.com/DmitryKan/status/399820408444051456
> >
> > HTH,
> >
> > Dmitry
> >
> >
> > On Tue, Apr 1, 2014 at 11:34 AM, elisabeth benoit <
> > elisaelisael...@gmail.com
> > > wrote:
> >
> > > Thanks a lot for your answers!
> > >
> > > Shawn. Our GC configuration has far less parameters defined, so we'll
> > check
> > > this out.
> > >
> > > Dimitry, about the expungeDeletes option, we'll add that in the delete
> > > process. But from what I read, this is done in the optimize process
> (cf.
> > >
> > >
> >
> http://lucene.472066.n3.nabble.com/Does-expungeDeletes-need-calling-during-an-optimize-td1214083.html
> > > ).
> > > Or maybe not?
> > >
> > > Thanks again,
> > > Elisabeth
> > >
> > >
> > > 2014-04-01 7:52 GMT+02:00 Dmitry Kan :
> > >
> > > > Hi,
> > > >
> > > > We have noticed something like this as well, but with older versions
> of
> > > > solr, 3.4. In our setup we delete documents pretty often. Internally
> in
> > > > Lucene, when a document is client requested to be deleted, it is not
> > > > physically deleted, but only marked as "deleted". Our original
> > > optimization
> > > > assumption was such that the "deleted" documents would get physically
> > > > removed on each optimize command issued. We started to suspect it
> > wasn't
> > > > always true as the shards (especially relatively large shards) became
> > > > slower over time. So we found out about the expungeDeletes option,
> > which
> > > > purges the "deleted" docs and is by default false. We have set it to
> > > true.
> > > > If your solr update lifecycle includes frequent deletes, try this
> out.
> > > >
> > > > This of course does not override working towards finding better
> > > > GCparameters.
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
> > > >
> > > >
> > > > On Mon, Mar 31, 2014 at 3:57 PM, elisabeth benoit <
> > > > elisaelisael...@gmail.com
> > > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > We are currently using solr 4.2.1. Our index is updated on a daily
> > > basis.
> > > > > After noticing solr query time has increased (two times the initial
> > > size)
> > > > > without any change in index size or in solr configuration, we tried
> > an
> > > > > optimize on the index but it didn't fix our problem. We checked the
> > > > garbage
> > > > > collector, but everything seemed fine. What did in fact fix our
> > problem
> > > > was
> > > > > to delete all documents and reindex from scratch.
> > > > >
> > > > > It looks like over time our index gets "corrupted" and optimize
> > doesn't
> > > > fix
> > > > > it. Does anyone have a clue how to investigate further this
> > situation?
> > > > >
> > > > >
> > > > > Elisabeth
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Dmitry
> > > > Blog: http://dmitrykan.blogspot.com
> > > > Twitter: http://twitter.com/dmitrykan
> > > >
> > >
> >
> >
> >
> > --
> > Dmitry
> > Blog: http://dmitrykan.blogspot.com
> > Twitter: http://twitter.com/dmitrykan
> >
>
>
>
> --
> Dmitry
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
>


Re: Re: solr 4.2.1 index gets slower over time

2014-04-01 Thread Dmitry Kan
Thanks, Markus, that is useful.
I'm guessing the higher the weight, the longer the op takes?


On Tue, Apr 1, 2014 at 10:39 PM, Markus Jelsma
wrote:

> You may want to increase reclaimdeletesweight for tieredmergepolicy from 2
> to 3 or 4. By default it may keep too much deleted or updated docs in the
> index. This can increase index size by 50%!! Dmitry Kan <
> solrexp...@gmail.com> schreef:Elisabeth,
>
> Yes, I believe you are right in that the deletes are part of the optimize
> process. If you delete often, you may consider (if not already) the
> TieredMergePolicy, which is suited for this scenario. Check out this
> relevant discussion I had with Lucene committers:
> https://twitter.com/DmitryKan/status/399820408444051456
>
> HTH,
>
> Dmitry
>
>
> On Tue, Apr 1, 2014 at 11:34 AM, elisabeth benoit <
> elisaelisael...@gmail.com
> > wrote:
>
> > Thanks a lot for your answers!
> >
> > Shawn. Our GC configuration has far less parameters defined, so we'll
> check
> > this out.
> >
> > Dimitry, about the expungeDeletes option, we'll add that in the delete
> > process. But from what I read, this is done in the optimize process (cf.
> >
> >
> http://lucene.472066.n3.nabble.com/Does-expungeDeletes-need-calling-during-an-optimize-td1214083.html
> > ).
> > Or maybe not?
> >
> > Thanks again,
> > Elisabeth
> >
> >
> > 2014-04-01 7:52 GMT+02:00 Dmitry Kan :
> >
> > > Hi,
> > >
> > > We have noticed something like this as well, but with older versions of
> > > solr, 3.4. In our setup we delete documents pretty often. Internally in
> > > Lucene, when a document is client requested to be deleted, it is not
> > > physically deleted, but only marked as "deleted". Our original
> > optimization
> > > assumption was such that the "deleted" documents would get physically
> > > removed on each optimize command issued. We started to suspect it
> wasn't
> > > always true as the shards (especially relatively large shards) became
> > > slower over time. So we found out about the expungeDeletes option,
> which
> > > purges the "deleted" docs and is by default false. We have set it to
> > true.
> > > If your solr update lifecycle includes frequent deletes, try this out.
> > >
> > > This of course does not override working towards finding better
> > > GCparameters.
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
> > >
> > >
> > > On Mon, Mar 31, 2014 at 3:57 PM, elisabeth benoit <
> > > elisaelisael...@gmail.com
> > > > wrote:
> > >
> > > > Hello,
> > > >
> > > > We are currently using solr 4.2.1. Our index is updated on a daily
> > basis.
> > > > After noticing solr query time has increased (two times the initial
> > size)
> > > > without any change in index size or in solr configuration, we tried
> an
> > > > optimize on the index but it didn't fix our problem. We checked the
> > > garbage
> > > > collector, but everything seemed fine. What did in fact fix our
> problem
> > > was
> > > > to delete all documents and reindex from scratch.
> > > >
> > > > It looks like over time our index gets "corrupted" and optimize
> doesn't
> > > fix
> > > > it. Does anyone have a clue how to investigate further this
> situation?
> > > >
> > > >
> > > > Elisabeth
> > > >
> > >
> > >
> > >
> > > --
> > > Dmitry
> > > Blog: http://dmitrykan.blogspot.com
> > > Twitter: http://twitter.com/dmitrykan
> > >
> >
>
>
>
> --
> Dmitry
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
>



-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan


Re: Re: solr 4.2.1 index gets slower over time

2014-04-01 Thread Markus Jelsma
You may want to increase reclaimdeletesweight for tieredmergepolicy from 2 to 3 
or 4. By default it may keep too much deleted or updated docs in the index. 
This can increase index size by 50%!! Dmitry Kan  
schreef:Elisabeth,

Yes, I believe you are right in that the deletes are part of the optimize
process. If you delete often, you may consider (if not already) the
TieredMergePolicy, which is suited for this scenario. Check out this
relevant discussion I had with Lucene committers:
https://twitter.com/DmitryKan/status/399820408444051456

HTH,

Dmitry


On Tue, Apr 1, 2014 at 11:34 AM, elisabeth benoit  wrote:

> Thanks a lot for your answers!
>
> Shawn. Our GC configuration has far less parameters defined, so we'll check
> this out.
>
> Dimitry, about the expungeDeletes option, we'll add that in the delete
> process. But from what I read, this is done in the optimize process (cf.
>
> http://lucene.472066.n3.nabble.com/Does-expungeDeletes-need-calling-during-an-optimize-td1214083.html
> ).
> Or maybe not?
>
> Thanks again,
> Elisabeth
>
>
> 2014-04-01 7:52 GMT+02:00 Dmitry Kan :
>
> > Hi,
> >
> > We have noticed something like this as well, but with older versions of
> > solr, 3.4. In our setup we delete documents pretty often. Internally in
> > Lucene, when a document is client requested to be deleted, it is not
> > physically deleted, but only marked as "deleted". Our original
> optimization
> > assumption was such that the "deleted" documents would get physically
> > removed on each optimize command issued. We started to suspect it wasn't
> > always true as the shards (especially relatively large shards) became
> > slower over time. So we found out about the expungeDeletes option, which
> > purges the "deleted" docs and is by default false. We have set it to
> true.
> > If your solr update lifecycle includes frequent deletes, try this out.
> >
> > This of course does not override working towards finding better
> > GCparameters.
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
> >
> >
> > On Mon, Mar 31, 2014 at 3:57 PM, elisabeth benoit <
> > elisaelisael...@gmail.com
> > > wrote:
> >
> > > Hello,
> > >
> > > We are currently using solr 4.2.1. Our index is updated on a daily
> basis.
> > > After noticing solr query time has increased (two times the initial
> size)
> > > without any change in index size or in solr configuration, we tried an
> > > optimize on the index but it didn't fix our problem. We checked the
> > garbage
> > > collector, but everything seemed fine. What did in fact fix our problem
> > was
> > > to delete all documents and reindex from scratch.
> > >
> > > It looks like over time our index gets "corrupted" and optimize doesn't
> > fix
> > > it. Does anyone have a clue how to investigate further this situation?
> > >
> > >
> > > Elisabeth
> > >
> >
> >
> >
> > --
> > Dmitry
> > Blog: http://dmitrykan.blogspot.com
> > Twitter: http://twitter.com/dmitrykan
> >
>



-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan


Re: solr 4.2.1 index gets slower over time

2014-04-01 Thread Dmitry Kan
Elisabeth,

Yes, I believe you are right in that the deletes are part of the optimize
process. If you delete often, you may consider (if not already) the
TieredMergePolicy, which is suited for this scenario. Check out this
relevant discussion I had with Lucene committers:
https://twitter.com/DmitryKan/status/399820408444051456

HTH,

Dmitry


On Tue, Apr 1, 2014 at 11:34 AM, elisabeth benoit  wrote:

> Thanks a lot for your answers!
>
> Shawn. Our GC configuration has far less parameters defined, so we'll check
> this out.
>
> Dimitry, about the expungeDeletes option, we'll add that in the delete
> process. But from what I read, this is done in the optimize process (cf.
>
> http://lucene.472066.n3.nabble.com/Does-expungeDeletes-need-calling-during-an-optimize-td1214083.html
> ).
> Or maybe not?
>
> Thanks again,
> Elisabeth
>
>
> 2014-04-01 7:52 GMT+02:00 Dmitry Kan :
>
> > Hi,
> >
> > We have noticed something like this as well, but with older versions of
> > solr, 3.4. In our setup we delete documents pretty often. Internally in
> > Lucene, when a document is client requested to be deleted, it is not
> > physically deleted, but only marked as "deleted". Our original
> optimization
> > assumption was such that the "deleted" documents would get physically
> > removed on each optimize command issued. We started to suspect it wasn't
> > always true as the shards (especially relatively large shards) became
> > slower over time. So we found out about the expungeDeletes option, which
> > purges the "deleted" docs and is by default false. We have set it to
> true.
> > If your solr update lifecycle includes frequent deletes, try this out.
> >
> > This of course does not override working towards finding better
> > GCparameters.
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
> >
> >
> > On Mon, Mar 31, 2014 at 3:57 PM, elisabeth benoit <
> > elisaelisael...@gmail.com
> > > wrote:
> >
> > > Hello,
> > >
> > > We are currently using solr 4.2.1. Our index is updated on a daily
> basis.
> > > After noticing solr query time has increased (two times the initial
> size)
> > > without any change in index size or in solr configuration, we tried an
> > > optimize on the index but it didn't fix our problem. We checked the
> > garbage
> > > collector, but everything seemed fine. What did in fact fix our problem
> > was
> > > to delete all documents and reindex from scratch.
> > >
> > > It looks like over time our index gets "corrupted" and optimize doesn't
> > fix
> > > it. Does anyone have a clue how to investigate further this situation?
> > >
> > >
> > > Elisabeth
> > >
> >
> >
> >
> > --
> > Dmitry
> > Blog: http://dmitrykan.blogspot.com
> > Twitter: http://twitter.com/dmitrykan
> >
>



-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan


Re: solr 4.2.1 index gets slower over time

2014-04-01 Thread elisabeth benoit
Thanks a lot for your answers!

Shawn. Our GC configuration has far less parameters defined, so we'll check
this out.

Dimitry, about the expungeDeletes option, we'll add that in the delete
process. But from what I read, this is done in the optimize process (cf.
http://lucene.472066.n3.nabble.com/Does-expungeDeletes-need-calling-during-an-optimize-td1214083.html).
Or maybe not?

Thanks again,
Elisabeth


2014-04-01 7:52 GMT+02:00 Dmitry Kan :

> Hi,
>
> We have noticed something like this as well, but with older versions of
> solr, 3.4. In our setup we delete documents pretty often. Internally in
> Lucene, when a document is client requested to be deleted, it is not
> physically deleted, but only marked as "deleted". Our original optimization
> assumption was such that the "deleted" documents would get physically
> removed on each optimize command issued. We started to suspect it wasn't
> always true as the shards (especially relatively large shards) became
> slower over time. So we found out about the expungeDeletes option, which
> purges the "deleted" docs and is by default false. We have set it to true.
> If your solr update lifecycle includes frequent deletes, try this out.
>
> This of course does not override working towards finding better
> GCparameters.
>
> https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
>
>
> On Mon, Mar 31, 2014 at 3:57 PM, elisabeth benoit <
> elisaelisael...@gmail.com
> > wrote:
>
> > Hello,
> >
> > We are currently using solr 4.2.1. Our index is updated on a daily basis.
> > After noticing solr query time has increased (two times the initial size)
> > without any change in index size or in solr configuration, we tried an
> > optimize on the index but it didn't fix our problem. We checked the
> garbage
> > collector, but everything seemed fine. What did in fact fix our problem
> was
> > to delete all documents and reindex from scratch.
> >
> > It looks like over time our index gets "corrupted" and optimize doesn't
> fix
> > it. Does anyone have a clue how to investigate further this situation?
> >
> >
> > Elisabeth
> >
>
>
>
> --
> Dmitry
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
>


Re: solr 4.2.1 index gets slower over time

2014-03-31 Thread Dmitry Kan
Hi,

We have noticed something like this as well, but with older versions of
solr, 3.4. In our setup we delete documents pretty often. Internally in
Lucene, when a document is client requested to be deleted, it is not
physically deleted, but only marked as "deleted". Our original optimization
assumption was such that the "deleted" documents would get physically
removed on each optimize command issued. We started to suspect it wasn't
always true as the shards (especially relatively large shards) became
slower over time. So we found out about the expungeDeletes option, which
purges the "deleted" docs and is by default false. We have set it to true.
If your solr update lifecycle includes frequent deletes, try this out.

This of course does not override working towards finding better
GCparameters.

https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching


On Mon, Mar 31, 2014 at 3:57 PM, elisabeth benoit  wrote:

> Hello,
>
> We are currently using solr 4.2.1. Our index is updated on a daily basis.
> After noticing solr query time has increased (two times the initial size)
> without any change in index size or in solr configuration, we tried an
> optimize on the index but it didn't fix our problem. We checked the garbage
> collector, but everything seemed fine. What did in fact fix our problem was
> to delete all documents and reindex from scratch.
>
> It looks like over time our index gets "corrupted" and optimize doesn't fix
> it. Does anyone have a clue how to investigate further this situation?
>
>
> Elisabeth
>



-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan


Re: solr 4.2.1 index gets slower over time

2014-03-31 Thread Shawn Heisey

On 3/31/2014 9:03 AM, elisabeth benoit wrote:

We use JVisualVM. The CPU usage is very high (90%), but the GC activity
shows less than 0.01% average activity. Plus the heap usage stays low
(below 4G while the max heap size is 16G).

Do you have a different tool to suggest to check the GC? Do you think there
is something else me might not see?


You can't get actual usable GC pause information from jvisualvm or 
jconsole, only totals and averages.  Those tools seem to be geared more 
towards seeing problems when your heap is too small.


To see real pause information, you can turn on GC logging and then run 
the log through a tool like GCLogViewer to see a graph of your 
collection pauses.  What I used to initially see the problem was a 
program called jHiccup, which will show *ANY* pause, not just those 
caused by garbage collection.  GC is almost always the reason there is a 
pause, though.


http://www.azulsystems.com/jHiccup
https://code.google.com/p/gclogviewer/

You can still have long GC pauses even if your max heap isn't reached.

Have you provided any GC-related options to your JVM at all?  With a 
heap size of 4GB and a max heap of 16GB, I can absolutely guarantee that 
you will have pause problems unless you provide the JVM with a number of 
tuning options.  I was frequently having pauses as high as 10 to 12 
seconds on an 8GB heap, even after I switched to CMS.  Further tuning 
was required.  These options made the situation a lot better, but I 
think they can probably be improved:


http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

More expanded info, which references the link above:

http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems

Thanks,
Shawn



Re: solr 4.2.1 index gets slower over time

2014-03-31 Thread elisabeth benoit
Hello,

Thanks for your answer.

We use JVisualVM. The CPU usage is very high (90%), but the GC activity
shows less than 0.01% average activity. Plus the heap usage stays low
(below 4G while the max heap size is 16G).

Do you have a different tool to suggest to check the GC? Do you think there
is something else me might not see?

Thanks again,
Elisabeth


2014-03-31 16:26 GMT+02:00 Shawn Heisey :

> On 3/31/2014 6:57 AM, elisabeth benoit wrote:
> > We are currently using solr 4.2.1. Our index is updated on a daily basis.
> > After noticing solr query time has increased (two times the initial size)
> > without any change in index size or in solr configuration, we tried an
> > optimize on the index but it didn't fix our problem. We checked the
> garbage
> > collector, but everything seemed fine. What did in fact fix our problem
> was
> > to delete all documents and reindex from scratch.
> >
> > It looks like over time our index gets "corrupted" and optimize doesn't
> fix
> > it. Does anyone have a clue how to investigate further this situation?
>
> That seems very odd.  I have one production copy of my index using
> 4.2.1, and it has been working fine for quite a long time.  We are
> transitioning to Solr 4.6.1 now, so the other copy is running that
> version.  We do occasionally do a full rebuild, but that is for index
> content, not for any problems.
>
> When you say you checked your garbage collector, what tools did you use?
>  I was having GC pause problems, but I didn't know it until I started
> using different tools.
>
> Thanks,
> Shawn
>
>


Re: solr 4.2.1 index gets slower over time

2014-03-31 Thread Shawn Heisey
On 3/31/2014 6:57 AM, elisabeth benoit wrote:
> We are currently using solr 4.2.1. Our index is updated on a daily basis.
> After noticing solr query time has increased (two times the initial size)
> without any change in index size or in solr configuration, we tried an
> optimize on the index but it didn't fix our problem. We checked the garbage
> collector, but everything seemed fine. What did in fact fix our problem was
> to delete all documents and reindex from scratch.
> 
> It looks like over time our index gets "corrupted" and optimize doesn't fix
> it. Does anyone have a clue how to investigate further this situation?

That seems very odd.  I have one production copy of my index using
4.2.1, and it has been working fine for quite a long time.  We are
transitioning to Solr 4.6.1 now, so the other copy is running that
version.  We do occasionally do a full rebuild, but that is for index
content, not for any problems.

When you say you checked your garbage collector, what tools did you use?
 I was having GC pause problems, but I didn't know it until I started
using different tools.

Thanks,
Shawn