Re: Solr using all available CPU and becoming unresponsive

2021-01-12 Thread Charlie Hull

Hi Jeremy,

You might find our recent blog on Debugging Solr Performance Issues 
useful 
https://opensourceconnections.com/blog/2021/01/05/a-solr-performance-debugging-toolkit/ 
- also check out Savan Das' blog which is linked within.


Best

Charlie

On 12/01/2021 14:53, Michael Gibney wrote:

Ahh ok. If those are your only fieldType definitions, and most of your
config is copied from the default, then SOLR-13336 is unlikely to be the
culprit. Looking at more general options, off the top of my head:
1. make sure you haven't allocated all physical memory to heap (leave a
decent amount for OS page cache)
2. disable swap, if you can (this is esp. important if using network
storage as swap). There are potential downsides to this (so proceed with
caution); but if part of your heap gets swapped out (and it almost
certainly will, with a sufficiently large heap) full GCs lead to a swap
storm that compounds the problem. (fwiw, this is probably the first thing
I'd recommend looking into and trying, because it's so easy, and can in
some cases yield a dramatic improvement. N.b., I'm talking about `swapoff
-a`, not `sysctl -w vm.swappiness=0` -- I find that the latter does *not*
eliminate swapping in the way that's needed to achieve the desired goal in
this case. Again, exercise caution in doing this, discuss, research, etc.).
Related documentation was added in 8.5, but absolutely applies to 7.3.1 as
well:
https://lucene.apache.org/solr/guide/8_7/taking-solr-to-production.html#avoid-swapping-nix-operating-systems
-- the note there about "lowering swappiness" being an acceptable
alternative contradicts my experience, but I suppose ymmv?
3. if you're faceting on fields -- especially high-cardinality fields (many
values) -- make sure that you have `docValues=true, uninvertible=false`
configured (to ensure that you're not building large on-heap data
structures when there's an alternative that doesn't require it.

These are all recommendations that are explained in more detail by others
elsewhere; I think they should all apply to 7.3.1; fwiw, I would recommend
upgrading if you have the (human) bandwidth to do so. Good luck!

Michael

On Tue, Jan 12, 2021 at 8:39 AM Jeremy Smith  wrote:


Thanks Michael,
  SOLR-13336 seems intriguing.  I'm not a solr expert, but I believe
these are the relevant sections from our schema definition:

 
   
 
 
   
   
 
 
   
 
 
   
 
 
 
   
   
 
 
 
 
   
 

Our other fieldTypes don't have any analyzers attached to them.


If SOLR-13336 is the cause of the issue is the best remedy to upgrade to
solr 8?  It doesn't look like the fix was back patched to 7.x.

Our schema has some issues arising from not fully understanding Solr and
just copying existing structures from the defaults.  In this case,
stopwords.txt is completely empty and synonyms.txt is just the default
synonyms.txt, which seems not useful at all for us.  Could I just take out
the StopFilterFactory and SynonymGraphFilterFactory from the query section
(and maybe the StopFilterFactory from the index section as well)?

Thanks again,
Jeremy


From: Michael Gibney 
Sent: Monday, January 11, 2021 8:30 PM
To: solr-user@lucene.apache.org 
Subject: Re: Solr using all available CPU and becoming unresponsive

Hi Jeremy,
Can you share your analysis chain configs? (SOLR-13336 can manifest in a
similar way, and would affect 7.3.1 with a susceptible config, given the
right (wrong?) input ...)
Michael

On Mon, Jan 11, 2021 at 5:27 PM Jeremy Smith  wrote:


Hello all,
  We have been struggling with an issue where solr will intermittently
use all available CPU and become unresponsive.  It will remain in this
state until we restart.  Solr will remain stable for some time, usually a
few hours to a few days, before this happens again.  We've tried

adjusting

the caches and adding memory to both the VM and JVM, but we haven't been
able to solve the issue yet.

Here is some info about our server:
Solr:
   Solr 7.3.1, running on Java 1.8
   Running in cloud mode, but there's only one core

Host:
   CentOS7
   8 CPU, 56GB RAM
   The only other processes running on this VM are two zookeepers, one for
this Solr instance, one for another Solr instance

Solr Config:
  - One Core
  - 36 Million documents (Max Doc), 28 million (Num Docs)
  - ~15GB
  - 10-20 Requests/second
  - The schema is fairly large (~100 fields) and we allow faceting and
searching on many, but not all, of the fields
  - Data are imported once per minute through the DataImportHandler, with

a

hard commit at the end.  We usually index ~100-500 documents per minute,
with many of these being updates to existing documents.

Cache settings:
 

 

 

For the filterCache, we have tried sizes as low as 128, which cause

Re: Solr using all available CPU and becoming unresponsive

2021-01-12 Thread Michael Gibney
Ahh ok. If those are your only fieldType definitions, and most of your
config is copied from the default, then SOLR-13336 is unlikely to be the
culprit. Looking at more general options, off the top of my head:
1. make sure you haven't allocated all physical memory to heap (leave a
decent amount for OS page cache)
2. disable swap, if you can (this is esp. important if using network
storage as swap). There are potential downsides to this (so proceed with
caution); but if part of your heap gets swapped out (and it almost
certainly will, with a sufficiently large heap) full GCs lead to a swap
storm that compounds the problem. (fwiw, this is probably the first thing
I'd recommend looking into and trying, because it's so easy, and can in
some cases yield a dramatic improvement. N.b., I'm talking about `swapoff
-a`, not `sysctl -w vm.swappiness=0` -- I find that the latter does *not*
eliminate swapping in the way that's needed to achieve the desired goal in
this case. Again, exercise caution in doing this, discuss, research, etc.).
Related documentation was added in 8.5, but absolutely applies to 7.3.1 as
well:
https://lucene.apache.org/solr/guide/8_7/taking-solr-to-production.html#avoid-swapping-nix-operating-systems
-- the note there about "lowering swappiness" being an acceptable
alternative contradicts my experience, but I suppose ymmv?
3. if you're faceting on fields -- especially high-cardinality fields (many
values) -- make sure that you have `docValues=true, uninvertible=false`
configured (to ensure that you're not building large on-heap data
structures when there's an alternative that doesn't require it.

These are all recommendations that are explained in more detail by others
elsewhere; I think they should all apply to 7.3.1; fwiw, I would recommend
upgrading if you have the (human) bandwidth to do so. Good luck!

Michael

On Tue, Jan 12, 2021 at 8:39 AM Jeremy Smith  wrote:

> Thanks Michael,
>  SOLR-13336 seems intriguing.  I'm not a solr expert, but I believe
> these are the relevant sections from our schema definition:
>
>  positionIncrementGap="100">
>   
> 
> 
>   
>   
> 
> 
>   
> 
>  positionIncrementGap="100" multiValued="false">
>   
> 
>  words="stopwords.txt" />
> 
>   
>   
> 
>  words="stopwords.txt" />
>  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> 
>   
> 
>
> Our other fieldTypes don't have any analyzers attached to them.
>
>
> If SOLR-13336 is the cause of the issue is the best remedy to upgrade to
> solr 8?  It doesn't look like the fix was back patched to 7.x.
>
> Our schema has some issues arising from not fully understanding Solr and
> just copying existing structures from the defaults.  In this case,
> stopwords.txt is completely empty and synonyms.txt is just the default
> synonyms.txt, which seems not useful at all for us.  Could I just take out
> the StopFilterFactory and SynonymGraphFilterFactory from the query section
> (and maybe the StopFilterFactory from the index section as well)?
>
> Thanks again,
> Jeremy
>
> 
> From: Michael Gibney 
> Sent: Monday, January 11, 2021 8:30 PM
> To: solr-user@lucene.apache.org 
> Subject: Re: Solr using all available CPU and becoming unresponsive
>
> Hi Jeremy,
> Can you share your analysis chain configs? (SOLR-13336 can manifest in a
> similar way, and would affect 7.3.1 with a susceptible config, given the
> right (wrong?) input ...)
> Michael
>
> On Mon, Jan 11, 2021 at 5:27 PM Jeremy Smith  wrote:
>
> > Hello all,
> >  We have been struggling with an issue where solr will intermittently
> > use all available CPU and become unresponsive.  It will remain in this
> > state until we restart.  Solr will remain stable for some time, usually a
> > few hours to a few days, before this happens again.  We've tried
> adjusting
> > the caches and adding memory to both the VM and JVM, but we haven't been
> > able to solve the issue yet.
> >
> > Here is some info about our server:
> > Solr:
> >   Solr 7.3.1, running on Java 1.8
> >   Running in cloud mode, but there's only one core
> >
> > Host:
> >   CentOS7
> >   8 CPU, 56GB RAM
> >   The only other processes running on this VM are two zookeepers, one for
> > this Solr instance, one for another Solr instance
> >
> > Solr Config:
> >  - One Core
> >  - 36 Million documents (Max Doc), 28 million (Num D

Re: Solr using all available CPU and becoming unresponsive

2021-01-12 Thread Jeremy Smith
Thanks Michael,
 SOLR-13336 seems intriguing.  I'm not a solr expert, but I believe these 
are the relevant sections from our schema definition:


  


  
  


  


  



  
  




  


Our other fieldTypes don't have any analyzers attached to them.


If SOLR-13336 is the cause of the issue is the best remedy to upgrade to solr 
8?  It doesn't look like the fix was back patched to 7.x.

Our schema has some issues arising from not fully understanding Solr and just 
copying existing structures from the defaults.  In this case, stopwords.txt is 
completely empty and synonyms.txt is just the default synonyms.txt, which seems 
not useful at all for us.  Could I just take out the StopFilterFactory and 
SynonymGraphFilterFactory from the query section (and maybe the 
StopFilterFactory from the index section as well)?

Thanks again,
Jeremy


From: Michael Gibney 
Sent: Monday, January 11, 2021 8:30 PM
To: solr-user@lucene.apache.org 
Subject: Re: Solr using all available CPU and becoming unresponsive

Hi Jeremy,
Can you share your analysis chain configs? (SOLR-13336 can manifest in a
similar way, and would affect 7.3.1 with a susceptible config, given the
right (wrong?) input ...)
Michael

On Mon, Jan 11, 2021 at 5:27 PM Jeremy Smith  wrote:

> Hello all,
>  We have been struggling with an issue where solr will intermittently
> use all available CPU and become unresponsive.  It will remain in this
> state until we restart.  Solr will remain stable for some time, usually a
> few hours to a few days, before this happens again.  We've tried adjusting
> the caches and adding memory to both the VM and JVM, but we haven't been
> able to solve the issue yet.
>
> Here is some info about our server:
> Solr:
>   Solr 7.3.1, running on Java 1.8
>   Running in cloud mode, but there's only one core
>
> Host:
>   CentOS7
>   8 CPU, 56GB RAM
>   The only other processes running on this VM are two zookeepers, one for
> this Solr instance, one for another Solr instance
>
> Solr Config:
>  - One Core
>  - 36 Million documents (Max Doc), 28 million (Num Docs)
>  - ~15GB
>  - 10-20 Requests/second
>  - The schema is fairly large (~100 fields) and we allow faceting and
> searching on many, but not all, of the fields
>  - Data are imported once per minute through the DataImportHandler, with a
> hard commit at the end.  We usually index ~100-500 documents per minute,
> with many of these being updates to existing documents.
>
> Cache settings:
>   size="256"
>  initialSize="256"
>  autowarmCount="8"
>  showItems="64"/>
>
>size="256"
>   initialSize="256"
>   autowarmCount="0"/>
>
> size="1024"
>initialSize="1024"
>autowarmCount="0"/>
>
> For the filterCache, we have tried sizes as low as 128, which caused our
> CPU usage to go up and didn't solve our issue.  autowarmCount used to be
> much higher, but we have reduced it to try to address this issue.
>
>
> The behavior we see:
>
> Solr is normally using ~3-6GB of heap and we usually have ~20GB of free
> memory.  Occasionally, though, solr is not able to free up memory and the
> heap usage climbs.  Analyzing the GC logs shows a sharp incline of usage
> with the GC (the default CMS) working hard to free memory, but not
> accomplishing much.  Eventually, it fills up the heap, maxes out the CPUs,
> and never recovers.  We have tried to analyze the logs to see if there are
> particular queries causing issues or if there are network issues to
> zookeeper, but we haven't been able to find any patterns.  After the issues
> start, we often see session timeouts to zookeeper, but it doesn't appear​
> that they are the cause.
>
>
>
> Does anyone have any recommendations on things to try or metrics to look
> into or configuration issues I may be overlooking?
>
> Thanks,
> Jeremy
>
>


Re: Solr using all available CPU and becoming unresponsive

2021-01-11 Thread Michael Gibney
Hi Jeremy,
Can you share your analysis chain configs? (SOLR-13336 can manifest in a
similar way, and would affect 7.3.1 with a susceptible config, given the
right (wrong?) input ...)
Michael

On Mon, Jan 11, 2021 at 5:27 PM Jeremy Smith  wrote:

> Hello all,
>  We have been struggling with an issue where solr will intermittently
> use all available CPU and become unresponsive.  It will remain in this
> state until we restart.  Solr will remain stable for some time, usually a
> few hours to a few days, before this happens again.  We've tried adjusting
> the caches and adding memory to both the VM and JVM, but we haven't been
> able to solve the issue yet.
>
> Here is some info about our server:
> Solr:
>   Solr 7.3.1, running on Java 1.8
>   Running in cloud mode, but there's only one core
>
> Host:
>   CentOS7
>   8 CPU, 56GB RAM
>   The only other processes running on this VM are two zookeepers, one for
> this Solr instance, one for another Solr instance
>
> Solr Config:
>  - One Core
>  - 36 Million documents (Max Doc), 28 million (Num Docs)
>  - ~15GB
>  - 10-20 Requests/second
>  - The schema is fairly large (~100 fields) and we allow faceting and
> searching on many, but not all, of the fields
>  - Data are imported once per minute through the DataImportHandler, with a
> hard commit at the end.  We usually index ~100-500 documents per minute,
> with many of these being updates to existing documents.
>
> Cache settings:
>   size="256"
>  initialSize="256"
>  autowarmCount="8"
>  showItems="64"/>
>
>size="256"
>   initialSize="256"
>   autowarmCount="0"/>
>
> size="1024"
>initialSize="1024"
>autowarmCount="0"/>
>
> For the filterCache, we have tried sizes as low as 128, which caused our
> CPU usage to go up and didn't solve our issue.  autowarmCount used to be
> much higher, but we have reduced it to try to address this issue.
>
>
> The behavior we see:
>
> Solr is normally using ~3-6GB of heap and we usually have ~20GB of free
> memory.  Occasionally, though, solr is not able to free up memory and the
> heap usage climbs.  Analyzing the GC logs shows a sharp incline of usage
> with the GC (the default CMS) working hard to free memory, but not
> accomplishing much.  Eventually, it fills up the heap, maxes out the CPUs,
> and never recovers.  We have tried to analyze the logs to see if there are
> particular queries causing issues or if there are network issues to
> zookeeper, but we haven't been able to find any patterns.  After the issues
> start, we often see session timeouts to zookeeper, but it doesn't appear​
> that they are the cause.
>
>
>
> Does anyone have any recommendations on things to try or metrics to look
> into or configuration issues I may be overlooking?
>
> Thanks,
> Jeremy
>
>


Re: SOLR USING 100% percent CPU and not responding after a while

2014-01-28 Thread Otis Gospodnetic
Hi,

Show us more graphs.  Is the GC working hard?  Any of the JVM mem pools at
or near 100%?  SPM for Solr is your friend for long term
monitoring/alerting/trends, jconsole and visualvm for a quick look.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Jan 28, 2014 at 2:11 PM, heaven  wrote:

> I have the same problem, please look at the image:
> 
>
> And this is on idle. Index size is about 90Gb. Solr 4.4.0. Memory is not an
> issue, there's a lot. RAID 10 (15000RPM rapid hdd).
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SOLR-USING-100-percent-CPU-and-not-responding-after-a-while-tp4021359p4114026.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


RE: SOLR USING 100% percent CPU and not responding after a while

2014-01-28 Thread heaven
I have the same problem, please look at the image:
 

And this is on idle. Index size is about 90Gb. Solr 4.4.0. Memory is not an
issue, there's a lot. RAID 10 (15000RPM rapid hdd).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-USING-100-percent-CPU-and-not-responding-after-a-while-tp4021359p4114026.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SOLR USING 100% percent CPU and not responding after a while

2013-08-08 Thread nitin4php
Hi Biva,

Any luck on this?

Even we are facing same issue with exactly same configuration and setup.

Any inputs will help a lot.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-USING-100-percent-CPU-and-not-responding-after-a-while-tp4021359p4083234.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr - using fq parameter does not retrieve an answer

2013-08-06 Thread Mysurf Mail
Thanks.


On Mon, Aug 5, 2013 at 4:57 PM, Shawn Heisey  wrote:

> On 8/5/2013 2:35 AM, Mysurf Mail wrote:
> > When I query using
> >
> > http://localhost:8983/solr/vault/select?q=*:*
> >
> > I get reuslts including the following
> >
> > 
> >   ...
> >   ...
> >   7
> >   ...
> > 
> >
> > Now I try to get only that row so I add to my query fq=VersionNumber:7
> >
> > http://localhost:8983/solr/vault/select?q=*:*&fq=VersionNumber:7
> >
> > And I get nothing.
> > Any idea?
>
> Is the VersionNumber field indexed?  If it's not, you won't be able to
> search on it.
>
> If you change your schema so that the field has 'indexed="true", you'll
> have to reindex.
>
> http://wiki.apache.org/solr/HowToReindex
>
> When you are retrieving a single document, it's better to use the q
> parameter rather than the fq parameter.  Querying a single document will
> pollute the cache.  It's a lot better to pollute the queryResultCache
> than the filterCache.  The former is generally much larger than the
> latter and better able to deal with pollution.
>
> Thanks,
> Shawn
>
>


Re: solr - using fq parameter does not retrieve an answer

2013-08-05 Thread Shawn Heisey
On 8/5/2013 2:35 AM, Mysurf Mail wrote:
> When I query using
> 
> http://localhost:8983/solr/vault/select?q=*:*
> 
> I get reuslts including the following
> 
> 
>   ...
>   ...
>   7
>   ...
> 
> 
> Now I try to get only that row so I add to my query fq=VersionNumber:7
> 
> http://localhost:8983/solr/vault/select?q=*:*&fq=VersionNumber:7
> 
> And I get nothing.
> Any idea?

Is the VersionNumber field indexed?  If it's not, you won't be able to
search on it.

If you change your schema so that the field has 'indexed="true", you'll
have to reindex.

http://wiki.apache.org/solr/HowToReindex

When you are retrieving a single document, it's better to use the q
parameter rather than the fq parameter.  Querying a single document will
pollute the cache.  It's a lot better to pollute the queryResultCache
than the filterCache.  The former is generally much larger than the
latter and better able to deal with pollution.

Thanks,
Shawn



Re: solr - using fq parameter does not retrieve an answer

2013-08-05 Thread Jack Krupansky

Is VersionNumber an "indexed" field, or just "stored"?

-- Jack Krupansky

-Original Message- 
From: Mysurf Mail 
Sent: Monday, August 05, 2013 4:35 AM 
To: solr-user@lucene.apache.org 
Subject: solr - using fq parameter does not retrieve an answer 


When I query using

http://localhost:8983/solr/vault/select?q=*:*

I get reuslts including the following


 ...
 ...
 7
 ...


Now I try to get only that row so I add to my query fq=VersionNumber:7

http://localhost:8983/solr/vault/select?q=*:*&fq=VersionNumber:7

And I get nothing.
Any idea?


Re: Solr using a ridiculous amount of memory

2013-06-16 Thread Jack Krupansky
Yeah, this is yet another "anti-pattern" we need to be discouraging - large 
multivalued fields. They indicate that the data model is not well balanced 
and aligned with the strengths of Solr and Lucene.


-- Jack Krupansky

-Original Message- 
From: adityab

Sent: Sunday, June 16, 2013 9:36 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr using a ridiculous amount of memory

It was interesting to read this post. I had similar issue on Solr v4.2.1. 
The

nature of our document is that it has huge multiValued fields and we were
able to knock off out server in about 30muns
We then found a bug "Lucene-4995" which was causing all the problem.
Applying the patch has helped a lot.
Not sure related but you might want to check that out.
Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-using-a-ridiculous-amount-of-memory-tp4050840p4070803.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Solr using a ridiculous amount of memory

2013-06-16 Thread adityab
It was interesting to read this post. I had similar issue on Solr v4.2.1. The
nature of our document is that it has huge multiValued fields and we were
able to knock off out server in about 30muns 
We then found a bug "Lucene-4995" which was causing all the problem.
Applying the patch has helped a lot. 
Not sure related but you might want to check that out. 
Thanks. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-using-a-ridiculous-amount-of-memory-tp4050840p4070803.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr using a ridiculous amount of memory

2013-06-16 Thread Erick Erickson
John:

If you'd like to add your experience to the Wiki, create
an ID and let us know what it is and we'll add you to the
contributors list. Unfortunately we had problems with
spam pages to we added this step.

Make sure you include your logon in the request.

Thanks,
Erick

On Fri, Jun 14, 2013 at 8:55 AM, John Nielsen  wrote:
> Sorry for not getting back to the list sooner. It seems like I finally
> solved the memory problems by following Toke's instruction of splitting the
> cores up into smaller chunks.
>
> After some major refactoring, our 15 cores have now turned into ~500 cores
> and our memory consumption has dropped dramaticly. Running 200 webshops now
> actually uses less memory as our 24 test shops did before.
>
> Thank you to everyone who helped, and especially to Toke.
>
> I looked at the wiki, but could not find any reference to this unintuitive
> way of using memory. Did I miss it somewhere?
>
>
>
> On Fri, Apr 19, 2013 at 1:30 PM, Erick Erickson 
> wrote:
>
>> Hmmm. There has been quite a bit of work lately to support a couple of
>> things that might be of interest (4.3, which Simon cut today, probably
>> available to all mid next week at the latest). Basically, you can
>> choose to pre-define all the cores in solr.xml (so-called "old style")
>> _or_ use the new-style solr.xml which uses "auto-discover" mode to
>> walk the indicated directory and find all the cores (indicated by the
>> presence of a 'core.properties' file). Don't know if this would make
>> your particular case easier, and I should warn you that this is
>> relatively new code (although there are some reasonable unit tests).
>>
>> You also have the option to only load the cores when they are
>> referenced, and only keep N cores open at a time (loadOnStartup and
>> transient properties).
>>
>> See: http://wiki.apache.org/solr/CoreAdmin#Configuration and
>> http://wiki.apache.org/solr/Solr.xml%204.3%20and%20beyond
>>
>> Note, the docs are somewhat sketchy, so if you try to go down this
>> route let us know anything that should be improved (or you can be
>> added to the list of wiki page contributors and help out!)
>>
>> Best
>> Erick
>>
>> On Thu, Apr 18, 2013 at 8:31 AM, John Nielsen  wrote:
>> >> You are missing an essential part: Both the facet and the sort
>> >> structures needs to hold one reference for each document
>> >> _in_the_full_index_, even when the document does not have any values in
>> >> the fields.
>> >>
>> >
>> > Wow, thank you for this awesome explanation! This is where the penny
>> > dropped for me.
>> >
>> > I will definetely move to a multi-core setup. It will take some time and
>> a
>> > lot of re-coding. As soon as I know the result, I will let you know!
>> >
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Med venlig hilsen / Best regards
>> >
>> > *John Nielsen*
>> > Programmer
>> >
>> >
>> >
>> > *MCB A/S*
>> > Enghaven 15
>> > DK-7500 Holstebro
>> >
>> > Kundeservice: +45 9610 2824
>> > p...@mcb.dk
>> > www.mcb.dk
>>
>
>
>
> --
> Med venlig hilsen / Best regards
>
> *John Nielsen*
> Programmer
>
>
>
> *MCB A/S*
> Enghaven 15
> DK-7500 Holstebro
>
> Kundeservice: +45 9610 2824
> p...@mcb.dk
> www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-06-14 Thread Toke Eskildsen
On Fri, 2013-06-14 at 14:55 +0200, John Nielsen wrote:
> Sorry for not getting back to the list sooner.

Time not important, only feedback important (apologies to Fifth
Element).

> After some major refactoring, our 15 cores have now turned into ~500 cores
> and our memory consumption has dropped dramaticly. Running 200 webshops now
> actually uses less memory as our 24 test shops did before.

That's great to hear. One core/shop also sounds like a cleaner setup.

> I looked at the wiki, but could not find any reference to this unintuitive
> way of using memory. Did I miss it somewhere?

I am not aware of a wikified explanation, but a section on "Why does
Solr use so much memory?" with some suggestions for changes to setup
would seem appropriate. You are not the first to have these kinds of
problems.


Thank you for closing the issue,
Toke Eskildsen



Re: Solr using a ridiculous amount of memory

2013-06-14 Thread John Nielsen
Sorry for not getting back to the list sooner. It seems like I finally
solved the memory problems by following Toke's instruction of splitting the
cores up into smaller chunks.

After some major refactoring, our 15 cores have now turned into ~500 cores
and our memory consumption has dropped dramaticly. Running 200 webshops now
actually uses less memory as our 24 test shops did before.

Thank you to everyone who helped, and especially to Toke.

I looked at the wiki, but could not find any reference to this unintuitive
way of using memory. Did I miss it somewhere?



On Fri, Apr 19, 2013 at 1:30 PM, Erick Erickson wrote:

> Hmmm. There has been quite a bit of work lately to support a couple of
> things that might be of interest (4.3, which Simon cut today, probably
> available to all mid next week at the latest). Basically, you can
> choose to pre-define all the cores in solr.xml (so-called "old style")
> _or_ use the new-style solr.xml which uses "auto-discover" mode to
> walk the indicated directory and find all the cores (indicated by the
> presence of a 'core.properties' file). Don't know if this would make
> your particular case easier, and I should warn you that this is
> relatively new code (although there are some reasonable unit tests).
>
> You also have the option to only load the cores when they are
> referenced, and only keep N cores open at a time (loadOnStartup and
> transient properties).
>
> See: http://wiki.apache.org/solr/CoreAdmin#Configuration and
> http://wiki.apache.org/solr/Solr.xml%204.3%20and%20beyond
>
> Note, the docs are somewhat sketchy, so if you try to go down this
> route let us know anything that should be improved (or you can be
> added to the list of wiki page contributors and help out!)
>
> Best
> Erick
>
> On Thu, Apr 18, 2013 at 8:31 AM, John Nielsen  wrote:
> >> You are missing an essential part: Both the facet and the sort
> >> structures needs to hold one reference for each document
> >> _in_the_full_index_, even when the document does not have any values in
> >> the fields.
> >>
> >
> > Wow, thank you for this awesome explanation! This is where the penny
> > dropped for me.
> >
> > I will definetely move to a multi-core setup. It will take some time and
> a
> > lot of re-coding. As soon as I know the result, I will let you know!
> >
> >
> >
> >
> >
> >
> > --
> > Med venlig hilsen / Best regards
> >
> > *John Nielsen*
> > Programmer
> >
> >
> >
> > *MCB A/S*
> > Enghaven 15
> > DK-7500 Holstebro
> >
> > Kundeservice: +45 9610 2824
> > p...@mcb.dk
> > www.mcb.dk
>



-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-04-19 Thread Erick Erickson
Hmmm. There has been quite a bit of work lately to support a couple of
things that might be of interest (4.3, which Simon cut today, probably
available to all mid next week at the latest). Basically, you can
choose to pre-define all the cores in solr.xml (so-called "old style")
_or_ use the new-style solr.xml which uses "auto-discover" mode to
walk the indicated directory and find all the cores (indicated by the
presence of a 'core.properties' file). Don't know if this would make
your particular case easier, and I should warn you that this is
relatively new code (although there are some reasonable unit tests).

You also have the option to only load the cores when they are
referenced, and only keep N cores open at a time (loadOnStartup and
transient properties).

See: http://wiki.apache.org/solr/CoreAdmin#Configuration and
http://wiki.apache.org/solr/Solr.xml%204.3%20and%20beyond

Note, the docs are somewhat sketchy, so if you try to go down this
route let us know anything that should be improved (or you can be
added to the list of wiki page contributors and help out!)

Best
Erick

On Thu, Apr 18, 2013 at 8:31 AM, John Nielsen  wrote:
>> You are missing an essential part: Both the facet and the sort
>> structures needs to hold one reference for each document
>> _in_the_full_index_, even when the document does not have any values in
>> the fields.
>>
>
> Wow, thank you for this awesome explanation! This is where the penny
> dropped for me.
>
> I will definetely move to a multi-core setup. It will take some time and a
> lot of re-coding. As soon as I know the result, I will let you know!
>
>
>
>
>
>
> --
> Med venlig hilsen / Best regards
>
> *John Nielsen*
> Programmer
>
>
>
> *MCB A/S*
> Enghaven 15
> DK-7500 Holstebro
>
> Kundeservice: +45 9610 2824
> p...@mcb.dk
> www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-04-18 Thread John Nielsen
> You are missing an essential part: Both the facet and the sort
> structures needs to hold one reference for each document
> _in_the_full_index_, even when the document does not have any values in
> the fields.
>

Wow, thank you for this awesome explanation! This is where the penny
dropped for me.

I will definetely move to a multi-core setup. It will take some time and a
lot of re-coding. As soon as I know the result, I will let you know!






-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-04-18 Thread Toke Eskildsen
On Thu, 2013-04-18 at 11:59 +0200, John Nielsen wrote:
> Yes, thats right. No search from any given client ever returns
> anything from another client.

Great. That makes the 1 core/client solution feasible.

[No sort & facet warmup is performed]

[Suggestion 1: Reduce the number of sort fields by mapping]

[Suggestion 3: 1 core/customer]

> If I understand the fieldCache mechanism correctly (which i can see
> that I don't), the data used for faceting and sorting is saved in the
> fieldCache using a key comprised of the fields used for said
> faceting/sorting. That data only contains the data which is actually
> used for the operation. This is what the fq queries are for.
> 
You are missing an essential part: Both the facet and the sort
structures needs to hold one reference for each document
_in_the_full_index_, even when the document does not have any values in
the fields.

It might help to visualize the structures as arrays of values with docID
as index: String[] myValues = new String[140] takes up 1.4M * 32 bit
(or more for a 64 bit machine) = 5.6MB, even when it is empty.

Note: Neither String-objects, nor Java references are used for the real
facet- and sort-structures, but the principle is quite the same.

> So if i generate a core for each client, I would have a client
> specific fieldCache containing the data from that client. Wouldn't I
> just split up the same data into several cores?

The same terms, yes, but not the same references.

Let's say your customer has 10K documents in the index and that there
are 100 unique values, each 10 bytes long, in each group .

As each group holds its own separate structure, we use the old formula
to get the memory overhead:

#documents*log2(#unique_terms*average_term_length) +
#unique_terms*average_term_length
> 
1.4M*log2(100*(10*8)) + 100*(10*8) bit = 1.2MB + 1KB.

Note how the values themselves are just 1KB, while the nearly empty
reference list takes 1.2MB.


Compare this to a dedicated core with just the 10K documents:
10K*log2(100*(10*8)) + 100*(10*8) bit = 8.5KB + 1KB.

The terms take up exactly the same space, but the heap requirement for
the references is reduced by 99%.

Now, 25GB for 180 clients means 140MB/client with your current setup.
I do not know the memory overhead of running a core, but since Solr can
run fine with 32MB for small indexes, it should be smaller than that.
You will of course have to experiment and to measure.


- Toke Eskildsen, State and University Library, Denmark




Re: Solr using a ridiculous amount of memory

2013-04-18 Thread John Nielsen
>
> > http://172.22.51.111:8000/solr/default1_Danish/search
>
> [...]
>
> > &fq=site_guid%3a(10217)
>
> This constraints to hits to a specific customer, right? Any search will
> only be in a single customer's data?
>

Yes, thats right. No search from any given client ever returns anything
from another client.


[Toke: Are you warming all the sort- and facet-fields?]
>
> > I'm sorry, I don't know. I have the field value cache commented out in
> > my config, so... Whatever is default?
>
> (a bit shaky here) I would say not warming. You could check simply by
> starting solr and looking at the caches before you issue any searches.
>

The field cache shows 0 entries at startup. On the running server, forcing
a commit (and thus opening a new searcher) does not change the number of
entries.


> > The problem is that each item can have several sort orders. The sort
> > order to use is defined by a group number which is known ahead of
> > time. The group number is included in the sort order field name. To
> > solve it in the same way i solved the facet problem, I would need to
> > be able to sort on a multi-valued field, and unless I'm wrong, I don't
> > think that it's possible.
>
> That is correct.
>
> Three suggestions off the bat:
>
> 1) Reduce the number of sort fields by mapping names.
> Count the maximum number of unique sort fields for any given customer.
> That will be the total number of sort fields in the index. For each
> group number for a customer, map that number to one of the index-wide
> sort fields.
> This only works if the maximum number of unique fields is low (let's say
> a single field takes 50MB, so 20 fields should be okay).
>

I just checked our DB. Our worst case scenario client has over a thousand
groups for sorting. Granted, it may be, probably is, an error with the
data. It is an interesting idea though and I will look into this posibility.


> 3) Switch to a layout where each customer has a dedicated core.
> The basic overhead is a lot larger than for a shared index, but it would
> make your setup largely immune to the adverse effect of many documents
> coupled with many facet- and sort-fields.
>

Now this is where my brain melts down.

If I understand the fieldCache mechanism correctly (which i can see that I
don't), the data used for faceting and sorting is saved in the fieldCache
using a key comprised of the fields used for said faceting/sorting. That
data only contains the data which is actually used for the operation. This
is what the fq queries are for.

So if i generate a core for each client, I would have a client specific
fieldCache containing the data from that client. Wouldn't I just split up
the same data into several cores?

I'm afraid I don't understand how this would help.


-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-04-18 Thread Toke Eskildsen
On Thu, 2013-04-18 at 08:34 +0200, John Nielsen wrote:

> 
[Toke: Can you find the facet fields in any of the other caches?]

> Yes, here it is, in the field cache:

> http://screencast.com/t/mAwEnA21yL
> 
Ah yes, mystery solved, my mistake.

> http://172.22.51.111:8000/solr/default1_Danish/search

[...]

> &fq=site_guid%3a(10217)

This constraints to hits to a specific customer, right? Any search will
only be in a single customer's data?

> 
[Toke: Are you warming all the sort- and facet-fields?]

> I'm sorry, I don't know. I have the field value cache commented out in
> my config, so... Whatever is default?

(a bit shaky here) I would say not warming. You could check simply by
starting solr and looking at the caches before you issue any searches.

This fits the description of your searchers gradually eating memory
until your JVM OOMs. Each time a new field is faceted or sorted upon, it
it added to the cache. As your index is relatively small and the number
of values in the single fields is small, the initialization time for a
field is so short that it is not a performance problem. Memory wise is
is death by a thousand cuts.

If you did explicit warming of all the possible fields for sorting and
faceting, your would allocate it all up front and would be sure that
there would be enough memory available. But it would take much longer
than your current setup. You might want to try it out (no need to fiddle
with Solr setup, just make a script and fire wgets as this has the same
effect).

> The problem is that each item can have several sort orders. The sort
> order to use is defined by a group number which is known ahead of
> time. The group number is included in the sort order field name. To
> solve it in the same way i solved the facet problem, I would need to
> be able to sort on a multi-valued field, and unless I'm wrong, I don't
> think that it's possible.

That is correct.

Three suggestions off the bat:

1) Reduce the number of sort fields by mapping names.
Count the maximum number of unique sort fields for any given customer.
That will be the total number of sort fields in the index. For each
group number for a customer, map that number to one of the index-wide
sort fields.
This only works if the maximum number of unique fields is low (let's say
a single field takes 50MB, so 20 fields should be okay).

2) Create a custom sorter for Solr.
Create a field with all the sort values, prefixed by group ID. Create a
structure (or reuse the one from Lucene) with a doc->terms map with all
the terms in-memory. When sorting, extract the relevant compare-string
for a document by iterating all the terms for the document and selecting
the one with the right prefix.
Memory wise this scales linear to the number of terms instead of the
number of fields, but it would require quite some coding.

3) Switch to a layout where each customer has a dedicated core.
The basic overhead is a lot larger than for a shared index, but it would
make your setup largely immune to the adverse effect of many documents
coupled with many facet- and sort-fields.

- Toke Eskildsen, State and University Library, Denmark




Re: Solr using a ridiculous amount of memory

2013-04-17 Thread John Nielsen
> That was strange. As you are using a multi-valued field with the new
setup, they should appear there.

Yes, the new field we use for faceting is a multi valued field.

> Can you find the facet fields in any of the other caches?

Yes, here it is, in the field cache:

http://screencast.com/t/mAwEnA21yL

> I hope you are not calling the facets with facet.method=enum? Could you
paste a typical facet-enabled search request?

Here is a typical example (I added newlines for readability):

http://172.22.51.111:8000/solr/default1_Danish/search
?defType=edismax
&q=*%3a*
&facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_7+key%3ditemvariantoptions_int_mv_7%7ditemvariantoptions_int_mv
&facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_9+key%3ditemvariantoptions_int_mv_9%7ditemvariantoptions_int_mv
&facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_8+key%3ditemvariantoptions_int_mv_8%7ditemvariantoptions_int_mv
&facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_2+key%3ditemvariantoptions_int_mv_2%7ditemvariantoptions_int_mv
&fq=site_guid%3a(10217)
&fq=item_type%3a(PRODUCT)
&fq=language_guid%3a(1)
&fq=item_group_1522_combination%3a(*)
&fq=is_searchable%3a(True)
&sort=item_group_1522_name_int+asc, variant_of_item_guid+asc
&querytype=Technical
&fl=feed_item_serialized
&facet=true
&group=true
&group.facet=true
&group.ngroups=true
&group.field=groupby_variant_of_item_guid
&group.sort=name+asc
&rows=0

> Are you warming all the sort- and facet-fields?

I'm sorry, I don't know. I have the field value cache commented out in my
config, so... Whatever is default?

Removing the custom sort fields is unfortunately quite a bit more difficult
than my other facet modification.

The problem is that each item can have several sort orders. The sort order
to use is defined by a group number which is known ahead of time. The group
number is included in the sort order field name. To solve it in the same
way i solved the facet problem, I would need to be able to sort on a
multi-valued field, and unless I'm wrong, I don't think that it's possible.

I am quite stomped on how to fix this.




On Wed, Apr 17, 2013 at 3:06 PM, Toke Eskildsen wrote:

> John Nielsen [j...@mcb.dk]:
> > I never seriously looked at my fieldValueCache. It never seemed to get
> used:
>
> > http://screencast.com/t/YtKw7UQfU
>
> That was strange. As you are using a multi-valued field with the new
> setup, they should appear there. Can you find the facet fields in any of
> the other caches?
>
> ...I hope you are not calling the facets with facet.method=enum? Could you
> paste a typical facet-enabled search request?
>
> > Yep. We still do a lot of sorting on dynamic field names, so the field
> cache
> > has a lot of entries. (9.411 entries as we speak. This is considerably
> lower
> > than before.). You mentioned in an earlier mail that faceting on a field
> > shared between all facet queries would bring down the memory needed.
> > Does the same thing go for sorting?
>
> More or less. Sorting stores the raw string representations (utf-8) in
> memory so the number of unique values has more to say than it does for
> faceting. Just as with faceting, a list of pointers from documents to
> values (1 value/document as we are sorting) is maintained, so the overhead
> is something like
>
> #documents*log2(#unique_terms*average_term_length) +
> #unique_terms*average_term_length
> (where average_term_length is in bits)
>
> Caveat: This is with the index-wide sorting structure. I am fairly
> confident that this is what Solr uses, but I have not looked at it lately
> so it is possible that some memory-saving segment-based trickery has been
> implemented.
>
> > Does those 9411 entries duplicate data between them?
>
> Sorry, I do not know. SOLR- discusses the problems with the field
> cache and duplication of data, but I cannot infer if it is has been solved
> or not. I am not familiar with the stat breakdown of the fieldCache, but it
> _seems_ to me that there are 2 or 3 entries for each segment for each sort
> field. Guesstimating further, let's say you have 30 segments in your index.
> Going with the guesswork, that would bring the number of sort fields to
> 9411/3/30 ~= 100. Looks like you use a custom sort field for each client?
>
> Extrapolating from 1.4M documents and 180 clients, let's say that there
> are 1.4M/180/5 unique terms for each sort-field and that their average
> length is 10. We thus have
> 1.4M*log2(1500*10*8) + 1500*10*8 bit ~= 23MB
> per sort field or about 4GB for all the 180 fields.
>
> With this few unique values, the doc->value structure is by far the
> biggest, just as with facets. As opposed to the faceting structure, this is
> fairly close to the actual memory usage. Switching to a single sort field
> would reduce the memory usage from 4GB to about 55MB.
>
> > I do commit a bit more often than i should. I get these in my log file
> from
> > time to time: PERFORMANCE WARNING: Overlapping onDeckSearchers=2
>
> So 1 active searcher and 2 warming searcher

RE: Solr using a ridiculous amount of memory

2013-04-17 Thread Toke Eskildsen
Whopps. I made some mistakes in the previous post. 

Toke Eskildsen [t...@statsbiblioteket.dk]:

> Extrapolating from 1.4M documents and 180 clients, let's say that
> there are 1.4M/180/5 unique terms for each sort-field and that their
> average length is 10. We thus have
> 1.4M*log2(1500*10*8) + 1500*10*8 bit ~= 23MB
> per sort field or about 4GB for all the 180 fields.

That would be 10 bytes and thus 80 bits. The results were correct though.

> So 1 active searcher and 2 warming searchers. Ignoring that one of
> the warming searchers is highly likely to finish well ahead of the other
> one, that means that your heap must hold 3 times the structures for
> a single searcher.

This should be taken with a grain of salt as it depends on whether or not there 
is any re-use of segments. There might be for sorting.

Apologies for any confusion,
Toke Eskildsen


RE: Solr using a ridiculous amount of memory

2013-04-17 Thread Toke Eskildsen
John Nielsen [j...@mcb.dk]:
> I never seriously looked at my fieldValueCache. It never seemed to get used:

> http://screencast.com/t/YtKw7UQfU

That was strange. As you are using a multi-valued field with the new setup, 
they should appear there. Can you find the facet fields in any of the other 
caches?

...I hope you are not calling the facets with facet.method=enum? Could you 
paste a typical facet-enabled search request?

> Yep. We still do a lot of sorting on dynamic field names, so the field cache
> has a lot of entries. (9.411 entries as we speak. This is considerably lower
> than before.). You mentioned in an earlier mail that faceting on a field
> shared between all facet queries would bring down the memory needed.
> Does the same thing go for sorting?

More or less. Sorting stores the raw string representations (utf-8) in memory 
so the number of unique values has more to say than it does for faceting. Just 
as with faceting, a list of pointers from documents to values (1 value/document 
as we are sorting) is maintained, so the overhead is something like

#documents*log2(#unique_terms*average_term_length) + 
#unique_terms*average_term_length
(where average_term_length is in bits)

Caveat: This is with the index-wide sorting structure. I am fairly confident 
that this is what Solr uses, but I have not looked at it lately so it is 
possible that some memory-saving segment-based trickery has been implemented.

> Does those 9411 entries duplicate data between them?

Sorry, I do not know. SOLR- discusses the problems with the field cache and 
duplication of data, but I cannot infer if it is has been solved or not. I am 
not familiar with the stat breakdown of the fieldCache, but it _seems_ to me 
that there are 2 or 3 entries for each segment for each sort field. 
Guesstimating further, let's say you have 30 segments in your index. Going with 
the guesswork, that would bring the number of sort fields to 9411/3/30 ~= 100. 
Looks like you use a custom sort field for each client?

Extrapolating from 1.4M documents and 180 clients, let's say that there are 
1.4M/180/5 unique terms for each sort-field and that their average length is 
10. We thus have
1.4M*log2(1500*10*8) + 1500*10*8 bit ~= 23MB 
per sort field or about 4GB for all the 180 fields.

With this few unique values, the doc->value structure is by far the biggest, 
just as with facets. As opposed to the faceting structure, this is fairly close 
to the actual memory usage. Switching to a single sort field would reduce the 
memory usage from 4GB to about 55MB.

> I do commit a bit more often than i should. I get these in my log file from
> time to time: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

So 1 active searcher and 2 warming searchers. Ignoring that one of the warming 
searchers is highly likely to finish well ahead of the other one, that means 
that your heap must hold 3 times the structures for a single searcher. With the 
old heap size of 25GB that left "only" 8GB for a full dataset. Subtract the 4GB 
for sorting and a similar amount for faceting and you have your OOM.

Tweaking your ingest to avoid 3 overlapping searchers will lower your memory 
requirements by 1/3. Fixing the facet & sorting logic will bring it down to 
laptop size.

> The control panel says that the warm up time of the last searcher is 5574. Is 
> that seconds or milliseconds?
> http://screencast.com/t/d9oIbGLCFQwl

milliseconds, I am fairly sure. It is much faster than I anticipated. Are you 
warming all the sort- and facet-fields?

> Waiting for a full GC would take a long time.

Until you have fixed the core memory issue, you might consider doing an 
explicit GC every night to clean up and hope that it does not occur 
automatically at daytime (or whenever your clients uses it).

> Unfortunately I don't know of a way to provoke a full GC on command.

VisualVM, which is delivered with the Oracle JDK (look somewhere in the bin 
folder), is your friend. Just start it on the server and click on the relevant 
process.

Regards,
Toke Eskildsen

Re: Solr using a ridiculous amount of memory

2013-04-17 Thread John Nielsen
> I am surprised about the lack of "UnInverted" from your logs as it is
logged on INFO level.

Nope, no trace of it. No mention either in Logging -> Level from the admin
interface.

> It should also be available from the admin interface under
collection/Plugin / Stats/CACHE/fieldValueCache.

I never seriously looked at my fieldValueCache. It never seemed to get used:

http://screencast.com/t/YtKw7UQfU

> You stated that you were unable to make a 4GB JVM OOM when you just
performed faceting (I guesstimate that it will also run fine with just ½GB
or at least with 1GB, based on the
> numbers above) and you have observed that the field cache eats the
memory.

Yep. We still do a lot of sorting on dynamic field names, so the field
cache has a lot of entries. (9.411 entries as we speak. This is
considerably lower than before.). You mentioned in an earlier mail that
faceting on a field shared between all facet queries would bring down the
memory needed. Does the same thing go for sorting? Does those 9411 entries
duplicate data between them? If this is where all the memory is going, I
have a lot of coding to do.

> Guessing wildly: Do you issue a high frequency small updates with
frequent commits? If you pause the indexing, does memory use fall back to
the single GB level

I do commit a bit more often than i should. I get these in my log file from
time to time: PERFORMANCE WARNING: Overlapping onDeckSearchers=2 The way I
understand this is that two searchers are being warmed at the same time and
that one will be discarded when it finishes its auto warming procedure. If
the math above is correct, I would need tens of searchers auto
warming in parallel to cause my problem. If I misunderstand how this works,
do let me know.

My indexer has a cleanup routine that deletes replay logs and other things
when it has nothing to do. This includes running a commit on the solr
server to make sure nothing is ever in a state where something is not
written to disk anywhere. In theory it can commit once every 60 seconds,
though i doubt that ever happenes. The less work the indexer has, the more
often it commits. (yes i know, its on my todo list)

Other than that, my autocommit settings look like this:

 6 6000 
false 

The control panel says that the warm up time of the last searcher is 5574.
Is that seconds or milliseconds?
http://screencast.com/t/d9oIbGLCFQwl

I would prefer to not turn off the indexer unless the numbers above
suggests that I really should try this. Waiting for a full GC would take a
long time. Unfortunately I don't know of a way to provoke a full GC on
command.


On Wed, Apr 17, 2013 at 11:48 AM, Toke Eskildsen 
wrote:

> John Nielsen [j...@mcb.dk] wrote:
> > I managed to get this done. The facet queries now facets on a multivalue
> field as opposed to the dynamic field names.
>
> > Unfortunately it doesn't seem to have done much difference, if any at
> all.
>
> I am sorry to hear that.
>
> > documents = ~1.400.000
> > references 11.200.000  (we facet on two multivalue fields with each 4
> values
> > on average, so 1.400.000 * 2 * 4 = 11.200.000
> > unique values = 1.132.344 (total number of variant options across all
> clients.
> > This is what we facet on)
>
> > 1.400.000 * log2(11.200.000) + 1.400.000 * log2(1132344) = ~14MB per
> field (we have 4 fields)?
>
> > I must be calculating this wrong.
>
> No, that sounds about right. In reality you need to multiply with 3 or 4,
> so let's round to 50MB/field: 1.4M documents with 2 fields with 5M
> references/field each is not very much and should not take a lot of memory.
> In comparison, we facet on 12M documents with 166M references and do some
> other stuff (in Lucene with a different faceting implementation, but at
> this level it is equivalent to Solr's in terms of memory). Our heap is 3GB.
>
> I am surprised about the lack of "UnInverted" from your logs as it is
> logged on INFO level. It should also be available from the admin interface
> under collection/Plugin / Stats/CACHE/fieldValueCache. But I am guessing
> you got your numbers from that and that the list only contains the few
> facets you mentioned previously? It might be wise to sanity check by
> summing the memSizes though; they ought to take up far below 1GB.
>
> From your description, your index is small and your faceting requirements
> modest. A SSD-equipped laptop should be adequate as server. So we are back
> to "math does not check out".
>
>
> You stated that you were unable to make a 4GB JVM OOM when you just
> performed faceting (I guesstimate that it will also run fine with just ½GB
> or at least with 1GB, based on the numbers above) and you have observed
> that the field cache eats the memory. This does indicate that the old
> caches are somehow not freed when the index is updated. That is strange as
> Solr should take care of that automatically.
>
> Guessing wildly: Do you issue a high frequency small updates with frequent
> commits? If you pause the indexing, does memory use fall back

RE: Solr using a ridiculous amount of memory

2013-04-17 Thread Toke Eskildsen
John Nielsen [j...@mcb.dk] wrote:
> I managed to get this done. The facet queries now facets on a multivalue 
> field as opposed to the dynamic field names.

> Unfortunately it doesn't seem to have done much difference, if any at all.

I am sorry to hear that.

> documents = ~1.400.000
> references 11.200.000  (we facet on two multivalue fields with each 4 values 
> on average, so 1.400.000 * 2 * 4 = 11.200.000
> unique values = 1.132.344 (total number of variant options across all clients.
> This is what we facet on)

> 1.400.000 * log2(11.200.000) + 1.400.000 * log2(1132344) = ~14MB per field 
> (we have 4 fields)?

> I must be calculating this wrong.

No, that sounds about right. In reality you need to multiply with 3 or 4, so 
let's round to 50MB/field: 1.4M documents with 2 fields with 5M 
references/field each is not very much and should not take a lot of memory. In 
comparison, we facet on 12M documents with 166M references and do some other 
stuff (in Lucene with a different faceting implementation, but at this level it 
is equivalent to Solr's in terms of memory). Our heap is 3GB.

I am surprised about the lack of "UnInverted" from your logs as it is logged on 
INFO level. It should also be available from the admin interface under 
collection/Plugin / Stats/CACHE/fieldValueCache. But I am guessing you got your 
numbers from that and that the list only contains the few facets you mentioned 
previously? It might be wise to sanity check by summing the memSizes though; 
they ought to take up far below 1GB.

>From your description, your index is small and your faceting requirements 
>modest. A SSD-equipped laptop should be adequate as server. So we are back to 
>"math does not check out".


You stated that you were unable to make a 4GB JVM OOM when you just performed 
faceting (I guesstimate that it will also run fine with just ½GB or at least 
with 1GB, based on the numbers above) and you have observed that the field 
cache eats the memory. This does indicate that the old caches are somehow not 
freed when the index is updated. That is strange as Solr should take care of 
that automatically.

Guessing wildly: Do you issue a high frequency small updates with frequent 
commits? If you pause the indexing, does memory use fall back to the single GB 
level (You probably need to trigger a full GC to check that)? If that is the 
case, it might be a warmup problem with old warmups still running when new 
commits are triggered.

Regards,
Toke Eskildsen, State and University Library, Denmark

Re: Solr using a ridiculous amount of memory

2013-04-17 Thread John Nielsen
I managed to get this done. The facet queries now facets on a multivalue
field as opposed to the dynamic field names.

Unfortunately it doesn't seem to have done much difference, if any at all.

Some more information that might help:

The JVM memory seem to be eaten up slowly. I dont think that there is one
single query that causes the problem. My test case (dumping 180 clients on
top of solr) takes hours before it causes an OOM. Often a full day. The
memory usage wobbles up and down, so the GC is at least partially doing its
job. It still works its way up to 100% eventually. When that happens it
either OOM's or it stops the world and brings the memory consumption to
10-15 gigs.

I did try to facet on all products across all clients (about 1.4 mil docs)
and i could not make it OOM on a server with a 4 gig jvm. This was on a
dedicated test server with my test being the only traffic.

I am beginning to think that this may be related to traffic volume and not
just on the type of query that I do.

I tried to calculate the memory requirement example you gave me above based
on the change that got rid of the dynamic fields.

documents = ~1.400.000
references 11.200.000  (we facet on two multivalue fields with each 4
values on average, so 1.400.000 * 2 * 4 = 11.200.000
unique values = 1.132.344 (total number of variant options across all
clients. This is what we facet on)

1.400.000 * log2(11.200.000) + 1.400.000 * log2(1132344) = ~14MB per field
(we have 4 fields)?

I must be calculating this wrong.






On Mon, Apr 15, 2013 at 2:10 PM, John Nielsen  wrote:

> I did a search. I have no occurrence of "UnInverted" in the solr logs.
>
> > Another explanation for the large amount of memory presents itself if
> > you use a single index: If each of your clients facet on at least one
> > fields specific to the client ("client123_persons" or something like
> > that), then your memory usage goes through the roof.
>
> This is exactly how we facet right now! I will definetely rewrite the
> relevant parts of our product to test this out before moving further down
> the docValues path.
>
> I will let you know as soon as I know one way or the other.
>
>
> On Mon, Apr 15, 2013 at 1:38 PM, Toke Eskildsen 
> wrote:
>
>> On Mon, 2013-04-15 at 10:25 +0200, John Nielsen wrote:
>>
>> > The FieldCache is the big culprit. We do a huge amount of faceting so
>> > it seems right.
>>
>> Yes, you wrote that earlier. The mystery is that the math does not check
>> out with the description you have given us.
>>
>> > Unfortunately I am super swamped at work so I have precious little
>> > time to work on this, which is what explains my silence.
>>
>> No problem, we've all been there.
>> >
>> [Band aid: More memory]
>>
>> > The extra memory helped a lot, but it still OOM with about 180 clients
>> > using it.
>>
>> You stated earlier that you has a "solr cluster" and your total(?) index
>> size was 35GB, with each "register" being between "15k" and "30k". I am
>> using the quotes to signify that it is unclear what you mean. Is your
>> cluster multiple machines (I'm guessing no), multiple Solr's, cores,
>> shards or maybe just a single instance prepared for later distribution?
>> Is a register a core, shard or a simply logical part (one client's data)
>> of the index?
>>
>> If each client has their own core or shard, that would mean that each
>> client uses more than 25GB/180 bytes ~= 142MB of heap to access 35GB/180
>> ~= 200MB of index. That sounds quite high and you would need a very
>> heavy facet to reach that.
>>
>> If you could grep "UnInverted" from the Solr log file and paste the
>> entries here, that would help to clarify things.
>>
>>
>> Another explanation for the large amount of memory presents itself if
>> you use a single index: If each of your clients facet on at least one
>> fields specific to the client ("client123_persons" or something like
>> that), then your memory usage goes through the roof.
>>
>> Assuming an index with 10M documents, each with 5 references to a modest
>> 10K unique values in a facet field, the simplified formula
>>   #documents*log2(#references) + #references*log2(#unique_values) bit
>> tells us that this takes at least 110MB with field cache based faceting.
>>
>> 180 clients @ 110MB ~= 20GB. As that is a theoretical low, we can at
>> least double that. This fits neatly with your new heap of 64GB.
>>
>>
>> If my guessing is correct, you can solve your memory problems very
>> easily by sharing _all_ the facet fields between your clients.
>> This should bring your memory usage down to a few GB.
>>
>> You are probably already restricting their searches to their own data by
>> filtering, so this should not influence the returned facet values and
>> counts, as compared to separate fields.
>>
>> This is very similar to the thread "Facets with 5000 facet fields" BTW.
>>
>> > Today I finally managed to set up a test core so I can begin to play
>> > around with docValues.
>>
>> If you are using a single index with t

Re: Solr using a ridiculous amount of memory

2013-04-15 Thread Upayavira
Might be obvious, but just in case - remember that you'll need to
re-index your content once you've added docValues to your schema, in
order to get the on-disk files to be created.

Upayavira

On Mon, Mar 25, 2013, at 03:16 PM, John Nielsen wrote:
> I apologize for the slow reply. Today has been killer. I will reply to
> everyone as soon as I get the time.
> 
> I am having difficulties understanding how docValues work.
> 
> Should I only add docValues to the fields that I actually use for sorting
> and faceting or on all fields?
> 
> Will the docValues magic apply to the fields i activate docValues on or
> on
> the entire document when sorting/faceting on a field that has docValues
> activated?
> 
> I'm not even sure which question to ask. I am struggling to understand
> this
> on a conceptual level.
> 
> 
> On Sun, Mar 24, 2013 at 7:11 PM, Robert Muir  wrote:
> 
> > On Sun, Mar 24, 2013 at 4:19 AM, John Nielsen  wrote:
> >
> > > Schema with DocValues attempt at solving problem:
> > > http://pastebin.com/Ne23NnW4
> > > Config: http://pastebin.com/x1qykyXW
> > >
> >
> > This schema isn't using docvalues, due to a typo in your config.
> > it should not be DocValues="true" but docValues="true".
> >
> > Are you not getting an error? Solr needs to throw exception if you
> > provide invalid attributes to the field. Nothing is more frustrating
> > than having a typo or something in your configuration and solr just
> > ignores this, reports no error, and "doesnt work the way you want".
> > I'll look into this (I already intend to add these checks to analysis
> > factories for the same reason).
> >
> > Separately, if you really want the terms data and so on to remain on
> > disk, it is not enough to "just enable docvalues" for the field. The
> > default implementation uses the heap. So if you want that, you need to
> > set docValuesFormat="Disk" on the fieldtype. This will keep the
> > majority of the data on disk, and only some key datastructures in heap
> > memory. This might have significant performance impact depending upon
> > what you are doing so you need to test that.
> >
> 
> 
> 
> -- 
> Med venlig hilsen / Best regards
> 
> *John Nielsen*
> Programmer
> 
> 
> 
> *MCB A/S*
> Enghaven 15
> DK-7500 Holstebro
> 
> Kundeservice: +45 9610 2824
> p...@mcb.dk
> www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-04-15 Thread John Nielsen
I did a search. I have no occurrence of "UnInverted" in the solr logs.

> Another explanation for the large amount of memory presents itself if
> you use a single index: If each of your clients facet on at least one
> fields specific to the client ("client123_persons" or something like
> that), then your memory usage goes through the roof.

This is exactly how we facet right now! I will definetely rewrite the
relevant parts of our product to test this out before moving further down
the docValues path.

I will let you know as soon as I know one way or the other.


On Mon, Apr 15, 2013 at 1:38 PM, Toke Eskildsen wrote:

> On Mon, 2013-04-15 at 10:25 +0200, John Nielsen wrote:
>
> > The FieldCache is the big culprit. We do a huge amount of faceting so
> > it seems right.
>
> Yes, you wrote that earlier. The mystery is that the math does not check
> out with the description you have given us.
>
> > Unfortunately I am super swamped at work so I have precious little
> > time to work on this, which is what explains my silence.
>
> No problem, we've all been there.
> >
> [Band aid: More memory]
>
> > The extra memory helped a lot, but it still OOM with about 180 clients
> > using it.
>
> You stated earlier that you has a "solr cluster" and your total(?) index
> size was 35GB, with each "register" being between "15k" and "30k". I am
> using the quotes to signify that it is unclear what you mean. Is your
> cluster multiple machines (I'm guessing no), multiple Solr's, cores,
> shards or maybe just a single instance prepared for later distribution?
> Is a register a core, shard or a simply logical part (one client's data)
> of the index?
>
> If each client has their own core or shard, that would mean that each
> client uses more than 25GB/180 bytes ~= 142MB of heap to access 35GB/180
> ~= 200MB of index. That sounds quite high and you would need a very
> heavy facet to reach that.
>
> If you could grep "UnInverted" from the Solr log file and paste the
> entries here, that would help to clarify things.
>
>
> Another explanation for the large amount of memory presents itself if
> you use a single index: If each of your clients facet on at least one
> fields specific to the client ("client123_persons" or something like
> that), then your memory usage goes through the roof.
>
> Assuming an index with 10M documents, each with 5 references to a modest
> 10K unique values in a facet field, the simplified formula
>   #documents*log2(#references) + #references*log2(#unique_values) bit
> tells us that this takes at least 110MB with field cache based faceting.
>
> 180 clients @ 110MB ~= 20GB. As that is a theoretical low, we can at
> least double that. This fits neatly with your new heap of 64GB.
>
>
> If my guessing is correct, you can solve your memory problems very
> easily by sharing _all_ the facet fields between your clients.
> This should bring your memory usage down to a few GB.
>
> You are probably already restricting their searches to their own data by
> filtering, so this should not influence the returned facet values and
> counts, as compared to separate fields.
>
> This is very similar to the thread "Facets with 5000 facet fields" BTW.
>
> > Today I finally managed to set up a test core so I can begin to play
> > around with docValues.
>
> If you are using a single index with the individual-facet-fields for
> each client approach, the DocValues will also have scaling issues, as
> the amount of values (of which the majority will be null) will be
>   #clients*#documents*#facet_fields
> This means that the adding a new client will be progressively more
> expensive.
>
> On the other hand, if you use a lot of small shards, DocValues should
> work for you.
>
> Regards,
> Toke Eskildsen
>
>
>


-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-04-15 Thread Toke Eskildsen
On Mon, 2013-04-15 at 10:25 +0200, John Nielsen wrote:

> The FieldCache is the big culprit. We do a huge amount of faceting so
> it seems right.

Yes, you wrote that earlier. The mystery is that the math does not check
out with the description you have given us.

> Unfortunately I am super swamped at work so I have precious little
> time to work on this, which is what explains my silence.

No problem, we've all been there.
> 
[Band aid: More memory]

> The extra memory helped a lot, but it still OOM with about 180 clients
> using it.

You stated earlier that you has a "solr cluster" and your total(?) index
size was 35GB, with each "register" being between "15k" and "30k". I am
using the quotes to signify that it is unclear what you mean. Is your
cluster multiple machines (I'm guessing no), multiple Solr's, cores,
shards or maybe just a single instance prepared for later distribution?
Is a register a core, shard or a simply logical part (one client's data)
of the index?

If each client has their own core or shard, that would mean that each
client uses more than 25GB/180 bytes ~= 142MB of heap to access 35GB/180
~= 200MB of index. That sounds quite high and you would need a very
heavy facet to reach that.

If you could grep "UnInverted" from the Solr log file and paste the
entries here, that would help to clarify things.


Another explanation for the large amount of memory presents itself if
you use a single index: If each of your clients facet on at least one
fields specific to the client ("client123_persons" or something like
that), then your memory usage goes through the roof.

Assuming an index with 10M documents, each with 5 references to a modest
10K unique values in a facet field, the simplified formula
  #documents*log2(#references) + #references*log2(#unique_values) bit
tells us that this takes at least 110MB with field cache based faceting.

180 clients @ 110MB ~= 20GB. As that is a theoretical low, we can at
least double that. This fits neatly with your new heap of 64GB.


If my guessing is correct, you can solve your memory problems very
easily by sharing _all_ the facet fields between your clients.
This should bring your memory usage down to a few GB.

You are probably already restricting their searches to their own data by
filtering, so this should not influence the returned facet values and
counts, as compared to separate fields.

This is very similar to the thread "Facets with 5000 facet fields" BTW.

> Today I finally managed to set up a test core so I can begin to play
> around with docValues.

If you are using a single index with the individual-facet-fields for
each client approach, the DocValues will also have scaling issues, as
the amount of values (of which the majority will be null) will be
  #clients*#documents*#facet_fields
This means that the adding a new client will be progressively more
expensive.

On the other hand, if you use a lot of small shards, DocValues should
work for you.

Regards,
Toke Eskildsen




Re: Solr using a ridiculous amount of memory

2013-04-15 Thread John Nielsen
Yes and no,

The FieldCache is the big culprit. We do a huge amount of faceting so it
seems right. Unfortunately I am super swamped at work so I have precious
little time to work on this, which is what explains my silence.

Out of desperation, I added another 32G of memory to each server and
increased the JVM size to 64G from 25G. The servers are running with 96G
memory right now (this is the max amount supported by the hardware) which
leaves solr somewhat starved for memory. I am aware of the performance
implications of doing this but I have little choice.

The extra memory helped a lot, but it still OOM with about 180 clients
using it. Unfortunately I need to support at least double that. After
upgrading the RAM, I ran for almost two weeks with the same workload that
used to OOM a couple of times a day, so it doesn't look like a leak.

Today I finally managed to set up a test core so I can begin to play around
with docValues.

I actually have a couple of questions regarding docValues:
1) If I facet on multible fields and only some of those fields are using
docValues, will I still get the memory saving benefit of docValues? (one of
the facet fields use null values and will require a lot of work in our
product to fix)
2) If i just use docValues on one small core with very limited traffic at
first for testing purposes, how can I test that it is actually using the
disk for caching?

I really appreciate all the help I have received on this list so far. I do
feel confident that I will be able to solve this issue eventually.



On Mon, Apr 15, 2013 at 9:00 AM, Toke Eskildsen wrote:

> On Sun, 2013-03-24 at 09:19 +0100, John Nielsen wrote:
> > Our memory requirements are running amok. We have less than a quarter of
> > our customers running now and even though we have allocated 25GB to the
> JVM
> > already, we are still seeing daily OOM crashes.
>
> Out of curiosity: Did you manage to pinpoint the memory eater in your
> setup?
>
> - Toke Eskildsen
>
>


-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-04-15 Thread Toke Eskildsen
On Sun, 2013-03-24 at 09:19 +0100, John Nielsen wrote:
> Our memory requirements are running amok. We have less than a quarter of
> our customers running now and even though we have allocated 25GB to the JVM
> already, we are still seeing daily OOM crashes.

Out of curiosity: Did you manage to pinpoint the memory eater in your
setup?

- Toke Eskildsen



Re: Solr using a ridiculous amount of memory

2013-03-25 Thread John Nielsen
I apologize for the slow reply. Today has been killer. I will reply to
everyone as soon as I get the time.

I am having difficulties understanding how docValues work.

Should I only add docValues to the fields that I actually use for sorting
and faceting or on all fields?

Will the docValues magic apply to the fields i activate docValues on or on
the entire document when sorting/faceting on a field that has docValues
activated?

I'm not even sure which question to ask. I am struggling to understand this
on a conceptual level.


On Sun, Mar 24, 2013 at 7:11 PM, Robert Muir  wrote:

> On Sun, Mar 24, 2013 at 4:19 AM, John Nielsen  wrote:
>
> > Schema with DocValues attempt at solving problem:
> > http://pastebin.com/Ne23NnW4
> > Config: http://pastebin.com/x1qykyXW
> >
>
> This schema isn't using docvalues, due to a typo in your config.
> it should not be DocValues="true" but docValues="true".
>
> Are you not getting an error? Solr needs to throw exception if you
> provide invalid attributes to the field. Nothing is more frustrating
> than having a typo or something in your configuration and solr just
> ignores this, reports no error, and "doesnt work the way you want".
> I'll look into this (I already intend to add these checks to analysis
> factories for the same reason).
>
> Separately, if you really want the terms data and so on to remain on
> disk, it is not enough to "just enable docvalues" for the field. The
> default implementation uses the heap. So if you want that, you need to
> set docValuesFormat="Disk" on the fieldtype. This will keep the
> majority of the data on disk, and only some key datastructures in heap
> memory. This might have significant performance impact depending upon
> what you are doing so you need to test that.
>



-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Re: Solr using a ridiculous amount of memory

2013-03-24 Thread Jack Krupansky
A step I meant to include was that after you "warm" Solr with a 
representative collection of queries that references all of the fields, 
facets, sorting, etc. that your daily load will reference, check the Java 
heap size at that point, and then set your Java heap limit to a moderate 
level higher, like 256M, restart, and then see what happens.


The theory is that if you have too much available heap, Java will gradually 
fill it all with garbage (no leaks implied, but maybe some leaks as well), 
and then a Java GC will be an expensive hit, and sometimes a rapid flow of 
incoming requests at that point can cause Java to freak out and even hit OOM 
even though a more graceful garbage collection would eventually free up tons 
of garbage.


So, by only allowing for a moderate amount of garbage, more frequent GCs 
will be less intensive and less likely to cause weird situations.


The other part of the theory is that it is usually better to leave tons of 
memory to the OS for efficiently caching files, rather than force Java to 
manage large amounts of memory, which it typically does not do so well.


-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Sunday, March 24, 2013 2:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr using a ridiculous amount of memory

Just to get started, do you hit OOM quickly with a few expensive queries, or
is it after a number of hours and lots of queries?

Does Java heap usage seem to be growing linearly as queries come in, or are
there big spikes?

How complex/rich are your queries (e.g., how many terms, wildcards, faceted
fields, sorting, etc.)?

As a baseline experiment, start a Solr server, see how much Java heap is
used/available. Then do a couple of typical queries, and check the heap size
again. Then do a couple more similar but different (to avoid query cache
matches), and check the heap again. Maybe do that a few times to get a
handle on the baseline memory required and whether there might be a leak of
some sort. Do enough queries to hits all of the fields, facets, sorting,
etc. that are likely to be encountered in one of your typical days that hits
OOM - just not the volume of queries. The goal is to determine if there is
something inherently memory intensive in your index/queries, or something
relating to a leak based on total query volume.

-- Jack Krupansky

-Original Message- 
From: John Nielsen

Sent: Sunday, March 24, 2013 4:19 AM
To: solr-user@lucene.apache.org
Subject: Solr using a ridiculous amount of memory

Hello all,

We are running a solr cluster which is now running solr-4.2.

The index is about 35GB on disk with each register between 15k and 30k.
(This is simply the size of a full xml reply of one register. I'm not sure
how to measure it otherwise.)

Our memory requirements are running amok. We have less than a quarter of
our customers running now and even though we have allocated 25GB to the JVM
already, we are still seeing daily OOM crashes. We used to just allocate
more memory to the JVM, but with the way solr is scaling, we would need
well over 100GB of memory on each node to finish the project, and thats
just not going to happen. I need to lower the memory requirements somehow.

I can see from the memory dumps we've done that the field cache is by far
the biggest sinner. Of special interest to me is the recent introduction of
DocValues which supposedly mitigates this issue by using memory outside the
JVM. I just can't, because of lack of documentation, seem to make it work.

We do a lot of facetting. One client facets on about 50.000 docs of approx
30k each on 5 fields. I understand that this is VERY memory intensive.

Schema with DocValues attempt at solving problem:
http://pastebin.com/Ne23NnW4
Config: http://pastebin.com/x1qykyXW

The cache is pretty well tuned. Any lower and i get evictions.

Come hell or high water, my JVM memory requirements must come down. Simply
moving some memory load outside of the JVM would be awesome! Making it not
use the field cache for anything would also (probably) work for me. I
thought about killing off my other caches, but from the dumps, they just
don't seem to use that much memory.

I am at my wits end. Any help would be sorely appreciated.

--
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk 



RE: Solr using a ridiculous amount of memory

2013-03-24 Thread Toke Eskildsen
Toke Eskildsen [t...@statsbiblioteket.dk]:
> If your whole index has 10M documents, which each has 100 values
> for each field, with each field having 50M unique values, then the 
> memory requirement would be more than 
> 10M*log2(100*10M) + 100*10M*log2(50M) bit ~= 340MB/field ~=
> 1.6GB for faceting on all fields.

Whoops. Missed a 0 when calculating. The case above would actually take more 
than 15GB, probably also more than the 25GB you have allocated.


Anyway, I see now in your solrconfig that your main facet fields are "cat", 
"manu_exact", "content_type" and "author_s", with the 5th being maybe "price", 
"popularity" or "manufacturedate_dt"?

cat seems like category (relatively few references, few uniques), content_type 
probably has a single value/item and again few uniques. No memory problem 
there, unless you have a lot of documents (100M-range). That leaves manu_exact 
and author_s. If those are freetext fields with item descriptions or similar, 
that might explain the OOM.

Could you describe the facet fields in more detail and provide us with the 
total document count?


Quick sanity check: If you are using a Linux server, could you please verify 
that your virtual memory is set to unlimited with 'ulimit -v'?

Regards,
Toke Eskildsen


RE: Solr using a ridiculous amount of memory

2013-03-24 Thread Toke Eskildsen
From: John Nielsen [j...@mcb.dk]:
> The index is about 35GB on disk with each register between 15k and 30k.
> (This is simply the size of a full xml reply of one register. I'm not sure
> how to measure it otherwise.)

> Our memory requirements are running amok. We have less than a quarter of
> our customers running now and even though we have allocated 25GB to the JVM
> already, we are still seeing daily OOM crashes.

That does sound a bit peculiar. I do not understand what you mean by "register" 
though. How many documents does your index holds?

> I can see from the memory dumps we've done that the field cache is by far
> the biggest sinner.

Do you sort on a lot of different fields?

> We do a lot of facetting. One client facets on about 50.000 docs of approx
> 30k each on 5 fields. I understand that this is VERY memory intensive.

To get a rough approximation of memory usage, we need the total number of 
documents, the average number of values for each of the 5 fields for a document 
and the number of unique values in each of the 5 fields. The rule of thumb I 
use for lower ceiling is

#documents*log2(#references) + #references*log2(#unique_values) bit

If your whole index has 10M documents, which each has 100 values for each 
field, with each field having 50M unique values, then the memory requirement 
would be more than 10M*log2(100*10M) + 100*10M*log2(50M) bit ~= 340MB/field ~= 
1.6GB for faceting on all fields. Even when we multiply that with 4 to get a 
more real-world memory requirement, it is far from the 25GB that you are 
allocating. Either you have an interestingly high number somewhere in the 
equation or something's off.

Regards,
Toke Eskildsen

Re: Solr using a ridiculous amount of memory

2013-03-24 Thread Robert Muir
On Sun, Mar 24, 2013 at 4:19 AM, John Nielsen  wrote:

> Schema with DocValues attempt at solving problem:
> http://pastebin.com/Ne23NnW4
> Config: http://pastebin.com/x1qykyXW
>

This schema isn't using docvalues, due to a typo in your config.
it should not be DocValues="true" but docValues="true".

Are you not getting an error? Solr needs to throw exception if you
provide invalid attributes to the field. Nothing is more frustrating
than having a typo or something in your configuration and solr just
ignores this, reports no error, and "doesnt work the way you want".
I'll look into this (I already intend to add these checks to analysis
factories for the same reason).

Separately, if you really want the terms data and so on to remain on
disk, it is not enough to "just enable docvalues" for the field. The
default implementation uses the heap. So if you want that, you need to
set docValuesFormat="Disk" on the fieldtype. This will keep the
majority of the data on disk, and only some key datastructures in heap
memory. This might have significant performance impact depending upon
what you are doing so you need to test that.


Re: Solr using a ridiculous amount of memory

2013-03-24 Thread Jack Krupansky
Just to get started, do you hit OOM quickly with a few expensive queries, or 
is it after a number of hours and lots of queries?


Does Java heap usage seem to be growing linearly as queries come in, or are 
there big spikes?


How complex/rich are your queries (e.g., how many terms, wildcards, faceted 
fields, sorting, etc.)?


As a baseline experiment, start a Solr server, see how much Java heap is 
used/available. Then do a couple of typical queries, and check the heap size 
again. Then do a couple more similar but different (to avoid query cache 
matches), and check the heap again. Maybe do that a few times to get a 
handle on the baseline memory required and whether there might be a leak of 
some sort. Do enough queries to hits all of the fields, facets, sorting, 
etc. that are likely to be encountered in one of your typical days that hits 
OOM - just not the volume of queries. The goal is to determine if there is 
something inherently memory intensive in your index/queries, or something 
relating to a leak based on total query volume.


-- Jack Krupansky

-Original Message- 
From: John Nielsen

Sent: Sunday, March 24, 2013 4:19 AM
To: solr-user@lucene.apache.org
Subject: Solr using a ridiculous amount of memory

Hello all,

We are running a solr cluster which is now running solr-4.2.

The index is about 35GB on disk with each register between 15k and 30k.
(This is simply the size of a full xml reply of one register. I'm not sure
how to measure it otherwise.)

Our memory requirements are running amok. We have less than a quarter of
our customers running now and even though we have allocated 25GB to the JVM
already, we are still seeing daily OOM crashes. We used to just allocate
more memory to the JVM, but with the way solr is scaling, we would need
well over 100GB of memory on each node to finish the project, and thats
just not going to happen. I need to lower the memory requirements somehow.

I can see from the memory dumps we've done that the field cache is by far
the biggest sinner. Of special interest to me is the recent introduction of
DocValues which supposedly mitigates this issue by using memory outside the
JVM. I just can't, because of lack of documentation, seem to make it work.

We do a lot of facetting. One client facets on about 50.000 docs of approx
30k each on 5 fields. I understand that this is VERY memory intensive.

Schema with DocValues attempt at solving problem:
http://pastebin.com/Ne23NnW4
Config: http://pastebin.com/x1qykyXW

The cache is pretty well tuned. Any lower and i get evictions.

Come hell or high water, my JVM memory requirements must come down. Simply
moving some memory load outside of the JVM would be awesome! Making it not
use the field cache for anything would also (probably) work for me. I
thought about killing off my other caches, but from the dumps, they just
don't seem to use that much memory.

I am at my wits end. Any help would be sorely appreciated.

--
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk 



Re: SOLR USING 100% percent CPU and not responding after a while

2012-11-20 Thread Otis Gospodnetic
Hi,

I looked at the spreadsheet and the graph and it looks like that's memory
for the whole OS.  What you want to look at is the JVM heap and GC counts
and timings there, esp. when you say performance sinks.  At that time also
look at your query, filter, and document caches and see if evictions go up
and cache hit rate down.  Or maybe queries are so random that there is lots
of disk IO happening?  Have a look at the URL in my signature, you may find
that helpful with this sort of troubleshooting.

Otis
--
Solr Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Tue, Nov 20, 2012 at 11:49 AM, Shrestha, Biva  wrote:

> Hello,
> We were doing fine with memory, comparitively. This was in a one hour span
> of load testing we did. And no out of memory exceptions seen in the log. I
> have copied the memory chart here. It shows that most of the memory was
> being used for cache.
> Thank you
>
>
> -Original Message-
> From: Rafał Kuć [mailto:r@solr.pl]
> Sent: Tuesday, November 20, 2012 11:25 AM
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR USING 100% percent CPU and not responding after a while
>
> Hello!
>
> The first thing one should ask is what is the cause of the 100% CPU
> utilization. It can be an issue with the memory or your queries may be
> quite resource intensive. Can you provide some more information about your
> issues ? Do you see any OutOfMemory exceptions in your logs ? How are JVM
> garbage collector behaving when you are experiencing Solr being
> unresponsive ?
>
> --
> Regards,
>  Rafał Kuć
>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
>
> > Hello all,
> > We have solr 3.6 in a tomcat server with 16 GB memory allocated to it
> > in a Linux server. It is a multicore solr and one of the core has over
> > 100 million records. We have a web application that is fed by this
> > solr. We don't use sharding, different cores are for different
> > purposes. There are all kind of queries involved: simple queries,
> > queries with filters, queries with facets, queries with pagination (
> > deep pagination as well) and a combination of them . When there are
> > 100 concurrent users and quite some queries thrown to the solr the CPU
> > utilization by the solr exceeds 100% and it becomes unresponsive.
> > Has someone from this group have had  this problem and found a
> > solution. We did look at the queries from the logs and there is some
> > optimization to be done there but is there anything we can do in the
> > configuration. I know that there is a parameter called MergeFactor
> > which can be set low in the solrconfig.xml for enabling faster
> > searching but I don't see anything about CPU utilization related to
> > that settings. So my question in a jest is that is there any CPU
> > utilization related configuration settings for SOLR.
> > Any help will be appreciated a lot
> > Thank you a bunch
> > Biva
>
>
>
>


Re: SOLR USING 100% percent CPU and not responding after a while

2012-11-20 Thread Rafał Kuć
Hello!

How about some information how Garbage Collector works during the time
when you have Solr unresponsive ?

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> Hello,
> We were doing fine with memory, comparitively. This was in a one
> hour span of load testing we did. And no out of memory exceptions
> seen in the log. I have copied the memory chart here. It shows that
> most of the memory was being used for cache. 
> Thank you


> -Original Message-
> From: Rafał Kuć [mailto:r@solr.pl] 
> Sent: Tuesday, November 20, 2012 11:25 AM
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR USING 100% percent CPU and not responding after a while

> Hello!

> The first thing one should ask is what is the cause of the 100% CPU
> utilization. It can be an issue with the memory or your queries may
> be quite resource intensive. Can you provide some more information
> about your issues ? Do you see any OutOfMemory exceptions in your
> logs ? How are JVM garbage collector behaving when you are
> experiencing Solr being unresponsive ?

> --
> Regards,
>  Rafał Kuć
>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

>> Hello all,
>> We have solr 3.6 in a tomcat server with 16 GB memory allocated to it 
>> in a Linux server. It is a multicore solr and one of the core has over 
>> 100 million records. We have a web application that is fed by this 
>> solr. We don't use sharding, different cores are for different 
>> purposes. There are all kind of queries involved: simple queries, 
>> queries with filters, queries with facets, queries with pagination ( 
>> deep pagination as well) and a combination of them . When there are
>> 100 concurrent users and quite some queries thrown to the solr the CPU 
>> utilization by the solr exceeds 100% and it becomes unresponsive.
>> Has someone from this group have had  this problem and found a 
>> solution. We did look at the queries from the logs and there is some 
>> optimization to be done there but is there anything we can do in the 
>> configuration. I know that there is a parameter called MergeFactor 
>> which can be set low in the solrconfig.xml for enabling faster 
>> searching but I don't see anything about CPU utilization related to 
>> that settings. So my question in a jest is that is there any CPU 
>> utilization related configuration settings for SOLR.
>> Any help will be appreciated a lot
>> Thank you a bunch
>> Biva





Re: Solr using very high I/O

2011-12-14 Thread Martin Koch
Do you commit often? If so, try committing less often :)

/Martin

On Wed, Dec 7, 2011 at 12:16 PM, Adrian Fita  wrote:

> Hi. I experience an issue where Solr is using huge ammounts of I/O.
> Basically it uses the whole HDD continously, leaving nothing to the
> other processes. Solr is called by a script which continously indexes
> some files.
>
> The index has around 800MB and I can't understand why it could trash
> the HDD so much.
>
> I could use some help on how to optimize Solr so it doesn't use so much
> I/O.
>
> Thank you.
> --
> Fita Adrian
>


Re: Solr using very high I/O

2011-12-08 Thread pravesh
Can u share more info: like what is your H/W infra, CPU, RAM, HDD??
>From where you pick the records/documents to index; RDBMS, Files, Network??

Regards
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-using-very-high-I-O-tp3567076p3569903.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr: using to index large folders recursively containing lots of different documents, and querying over the web

2011-01-17 Thread Lance Norskog
Solr itself does all three things. There is no need for Nutch- that is
needed for crawling web sites, not file systems (as the original
question specifies).

Solr operates as a web service, running in any Java servlet container.

Detecting changes to files is more tricky: there is no implementation
for the real-time update system available for Windows. You would have
to implement that. Otherwise you can poll a file system and re-index
altered files.

On Fri, Jan 14, 2011 at 4:54 AM, Markus Jelsma
 wrote:
> Nutch can crawl the file system as well. Nutch 1.x can also provide search but
> this is delegated to Solr in Nutch 2.x. Solr can provide the search and Nutch
> can provide Solr with content from your intranet.
>
> On Friday 14 January 2011 13:17:52 Cathy Hemsley wrote:
>> Hi,
>> Thanks for suggesting this.
>> However, I'm not sure a 'crawler' will work:  as the various pages are not
>> necessarily linked (it's complicated:  basically our intranet is a dynamic
>> and managed collection of independantly published web sites, and users
>> found information using categorisation and/or text searching), so we need
>> something that will index all the files in a given folder, rather than
>> follow links like a crawler. Can Nutch do this? As well as the other
>> requirements below?
>> Regards
>> Cathy
>>
>> On 14 January 2011 12:09, Markus Jelsma  wrote:
>> > Please visit the Nutch project. It is a powerful crawler and can
>> > integrate with Solr.
>> >
>> > http://nutch.apache.org/
>> >
>> > > Hi Solr users,
>> > >
>> > > I hope you can help.  We are migrating our intranet web site management
>> > > system to Windows 2008 and need a replacement for Index Server to do
>> > > the text searching.  I am trying to establish if Lucene and Solr is a
>> >
>> > feasible
>> >
>> > > replacement, but I cannot find the answers to these questions:
>> > >
>> > > 1. Can Solr be set up to recursively index a folder containing an
>> > > indeterminate and variable large number of subfolders, containing files
>> >
>> > of
>> >
>> > > all types:  XML, HTML, PDF, DOC, spreadsheets, powerpoint
>> > > presentations, text files etc.  If so, how?
>> > > 2. Can Solr be queried over the web and return a list of files that
>> > > match
>> >
>> > a
>> >
>> > > search query entered by a user, and also return the abstracts for these
>> > > files, as well as 'hit highlighting'.  If so, how?
>> > > 3. Can Solr be run as a service (like Index Server) that automatically
>> > > detects changes to the files within the indexed folder and updates the
>> > > index? If so, how?
>> > >
>> > > Thanks for your help
>> > >
>> > > Cathy Hemsley
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>



-- 
Lance Norskog
goks...@gmail.com


Re: Solr: using to index large folders recursively containing lots of different documents, and querying over the web

2011-01-14 Thread Markus Jelsma
Nutch can crawl the file system as well. Nutch 1.x can also provide search but 
this is delegated to Solr in Nutch 2.x. Solr can provide the search and Nutch 
can provide Solr with content from your intranet.

On Friday 14 January 2011 13:17:52 Cathy Hemsley wrote:
> Hi,
> Thanks for suggesting this.
> However, I'm not sure a 'crawler' will work:  as the various pages are not
> necessarily linked (it's complicated:  basically our intranet is a dynamic
> and managed collection of independantly published web sites, and users
> found information using categorisation and/or text searching), so we need
> something that will index all the files in a given folder, rather than
> follow links like a crawler. Can Nutch do this? As well as the other
> requirements below?
> Regards
> Cathy
> 
> On 14 January 2011 12:09, Markus Jelsma  wrote:
> > Please visit the Nutch project. It is a powerful crawler and can
> > integrate with Solr.
> > 
> > http://nutch.apache.org/
> > 
> > > Hi Solr users,
> > > 
> > > I hope you can help.  We are migrating our intranet web site management
> > > system to Windows 2008 and need a replacement for Index Server to do
> > > the text searching.  I am trying to establish if Lucene and Solr is a
> > 
> > feasible
> > 
> > > replacement, but I cannot find the answers to these questions:
> > > 
> > > 1. Can Solr be set up to recursively index a folder containing an
> > > indeterminate and variable large number of subfolders, containing files
> > 
> > of
> > 
> > > all types:  XML, HTML, PDF, DOC, spreadsheets, powerpoint
> > > presentations, text files etc.  If so, how?
> > > 2. Can Solr be queried over the web and return a list of files that
> > > match
> > 
> > a
> > 
> > > search query entered by a user, and also return the abstracts for these
> > > files, as well as 'hit highlighting'.  If so, how?
> > > 3. Can Solr be run as a service (like Index Server) that automatically
> > > detects changes to the files within the indexed folder and updates the
> > > index? If so, how?
> > > 
> > > Thanks for your help
> > > 
> > > Cathy Hemsley

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Solr: using to index large folders recursively containing lots of different documents, and querying over the web

2011-01-14 Thread Toke Eskildsen
On Fri, 2011-01-14 at 13:05 +0100, Cathy Hemsley wrote:
> I hope you can help.  We are migrating our intranet web site management
> system to Windows 2008 and need a replacement for Index Server to do the
> text searching.  I am trying to establish if Lucene and Solr is a feasible
> replacement, but I cannot find the answers to these questions:

The answers to your questions are yes and no to all of them. Solr does
not do what you ask out of the box, but it can certainly be done by
extending Solr or using it as at the core of another system.

Some time ago I stumbled upon http://www.constellio.com/ which seems to
be exactly what you're looking for.



Re: Solr: using to index large folders recursively containing lots of different documents, and querying over the web

2011-01-14 Thread Markus Jelsma
Please visit the Nutch project. It is a powerful crawler and can integrate 
with Solr.

http://nutch.apache.org/

> Hi Solr users,
> 
> I hope you can help.  We are migrating our intranet web site management
> system to Windows 2008 and need a replacement for Index Server to do the
> text searching.  I am trying to establish if Lucene and Solr is a feasible
> replacement, but I cannot find the answers to these questions:
> 
> 1. Can Solr be set up to recursively index a folder containing an
> indeterminate and variable large number of subfolders, containing files of
> all types:  XML, HTML, PDF, DOC, spreadsheets, powerpoint presentations,
> text files etc.  If so, how?
> 2. Can Solr be queried over the web and return a list of files that match a
> search query entered by a user, and also return the abstracts for these
> files, as well as 'hit highlighting'.  If so, how?
> 3. Can Solr be run as a service (like Index Server) that automatically
> detects changes to the files within the indexed folder and updates the
> index? If so, how?
> 
> Thanks for your help
> 
> Cathy Hemsley


Re: Solr using 1500 threads - is that normal?

2010-07-30 Thread Erick Erickson
Glad to help. Do be aware that there are several config values that
influence
the commit frequency, they might also be relevant.

Best
Erick

On Thu, Jul 29, 2010 at 5:11 AM, Christos Constantinou <
ch...@simpleweb.co.uk> wrote:

> Eric,
>
> Thank you very much for the indicators! I had a closer look at the commit
> intervals and it seems that the application is gradually increasing the
> commits to almost once per second after some time - something that was
> hidden in the massive amount of queries in the log file. I have changed the
> code to use commitWithin rather than commit and everything looks much better
> now. I believe that might have solved the problem so thanks again.
>
> Christos
>
> On 29 Jul 2010, at 01:44, Erick Erickson wrote:
>
> > Your commits are very suspect. How often are you making changes to your
> > index?
> > Do you have autocommit on? Do you commit when updating each document?
> > Committing
> > too often and consequently firing off warmup queries is the first place
> I'd
> > look. But I
> > agree with dc tech, 1,500 is wy more than I would expect.
> >
> > Best
> > Erick
> >
> >
> >
> > On Wed, Jul 28, 2010 at 6:53 AM, Christos Constantinou <
> > ch...@simpleweb.co.uk> wrote:
> >
> >> Hi,
> >>
> >> Solr seems to be crashing after a JVM exception that new threads cannot
> be
> >> created. I am writing in hope of advice from someone that has
> experienced
> >> this before. The exception that is causing the problem is:
> >>
> >> Exception in thread "btpool0-5" java.lang.OutOfMemoryError: unable to
> >> create new native thread
> >>
> >> The memory that is allocated to Solr is 3072MB, which should be enough
> >> memory for a ~6GB data set. The documents are not big either, they have
> >> around 10 fields of which only one stores large text ranging between
> 1k-50k.
> >>
> >> The top command at the time of the crash shows Solr using around 1500
> >> threads, which I assume it is not normal. Could it be that the threads
> are
> >> crashing one by one and new ones are created to cope with the queries?
> >>
> >> In the log file, right after the the exception, there are several
> thousand
> >> commits before the server stalls completely. Normally, the log file
> would
> >> report 20-30 document existence queries per second, then 1 commit per
> 5-30
> >> seconds, and some more infrequent faceted document searches on the data.
> >> However after the exception, there are only commits until the end of the
> log
> >> file.
> >>
> >> I am wondering if anyone has experienced this before or if it is some
> sort
> >> of known bug from Solr 1.4? Is there a way to increase the details of
> the
> >> exception in the logfile?
> >>
> >> I am attaching the output of a grep Exception command on the logfile.
> >>
> >> Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
> >> SEVERE: org.apache.solr.common.SolrException: Error opening new
> searcher.
> >> exceeded limit of maxWarmingSearchers=2, try again later.
> >> Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
> >> SEVERE: org.apache.solr.common.SolrException: Error opening new
> searcher.
> >> exceeded limit of maxWarmingSearchers=2, try again later.
> >> Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
> >> SEVERE: org.apache.solr.common.SolrException: Error opening new
> searcher.
> >> exceeded limit of maxWarmingSearchers=2, try again later.
> >> Jul 28, 2010 8:19:32 AM org.apache.solr.common.SolrException log
> >> SEVERE: org.apache.solr.common.SolrException: Error opening new
> searcher.
> >> exceeded limit of maxWarmingSearchers=2, try again later.
> >> Jul 28, 2010 8:20:18 AM org.apache.solr.common.SolrException log
> >> SEVERE: org.apache.solr.common.SolrException: Error opening new
> searcher.
> >> exceeded limit of maxWarmingSearchers=2, try again later.
> >> Jul 28, 2010 8:20:48 AM org.apache.solr.common.SolrException log
> >> SEVERE: org.apache.solr.common.SolrException: Error opening new
> searcher.
> >> exceeded limit of maxWarmingSearchers=2, try again later.
> >> Jul 28, 2010 8:22:43 AM org.apache.solr.common.SolrException log
> >> SEVERE: org.apache.solr.common.SolrException: Error opening new
> searcher.
> >> exceeded limit of maxWarmingSearchers=2, try again later.
> >> Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
> >> SEVERE: org.apache.solr.common.SolrException: Error opening new
> searcher.
> >> exceeded limit of maxWarmingSearchers=2, try again later.
> >> Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
> >> SEVERE: org.apache.solr.common.SolrException: Error opening new
> searcher.
> >> exceeded limit of maxWarmingSearchers=2, try again later.
> >> Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
> >> SEVERE: org.apache.solr.common.SolrException: Error opening new
> searcher.
> >> exceeded limit of maxWarmingSearchers=2, try again later.
> >> Jul 28, 2010 8:28:50 AM org.apache.solr.common.SolrException log
> >> SEVERE: org.apac

Re: Solr using 1500 threads - is that normal?

2010-07-29 Thread Christos Constantinou
Eric,

Thank you very much for the indicators! I had a closer look at the commit 
intervals and it seems that the application is gradually increasing the commits 
to almost once per second after some time - something that was hidden in the 
massive amount of queries in the log file. I have changed the code to use 
commitWithin rather than commit and everything looks much better now. I believe 
that might have solved the problem so thanks again.

Christos

On 29 Jul 2010, at 01:44, Erick Erickson wrote:

> Your commits are very suspect. How often are you making changes to your
> index?
> Do you have autocommit on? Do you commit when updating each document?
> Committing
> too often and consequently firing off warmup queries is the first place I'd
> look. But I
> agree with dc tech, 1,500 is wy more than I would expect.
> 
> Best
> Erick
> 
> 
> 
> On Wed, Jul 28, 2010 at 6:53 AM, Christos Constantinou <
> ch...@simpleweb.co.uk> wrote:
> 
>> Hi,
>> 
>> Solr seems to be crashing after a JVM exception that new threads cannot be
>> created. I am writing in hope of advice from someone that has experienced
>> this before. The exception that is causing the problem is:
>> 
>> Exception in thread "btpool0-5" java.lang.OutOfMemoryError: unable to
>> create new native thread
>> 
>> The memory that is allocated to Solr is 3072MB, which should be enough
>> memory for a ~6GB data set. The documents are not big either, they have
>> around 10 fields of which only one stores large text ranging between 1k-50k.
>> 
>> The top command at the time of the crash shows Solr using around 1500
>> threads, which I assume it is not normal. Could it be that the threads are
>> crashing one by one and new ones are created to cope with the queries?
>> 
>> In the log file, right after the the exception, there are several thousand
>> commits before the server stalls completely. Normally, the log file would
>> report 20-30 document existence queries per second, then 1 commit per 5-30
>> seconds, and some more infrequent faceted document searches on the data.
>> However after the exception, there are only commits until the end of the log
>> file.
>> 
>> I am wondering if anyone has experienced this before or if it is some sort
>> of known bug from Solr 1.4? Is there a way to increase the details of the
>> exception in the logfile?
>> 
>> I am attaching the output of a grep Exception command on the logfile.
>> 
>> Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
>> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
>> exceeded limit of maxWarmingSearchers=2, try again later.
>> Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
>> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
>> exceeded limit of maxWarmingSearchers=2, try again later.
>> Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
>> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
>> exceeded limit of maxWarmingSearchers=2, try again later.
>> Jul 28, 2010 8:19:32 AM org.apache.solr.common.SolrException log
>> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
>> exceeded limit of maxWarmingSearchers=2, try again later.
>> Jul 28, 2010 8:20:18 AM org.apache.solr.common.SolrException log
>> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
>> exceeded limit of maxWarmingSearchers=2, try again later.
>> Jul 28, 2010 8:20:48 AM org.apache.solr.common.SolrException log
>> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
>> exceeded limit of maxWarmingSearchers=2, try again later.
>> Jul 28, 2010 8:22:43 AM org.apache.solr.common.SolrException log
>> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
>> exceeded limit of maxWarmingSearchers=2, try again later.
>> Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
>> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
>> exceeded limit of maxWarmingSearchers=2, try again later.
>> Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
>> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
>> exceeded limit of maxWarmingSearchers=2, try again later.
>> Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
>> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
>> exceeded limit of maxWarmingSearchers=2, try again later.
>> Jul 28, 2010 8:28:50 AM org.apache.solr.common.SolrException log
>> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
>> exceeded limit of maxWarmingSearchers=2, try again later.
>> Jul 28, 2010 8:33:19 AM org.apache.solr.common.SolrException log
>> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
>> exceeded limit of maxWarmingSearchers=2, try again later.
>> Jul 28, 2010 8:35:08 AM org.apache.solr.common.SolrException log
>> SEVERE: org.apache.solr.common.SolrE

Re: Solr using 1500 threads - is that normal?

2010-07-28 Thread Erick Erickson
Your commits are very suspect. How often are you making changes to your
index?
Do you have autocommit on? Do you commit when updating each document?
Committing
too often and consequently firing off warmup queries is the first place I'd
look. But I
agree with dc tech, 1,500 is wy more than I would expect.

Best
Erick



On Wed, Jul 28, 2010 at 6:53 AM, Christos Constantinou <
ch...@simpleweb.co.uk> wrote:

> Hi,
>
> Solr seems to be crashing after a JVM exception that new threads cannot be
> created. I am writing in hope of advice from someone that has experienced
> this before. The exception that is causing the problem is:
>
> Exception in thread "btpool0-5" java.lang.OutOfMemoryError: unable to
> create new native thread
>
> The memory that is allocated to Solr is 3072MB, which should be enough
> memory for a ~6GB data set. The documents are not big either, they have
> around 10 fields of which only one stores large text ranging between 1k-50k.
>
> The top command at the time of the crash shows Solr using around 1500
> threads, which I assume it is not normal. Could it be that the threads are
> crashing one by one and new ones are created to cope with the queries?
>
> In the log file, right after the the exception, there are several thousand
> commits before the server stalls completely. Normally, the log file would
> report 20-30 document existence queries per second, then 1 commit per 5-30
> seconds, and some more infrequent faceted document searches on the data.
> However after the exception, there are only commits until the end of the log
> file.
>
> I am wondering if anyone has experienced this before or if it is some sort
> of known bug from Solr 1.4? Is there a way to increase the details of the
> exception in the logfile?
>
> I am attaching the output of a grep Exception command on the logfile.
>
> Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:19:32 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:20:18 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:20:48 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:22:43 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:28:50 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:33:19 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:35:08 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:35:58 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:35:59 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:44:31 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException

Re: Solr using 1500 threads - is that normal?

2010-07-28 Thread dc tech
1,500 threads seems extreme by any standards so there is something
happening in your install. Even with appservers for web apps,
typically 100 would be a fair # of threads.


On 7/28/10, Christos Constantinou  wrote:
> Hi,
>
> Solr seems to be crashing after a JVM exception that new threads cannot be
> created. I am writing in hope of advice from someone that has experienced
> this before. The exception that is causing the problem is:
>
> Exception in thread "btpool0-5" java.lang.OutOfMemoryError: unable to create
> new native thread
>
> The memory that is allocated to Solr is 3072MB, which should be enough
> memory for a ~6GB data set. The documents are not big either, they have
> around 10 fields of which only one stores large text ranging between 1k-50k.
>
> The top command at the time of the crash shows Solr using around 1500
> threads, which I assume it is not normal. Could it be that the threads are
> crashing one by one and new ones are created to cope with the queries?
>
> In the log file, right after the the exception, there are several thousand
> commits before the server stalls completely. Normally, the log file would
> report 20-30 document existence queries per second, then 1 commit per 5-30
> seconds, and some more infrequent faceted document searches on the data.
> However after the exception, there are only commits until the end of the log
> file.
>
> I am wondering if anyone has experienced this before or if it is some sort
> of known bug from Solr 1.4? Is there a way to increase the details of the
> exception in the logfile?
>
> I am attaching the output of a grep Exception command on the logfile.
>
> Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:19:31 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:19:32 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:20:18 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:20:48 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:22:43 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:27:53 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:28:50 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:33:19 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:35:08 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:35:58 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:35:59 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:44:31 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded limit of maxWarmingSearchers=2, try again later.
> Jul 28, 2010 8:51:49 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrEx

RE: Solr Using

2008-09-24 Thread Lance Norskog
Do these JSP pages compile under another servlet container?

If the JSP pages have Java .15 or Java 1.6 syntax features, they will not
compile under Jboss 4.0.2. The jboss 4.0.2 jsp compiler does the Java 1.4
language. I ran into this problem moving from a new tomcat to an older
jboss.

-Original Message-
From: Dinesh Gupta [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, September 23, 2008 11:54 PM
To: solr-user@lucene.apache.org
Subject: Solr Using


Which version of tomcat required.

I installed  jboss4.0.2 which have tomcat5.5.9.

JSP pages are not going to compile.

Its giving syntax error.

Please help.

I can't move from jboss4.0.2.

Please help.

Regards,
Dinesh Gupta

> Date: Tue, 23 Sep 2008 19:36:22 +0530
> From: [EMAIL PROTECTED]
> To: solr-user@lucene.apache.org
> Subject: Re: Lucene index
> 
> Hi Dinesh,
> 
> This seems straightforward for Solr. You can use the embedded jetty 
> server for a start. Look at the tutorial on how to get started.
> 
> You'll need to modify the schema.xml to define all the fields that you 
> want to index. The wiki page at http://wiki.apache.org/solr/SchemaXml 
> is a good start on how to do that. Each field in your code will have a 
> counterpart in the schema.xml with appropriate flags 
> (indexed/stored/tokenized etc.)
> 
> Once that is complete, try to modify the DataImportHandler's hsqldb 
> example for your mysql database.
> 
> On Tue, Sep 23, 2008 at 7:01 PM, Dinesh Gupta
<[EMAIL PROTECTED]>wrote:
> 
> >
> > Hi Shalin Shekhar,
> >
> > Let me explain my issue.
> >
> > I have some tables in my database like
> >
> > Product
> > Category
> > Catalogue
> > Keywords
> > Seller
> > Brand
> > Country_city_group
> > etc.
> > I have a class that represent  product document as
> >
> > Document doc = new Document();
> >// Keywords which can be used directly for search
> >doc.add(new Field("id",(String) 
> > data.get("PRN"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> >
> >// Sorting fields]
> >String priceString = (String) data.get("Price");
> >if (priceString == null)
> >priceString = "0";
> >long price = 0;
> >try {
> >price = (long) Double.parseDouble(priceString);
> >} catch (Exception e) {
> >
> >}
> >
> >doc.add(new
> >
Field("prc",NumberUtils.pad(price),Field.Store.YES,Field.Index.UN_TOKENIZED)
);
> >Date createDate = (Date) data.get("CreateDate");
> >if (createDate == null) createDate = new Date();
> >
> >doc.add(new Field("cdt",String.valueOf(createDate.getTime()),
> > Field.Store.NO,Field.Index.UN_TOKENIZED));
> >
> >Date modiDate = (Date) data.get("ModiDate");
> >if (modiDate == null) modiDate = new Date();
> >
> >doc.add(new Field("mdt",String.valueOf(modiDate.getTime()),
> > Field.Store.NO,Field.Index.UN_TOKENIZED));
> >//doc.add(Field.UnStored("cdt", 
> > String.valueOf(createDate.getTime(;
> >
> >// Additional fields for search
> >doc.add(new Field("bnm",(String) 
> > data.get("Brand"),Field.Store.YES,Field.Index.TOKENIZED));
> >doc.add(new Field("bnm1",(String) 
> > data.get("Brand1"),Field.Store.NO ,Field.Index.UN_TOKENIZED));
> >//doc.add(Field.Text("bnm", (String) data.get("Brand"))); 
> > //Tokenized and Unstored
> >doc.add(new Field("bid",(String) 
> > data.get("BrandId"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> >//doc.add(Field.Keyword("bid", (String) 
> > data.get("BrandId"))); // untokenized &
> >doc.add(new Field("grp",(String) 
> > data.get("Group"),Field.Store.NO ,Field.Index.TOKENIZED));
> >//doc.add(Field.Text("grp", (String) data.get("Group")));
> >doc.add(new Field("gid",(String) 
> > data.get("GroupId"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> >//doc.add(Field.Keyword("gid", (String) data.get("GroupId")));
//New
> >doc.add(new Field("snm",(String) 
> > data.get("Seller"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> >//doc.add(Field.Text("snm", (String) data.get("Seller")));
> >doc.add(new Field("sid",(String) 
> > data.get("SellerId"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> >//doc.add(Field.Keyword("sid", (String) 
> > data.get("SellerId"))); // New
> >doc.add(new Field("ttl",(String) 
> > data.get("Title"),Field.Store.YES,Field.Index.TOKENIZED));
> >//doc.add(Field.UnStored("ttl", (String) data.get("Title"), 
> > true));
> >
> >String title1 = (String) data.get("Title");
> >title1 = removeSpaces(title1);
> >doc.add(new Field("ttl1",title1,Field.Store.NO
> > ,Field.Index.UN_TOKENIZED));
> >
> >doc.add(new Field("ttl2",title1,Field.Store.NO
> > ,Field.Index.TOKENIZED));
> >//doc.add(Field.UnStored("ttl", (String) data.get("Title"), 
> > true));
> >
> >// ColumnC - Product Sequence
> >String productSeq = (String) data.get("ProductSeq");
> >if (productSeq == null) productSe

Re: Solr Using

2008-09-24 Thread Shalin Shekhar Mangar
What is the syntax error? Which JSP? Please give the stack trace too.

On Wed, Sep 24, 2008 at 12:24 PM, Dinesh Gupta
<[EMAIL PROTECTED]>wrote:

>
> Which version of tomcat required.
>
> I installed  jboss4.0.2 which have tomcat5.5.9.
>
> JSP pages are not going to compile.
>
> Its giving syntax error.
>
> Please help.
>
> I can't move from jboss4.0.2.
>
> Please help.
>
> Regards,
> Dinesh Gupta
>
> > Date: Tue, 23 Sep 2008 19:36:22 +0530
> > From: [EMAIL PROTECTED]
> > To: solr-user@lucene.apache.org
> > Subject: Re: Lucene index
> >
> > Hi Dinesh,
> >
> > This seems straightforward for Solr. You can use the embedded jetty
> server
> > for a start. Look at the tutorial on how to get started.
> >
> > You'll need to modify the schema.xml to define all the fields that you
> want
> > to index. The wiki page at http://wiki.apache.org/solr/SchemaXml is a
> good
> > start on how to do that. Each field in your code will have a counterpart
> in
> > the schema.xml with appropriate flags (indexed/stored/tokenized etc.)
> >
> > Once that is complete, try to modify the DataImportHandler's hsqldb
> example
> > for your mysql database.
> >
> > On Tue, Sep 23, 2008 at 7:01 PM, Dinesh Gupta <
> [EMAIL PROTECTED]>wrote:
> >
> > >
> > > Hi Shalin Shekhar,
> > >
> > > Let me explain my issue.
> > >
> > > I have some tables in my database like
> > >
> > > Product
> > > Category
> > > Catalogue
> > > Keywords
> > > Seller
> > > Brand
> > > Country_city_group
> > > etc.
> > > I have a class that represent  product document as
> > >
> > > Document doc = new Document();
> > >// Keywords which can be used directly for search
> > >doc.add(new Field("id",(String)
> > > data.get("PRN"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> > >
> > >// Sorting fields]
> > >String priceString = (String) data.get("Price");
> > >if (priceString == null)
> > >priceString = "0";
> > >long price = 0;
> > >try {
> > >price = (long) Double.parseDouble(priceString);
> > >} catch (Exception e) {
> > >
> > >}
> > >
> > >doc.add(new
> > >
> Field("prc",NumberUtils.pad(price),Field.Store.YES,Field.Index.UN_TOKENIZED));
> > >Date createDate = (Date) data.get("CreateDate");
> > >if (createDate == null) createDate = new Date();
> > >
> > >doc.add(new Field("cdt",String.valueOf(createDate.getTime()),
> > > Field.Store.NO,Field.Index.UN_TOKENIZED));
> > >
> > >Date modiDate = (Date) data.get("ModiDate");
> > >if (modiDate == null) modiDate = new Date();
> > >
> > >doc.add(new Field("mdt",String.valueOf(modiDate.getTime()),
> > > Field.Store.NO,Field.Index.UN_TOKENIZED));
> > >//doc.add(Field.UnStored("cdt",
> > > String.valueOf(createDate.getTime(;
> > >
> > >// Additional fields for search
> > >doc.add(new Field("bnm",(String)
> > > data.get("Brand"),Field.Store.YES,Field.Index.TOKENIZED));
> > >doc.add(new Field("bnm1",(String) data.get("Brand1"),
> Field.Store.NO
> > > ,Field.Index.UN_TOKENIZED));
> > >//doc.add(Field.Text("bnm", (String) data.get("Brand")));
> > > //Tokenized and Unstored
> > >doc.add(new Field("bid",(String)
> > > data.get("BrandId"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> > >//doc.add(Field.Keyword("bid", (String) data.get("BrandId")));
> //
> > > untokenized &
> > >doc.add(new Field("grp",(String) data.get("Group"),
> Field.Store.NO
> > > ,Field.Index.TOKENIZED));
> > >//doc.add(Field.Text("grp", (String) data.get("Group")));
> > >doc.add(new Field("gid",(String)
> > > data.get("GroupId"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> > >//doc.add(Field.Keyword("gid", (String) data.get("GroupId")));
> //New
> > >doc.add(new Field("snm",(String)
> > > data.get("Seller"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> > >//doc.add(Field.Text("snm", (String) data.get("Seller")));
> > >doc.add(new Field("sid",(String)
> > > data.get("SellerId"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> > >//doc.add(Field.Keyword("sid", (String) data.get("SellerId")));
> //
> > > New
> > >doc.add(new Field("ttl",(String)
> > > data.get("Title"),Field.Store.YES,Field.Index.TOKENIZED));
> > >//doc.add(Field.UnStored("ttl", (String) data.get("Title"),
> true));
> > >
> > >String title1 = (String) data.get("Title");
> > >title1 = removeSpaces(title1);
> > >doc.add(new Field("ttl1",title1,Field.Store.NO
> > > ,Field.Index.UN_TOKENIZED));
> > >
> > >doc.add(new Field("ttl2",title1,Field.Store.NO
> > > ,Field.Index.TOKENIZED));
> > >//doc.add(Field.UnStored("ttl", (String) data.get("Title"),
> true));
> > >
> > >// ColumnC - Product Sequence
> > >String productSeq = (String) data.get("ProductSeq");
> > >if (productSeq == null) productSeq = "";
> > >doc.add(new Field("seq",productSeq,Field.Store.NO
> >

Re: Solr Using

2008-09-23 Thread Shalin Shekhar Mangar
Hi Dinesh,

Your code is hardly useful to us since we don't know what you are trying to
achieve or what all those Dao classes do.

Look at the Solr tutorial first -- http://lucene.apache.org/solr/
Use the SolrJ client for communicating with Solr server --
http://wiki.apache.org/solr/Solrj
Also take a look at DataImportHandler which can help avoid all this code --
http://wiki.apache.org/solr/DataImportHandler

If you face any problem, first search this mailing list through markmail.orgor
nabble.com to find previous posts related to your issue. If you don't find
anything helpful, post specific questions here which we will help answer.

On Tue, Sep 23, 2008 at 3:56 PM, Dinesh Gupta <[EMAIL PROTECTED]>wrote:

>
>
>
>
>
> Hi Otis,
>
> Currently I am creating indexes from Java standalone program.
>
> I am preparing data by using query & have made data to index.
>
> Function as blow can we write.
>
> I have large number of product & we want to user it at production level.
>
> Please provide me sample or tutorials.
>
>
> /**
> *
> *
> * @param pbi
> * @throws DAOException
> */
>protected Document prepareLuceneDocument(Ismpbi pbi) throws DAOException
> {
>long start = System.currentTimeMillis();
>Long prn = pbi.getPbirfnum();
>if (!isValidProduct(pbi)) {
>if(logger.isDebugEnabled())
>logger.debug("Product Discarded" + prn+ " not a valid
> product. ");
>discarded++;
>return null;
>}
>
>IsmpptDAO pptDao = new IsmpptDAO();
>Set categoryList = new HashSet(pptDao.findByProductCategories(prn));
>
>Iterator iter = categoryList.iterator();
>Set directCategories = new HashSet();
>while (iter.hasNext()) {
>Object[] obj = (Object[]) iter.next();
>Long categoryId = (Long) obj[0];
>String categoryName = (String) obj[1];
>directCategories.add(new CategoryRecord(categoryId,
> categoryName));
>}
>
>if (directCategories.size() == 0) {
>if(logger.isDebugEnabled())
>logger.debug("Product Discarded" + prn
>+ " not placed in any category directly [ismppt].");
>discarded++;
>return null;
>}
>
>// Get all the categories for the direct categories - contains
>// CategoryRecord objects
>Set categories = getCategories(directCategories, prn);
>Set categoryIds = new HashSet(); // All category ids
>
>Iterator it = categories.iterator();
>while (it.hasNext()) {
>CategoryRecord rec = (CategoryRecord) it.next();
>categoryIds.add(rec.getId());
>}
>
>//All categories so far TOTAL (direct+parent categories)
>if (categoryIds.size() == 0) {
>if(logger.isDebugEnabled())
>logger.debug("Product Discarded" + prn+ " direct categories
> are not placed under other categories.");
>discarded++;
>return null;
>}
>
>Set catalogues = getCatalogues(prn);
>if (catalogues.size()!=0){
>if(logger.isDebugEnabled())
>logger.debug("[" + prn + "]-> Total Direct PCC Catalogues ["
> + collectionToStringNew(catalogues) +"]");
>}
>
>getCatalogueWithAllChildInCCR(prn, categoryIds, catalogues);
>if (catalogues.size() == 0) {
>if(logger.isDebugEnabled())
>logger.debug("Product Discarded " + prn+ " not attached with
> any catalogue");
>discarded++;
>return null;
>}
>
>String productDirectCategories =
> collectionToString(directCategories);
>String productAllCategories = collectionToString(categories);
>String productAllCatalogues = collectionToStringNew(catalogues);
>
>String categoryNames = getCategoryNames(categories);
>
>if(logger.isInfoEnabled())
>logger.info("TO Document Product " + pbi.getPbirfnum() + " Dir
> Categories " +
>  productDirectCategories + " All Categories "
>+ productAllCategories + " And Catalogues "
>+ productAllCatalogues);
>
>directCategories = null;
>categories=null;
>catalogues=null;
>
>
>Document document = new ProductDocument().toDocument(pbi,
>productAllCategories, productAllCatalogues,
>    productDirectC

RE: Solr Using

2008-09-23 Thread Dinesh Gupta





Hi Otis,

Currently I am creating indexes from Java standalone program. 

I am preparing data by using query & have made data to index.

Function as blow can we write.

I have large number of product & we want to user it at production level.

Please provide me sample or tutorials.


/**
 * 
 * 
 * @param pbi
 * @throws DAOException
 */
protected Document prepareLuceneDocument(Ismpbi pbi) throws DAOException {
long start = System.currentTimeMillis();
Long prn = pbi.getPbirfnum();
if (!isValidProduct(pbi)) {
if(logger.isDebugEnabled())
logger.debug("Product Discarded" + prn+ " not a valid product. 
");
discarded++;
return null;
}

IsmpptDAO pptDao = new IsmpptDAO();
Set categoryList = new HashSet(pptDao.findByProductCategories(prn));

Iterator iter = categoryList.iterator();
Set directCategories = new HashSet();
while (iter.hasNext()) {
Object[] obj = (Object[]) iter.next();
Long categoryId = (Long) obj[0];
String categoryName = (String) obj[1];
directCategories.add(new CategoryRecord(categoryId, categoryName));
}

if (directCategories.size() == 0) {
if(logger.isDebugEnabled())
logger.debug("Product Discarded" + prn
+ " not placed in any category directly [ismppt].");
discarded++;
return null;
}

// Get all the categories for the direct categories - contains
// CategoryRecord objects
Set categories = getCategories(directCategories, prn);
Set categoryIds = new HashSet(); // All category ids

Iterator it = categories.iterator();
while (it.hasNext()) {
CategoryRecord rec = (CategoryRecord) it.next();
categoryIds.add(rec.getId());
}

//All categories so far TOTAL (direct+parent categories)
if (categoryIds.size() == 0) {
if(logger.isDebugEnabled())
logger.debug("Product Discarded" + prn+ " direct categories are 
not placed under other categories.");
discarded++;
return null;
}

Set catalogues = getCatalogues(prn);
if (catalogues.size()!=0){
if(logger.isDebugEnabled())
logger.debug("[" + prn + "]-> Total Direct PCC Catalogues [" + 
collectionToStringNew(catalogues) +"]");
}

getCatalogueWithAllChildInCCR(prn, categoryIds, catalogues);
if (catalogues.size() == 0) {
if(logger.isDebugEnabled())
logger.debug("Product Discarded " + prn+ " not attached with 
any catalogue");
discarded++;
return null;
}

String productDirectCategories = collectionToString(directCategories);
String productAllCategories = collectionToString(categories);
String productAllCatalogues = collectionToStringNew(catalogues);

String categoryNames = getCategoryNames(categories);

if(logger.isInfoEnabled())
logger.info("TO Document Product " + pbi.getPbirfnum() + " Dir 
Categories " +
  productDirectCategories + " All Categories "
+ productAllCategories + " And Catalogues "
+ productAllCatalogues);

directCategories = null;
categories=null;
catalogues=null;


Document document = new ProductDocument().toDocument(pbi,
productAllCategories, productAllCatalogues,
productDirectCategories, categoryNames);

categoryNames =null;
pbi=null;
productAllCatalogues =null;
productAllCategories =null;
productDirectCategories=null;
categoryNames=null;

long time = System.currentTimeMillis() - start;
if (time > longestIndexTime) {
    longestIndexTime = time;
}
return document;
}



> Date: Mon, 22 Sep 2008 22:10:16 -0700
> From: [EMAIL PROTECTED]
> Subject: Re: Solr Using
> To: solr-user@lucene.apache.org
> 
> Dinesh,
> 
> Please have a look at the Solr tutorial first.
> Then have a look at the new DataImportHandler - there is a very detailed page 
> about it on the Wiki.
> 
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
> > From: Dinesh Gupta <[EMAIL PROTECTED]>
> > To: solr-user@lucene.apache.org
> > Sent: Tuesday, September 23, 2008 1:02:34 AM
> > Subject: Solr Using
> > 
> > 
> > 
> > Hi All,
> > 
>

Re: Solr Using

2008-09-22 Thread Otis Gospodnetic
Dinesh,

Please have a look at the Solr tutorial first.
Then have a look at the new DataImportHandler - there is a very detailed page 
about it on the Wiki.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Dinesh Gupta <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, September 23, 2008 1:02:34 AM
> Subject: Solr Using
> 
> 
> 
> Hi All,
> 
> I am new to Solr. I am using Lucene last 2 years.
> 
> We create Lucene indexes for database.
> 
> Please help to migrate to Solr.
> 
> How can achieve this.
> 
> If any one have idea, please help.
> 
> Thanks In Advance.
> 
> 
> Regards,
> Dinesh Gupta
> 
> _
> Search for videos of Bollywood, Hollywood, Mollywood and every other wood, 
> only 
> on Live.com 
> http://www.live.com/?scope=video&form=MICOAL



Re: SOLR using sort but not sorting

2007-07-10 Thread Chris Hostetter

: 

: now I add sort=last-name asc
:
: this DOES change the ordering of the docs but it's not exactly
: alphabetically.

Lucene sorting can't work on a field with more then one indexed term per
document, this was breifly covered in the "sort" param docs, but i have
beefed up the info

http://wiki.apache.org/solr/CommonQueryParameters#sort

...if you have any ideas for improving the docs, please feel free to edit
the wiki.


-Hoss