Re: Availability Issues

2008-10-07 Thread sunnyfr

Hi Matthew,

Can you tell me what you mean by posting updates to every machine,
you mean snapshot files or index directory copy  

thanks a lot Matthew,
Wish you a nice day,


Matthew Runo wrote:
> 
> The way I'd do it would be to buy more servers, set up Tomcat on  
> each, and get SOLR replicating from your current machine to the  
> others. Then, throw them all behind a load balancer, and there you go.
> 
> You could also post your updates to every machine. Then you don't  
> need to worry about getting replication running.
> 
> ++
>   | Matthew Runo
>   | Zappos Development
>   | [EMAIL PROTECTED]
>   | 702-943-7833
> ++
> 
> 
> On Oct 9, 2007, at 7:12 AM, David Whalen wrote:
> 
>> All:
>>
>> How can I break up my install onto more than one box?  We've
>> hit a learning curve here and we don't understand how best to
>> proceed.  Right now we have everything crammed onto one box
>> because we don't know any better.
>>
>> So, how would you build it if you could?  Here are the specs:
>>
>> a) the index needs to hold at least 25 million articles
>> b) the index is constantly updated at a rate of 10,000 articles
>> per minute
>> c) we need to have faceted queries
>>
>> Again, real-world experience is preferred here over book knowledge.
>> We've tried to read the docs and it's only made us more confused.
>>
>> TIA
>>
>> Dave W
>>
>>
>>> -Original Message-
>>> From: Yonik Seeley [mailto:[EMAIL PROTECTED]
>>> Sent: Monday, October 08, 2007 3:42 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Availability Issues
>>>
>>> On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:
>>>>> Do you see any requests that took a really long time to finish?
>>>>
>>>> The requests that take a long time to finish are just
>>> simple queries.
>>>> And the same queries run at a later time come back much faster.
>>>>
>>>> Our logs contain 99% inserts and 1% queries.  We are
>>> constantly adding
>>>> documents to the index at a rate of 10,000 per minute, so the logs
>>>> show mostly that.
>>>
>>> Oh, so you are using the same boxes for updating and querying?
>>> When you insert, are you using multiple threads?  If so, how many?
>>>
>>> What is the full URL of those slow query requests?
>>> Do the slow requests start after a commit?
>>>
>>>>> Start with the thread dump.
>>>>> I bet it's multiple queries piling up around some synchronization
>>>>> points in lucene (sometimes caused by multiple threads generating
>>>>> the same big filter that isn't yet cached).
>>>>
>>>> What would be my next steps after that?  I'm not sure I'd
>>> understand
>>>> enough from the dump to make heads-or-tails of it.  Can I
>>> share that
>>>> here?
>>>
>>> Yes, post it here.  Most likely a majority of the threads
>>> will be blocked somewhere deep in lucene code, and you will
>>> probably need help from people here to figure it out.
>>>
>>> -Yonik
>>>
>>>
>>
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Availability-Issues-tp13102075p19852301.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Availability Issues

2008-10-06 Thread sunnyfr

Hi Matthew,

What do you mean by post your updates ?
Does that mean that you just scp, copy data directory by cron job without
using automatic replication.
Because really since, I started to turn on autoCommit snapshooter, it does
slow down and mess up a bit everything.

Did you have had the same problem?
Thanks a lot,


Matthew Runo wrote:
> 
> The way I'd do it would be to buy more servers, set up Tomcat on  
> each, and get SOLR replicating from your current machine to the  
> others. Then, throw them all behind a load balancer, and there you go.
> 
> You could also post your updates to every machine. Then you don't  
> need to worry about getting replication running.
> 
> ++
>   | Matthew Runo
>   | Zappos Development
>   | [EMAIL PROTECTED]
>   | 702-943-7833
> ++
> 
> 
> On Oct 9, 2007, at 7:12 AM, David Whalen wrote:
> 
>> All:
>>
>> How can I break up my install onto more than one box?  We've
>> hit a learning curve here and we don't understand how best to
>> proceed.  Right now we have everything crammed onto one box
>> because we don't know any better.
>>
>> So, how would you build it if you could?  Here are the specs:
>>
>> a) the index needs to hold at least 25 million articles
>> b) the index is constantly updated at a rate of 10,000 articles
>> per minute
>> c) we need to have faceted queries
>>
>> Again, real-world experience is preferred here over book knowledge.
>> We've tried to read the docs and it's only made us more confused.
>>
>> TIA
>>
>> Dave W
>>
>>
>>> -Original Message-
>>> From: Yonik Seeley [mailto:[EMAIL PROTECTED]
>>> Sent: Monday, October 08, 2007 3:42 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Availability Issues
>>>
>>> On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:
>>>>> Do you see any requests that took a really long time to finish?
>>>>
>>>> The requests that take a long time to finish are just
>>> simple queries.
>>>> And the same queries run at a later time come back much faster.
>>>>
>>>> Our logs contain 99% inserts and 1% queries.  We are
>>> constantly adding
>>>> documents to the index at a rate of 10,000 per minute, so the logs
>>>> show mostly that.
>>>
>>> Oh, so you are using the same boxes for updating and querying?
>>> When you insert, are you using multiple threads?  If so, how many?
>>>
>>> What is the full URL of those slow query requests?
>>> Do the slow requests start after a commit?
>>>
>>>>> Start with the thread dump.
>>>>> I bet it's multiple queries piling up around some synchronization
>>>>> points in lucene (sometimes caused by multiple threads generating
>>>>> the same big filter that isn't yet cached).
>>>>
>>>> What would be my next steps after that?  I'm not sure I'd
>>> understand
>>>> enough from the dump to make heads-or-tails of it.  Can I
>>> share that
>>>> here?
>>>
>>> Yes, post it here.  Most likely a majority of the threads
>>> will be blocked somewhere deep in lucene code, and you will
>>> probably need help from people here to figure it out.
>>>
>>> -Yonik
>>>
>>>
>>
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Availability-Issues-tp13102075p19835109.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Availability Issues

2007-10-11 Thread Norberto Meijome
On Tue, 9 Oct 2007 10:12:51 -0400
"David Whalen" <[EMAIL PROTECTED]> wrote:

> So, how would you build it if you could?  Here are the specs:
> 
> a) the index needs to hold at least 25 million articles
> b) the index is constantly updated at a rate of 10,000 articles
> per minute
> c) we need to have faceted queries

Hi David,
Others with more experience than I have given you good answers , so I won't go
there

One thing you want to consider when you have lots of ongoing updates is,
how fast do you want your latest changes to show up in your results. 

Yes, everyone wants the latest to be live the second it hits the index, but 
balancing that act with having a responsive search within certain budget (and
architectural, maybe ? ) constrains isn't always that easy. 

In all seriousness, not everyone is  in a situation where every one of their
users would really need (or benefit hugely) from having each of the 200 docs
posted in the last second come up the ms. they hit "Search". Can they tell if
it was posted within the last 3, 5 or 10 minutes?

I think that tuning the  values for cache warming should yield some good
results. Your probably don't want to have all your searches held until your
cache fully warms...or have to warm too often.

I was thinking that you could even split your indexes, have the latest entries
on smaller, faster index,and the rest of your 25M in another index which gets
updated , say, hourly. But if you have 10K updates (not new docs, but
changes),then maybe the idea of splitting the index is not that useful...

anyway, there many ways to skin a cat :)

good luck,
B
_
{Beto|Norberto|Numard} Meijome

"Everything is interesting if you go into it deeply enough"
  Richard Feynman

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Availability Issues

2007-10-10 Thread Otis Gospodnetic
Hi,

- Original Message 
From: David Whalen <[EMAIL PROTECTED]>

On that note -- I've read that Jetty isn't the best servlet
container to use in these situations, is that your experience?

OG: In which situations?  Jetty is great, actually! (the pretty high traffic 
site in my sig runs Jetty)

Otis 

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share



> -Original Message-
> From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 08, 2007 11:20 PM
> To: solr-user
> Subject: RE: Availability Issues
> 
> 
> : My logs don't look anything like that.  They look like HTTP
> : requests.  Am I looking in the wrong place?
> 
> what servlet container are you using?  
> 
> every servlet container handles applications logs differently 
> -- it's especially tricky becuse even the format can be 
> changed, the examples i gave before are in the default format 
> you get if you use the jetty setup in the solr example (which 
> logs to stdout), but many servlet containers won't include 
> that much detail by default (they typically leave out the 
> classname and method name).  there's also typically a setting 
> that controls the verbosity -- so in some configurations only 
> the SEVERE messages are logged and in others the INFO 
> messages are logged ... you're going to want at least the 
> INFO level to debug stuff.
> 
> grep all the log files you can find for "Solr home set to" 
> ... that's one of the first messages Solr logs.  if you can 
> find that, you'll find the other messages i was talking about.
> 
> 
> -Hoss
> 
> 
> 





RE: Availability Issues

2007-10-09 Thread Chris Hostetter

: We're using Jetty also, so I get the sense I'm looking at the
: wrong log file.

if you are using the jetty configs that comes in the solr downloads, it 
writes all of the solr log messages to stdout (ie: when you run it on the 
commandline, the messages come to your terminal).  i don't know off the 
top of my head how to configure Jetty to log application log messages to a 
specific file ... there may be jetty specific config options ofr 
controlling this, or jetty may expect you to explicitly set the system 
properties that tell the JVM default log manager what you wnat it to do...

http://java.sun.com/j2se/1.5.0/docs/guide/logging/overview.html

: On that note -- I've read that Jetty isn't the best servlet
: container to use in these situations, is that your experience?

i can't make any specific recommendations ... i use Resin because someone 
else at my work did some research and decided it's worth paying for.  From 
what i've seen tomcat seems easier to configure then jetty and i had an 
easier time understanding it's docs, but i've never done any performance 
tests.



-Hoss



Re: Availability Issues

2007-10-09 Thread Matthew Runo
When we are doing a reindex (1x a day), we post around 150-200  
documents per second, on average. Our index is not as large though,  
about 200k docs. During this import, the search service (with faceted  
page navigation) remains available for front-end searches and  
performance does not noticeably change. You can see this install  
running at http://www.6pm.com, where SOLR is in use for every part of  
the navigation and search.


I believe that a sustained load of 150+ posts per second is very  
possible. At that load though, it does make sense to consider  
multiple machines.


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Oct 9, 2007, at 10:16 AM, Charles Hornberger wrote:


I'm about to do a prototype deployment of Solr for a pretty
high-volume site, and I've been following this thread with some
interest.

One thing I want to confirm: It's really possible for Solr to handle a
constant stream of 10K updates/min (>150 updates/sec) to a
25M-document index? I new Solr and Lucene were good, but that seems
like a pretty tall order. From the responses I'm seeing to David
Whalen's inquiries, it seems like people think that's possible.

Thanks,
Charlie

On 10/9/07, Matthew Runo <[EMAIL PROTECTED]> wrote:

The way I'd do it would be to buy more servers, set up Tomcat on
each, and get SOLR replicating from your current machine to the
others. Then, throw them all behind a load balancer, and there you  
go.


You could also post your updates to every machine. Then you don't
need to worry about getting replication running.

++
  | Matthew Runo
  | Zappos Development
  | [EMAIL PROTECTED]
  | 702-943-7833
++


On Oct 9, 2007, at 7:12 AM, David Whalen wrote:


All:

How can I break up my install onto more than one box?  We've
hit a learning curve here and we don't understand how best to
proceed.  Right now we have everything crammed onto one box
because we don't know any better.

So, how would you build it if you could?  Here are the specs:

a) the index needs to hold at least 25 million articles
b) the index is constantly updated at a rate of 10,000 articles
per minute
c) we need to have faceted queries

Again, real-world experience is preferred here over book knowledge.
We've tried to read the docs and it's only made us more confused.

TIA

Dave W



-Original Message-
From: Yonik Seeley [mailto:[EMAIL PROTECTED]
Sent: Monday, October 08, 2007 3:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Availability Issues

On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:

Do you see any requests that took a really long time to finish?


The requests that take a long time to finish are just

simple queries.

And the same queries run at a later time come back much faster.

Our logs contain 99% inserts and 1% queries.  We are

constantly adding

documents to the index at a rate of 10,000 per minute, so the logs
show mostly that.


Oh, so you are using the same boxes for updating and querying?
When you insert, are you using multiple threads?  If so, how many?

What is the full URL of those slow query requests?
Do the slow requests start after a commit?


Start with the thread dump.
I bet it's multiple queries piling up around some synchronization
points in lucene (sometimes caused by multiple threads generating
the same big filter that isn't yet cached).


What would be my next steps after that?  I'm not sure I'd

understand

enough from the dump to make heads-or-tails of it.  Can I

share that

here?


Yes, post it here.  Most likely a majority of the threads
will be blocked somewhere deep in lucene code, and you will
probably need help from people here to figure it out.

-Yonik













Re: Availability Issues

2007-10-09 Thread Charles Hornberger
I'm about to do a prototype deployment of Solr for a pretty
high-volume site, and I've been following this thread with some
interest.

One thing I want to confirm: It's really possible for Solr to handle a
constant stream of 10K updates/min (>150 updates/sec) to a
25M-document index? I new Solr and Lucene were good, but that seems
like a pretty tall order. From the responses I'm seeing to David
Whalen's inquiries, it seems like people think that's possible.

Thanks,
Charlie

On 10/9/07, Matthew Runo <[EMAIL PROTECTED]> wrote:
> The way I'd do it would be to buy more servers, set up Tomcat on
> each, and get SOLR replicating from your current machine to the
> others. Then, throw them all behind a load balancer, and there you go.
>
> You could also post your updates to every machine. Then you don't
> need to worry about getting replication running.
>
> ++
>   | Matthew Runo
>   | Zappos Development
>   | [EMAIL PROTECTED]
>   | 702-943-7833
> ++
>
>
> On Oct 9, 2007, at 7:12 AM, David Whalen wrote:
>
> > All:
> >
> > How can I break up my install onto more than one box?  We've
> > hit a learning curve here and we don't understand how best to
> > proceed.  Right now we have everything crammed onto one box
> > because we don't know any better.
> >
> > So, how would you build it if you could?  Here are the specs:
> >
> > a) the index needs to hold at least 25 million articles
> > b) the index is constantly updated at a rate of 10,000 articles
> > per minute
> > c) we need to have faceted queries
> >
> > Again, real-world experience is preferred here over book knowledge.
> > We've tried to read the docs and it's only made us more confused.
> >
> > TIA
> >
> > Dave W
> >
> >
> >> -Original Message-
> >> From: Yonik Seeley [mailto:[EMAIL PROTECTED]
> >> Sent: Monday, October 08, 2007 3:42 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Availability Issues
> >>
> >> On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:
> >>>> Do you see any requests that took a really long time to finish?
> >>>
> >>> The requests that take a long time to finish are just
> >> simple queries.
> >>> And the same queries run at a later time come back much faster.
> >>>
> >>> Our logs contain 99% inserts and 1% queries.  We are
> >> constantly adding
> >>> documents to the index at a rate of 10,000 per minute, so the logs
> >>> show mostly that.
> >>
> >> Oh, so you are using the same boxes for updating and querying?
> >> When you insert, are you using multiple threads?  If so, how many?
> >>
> >> What is the full URL of those slow query requests?
> >> Do the slow requests start after a commit?
> >>
> >>>> Start with the thread dump.
> >>>> I bet it's multiple queries piling up around some synchronization
> >>>> points in lucene (sometimes caused by multiple threads generating
> >>>> the same big filter that isn't yet cached).
> >>>
> >>> What would be my next steps after that?  I'm not sure I'd
> >> understand
> >>> enough from the dump to make heads-or-tails of it.  Can I
> >> share that
> >>> here?
> >>
> >> Yes, post it here.  Most likely a majority of the threads
> >> will be blocked somewhere deep in lucene code, and you will
> >> probably need help from people here to figure it out.
> >>
> >> -Yonik
> >>
> >>
> >
>
>


Re: Availability Issues

2007-10-09 Thread Matthew Runo
The way I'd do it would be to buy more servers, set up Tomcat on  
each, and get SOLR replicating from your current machine to the  
others. Then, throw them all behind a load balancer, and there you go.


You could also post your updates to every machine. Then you don't  
need to worry about getting replication running.


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Oct 9, 2007, at 7:12 AM, David Whalen wrote:


All:

How can I break up my install onto more than one box?  We've
hit a learning curve here and we don't understand how best to
proceed.  Right now we have everything crammed onto one box
because we don't know any better.

So, how would you build it if you could?  Here are the specs:

a) the index needs to hold at least 25 million articles
b) the index is constantly updated at a rate of 10,000 articles
per minute
c) we need to have faceted queries

Again, real-world experience is preferred here over book knowledge.
We've tried to read the docs and it's only made us more confused.

TIA

Dave W



-Original Message-
From: Yonik Seeley [mailto:[EMAIL PROTECTED]
Sent: Monday, October 08, 2007 3:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Availability Issues

On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:

Do you see any requests that took a really long time to finish?


The requests that take a long time to finish are just

simple queries.

And the same queries run at a later time come back much faster.

Our logs contain 99% inserts and 1% queries.  We are

constantly adding

documents to the index at a rate of 10,000 per minute, so the logs
show mostly that.


Oh, so you are using the same boxes for updating and querying?
When you insert, are you using multiple threads?  If so, how many?

What is the full URL of those slow query requests?
Do the slow requests start after a commit?


Start with the thread dump.
I bet it's multiple queries piling up around some synchronization
points in lucene (sometimes caused by multiple threads generating
the same big filter that isn't yet cached).


What would be my next steps after that?  I'm not sure I'd

understand

enough from the dump to make heads-or-tails of it.  Can I

share that

here?


Yes, post it here.  Most likely a majority of the threads
will be blocked somewhere deep in lucene code, and you will
probably need help from people here to figure it out.

-Yonik








RE: Availability Issues

2007-10-09 Thread David Whalen
All:

How can I break up my install onto more than one box?  We've
hit a learning curve here and we don't understand how best to
proceed.  Right now we have everything crammed onto one box
because we don't know any better.

So, how would you build it if you could?  Here are the specs:

a) the index needs to hold at least 25 million articles
b) the index is constantly updated at a rate of 10,000 articles
per minute
c) we need to have faceted queries

Again, real-world experience is preferred here over book knowledge.
We've tried to read the docs and it's only made us more confused.

TIA

Dave W
  

> -Original Message-
> From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 08, 2007 3:42 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Availability Issues
> 
> On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:
> > > Do you see any requests that took a really long time to finish?
> >
> > The requests that take a long time to finish are just 
> simple queries.  
> > And the same queries run at a later time come back much faster.
> >
> > Our logs contain 99% inserts and 1% queries.  We are 
> constantly adding 
> > documents to the index at a rate of 10,000 per minute, so the logs 
> > show mostly that.
> 
> Oh, so you are using the same boxes for updating and querying?
> When you insert, are you using multiple threads?  If so, how many?
> 
> What is the full URL of those slow query requests?
> Do the slow requests start after a commit?
> 
> > > Start with the thread dump.
> > > I bet it's multiple queries piling up around some synchronization 
> > > points in lucene (sometimes caused by multiple threads generating 
> > > the same big filter that isn't yet cached).
> >
> > What would be my next steps after that?  I'm not sure I'd 
> understand 
> > enough from the dump to make heads-or-tails of it.  Can I 
> share that 
> > here?
> 
> Yes, post it here.  Most likely a majority of the threads 
> will be blocked somewhere deep in lucene code, and you will 
> probably need help from people here to figure it out.
> 
> -Yonik
> 
> 


RE: Availability Issues

2007-10-09 Thread David Whalen
Chris:

We're using Jetty also, so I get the sense I'm looking at the
wrong log file.

On that note -- I've read that Jetty isn't the best servlet
container to use in these situations, is that your experience?

Dave


> -Original Message-
> From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 08, 2007 11:20 PM
> To: solr-user
> Subject: RE: Availability Issues
> 
> 
> : My logs don't look anything like that.  They look like HTTP
> : requests.  Am I looking in the wrong place?
> 
> what servlet container are you using?  
> 
> every servlet container handles applications logs differently 
> -- it's especially tricky becuse even the format can be 
> changed, the examples i gave before are in the default format 
> you get if you use the jetty setup in the solr example (which 
> logs to stdout), but many servlet containers won't include 
> that much detail by default (they typically leave out the 
> classname and method name).  there's also typically a setting 
> that controls the verbosity -- so in some configurations only 
> the SEVERE messages are logged and in others the INFO 
> messages are logged ... you're going to want at least the 
> INFO level to debug stuff.
> 
> grep all the log files you can find for "Solr home set to" 
> ... that's one of the first messages Solr logs.  if you can 
> find that, you'll find the other messages i was talking about.
> 
> 
> -Hoss
> 
> 
> 


RE: Availability Issues

2007-10-08 Thread Chris Hostetter

: My logs don't look anything like that.  They look like HTTP
: requests.  Am I looking in the wrong place?

what servlet container are you using?  

every servlet container handles applications logs differently -- it's 
especially tricky becuse even the format can be changed, the examples i 
gave before are in the default format you get if you use the jetty setup 
in the solr example (which logs to stdout), but many servlet containers 
won't include that much detail by default (they typically leave out the 
classname and method name).  there's also typically a setting that 
controls the verbosity -- so in some configurations only the SEVERE 
messages are logged and in others the INFO messages are logged ... you're 
going to want at least the INFO level to debug stuff.

grep all the log files you can find for "Solr home set to" ... that's one 
of the first messages Solr logs.  if you can find that, you'll find the 
other messages i was talking about.


-Hoss



Re: Availability Issues

2007-10-08 Thread James liu
* *

*i think text not need "stored='true'" unless u will show it.(it will help u
decrease index size and not affect search )*

*index and search use same box? if it is true, u should moniter search
response time when indexing.(include CPU, RAM change)*

*i have similar problem and i increase JVM size to fix it.(u can try it and
show me your response)*


2007/10/9, David Whalen <[EMAIL PROTECTED]>:
>
> Thanks for letting me know that.  Okay, here they are:
>
>
>  BEGIN SCHEMA.XML===
>
>
> 
> 
>
> 
>
> 
> 
>
> 
>
>
>
> omitNorms="true"/>
>
>
> omitNorms="true"/>
>
>
>
>
>
>
>
>
>
>
>
>
> sortMissingLast="true" omitNorms="true"/>
> sortMissingLast="true" omitNorms="true"/>
> sortMissingLast="true" omitNorms="true"/>
> sortMissingLast="true" omitNorms="true"/>
>
>
>
> omitNorms="true"/>
>
>
>
>
>
>
> positionIncrementGap="100">
>  
>
>  
>
>
>
> positionIncrementGap="100">
>  
>
>
>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>
>
>
>
>  
>  
>
> ignoreCase="true" expand="true"/>
>
> generateWordParts="0" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0"/>
>
>
>
>
>  
>
>
>
>
> positionIncrementGap="100" >
>  
>
> ignoreCase="true" expand="false"/>
>
> generateWordParts="0" generateNumberParts="0" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>
>
>
>  
>
>
> 
>
>
> 
>   
>
>   
>   
>   
>   
>   
>multiValued="true"/>
>/>
>   
>   
>/>
>   
>   
>
>   
>   
>   
>   
>   
>   
>   
>   
>   
> 
>
> 
> id
>
> 
> text
>
> 
> 
>
> 
>   
>
> 
> 
>
> 
>
>  END SCHEMA.XML===
>
>
>
>
>  BEGIN CONFIG.XML===
>
>
>
> 
> 
>
> 
>
> 
> 
> false
> 
>   
>true
>10
>1000
>2147483647
>1
>1000
>1
> 
>
> 
>
>true
>10
>1000
>2147483647
>1
>
>
>true
> 
>
> 
> 
>
>
>
>
>
>
>
>
>
> 
>
>
> 
>
>1024
>
>
>
>  class="solr.LRUCache"
>  size="512"
>  initialSize="512"
>  autowarmCount="256"/>
>
>   
>  class="solr.LRUCache"
>  size="512"
>  initialSize="512"
>  autowarmCount="256"/>
>
> 
>  class="solr.LRUCache"
>  size="512"
>  initialSize="512"
>  autowarmCount="0"/>
>
>
>false
>
>
>
>
>   
>
>   
>10
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>false
>
> 
>
>
> 
> 
>
> 
>   explicit
>
>   
>   50
>   10
>   *
>   2.1
>-->
> 
> 
>
> 
> 
>
> explicit
> 0.01
> 
>text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
> 
> 
>text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9
> 
> 
>ord(poplarity)^0.5 recip(rord(price),1,1000,1000)^0.3
> 
> 
>id,name,price,score
> 
> 
>2<-1 5<-2 6<90%
> 
> 100
>
> 
>
> 
> 
>
> explicit
> text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0
> 2<-1 5<-2 6<90%
> 
> incubationdate_dt:[* TO NOW/DAY-1MONTH]^2.2
>
>
>
>  inStock:true
>
>
>
>  cat
>  manu_exact
>  price:[* TO 500]
>  price:[500 TO *]
>
> 
>
> 
>
> 
>inStock:true
> 
> 
>text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
> 
> 
>2<-1 5<-2 6<90%
> 
> 
>
> 
>
> 
> 
>5
> 
>
> 
> 
>solr
>solrconfig.xml schema.xml admin-extra.html
> 
>
>
> qt=dismax&q=solr&start=3&fq=id:[* TO *]&fq=cat:[* TO
> *]
>
>
> 
>
> 
>
>
>
>  END CONFIG.XML===
>
>
>
>
>
> > -Original Message-
> > From: Chris Hostetter [mailto:[EMAIL PROTECTED]
> > Sent: Monday, October 08, 2007 4:56 PM
> > To: solr-user
> > Subject: RE: Availability Issues
> >
> > : I've attached our schema/config files.  They are pretty much
> > : out-of-the-box values, except for our index.
> >
> > FYI: the mailing list strips most attachemnts ... the best
> > thing to do is just inline them in your mail.
> >
> > Quick question: do you have autoCommit turned on in your
> > solrconfig.xml?
> >
> > Second question: do you have autowarming on your caches?
> >
> >
> >
> > -Hoss
> >
> >
> >
>



-- 
regards
jl


RE: Availability Issues

2007-10-08 Thread David Whalen
Thanks for letting me know that.  Okay, here they are:


 BEGIN SCHEMA.XML ===








  

  


































  

  




  








  
  








  





  







  


 


 
   

   
   
   
   
   
   
   
   
   
   
   
   

   
   
   
   
   
   
   
   
   
 

 
 id

 
 text

 
 

  
   

 
 



 END SCHEMA.XML ===




 BEGIN CONFIG.XML ===








  
  
 false
  
   
true
10
1000
2147483647
1
1000
1
  

  

true
10
1000
2147483647
1


true
  

  
  









  


  

1024





   


  



false




   

   
10

















false

  


  
  

 
   explicit

   
   50
   10
   *
   2.1
-->
 
  

  
  

 explicit
 0.01
 
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 
 
text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9
 
 
ord(poplarity)^0.5 recip(rord(price),1,1000,1000)^0.3
 
 
id,name,price,score
 
 
2<-1 5<-2 6<90%
 
 100

  

  
  

 explicit
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0
 2<-1 5<-2 6<90%
 
 incubationdate_dt:[* TO NOW/DAY-1MONTH]^2.2



  inStock:true



  cat
  manu_exact
  price:[* TO 500]
  price:[500 TO *]

  
  
  

 
inStock:true
 
 
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 
 
2<-1 5<-2 6<90%
 
  
  
  

  
  
5
   

   
  
solr
solrconfig.xml schema.xml admin-extra.html


 qt=dismax&q=solr&start=3&fq=id:[* TO *]&fq=cat:[* TO *]


  





 END CONFIG.XML ===



 

> -Original Message-
> From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 08, 2007 4:56 PM
> To: solr-user
> Subject: RE: Availability Issues
> 
> : I've attached our schema/config files.  They are pretty much
> : out-of-the-box values, except for our index.
> 
> FYI: the mailing list strips most attachemnts ... the best 
> thing to do is just inline them in your mail.
> 
> Quick question: do you have autoCommit turned on in your 
> solrconfig.xml?
> 
> Second question: do you have autowarming on your caches?
> 
> 
> 
> -Hoss
> 
> 
> 


RE: Availability Issues

2007-10-08 Thread David Whalen
Hi Chris.

My logs don't look anything like that.  They look like HTTP
requests.  Am I looking in the wrong place?

Dave


> -Original Message-
> From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 08, 2007 5:02 PM
> To: solr-user
> Subject: RE: Availability Issues
> 
> 
> : > Do the slow requests start after a commit?
> : 
> : Based on the way the logs read, you could argue that point.
> : The stream of POSTs end in the logs and then subsequent queries
> : take longer to run, but it's hard to be sure there's a direct
> : correlation.
> 
> you would know based on the INFO level messages related to a 
> commit ... 
> you'll see messages that look like this when the commit starts...
> 
> Oct 8, 2007 1:56:48 PM 
> org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
> 
> ...then you'll see a message like this...
> 
> Oct 8, 2007 1:56:48 PM 
> org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: end_commit_flush
> 
> ...if you have autowarming you'll see a bunch of logs about 
> that, and then eventually you'll see a message like this...
> 
> Oct 8, 2007 1:56:48 PM 
> org.apache.solr.update.processor.LogUpdateProcessor finish
> INFO: {commit=} 0 299
> 
> ...the important question is how many of these hangs or 
> really long queries happen in the midst of all that ... how 
> many happen very quickly after it (which may indicate not 
> enough warming)
> 
> (NOTE: some of those log messages may look different in your 
> nightly snapshot version, but the main gist should be the 
> same .. i don't remember when exactly the LogUpdateProcessor 
> was added).
> 
> 
> 
> 
> -Hoss
> 
> 
> 


RE: Availability Issues

2007-10-08 Thread Chris Hostetter

: > Do the slow requests start after a commit?
: 
: Based on the way the logs read, you could argue that point.
: The stream of POSTs end in the logs and then subsequent queries
: take longer to run, but it's hard to be sure there's a direct
: correlation.

you would know based on the INFO level messages related to a commit ... 
you'll see messages that look like this when the commit starts...

Oct 8, 2007 1:56:48 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)

...then you'll see a message like this...

Oct 8, 2007 1:56:48 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush

...if you have autowarming you'll see a bunch of logs about that, and then 
eventually you'll see a message like this...

Oct 8, 2007 1:56:48 PM org.apache.solr.update.processor.LogUpdateProcessor 
finish
INFO: {commit=} 0 299

...the important question is how many of these hangs or really long 
queries happen in the midst of all that ... how many happen very quickly 
after it (which may indicate not enough warming)

(NOTE: some of those log messages may look different in your nightly 
snapshot version, but the main gist should be the same .. i don't remember 
when exactly the LogUpdateProcessor was added).




-Hoss



RE: Availability Issues

2007-10-08 Thread Chris Hostetter
: I've attached our schema/config files.  They are pretty much
: out-of-the-box values, except for our index.

FYI: the mailing list strips most attachemnts ... the best thing to do is 
just inline them in your mail.

Quick question: do you have autoCommit turned on in your solrconfig.xml?

Second question: do you have autowarming on your caches?



-Hoss



RE: Availability Issues

2007-10-08 Thread David Whalen
> Oh, so you are using the same boxes for updating and querying?

Yep.  We have a MySQL database on the box and we query it and
POST directly into SOLR via wget in PERL.  We then also hit the
box for queries.

[We'd be very interested in hearing about best practices on
how to seperate-out the data from the index and how to balance
them when the inserts outweigh the selects by factors of 50,000:1]

> When you insert, are you using multiple threads?  If so, how many?

We're not threading at all.  We have a PERL script that does a
select statement out of a MySQL database and runs POSTs sequentially
into SOLR, one per document.  After a batch of 10,000 POSTs, we run a
background commit (using waitFlush and waitSearcher)

Again, I'd be very grateful for success stories from people in terms
of good server architecture.  We are ready and willing to change versions
of linux, of the Java container, etc.  And we're ready to add more
boxes if that'll help.  We just need some guidance.

> What is the full URL of those slow query requests?

They can be anything.  For example:

[08/10/2007:18:51:55 +] "GET 
/solr/select/?q=solr&version=2.2&start=0&rows=10&indent=on HTTP/1.1" 200 45799

> Do the slow requests start after a commit?

Based on the way the logs read, you could argue that point.
The stream of POSTs end in the logs and then subsequent queries
take longer to run, but it's hard to be sure there's a direct
correlation.

> Yes, post it here.  Most likely a majority of the threads 
> will be blocked somewhere deep in lucene code, and you will 
> probably need help from people here to figure it out.

Next time it happens I'll shoot it over.
  
--Dave


> -Original Message-
> From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 08, 2007 3:42 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Availability Issues
> 
> On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:
> > > Do you see any requests that took a really long time to finish?
> >
> > The requests that take a long time to finish are just 
> simple queries.  
> > And the same queries run at a later time come back much faster.
> >
> > Our logs contain 99% inserts and 1% queries.  We are 
> constantly adding 
> > documents to the index at a rate of 10,000 per minute, so the logs 
> > show mostly that.
> 
> Oh, so you are using the same boxes for updating and querying?
> When you insert, are you using multiple threads?  If so, how many?
> 
> What is the full URL of those slow query requests?
> Do the slow requests start after a commit?
> 
> > > Start with the thread dump.
> > > I bet it's multiple queries piling up around some synchronization 
> > > points in lucene (sometimes caused by multiple threads generating 
> > > the same big filter that isn't yet cached).
> >
> > What would be my next steps after that?  I'm not sure I'd 
> understand 
> > enough from the dump to make heads-or-tails of it.  Can I 
> share that 
> > here?
> 
> Yes, post it here.  Most likely a majority of the threads 
> will be blocked somewhere deep in lucene code, and you will 
> probably need help from people here to figure it out.
> 
> -Yonik
> 
> 


Re: Availability Issues

2007-10-08 Thread Yonik Seeley
On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:
> > Do you see any requests that took a really long time to finish?
>
> The requests that take a long time to finish are just simple
> queries.  And the same queries run at a later time come back
> much faster.
>
> Our logs contain 99% inserts and 1% queries.  We are constantly
> adding documents to the index at a rate of 10,000 per minute,
> so the logs show mostly that.

Oh, so you are using the same boxes for updating and querying?
When you insert, are you using multiple threads?  If so, how many?

What is the full URL of those slow query requests?
Do the slow requests start after a commit?

> > Start with the thread dump.
> > I bet it's multiple queries piling up around some
> > synchronization points in lucene (sometimes caused by
> > multiple threads generating the same big filter that isn't
> > yet cached).
>
> What would be my next steps after that?  I'm not sure I'd
> understand enough from the dump to make heads-or-tails of
> it.  Can I share that here?

Yes, post it here.  Most likely a majority of the threads will be
blocked somewhere deep in lucene code, and you will probably need help
from people here to figure it out.

-Yonik


RE: Availability Issues

2007-10-08 Thread David Whalen
Hi Yonik.

> Do you see any requests that took a really long time to finish?

The requests that take a long time to finish are just simple
queries.  And the same queries run at a later time come back
much faster.

Our logs contain 99% inserts and 1% queries.  We are constantly
adding documents to the index at a rate of 10,000 per minute,
so the logs show mostly that.


> Start with the thread dump.
> I bet it's multiple queries piling up around some 
> synchronization points in lucene (sometimes caused by 
> multiple threads generating the same big filter that isn't 
> yet cached).

What would be my next steps after that?  I'm not sure I'd
understand enough from the dump to make heads-or-tails of
it.  Can I share that here?

Dave


> -Original Message-
> From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 08, 2007 3:01 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Availability Issues
> 
> On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:
> > The logs show nothing but regular activity.  We do a "tail -f"
> > on the logfile and we can read it during the unresponsive 
> period and 
> > we don't see any errors.
> 
> You don't see log entries for requests until after they complete.
> When a server becomes unresponsive, try shutting off further 
> traffic to it, and let it finish whatever requests it's 
> working on (assuming that's the issue) so you can see them in 
> the log.  Do you see any requests that took a really long 
> time to finish?
> 
> -Yonik
> 
> 


Re: Availability Issues

2007-10-08 Thread Yonik Seeley
On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:
> The logs show nothing but regular activity.  We do a "tail -f"
> on the logfile and we can read it during the unresponsive period
> and we don't see any errors.

You don't see log entries for requests until after they complete.
When a server becomes unresponsive, try shutting off further traffic
to it, and let it finish whatever requests it's working on (assuming
that's the issue) so you can see them in the log.  Do you see any
requests that took a really long time to finish?

-Yonik


Re: Availability Issues

2007-10-08 Thread Yonik Seeley
On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:
> > Have you taken a thread dump to see what is going on?
>
> We can't do it b/c during the unresponsive time we can't access
> the admin site (/solr/admin) at all.  I don't know how to do a
> thread dump via the command line

kill -3 

Start with the thread dump.
I bet it's multiple queries piling up around some synchronization
points in lucene (sometimes caused by multiple threads generating the
same big filter that isn't yet cached).

-Yonik


RE: Availability Issues

2007-10-08 Thread David Whalen
Hi Yonik.

> What version of Solr are you running?

We're running:
Solr Specification Version: 1.2.2007.08.24.08.06.00 
Solr Implementation Version: nightly ${svnversion} - yonik - 2007-08-24 
08:06:00 
Lucene Specification Version: 2.2.0 
Lucene Implementation Version: 2.2.0 548010 - buschmi - 2007-06-16 23:15:56 

> Is the CPU pegged at 100% when it's unresponsive?

It's a little difficult to be sure.  We have a HT box and the
CPU % we get back is misleading.  I think it's safe to say we
may spike up to 100% but we don't necessarily stay pegged there.

> Have you taken a thread dump to see what is going on?

We can't do it b/c during the unresponsive time we can't access
the admin site (/solr/admin) at all.  I don't know how to do a
thread dump via the command line

> Do you get into a situation where more than one searcher is 
> warming at a time? (there is configuration that can prevent 
> this one from happening).

Forgive me when I say I'm not totally clear on what this 
question means.  The index is constantly getting hit with
a myriad or queries, if that's what you meant

Thanks,

Dave


  

> -Original Message-
> From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 08, 2007 2:23 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Availability Issues
> 
> On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:
> > We're running SOLR 1.2 with a 2.5G heap size.  On any given 
> day, the 
> > system becomes completely unresponsive.
> > We can't even get /solr/admin/ to come up, much less any select 
> > queries.
> 
> What version of Solr are you running?
> The first step to diagnose something like this is to figure 
> out what is going on...
> Is the CPU pegged at 100% when it's unresponsive?
> Have you taken a thread dump to see what is going on?
> Do you get into a situation where more than one searcher is 
> warming at a time? (there is configuration that can prevent 
> this one from happening).
> 
> -Yonik
> 
> 


RE: Availability Issues

2007-10-08 Thread David Whalen
Hi Tom.

The logs show nothing but regular activity.  We do a "tail -f"
on the logfile and we can read it during the unresponsive period
and we don't see any errors.

I've attached our schema/config files.  They are pretty much
out-of-the-box values, except for our index.

Dave


> -Original Message-
> From: Tom Hill [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 08, 2007 2:22 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Availability Issues
> 
> Hi -
> 
> We're definitely not seeing that. What do your logs show? 
> What do your schema/solrconfig look like?
> 
> Tom
> 
> 
> On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:
> >
> > Hi All.
> >
> > I'm seeing all these threads about availability and I'm 
> wondering why 
> > my situation is so different than others'.
> >
> > We're running SOLR 1.2 with a 2.5G heap size.  On any given 
> day, the 
> > system becomes completely unresponsive.
> > We can't even get /solr/admin/ to come up, much less any select 
> > queries.
> >
> > The only thing we can do is kill the SOLR process and re-start it.
> >
> > We are indexing over 25 million documents and we add about 
> as much as 
> > we remove daily, so the number remains fairly constant.
> >
> > Again, it seems like other folks are having a much easier time with 
> > SOLR than we are.  Can anyone help by sharing how you've got it 
> > configured?  Does anyone have a similar experience?
> >
> > TIA.
> >
> > DW
> >
> >
> 
> 


Re: Availability Issues

2007-10-08 Thread Yonik Seeley
On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:
> We're running SOLR 1.2 with a 2.5G heap size.  On any
> given day, the system becomes completely unresponsive.
> We can't even get /solr/admin/ to come up, much less
> any select queries.

What version of Solr are you running?
The first step to diagnose something like this is to figure out what
is going on...
Is the CPU pegged at 100% when it's unresponsive?
Have you taken a thread dump to see what is going on?
Do you get into a situation where more than one searcher is warming at
a time? (there is configuration that can prevent this one from
happening).

-Yonik


Re: Availability Issues

2007-10-08 Thread Tom Hill
Hi -

We're definitely not seeing that. What do your logs show? What do your
schema/solrconfig look like?

Tom


On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:
>
> Hi All.
>
> I'm seeing all these threads about availability and I'm
> wondering why my situation is so different than others'.
>
> We're running SOLR 1.2 with a 2.5G heap size.  On any
> given day, the system becomes completely unresponsive.
> We can't even get /solr/admin/ to come up, much less
> any select queries.
>
> The only thing we can do is kill the SOLR process and
> re-start it.
>
> We are indexing over 25 million documents and we add
> about as much as we remove daily, so the number remains
> fairly constant.
>
> Again, it seems like other folks are having a much
> easier time with SOLR than we are.  Can anyone help
> by sharing how you've got it configured?  Does anyone
> have a similar experience?
>
> TIA.
>
> DW
>
>