Re: Commit in solr 1.3 can take up to 5 minutes

2008-10-04 Thread Yonik Seeley
On Sat, Oct 4, 2008 at 11:55 AM, Uwe Klosa <[EMAIL PROTECTED]> wrote:
> A "Opening Server" is always happening directly after "start commit" with no
> delay.

Ah, so it doesn't look like it's the close of the IndexWriter then!
When do you see the "end_commit_flush"?
Could you post everything in your log between when the commit begins
and when it ends?
Is this a live server (is query traffic continuing to come in while
the commit is happening?)  If so, it would be interesting to see (and
easier to debug) if it happened on a server with no query traffic.

> But I can see many {commit=} with QTime around 280.000 (4 and a half
> minutes)

> One difference I could see to your logging is that I have waitFlush=true.
> Could that have this impact?

These parameters (waitFlush/waitSearcher) won't affect how long it
takes to get the new searcher registered, but does affect at what
point control is returned to the caller (and hence when you see the
response).  If waitSearcher==false, then you see the response before
searcher warming, otherwise it blocks until after.  waitFlush==false
is not currently supported (it will always act as true), so your
change of that doesn't matter.

-Yonik


Re: RequestHandler that passes along the query

2008-10-04 Thread Brian Whitman
The issue I think is that process() is never called in my component, just
distributedProcess.
The server that hosts the component is a separate solr instance from the
shards, so my guess is process() is only called when that particular solr
instance has something to do with the index. distributedProcess() is called
for each of those stages, but the last stage it is called for is
GET_FIELDS.

But the WritingDistributedSearchComponents page did tip me off to a new
function, finishStage, that is called *after* each stage is done and does
exactly what I want:

  @Override

  public void finishStage(ResponseBuilder rb) {

if(rb.stage == ResponseBuilder.STAGE_GET_FIELDS) {

  SolrDocumentList sd = (SolrDocumentList) rb.rsp.getValues().get(
"response");

  for (SolrDocument d : sd) {

rb.rsp.add("second-id-list", d.getFieldValue("id").toString());

  }

}

  }






On Sat, Oct 4, 2008 at 1:37 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote:

> I'm not totally on top of how distributed components work, but check:
> http://wiki.apache.org/solr/WritingDistributedSearchComponents
>
> and:
>  https://issues.apache.org/jira/browse/SOLR-680
>
> Do you want each of the shards to append values?  or just the final result?
>  If appending the values is not a big resource hog, it may make sense to
> only do that in the main "process" block.  If that is the case, I *think*
> you just implement: process(ResponseBuilder rb)
>
> ryan
>
>
>
> On Oct 4, 2008, at 1:06 PM, Brian Whitman wrote:
>
>  Sorry for the extended question, but I am having trouble making
>> SearchComponent that can actually get at the returned response in a
>> distributed setup.
>> In my distributedProcess:
>>
>>   public int distributedProcess(ResponseBuilder rb) throws IOException {
>>
>> How can I get at the returned results from all shards? I want to get at
>> really the rendered response right before it goes back to the client so I
>> can add some information based on what came back.
>>
>> The TermVector example seems to get at rb.resultIds (which is not public
>> and
>> I can't use in my plugin) and then sends a request back to the shards to
>> get
>> the stored fields (using ShardDoc.id, another field I don't have access
>> to.)
>> Instead of doing all of that I'd like to just "peek" into the response
>> that
>> is about to be written to the client.
>>
>> I tried getting at rb.rsp but the data is not filled in during the last
>> stage (GET_FIELDS) that distributedProcess gets called for.
>>
>>
>>
>> On Sat, Oct 4, 2008 at 10:12 AM, Brian Whitman <[EMAIL PROTECTED]>
>> wrote:
>>
>>  Thanks grant and ryan, so far so good. But I am confused about one thing
>>> -
>>> when I set this up like:
>>>
>>>  public void process(ResponseBuilder rb) throws IOException {
>>>
>>> And put it as the last-component on a distributed search (a defaults
>>> shard
>>> is defined in the solrconfig for the handler), the component never does
>>> its
>>> thing. I looked at the TermVectorComponent implementation and it instead
>>> defines
>>>
>>>   public int distributedProcess(ResponseBuilder rb) throws IOException {
>>>
>>> And when I implemented that method it works. Is there a way to define
>>> just
>>> one method that will work with both distributed and normal searches?
>>>
>>>
>>>
>>> On Fri, Oct 3, 2008 at 4:41 PM, Grant Ingersoll <[EMAIL PROTECTED]
>>> >wrote:
>>>
>>>  No need to even write a new ReqHandler if you're using 1.3:
 http://wiki.apache.org/solr/SearchComponent


>>>
>


Re: RequestHandler that passes along the query

2008-10-04 Thread Ryan McKinley

I'm not totally on top of how distributed components work, but check:
http://wiki.apache.org/solr/WritingDistributedSearchComponents

and:
 https://issues.apache.org/jira/browse/SOLR-680

Do you want each of the shards to append values?  or just the final  
result?  If appending the values is not a big resource hog, it may  
make sense to only do that in the main "process" block.  If that is  
the case, I *think* you just implement: process(ResponseBuilder rb)


ryan


On Oct 4, 2008, at 1:06 PM, Brian Whitman wrote:


Sorry for the extended question, but I am having trouble making
SearchComponent that can actually get at the returned response in a
distributed setup.
In my distributedProcess:

   public int distributedProcess(ResponseBuilder rb) throws  
IOException {


How can I get at the returned results from all shards? I want to get  
at
really the rendered response right before it goes back to the client  
so I

can add some information based on what came back.

The TermVector example seems to get at rb.resultIds (which is not  
public and
I can't use in my plugin) and then sends a request back to the  
shards to get
the stored fields (using ShardDoc.id, another field I don't have  
access to.)
Instead of doing all of that I'd like to just "peek" into the  
response that

is about to be written to the client.

I tried getting at rb.rsp but the data is not filled in during the  
last

stage (GET_FIELDS) that distributedProcess gets called for.



On Sat, Oct 4, 2008 at 10:12 AM, Brian Whitman <[EMAIL PROTECTED]>  
wrote:


Thanks grant and ryan, so far so good. But I am confused about one  
thing -

when I set this up like:

 public void process(ResponseBuilder rb) throws IOException {

And put it as the last-component on a distributed search (a  
defaults shard
is defined in the solrconfig for the handler), the component never  
does its
thing. I looked at the TermVectorComponent implementation and it  
instead

defines

   public int distributedProcess(ResponseBuilder rb) throws  
IOException {


And when I implemented that method it works. Is there a way to  
define just

one method that will work with both distributed and normal searches?



On Fri, Oct 3, 2008 at 4:41 PM, Grant Ingersoll  
<[EMAIL PROTECTED]>wrote:



No need to even write a new ReqHandler if you're using 1.3:
http://wiki.apache.org/solr/SearchComponent







Re: RequestHandler that passes along the query

2008-10-04 Thread Brian Whitman
Sorry for the extended question, but I am having trouble making
 SearchComponent that can actually get at the returned response in a
distributed setup.
In my distributedProcess:

public int distributedProcess(ResponseBuilder rb) throws IOException {

How can I get at the returned results from all shards? I want to get at
really the rendered response right before it goes back to the client so I
can add some information based on what came back.

The TermVector example seems to get at rb.resultIds (which is not public and
I can't use in my plugin) and then sends a request back to the shards to get
the stored fields (using ShardDoc.id, another field I don't have access to.)
Instead of doing all of that I'd like to just "peek" into the response that
is about to be written to the client.

I tried getting at rb.rsp but the data is not filled in during the last
stage (GET_FIELDS) that distributedProcess gets called for.



On Sat, Oct 4, 2008 at 10:12 AM, Brian Whitman <[EMAIL PROTECTED]> wrote:

> Thanks grant and ryan, so far so good. But I am confused about one thing -
> when I set this up like:
>
>   public void process(ResponseBuilder rb) throws IOException {
>
> And put it as the last-component on a distributed search (a defaults shard
> is defined in the solrconfig for the handler), the component never does its
> thing. I looked at the TermVectorComponent implementation and it instead
> defines
>
> public int distributedProcess(ResponseBuilder rb) throws IOException {
>
> And when I implemented that method it works. Is there a way to define just
> one method that will work with both distributed and normal searches?
>
>
>
> On Fri, Oct 3, 2008 at 4:41 PM, Grant Ingersoll <[EMAIL PROTECTED]>wrote:
>
>> No need to even write a new ReqHandler if you're using 1.3:
>> http://wiki.apache.org/solr/SearchComponent
>>
>


Re: Commit in solr 1.3 can take up to 5 minutes

2008-10-04 Thread Uwe Klosa
A "Opening Server" is always happening directly after "start commit" with no
delay. But I can see many {commit=} with QTime around 280.000 (4 and a half
minutes)

One difference I could see to your logging is that I have waitFlush=true.
Could that have this impact?

Uwe

On Sat, Oct 4, 2008 at 4:36 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> On Fri, Oct 3, 2008 at 2:28 PM, Michael McCandless
> <[EMAIL PROTECTED]> wrote:
> > Yonik, when Solr commits what does it actually do?
>
> Less than it used to (Solr now uses Lucene to handle deletes).
> A solr-level commit closes the IndexWriter, calls some configured
> callbacks, opens a new IndexSearcher, warms it, and registers it.
>
> We can tell where the time is taken by looking at the timestamps in
> the log entries.  Here is what the log output should look like for a
> commit:
>
> INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
>  // close the index writer
>  // call any configured post-commit callbacks (to take a snapshot if
> the index, etc).
>  // open a new IndexSearcher (uses IndexReader.reopen() of the last
> opened reader)
> INFO: Opening [EMAIL PROTECTED] main
> INFO: end_commit_flush
>  // in a different thread, warming of the new IndexSearcher will be done.
>  // by default, the solr-level commit will wait for warming to be
> done and the new searcher
>  // to be registered (i.e. any new searches will see the committed changes)
> INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main [...]
>  // there will be multiple autowarming statements, and some could
> appear before the
>  // end_commit_flush log entry because it's being done in another thread.
> INFO: [] Registered new searcher [EMAIL PROTECTED] main
> INFO: Closing [EMAIL PROTECTED] main
> INFO: {commit=} 0 547
> INFO: [] webapp=/solr path=/update params={} status=0 QTime=547
>
> Uwe, can you verify that the bulk of the time is between "start
> commit" and "Opening Searcher"?
>
> -Yonik
>


Re: Commit in solr 1.3 can take up to 5 minutes

2008-10-04 Thread Yonik Seeley
On Sat, Oct 4, 2008 at 9:35 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> So it seems like fsync with ZFS can be very slow?

The other user that appears to have a commit issue is on Win64.

http://www.nabble.com/*Very*-slow-Commit-after-upgrading-to-solr-1.3-td19720792.html#a19720792

-Yonik


Re: *Very* slow Commit after upgrading to solr 1.3

2008-10-04 Thread Yonik Seeley
Ben, see also

http://www.nabble.com/Commit-in-solr-1.3-can-take-up-to-5-minutes-td19802781.html#a19802781

What type of physical drive is this and what interface is used (SATA, etc)?
What is the filesystem (NTFS)?

Did you add to an existing index from an older version of Solr, or
start from scratch?

If you add a single document to the index and commit, does it take a long time?

I notice your merge factor is 1000... this will create many files that
need to be sync'd
It may help to try the IndexWriter settings from the 1.3 example
setup... the important changes being:

10

32

-Yonik

On Mon, Sep 29, 2008 at 5:33 AM, Ben Shlomo, Yatir
<[EMAIL PROTECTED]> wrote:
> Hi!
>
>
>
> I am running on widows 64 bit ...
> I have upgraded to solr 1.3 in order to use the distributed search.
>
> I haven't changed the solrConfig and the schema xml files during the
> upgrade.
>
> I am indexing ~ 350K documents (each one is about 0.5 KB in size)
>
> The indexing takes a reasonable amount of time (350 seconds)
>
> See tomcat log:
>
> INFO: {add=[8x-wbTscWftuu1sVWpdnGw==, VOu1eSv0obBl1xkj2jGjIA==,
> YkOm-nKPrTVVVyeCZM4-4A==, rvaq_TyYsqt3aBc0KKDVbQ==,
> 9NdzWXsErbF_5btyT1JUjw==, ...(398728 more)]} 0 349875
>
>
>
> But when I commit it takes more than an hour ! (5000 seconds!, the
> optimize after the commit took 14 seconds)
>
> INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
>
>
>
> p.s. its not a machine problem I moved to another machine and the same
> thing happened
>
>
> I noticed something very strange during the time I wait for the commit:
>
> While the solr index is 210MB in size
>
> In the windows task manager I noticed that the java process is making a
> HUGE amounts of IO reads:
>
> It reads more than 350 GB ! (- which takes a lot of time.)
>
> The process is constantly taking 25% of the cpu resources.
>
> All my autowarmCount in Solrconfig  file do not exceed 256...
>
>
>
> Any more ideas to check?
>
> Thanks.
>
>
>
>
>
>
>
> Here is part of my solrConfig file:
>
> -   < - 
>
> - 
>
>  false
>
>  1000
>
>  1000
>
>  2147483647
>
>  1
>
>  1000
>
>  1
>
>  
>
> - 
>
> - 
>
>  false
>
>  1000
>
>  1000
>
>  2147483647
>
>  1
>
> - 
>
>  true
>
>  
>
>
>
>
>
>
>
>
>
>
>
> Yatir Ben-shlomo | eBay, Inc. | Classification Track, Shopping.com
> (Israel) | w: +972-9-892-1373 |  email: [EMAIL PROTECTED] |
>
>
>
>


Re: Commit in solr 1.3 can take up to 5 minutes

2008-10-04 Thread Yonik Seeley
On Fri, Oct 3, 2008 at 2:28 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> Yonik, when Solr commits what does it actually do?

Less than it used to (Solr now uses Lucene to handle deletes).
A solr-level commit closes the IndexWriter, calls some configured
callbacks, opens a new IndexSearcher, warms it, and registers it.

We can tell where the time is taken by looking at the timestamps in
the log entries.  Here is what the log output should look like for a
commit:

INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
  // close the index writer
  // call any configured post-commit callbacks (to take a snapshot if
the index, etc).
  // open a new IndexSearcher (uses IndexReader.reopen() of the last
opened reader)
INFO: Opening [EMAIL PROTECTED] main
INFO: end_commit_flush
  // in a different thread, warming of the new IndexSearcher will be done.
  // by default, the solr-level commit will wait for warming to be
done and the new searcher
  // to be registered (i.e. any new searches will see the committed changes)
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main [...]
  // there will be multiple autowarming statements, and some could
appear before the
  // end_commit_flush log entry because it's being done in another thread.
INFO: [] Registered new searcher [EMAIL PROTECTED] main
INFO: Closing [EMAIL PROTECTED] main
INFO: {commit=} 0 547
INFO: [] webapp=/solr path=/update params={} status=0 QTime=547

Uwe, can you verify that the bulk of the time is between "start
commit" and "Opening Searcher"?

-Yonik


Re: RequestHandler that passes along the query

2008-10-04 Thread Brian Whitman
Thanks grant and ryan, so far so good. But I am confused about one thing -
when I set this up like:

  public void process(ResponseBuilder rb) throws IOException {

And put it as the last-component on a distributed search (a defaults shard
is defined in the solrconfig for the handler), the component never does its
thing. I looked at the TermVectorComponent implementation and it instead
defines

public int distributedProcess(ResponseBuilder rb) throws IOException {

And when I implemented that method it works. Is there a way to define just
one method that will work with both distributed and normal searches?



On Fri, Oct 3, 2008 at 4:41 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:

> No need to even write a new ReqHandler if you're using 1.3:
> http://wiki.apache.org/solr/SearchComponent
>


Re: Commit in solr 1.3 can take up to 5 minutes

2008-10-04 Thread Michael McCandless


Oh OK, phew.  I misunderstood your answer too!

So it seems like fsync with ZFS can be very slow?

Mike

Uwe Klosa wrote:

Oh, you meant index files. I misunderstood your question. Sorry, now  
that I
read it again I see what you meant. There are only 136 index files.  
So no

problem there.

Uwe

On Sat, Oct 4, 2008 at 1:59 PM, Michael McCandless <
[EMAIL PROTECTED]> wrote:



Yikes!  That's way too many files.  Have you changed mergeFactor?  Or
implemented a custom DeletionPolicy or MergePolicy?

Or... does anyone know of something else in Solr's configuration  
that could

lead to such an insane number of files?

Mike


Uwe Klosa wrote:

There are around 35.000 files in the index. When I started Indexing 5

weeks
ago with only 2000 documents I did not this issue. I have seen it  
the

first
time with around 10.000 documents.

Before that I have been using the same instance on a Linux machine  
with up
to 17.000 documents and I haven't seen this issue at all. The  
original

plan
has always been to use Solr on Linux, but I'm still waiting for  
the new

server.

Uwe

On Sat, Oct 4, 2008 at 12:06 PM, Michael McCandless <
[EMAIL PROTECTED]> wrote:


Hmm OK that seems like a possible explanation then.  Still it's  
spooky

that
it's taking 5 minutes.  How many files are in the index at the  
time you

call
commit?

I wonder if you were to simply pause for say 30 seconds, before  
issuing

the
commit, whether you'd then see the commit go faster?  On Windows  
at least
such a silly trick does seem to improve performance, I think  
because it
allows the OS to move the bytes from its write cache onto stable  
storage

"on
its own schedule" whereas when we commit we are demanding the OS  
move the

bytes on our [arbitrary] schedule.

I really wish OSs would add an API that would just block & return  
once

the
file has made it to stable storage (letting the OS sync on its own
optimal
schedule), rather than demanding the file be fsync'd immediately.

I really haven't explored the performance of fsync on different
filesystems.  I think I've read that ReiserFS may have issues,  
though it
could have been addressed by now.  I *believe* ext3 is OK (at  
least, it
didn't show the strange "sleep to get better performance" issue  
above, in

my
limited testing).

Mike


Uwe Klosa wrote:

Thanks Mike



The use of fsync() might be the answer to my problem, because I  
have
installed Solr for lack of other possibilities in a zone on  
Solaris with

ZFS
which slows down when many fsync() calls are made. This will be  
fixed in

a
upcoming release of Solaris, but I will move as soon as possible  
the

Solr
instances to another server with a different file system. Would  
the use

of
a
different file system than ext3 boost the performance?

Uwe

On Fri, Oct 3, 2008 at 8:28 PM, Michael McCandless <
[EMAIL PROTECTED]> wrote:


Yonik Seeley wrote:


On Fri, Oct 3, 2008 at 1:56 PM, Uwe Klosa <[EMAIL PROTECTED]>  
wrote:



I have a big problem with one of my solr instances. A commit  
can take



up
to
5 minutes. This time does not depend on the number of  
documents which

are
updated. The difference for 1 or 100 updated documents is  
only a few

seconds.


Since Solr's commit logic really hasn't changed, I wonder if  
this

could be lucene related somehow.


Lucene's commit logic has changed: we now fsync() each file in  
the

index
to
ensure all bytes are on stable storage, before returning.

But I can't imagine that taking 5 minutes, unless there are  
somehow a

great
many files added to the index?

Uwe, what filesystem are you using?

Yonik, when Solr commits what does it actually do?

Mike











Re: Commit in solr 1.3 can take up to 5 minutes

2008-10-04 Thread Uwe Klosa
Oh, you meant index files. I misunderstood your question. Sorry, now that I
read it again I see what you meant. There are only 136 index files. So no
problem there.

Uwe

On Sat, Oct 4, 2008 at 1:59 PM, Michael McCandless <
[EMAIL PROTECTED]> wrote:

>
> Yikes!  That's way too many files.  Have you changed mergeFactor?  Or
> implemented a custom DeletionPolicy or MergePolicy?
>
> Or... does anyone know of something else in Solr's configuration that could
> lead to such an insane number of files?
>
> Mike
>
>
> Uwe Klosa wrote:
>
>  There are around 35.000 files in the index. When I started Indexing 5
>> weeks
>> ago with only 2000 documents I did not this issue. I have seen it the
>> first
>> time with around 10.000 documents.
>>
>> Before that I have been using the same instance on a Linux machine with up
>> to 17.000 documents and I haven't seen this issue at all. The original
>> plan
>> has always been to use Solr on Linux, but I'm still waiting for the new
>> server.
>>
>> Uwe
>>
>> On Sat, Oct 4, 2008 at 12:06 PM, Michael McCandless <
>> [EMAIL PROTECTED]> wrote:
>>
>>
>>> Hmm OK that seems like a possible explanation then.  Still it's spooky
>>> that
>>> it's taking 5 minutes.  How many files are in the index at the time you
>>> call
>>> commit?
>>>
>>> I wonder if you were to simply pause for say 30 seconds, before issuing
>>> the
>>> commit, whether you'd then see the commit go faster?  On Windows at least
>>> such a silly trick does seem to improve performance, I think because it
>>> allows the OS to move the bytes from its write cache onto stable storage
>>> "on
>>> its own schedule" whereas when we commit we are demanding the OS move the
>>> bytes on our [arbitrary] schedule.
>>>
>>> I really wish OSs would add an API that would just block & return once
>>> the
>>> file has made it to stable storage (letting the OS sync on its own
>>> optimal
>>> schedule), rather than demanding the file be fsync'd immediately.
>>>
>>> I really haven't explored the performance of fsync on different
>>> filesystems.  I think I've read that ReiserFS may have issues, though it
>>> could have been addressed by now.  I *believe* ext3 is OK (at least, it
>>> didn't show the strange "sleep to get better performance" issue above, in
>>> my
>>> limited testing).
>>>
>>> Mike
>>>
>>>
>>> Uwe Klosa wrote:
>>>
>>> Thanks Mike
>>>

 The use of fsync() might be the answer to my problem, because I have
 installed Solr for lack of other possibilities in a zone on Solaris with
 ZFS
 which slows down when many fsync() calls are made. This will be fixed in
 a
 upcoming release of Solaris, but I will move as soon as possible the
 Solr
 instances to another server with a different file system. Would the use
 of
 a
 different file system than ext3 boost the performance?

 Uwe

 On Fri, Oct 3, 2008 at 8:28 PM, Michael McCandless <
 [EMAIL PROTECTED]> wrote:


  Yonik Seeley wrote:
>
> On Fri, Oct 3, 2008 at 1:56 PM, Uwe Klosa <[EMAIL PROTECTED]> wrote:
>
>
>> I have a big problem with one of my solr instances. A commit can take
>>
>>> up
>>> to
>>> 5 minutes. This time does not depend on the number of documents which
>>> are
>>> updated. The difference for 1 or 100 updated documents is only a few
>>> seconds.
>>>
>>>
>>>  Since Solr's commit logic really hasn't changed, I wonder if this
>> could be lucene related somehow.
>>
>>
>>  Lucene's commit logic has changed: we now fsync() each file in the
> index
> to
> ensure all bytes are on stable storage, before returning.
>
> But I can't imagine that taking 5 minutes, unless there are somehow a
> great
> many files added to the index?
>
> Uwe, what filesystem are you using?
>
> Yonik, when Solr commits what does it actually do?
>
> Mike
>
>
>
>>>
>


Re: Commit in solr 1.3 can take up to 5 minutes

2008-10-04 Thread Michael McCandless


Yikes!  That's way too many files.  Have you changed mergeFactor?  Or  
implemented a custom DeletionPolicy or MergePolicy?


Or... does anyone know of something else in Solr's configuration that  
could lead to such an insane number of files?


Mike

Uwe Klosa wrote:

There are around 35.000 files in the index. When I started Indexing  
5 weeks
ago with only 2000 documents I did not this issue. I have seen it  
the first

time with around 10.000 documents.

Before that I have been using the same instance on a Linux machine  
with up
to 17.000 documents and I haven't seen this issue at all. The  
original plan
has always been to use Solr on Linux, but I'm still waiting for the  
new

server.

Uwe

On Sat, Oct 4, 2008 at 12:06 PM, Michael McCandless <
[EMAIL PROTECTED]> wrote:



Hmm OK that seems like a possible explanation then.  Still it's  
spooky that
it's taking 5 minutes.  How many files are in the index at the time  
you call

commit?

I wonder if you were to simply pause for say 30 seconds, before  
issuing the
commit, whether you'd then see the commit go faster?  On Windows at  
least
such a silly trick does seem to improve performance, I think  
because it
allows the OS to move the bytes from its write cache onto stable  
storage "on
its own schedule" whereas when we commit we are demanding the OS  
move the

bytes on our [arbitrary] schedule.

I really wish OSs would add an API that would just block & return  
once the
file has made it to stable storage (letting the OS sync on its own  
optimal

schedule), rather than demanding the file be fsync'd immediately.

I really haven't explored the performance of fsync on different
filesystems.  I think I've read that ReiserFS may have issues,  
though it
could have been addressed by now.  I *believe* ext3 is OK (at  
least, it
didn't show the strange "sleep to get better performance" issue  
above, in my

limited testing).

Mike


Uwe Klosa wrote:

Thanks Mike


The use of fsync() might be the answer to my problem, because I have
installed Solr for lack of other possibilities in a zone on  
Solaris with

ZFS
which slows down when many fsync() calls are made. This will be  
fixed in a
upcoming release of Solaris, but I will move as soon as possible  
the Solr
instances to another server with a different file system. Would  
the use of

a
different file system than ext3 boost the performance?

Uwe

On Fri, Oct 3, 2008 at 8:28 PM, Michael McCandless <
[EMAIL PROTECTED]> wrote:



Yonik Seeley wrote:

On Fri, Oct 3, 2008 at 1:56 PM, Uwe Klosa <[EMAIL PROTECTED]>  
wrote:




I have a big problem with one of my solr instances. A commit can  
take

up
to
5 minutes. This time does not depend on the number of documents  
which

are
updated. The difference for 1 or 100 updated documents is only  
a few

seconds.



Since Solr's commit logic really hasn't changed, I wonder if this
could be lucene related somehow.


Lucene's commit logic has changed: we now fsync() each file in  
the index

to
ensure all bytes are on stable storage, before returning.

But I can't imagine that taking 5 minutes, unless there are  
somehow a

great
many files added to the index?

Uwe, what filesystem are you using?

Yonik, when Solr commits what does it actually do?

Mike








Re: Commit in solr 1.3 can take up to 5 minutes

2008-10-04 Thread Uwe Klosa
There are around 35.000 files in the index. When I started Indexing 5 weeks
ago with only 2000 documents I did not this issue. I have seen it the first
time with around 10.000 documents.

Before that I have been using the same instance on a Linux machine with up
to 17.000 documents and I haven't seen this issue at all. The original plan
has always been to use Solr on Linux, but I'm still waiting for the new
server.

Uwe

On Sat, Oct 4, 2008 at 12:06 PM, Michael McCandless <
[EMAIL PROTECTED]> wrote:

>
> Hmm OK that seems like a possible explanation then.  Still it's spooky that
> it's taking 5 minutes.  How many files are in the index at the time you call
> commit?
>
> I wonder if you were to simply pause for say 30 seconds, before issuing the
> commit, whether you'd then see the commit go faster?  On Windows at least
> such a silly trick does seem to improve performance, I think because it
> allows the OS to move the bytes from its write cache onto stable storage "on
> its own schedule" whereas when we commit we are demanding the OS move the
> bytes on our [arbitrary] schedule.
>
> I really wish OSs would add an API that would just block & return once the
> file has made it to stable storage (letting the OS sync on its own optimal
> schedule), rather than demanding the file be fsync'd immediately.
>
> I really haven't explored the performance of fsync on different
> filesystems.  I think I've read that ReiserFS may have issues, though it
> could have been addressed by now.  I *believe* ext3 is OK (at least, it
> didn't show the strange "sleep to get better performance" issue above, in my
> limited testing).
>
> Mike
>
>
> Uwe Klosa wrote:
>
>  Thanks Mike
>>
>> The use of fsync() might be the answer to my problem, because I have
>> installed Solr for lack of other possibilities in a zone on Solaris with
>> ZFS
>> which slows down when many fsync() calls are made. This will be fixed in a
>> upcoming release of Solaris, but I will move as soon as possible the Solr
>> instances to another server with a different file system. Would the use of
>> a
>> different file system than ext3 boost the performance?
>>
>> Uwe
>>
>> On Fri, Oct 3, 2008 at 8:28 PM, Michael McCandless <
>> [EMAIL PROTECTED]> wrote:
>>
>>
>>> Yonik Seeley wrote:
>>>
>>> On Fri, Oct 3, 2008 at 1:56 PM, Uwe Klosa <[EMAIL PROTECTED]> wrote:
>>>

  I have a big problem with one of my solr instances. A commit can take
> up
> to
> 5 minutes. This time does not depend on the number of documents which
> are
> updated. The difference for 1 or 100 updated documents is only a few
> seconds.
>
>
 Since Solr's commit logic really hasn't changed, I wonder if this
 could be lucene related somehow.


>>> Lucene's commit logic has changed: we now fsync() each file in the index
>>> to
>>> ensure all bytes are on stable storage, before returning.
>>>
>>> But I can't imagine that taking 5 minutes, unless there are somehow a
>>> great
>>> many files added to the index?
>>>
>>> Uwe, what filesystem are you using?
>>>
>>> Yonik, when Solr commits what does it actually do?
>>>
>>> Mike
>>>
>>>
>


Re: Commit in solr 1.3 can take up to 5 minutes

2008-10-04 Thread Michael McCandless


Hmm OK that seems like a possible explanation then.  Still it's spooky  
that it's taking 5 minutes.  How many files are in the index at the  
time you call commit?


I wonder if you were to simply pause for say 30 seconds, before  
issuing the commit, whether you'd then see the commit go faster?  On  
Windows at least such a silly trick does seem to improve performance,  
I think because it allows the OS to move the bytes from its write  
cache onto stable storage "on its own schedule" whereas when we commit  
we are demanding the OS move the bytes on our [arbitrary] schedule.


I really wish OSs would add an API that would just block & return once  
the file has made it to stable storage (letting the OS sync on its own  
optimal schedule), rather than demanding the file be fsync'd  
immediately.


I really haven't explored the performance of fsync on different  
filesystems.  I think I've read that ReiserFS may have issues, though  
it could have been addressed by now.  I *believe* ext3 is OK (at  
least, it didn't show the strange "sleep to get better performance"  
issue above, in my limited testing).


Mike

Uwe Klosa wrote:


Thanks Mike

The use of fsync() might be the answer to my problem, because I have
installed Solr for lack of other possibilities in a zone on Solaris  
with ZFS
which slows down when many fsync() calls are made. This will be  
fixed in a
upcoming release of Solaris, but I will move as soon as possible the  
Solr
instances to another server with a different file system. Would the  
use of a

different file system than ext3 boost the performance?

Uwe

On Fri, Oct 3, 2008 at 8:28 PM, Michael McCandless <
[EMAIL PROTECTED]> wrote:



Yonik Seeley wrote:

On Fri, Oct 3, 2008 at 1:56 PM, Uwe Klosa <[EMAIL PROTECTED]>  
wrote:


I have a big problem with one of my solr instances. A commit can  
take up

to
5 minutes. This time does not depend on the number of documents  
which are
updated. The difference for 1 or 100 updated documents is only a  
few

seconds.



Since Solr's commit logic really hasn't changed, I wonder if this
could be lucene related somehow.



Lucene's commit logic has changed: we now fsync() each file in the  
index to

ensure all bytes are on stable storage, before returning.

But I can't imagine that taking 5 minutes, unless there are somehow  
a great

many files added to the index?

Uwe, what filesystem are you using?

Yonik, when Solr commits what does it actually do?

Mike





Re: Commit in solr 1.3 can take up to 5 minutes

2008-10-04 Thread Uwe Klosa
Thanks Mike

The use of fsync() might be the answer to my problem, because I have
installed Solr for lack of other possibilities in a zone on Solaris with ZFS
which slows down when many fsync() calls are made. This will be fixed in a
upcoming release of Solaris, but I will move as soon as possible the Solr
instances to another server with a different file system. Would the use of a
different file system than ext3 boost the performance?

Uwe

On Fri, Oct 3, 2008 at 8:28 PM, Michael McCandless <
[EMAIL PROTECTED]> wrote:

>
> Yonik Seeley wrote:
>
>  On Fri, Oct 3, 2008 at 1:56 PM, Uwe Klosa <[EMAIL PROTECTED]> wrote:
>>
>>> I have a big problem with one of my solr instances. A commit can take up
>>> to
>>> 5 minutes. This time does not depend on the number of documents which are
>>> updated. The difference for 1 or 100 updated documents is only a few
>>> seconds.
>>>
>>
>> Since Solr's commit logic really hasn't changed, I wonder if this
>> could be lucene related somehow.
>>
>
> Lucene's commit logic has changed: we now fsync() each file in the index to
> ensure all bytes are on stable storage, before returning.
>
> But I can't imagine that taking 5 minutes, unless there are somehow a great
> many files added to the index?
>
> Uwe, what filesystem are you using?
>
> Yonik, when Solr commits what does it actually do?
>
> Mike
>


Re: Commit in solr 1.3 can take up to 5 minutes

2008-10-04 Thread Uwe Klosa
5 minutes for only one update is slow.

On Fri, Oct 3, 2008 at 8:13 PM, Fuad Efendi <[EMAIL PROTECTED]> wrote:

> Hi Uwe,
>
> 5 minutes is not slow; commit can't be realtime... I do commit&optimize
> once a day at 3:00AM. It takes 15-20 minutes, but I have several millions
> daily updates...
>
>
>
>  Is there a way to see why commits are slow? Has anyone had the same
>> problem
>> and what was the solution that solved it?
>>
>> I can provide my schema.xml and solrconfig.xml if needed.
>>
>> Thanks in advance
>> Uwe
>>
>>
>
>
>


Re: Incomplete search by using dixmasequesthandler

2008-10-04 Thread Erik Hatcher
Currently there is not a way to specify wildcards or "all" fields in a  
qf parameter.  However, if the goal is to make a bunch of dynamic  
fields searchable, but without individual boosts, use copyField to  
merge all of your desired dynamic fields into a single searchable one.


Erik

On Oct 4, 2008, at 5:36 AM, prerna07 wrote:



All these fields are dynamic fields hence we dont know names of all  
the
fields also the number of dynamic fields is large, and we want to  
search for

all these dynamic fields.

Is there any other way of query field boosting ?



prerna07 wrote:


Hi,

I am using dismaxrequesthandler to boost by query on the basis of  
fields.

There are 5 indexes which contain the search string.

Field names which have this Searchcriteria are:
- statusName_s
- listOf_author
- prdMainTitle_s
- productDescription_s
- productURL_s


my query string is :
?q=Shahrukh&qt=dismaxrequest&qf=listOf_author%5E5.0+statusName_s 
%5E6.0+prdMainTitle_s%5E3.0


I dont need any boosting for productDescription_s and productURL_s  
hence i

am not giving these field names the query string above.
The results I am getting from this query does not contain documents  
where

the searchstring is present in productDescription_s and productURL_s
fields

Is there any configuration in solrconfig which can handle this  
scenario.


Let me know if you need more information.

Thanks,
Prerna






--
View this message in context: 
http://www.nabble.com/Incomplete-search-by-using-dixmasequesthandler-tp19810056p19810457.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Incomplete search by using dixmasequesthandler

2008-10-04 Thread prerna07

All these fields are dynamic fields hence we dont know names of all the
fields also the number of dynamic fields is large, and we want to search for
all these dynamic fields.

Is there any other way of query field boosting ?



prerna07 wrote:
> 
> Hi,
> 
> I am using dismaxrequesthandler to boost by query on the basis of fields.
> There are 5 indexes which contain the search string.
> 
> Field names which have this Searchcriteria are:
> - statusName_s
> - listOf_author
> - prdMainTitle_s
> - productDescription_s
> - productURL_s
> 
> 
> my query string is :
> ?q=Shahrukh&qt=dismaxrequest&qf=listOf_author%5E5.0+statusName_s%5E6.0+prdMainTitle_s%5E3.0
> 
> I dont need any boosting for productDescription_s and productURL_s hence i
> am not giving these field names the query string above. 
> The results I am getting from this query does not contain documents where
> the searchstring is present in productDescription_s and productURL_s
> fields
> 
> Is there any configuration in solrconfig which can handle this scenario.
> 
> Let me know if you need more information.
> 
> Thanks,
> Prerna
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Incomplete-search-by-using-dixmasequesthandler-tp19810056p19810457.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Incomplete search by using dixmasequesthandler

2008-10-04 Thread Erik Hatcher


On Oct 4, 2008, at 4:24 AM, prerna07 wrote:
I am using dismaxrequesthandler to boost by query on the basis of  
fields.

There are 5 indexes which contain the search string.

Field names which have this Searchcriteria are:
- statusName_s
- listOf_author
- prdMainTitle_s
- productDescription_s
- productURL_s


my query string is :
?q=Shahrukh&qt=dismaxrequest&qf=listOf_author%5E5.0+statusName_s 
%5E6.0+prdMainTitle_s%5E3.0



I dont need any boosting for productDescription_s and productURL_s  
hence i

am not giving these field names the query string above.
The results I am getting from this query does not contain documents  
where
the searchstring is present in productDescription_s and productURL_s  
fields


You still need to specify the fields you want searched, just use a  
boost of 1.0.


Erik





Is there any configuration in solrconfig which can handle this  
scenario.


Let me know if you need more information.

Thanks,
Prerna



--
View this message in context: 
http://www.nabble.com/Incomplete-search-by-using-dixmasequesthandler-tp19810056p19810056.html
Sent from the Solr - User mailing list archive at Nabble.com.




Incomplete search by using dixmasequesthandler

2008-10-04 Thread prerna07

Hi,

I am using dismaxrequesthandler to boost by query on the basis of fields.
There are 5 indexes which contain the search string.

Field names which have this Searchcriteria are:
- statusName_s
- listOf_author
- prdMainTitle_s
- productDescription_s
- productURL_s


my query string is :
?q=Shahrukh&qt=dismaxrequest&qf=listOf_author%5E5.0+statusName_s%5E6.0+prdMainTitle_s%5E3.0

I dont need any boosting for productDescription_s and productURL_s hence i
am not giving these field names the query string above. 
The results I am getting from this query does not contain documents where
the searchstring is present in productDescription_s and productURL_s fields

Is there any configuration in solrconfig which can handle this scenario.

Let me know if you need more information.

Thanks,
Prerna



-- 
View this message in context: 
http://www.nabble.com/Incomplete-search-by-using-dixmasequesthandler-tp19810056p19810056.html
Sent from the Solr - User mailing list archive at Nabble.com.