Re: Partial results with not enough hits

2012-11-22 Thread Aleksey Vorona

Thanks for the response.

I have increased the timeout and it did not increase execution time or 
system load. It is really that I misused the timeout.


Just to give you a bit of perspective, we added timeout to guarantee 
some level of QoS from the search engine. Our UI allows user to 
construct very complex queries and (what is worse) not all the time user 
really understands what she needs. That may become a problem if we have 
lots of users doing that. In this case I do not want to run such a 
complex query for seconds and want to return some result with a warning 
to the user that she is doing something wrong. But clearly, I set a 
timeout too low for that and started to harm even normal queries.


Anyway, thanks everyone for the replies. The issue is fixed and I now 
understand how timeout works much better (which was the reason to post 
to this list). Thanks!


-- Aleksey

On 12-11-22 06:37 AM, Otis Gospodnetic wrote:

Hi,

Maybe your goal should be to make your queries faster instead of fighting
with timeouts which are known not to work well.

What is your hardware like?
How about your queries?
What do you see in debugQuery=true output?

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm
On Nov 21, 2012 6:04 PM, "Aleksey Vorona"  wrote:


In all of my queries I have timeAllowed parameter. My application is ready
for partial results. However, whenever Solr returns partial result it is a
very bad result.

For example, I have a test query and here its execution log with the
strict time allowed:
 WARNING: Query: ; Elapsed time: 120Exceeded allowed search
time: 100 ms.
 INFO: [] webapp=/solr path=/select params={&timeAllowed=**100}
hits=189 status=0 QTime=119
Here it is without such a strict limitation:
 INFO: [] webapp=/solr path=/select params={&timeAllowed=**1}
hits=582 status=0 QTime=124

The total execution time is different by mere 5 ms, but the partial result
has only about 1/3 of the full result.

Is it the expected behaviour? Does that mean I can never rely on the
partial results?

I added timeAllowed to protect from too expensive wide queries, but I
still want to return something relevant to the user. This query returned
30% of the full result, but I have other queries in the log where partial
result is just empty. Am I doing something wrong?

P.S. I am using Solr 3.6.1, index size is 3Gb and easily fits in memory.
Load Average on the Solr box is very low.

-- Aleksey





Re: Partial results with not enough hits

2012-11-22 Thread Aleksey Vorona

Thank you!

That seems to be the case, I tried to execute queries without sorting 
and only one document in the response and I got execution time in the 
same range as before.


-- Aleksey

On 12-11-21 04:07 PM, Jack Krupansky wrote:

It could be that the time to get set up to return even the first result is
high and then each additional document is a minimal increment in time.

Do a query with &rows=1 (or even 0) and see what the minimum query time is
for your query, index, and environment.

-- Jack Krupansky

-Original Message-
From: Aleksey Vorona
Sent: Wednesday, November 21, 2012 6:04 PM
To: solr-user@lucene.apache.org
Subject: Partial results with not enough hits

In all of my queries I have timeAllowed parameter. My application is
ready for partial results. However, whenever Solr returns partial result
it is a very bad result.

For example, I have a test query and here its execution log with the
strict time allowed:
  WARNING: Query: ; Elapsed time: 120Exceeded allowed search
time: 100 ms.
  INFO: [] webapp=/solr path=/select
params={&timeAllowed=100} hits=189 status=0 QTime=119
Here it is without such a strict limitation:
  INFO: [] webapp=/solr path=/select
params={&timeAllowed=1} hits=582 status=0 QTime=124

The total execution time is different by mere 5 ms, but the partial
result has only about 1/3 of the full result.

Is it the expected behaviour? Does that mean I can never rely on the
partial results?

I added timeAllowed to protect from too expensive wide queries, but I
still want to return something relevant to the user. This query returned
30% of the full result, but I have other queries in the log where
partial result is just empty. Am I doing something wrong?

P.S. I am using Solr 3.6.1, index size is 3Gb and easily fits in memory.
Load Average on the Solr box is very low.

-- Aleksey






Partial results with not enough hits

2012-11-21 Thread Aleksey Vorona
In all of my queries I have timeAllowed parameter. My application is 
ready for partial results. However, whenever Solr returns partial result 
it is a very bad result.


For example, I have a test query and here its execution log with the 
strict time allowed:
WARNING: Query: ; Elapsed time: 120Exceeded allowed search 
time: 100 ms.
INFO: [] webapp=/solr path=/select 
params={&timeAllowed=100} hits=189 status=0 QTime=119

Here it is without such a strict limitation:
INFO: [] webapp=/solr path=/select 
params={&timeAllowed=1} hits=582 status=0 QTime=124


The total execution time is different by mere 5 ms, but the partial 
result has only about 1/3 of the full result.


Is it the expected behaviour? Does that mean I can never rely on the 
partial results?


I added timeAllowed to protect from too expensive wide queries, but I 
still want to return something relevant to the user. This query returned 
30% of the full result, but I have other queries in the log where 
partial result is just empty. Am I doing something wrong?


P.S. I am using Solr 3.6.1, index size is 3Gb and easily fits in memory. 
Load Average on the Solr box is very low.


-- Aleksey


All-wildcard query performance

2012-11-19 Thread Aleksey Vorona

Hi,

Our application sometimes generates queries with one of the constraints:
field:[* TO *]

I expected this query performance to be the same as if we omitted the 
"field" constraint completely. However, I see the performance of the two 
queries to differ drastically (3ms without all-wildcard constraint, 
200ms with it).


Could someone explain the source of the difference, please?

I am fixing the application not to generate such queries, obviously, but 
still would like to understand the logic here. We use Solr 3.6.1. Thanks.


-- Aleksey


Re: Solr Replication and Autocommit

2012-09-27 Thread Aleksey Vorona

Thank both of you for the responses!

-- Aleksey

On 12-09-27 03:51 AM, Erick Erickson wrote:

I'll echo Otis, nothing comes to mind...

Unless you were indexing stuff to the _slaves_, which you should
never do, now or in the past

Erick

On Thu, Sep 27, 2012 at 12:00 AM, Aleksey Vorona  wrote:

Hi,

I remember having some issues with replication and autocommit previously.
But now we are using Solr 3.6.1. Are there any known issues or any other
reasons to avoid autocommit while using replication? I guess not, just want
confirmation from someone confident and competent.

-- Aleksey




Solr Replication and Autocommit

2012-09-26 Thread Aleksey Vorona

Hi,

I remember having some issues with replication and autocommit 
previously. But now we are using Solr 3.6.1. Are there any known issues 
or any other reasons to avoid autocommit while using replication? I 
guess not, just want confirmation from someone confident and competent.


-- Aleksey


Re: Search by field with the space in it

2012-09-19 Thread Aleksey Vorona
Thank you for that insight. I, myself, would've liked to remove the 
spaces, but it is not possible in that particular project.


I see that I need to learn more about Lucene. Hopefully that will help 
me avoid some of those headaches to come.


-- Aleksey

On 12-09-19 11:42 AM, Erick Erickson wrote:

I would _really_ recommend that you re-do your schema and
take spaces out of your field names. That may require that
you change your indexing program to not send spaces in dynamic
field names

This is the kind of thing that causes endless headaches as time
goes forward.

You don't _have_ to, but I predict you'll regret if if you don't .

Best
Erick

On Wed, Sep 19, 2012 at 2:11 PM, Aleksey Vorona  wrote:

On 12-09-19 11:04 AM, Ahmet Arslan wrote:

I have a field with space in its name (that is a dynamic
field). How can I execute search on it?

I tried "q=aattr_box%20%type_sc:super" and it did not work

The field name is "aattr_box type"

How about q=aattr_box\ type_sc:super


That works! Thank you!

Sidenote: of course I urlencode space.

-- Aleksey




Re: Search by field with the space in it

2012-09-19 Thread Aleksey Vorona

On 12-09-19 11:04 AM, Ahmet Arslan wrote:

I have a field with space in its name (that is a dynamic
field). How can I execute search on it?

I tried "q=aattr_box%20%type_sc:super" and it did not work

The field name is "aattr_box type"

How about q=aattr_box\ type_sc:super


That works! Thank you!

Sidenote: of course I urlencode space.

-- Aleksey


Search by field with the space in it

2012-09-19 Thread Aleksey Vorona

Hi,

I have a field with space in its name (that is a dynamic field). How can 
I execute search on it?


I tried "q=aattr_box%20%type_sc:super" and it did not work

The field name is "aattr_box type"

-- Aleksey


Re: Solr not allowing persistent HTTP connections

2012-09-06 Thread Aleksey Vorona

Thank you. I did the test with curl the same way you did it and it works.

I still can not get ab ("apache benchmark") to reuse connections to 
solr. I'll investigate this further.


$ ab -c 1 -n 100 -k 'http://localhost:8983/solr/select?q=*:*' | grep Alive
Keep-Alive requests:0

-- Aleksey

On 12-09-06 11:07 AM, Chris Hostetter wrote:

: Some extra information. If I use curl and force it to use HTTP 1.0, it is more
: visible that Solr doesn't allow persistent connections:

a) solr has nothing to do with it, it's entirely something under the
control of jetty & the client.

b) i think you are introducing confusion by trying to force an HTTP/1.0
connection -- Jetty supports Keep-Alive for HTTP/1.1, but maybe not for
HTTP/1.0 ?

If you use curl to request multiple URLs and just let curl & jetty do
their normal behavior (w/o trying to bypass anything or manually add
headers) you can see that keep-alive is in fact working...

$ curl -v --keepalive 'http://localhost:8983/solr/select?q=*:*' 
'http://localhost:8983/solr/select?q=foo'
* About to connect() to localhost port 8983 (#0)
*   Trying 127.0.0.1... connected

GET /solr/select?q=*:* HTTP/1.1
User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 
zlib/1.2.3.4 libidn/1.23 librtmp/2.3
Host: localhost:8983
Accept: */*


< HTTP/1.1 200 OK
< Content-Type: application/xml; charset=UTF-8
< Transfer-Encoding: chunked
<


01*:*

* Connection #0 to host localhost left intact
* Re-using existing connection! (#0) with host localhost
* Connected to localhost (127.0.0.1) port 8983 (#0)

GET /solr/select?q=foo HTTP/1.1
User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 
zlib/1.2.3.4 libidn/1.23 librtmp/2.3
Host: localhost:8983
Accept: */*


< HTTP/1.1 200 OK
< Content-Type: application/xml; charset=UTF-8
< Transfer-Encoding: chunked
<


00foo

* Connection #0 to host localhost left intact
* Closing connection #0





-Hoss





Re: Solr not allowing persistent HTTP connections

2012-09-05 Thread Aleksey Vorona
Some extra information. If I use curl and force it to use HTTP 1.0, it 
is more visible that Solr doesn't allow persistent connections:


$ curl -v -0 'http://localhost:8983/solr/select?q=*:*' -H'Connection: 
Keep-Alive'* About to connect() to localhost port 8983 (#0)

*   Trying ::1... connected
> GET /solr/select?q=*:* HTTP/1.0
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 
OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3

> Host: localhost:8983
> Accept: */*
> Connection: Keep-Alive
>
< HTTP/1.1 200 OK
< Content-Type: application/xml; charset=UTF-8
* no chunk, no close, no size. Assume close to signal end
<


...removed the rest of the response body...

-- Aleksey

On 12-09-05 03:54 PM, Aleksey Vorona wrote:

Hi,

Running example Solr from the 3.6.1 distribution I can not make it to
keep persistent HTTP connections:

$ ab -c 1 -n 100 -k 'http://localhost:8983/solr/select?q=*:*' | grep
Keep-Alive
Keep-Alive requests:0

What should I change to fix that?

P.S. We have the same issue in production with Jetty 7, but I thought it
would be better to ask about Solr example, since it is easier for anyone
to reproduce the issue.

-- Aleksey





Solr not allowing persistent HTTP connections

2012-09-05 Thread Aleksey Vorona

Hi,

Running example Solr from the 3.6.1 distribution I can not make it to 
keep persistent HTTP connections:


$ ab -c 1 -n 100 -k 'http://localhost:8983/solr/select?q=*:*' | grep 
Keep-Alive

Keep-Alive requests:0

What should I change to fix that?

P.S. We have the same issue in production with Jetty 7, but I thought it 
would be better to ask about Solr example, since it is easier for anyone 
to reproduce the issue.


-- Aleksey


Re: Solr and query abortion

2012-08-31 Thread Aleksey Vorona
We are working on optimizing query performance. My concern was to ensure 
some stable QoS. Given our API and UI layout, user may generate an 
expensive query. Given the nature of the service, user may want to 
"hack" it. Currently, our Search API is a good point to try to inflict 
DoS on our server. And even though search outage will not cause any real 
security concern, it would be not nice.


That is why I wanted to put a hard limit on the query complexity. Thank 
you for a hint on how to do it.


As a side note, search performance with Solr is great. It is only during 
a good load test I am able to see those long running queries. When there 
is no load, even the most expensive query I have takes less than 100ms 
to be processed. As you said, 2.5M docs is not a very big index.


Thanks again for the reply. I am not sure if we are going to implement 
custom component for Solr or put query complexity estimation code in our 
application. But in any case your response was greatly appreciated, 
because I was thinking that I am missing something.


-- Aleksey

On 12-08-30 05:51 AM, Erick Erickson wrote:

The first thing I'd do is run your query with &debguQuery=on and look
at the "timings" section. That'll tell you what component is taking all
the time and should help you figure out where the problem is

But worst-case you could implement a custom component to stop
processing after some set number of responses..

2.5M docs isn't a very big index. So I'd look at the rest of the
tuning knobs before jumping to a solution. Also be aware that
the first time, for instance, a sort gets performed there's a lengthy
hit for warming the caches so you should disregard the first few
queries, or do appropriate autowarming.

Best
Erick

On Wed, Aug 29, 2012 at 1:26 PM, Aleksey Vorona  wrote:

Hi, we are running Solr 3.6.1 and see an issue in our load tests. Some of
the queries our load test script produces result in huge number of hits. It
may go as high as 90% of all documents we have (2.5M). Those are all range
queries. I see in the log that those queries take much more time to execute.

Since such a query does not make any sense from the end user perspective, I
would like to limit its performance impact.

Is it possible to abort the query after certain number of document hits or
certain time elapsed and return a error? I would render that error as
"Please refine your search" message to the end user in my application. I
know that many sites on the web do that, and I guess most of them do that
with Solr.

I tried setting timeAllowed limit, but, for some reason, I did not see those
query times to go down. I suspect that most of the time is spent not in
Search phase (which is the only one respecting timeAllowed, as far as I
know), but in the sorting phase. And still, I want to abort any longer
running query. Otherwise they accumulate over time, pushing server's load
average sky high and killing performance even for regular queries.

-- Aleksey




Re: Null Pointer Exception on DIH with MySQL

2012-08-29 Thread Aleksey Vorona
Thank you for the reply. We rebuilt solr from sources, reinstalled it 
and the problem went away. As it was never reproducible on any other 
server, I blame some mysterious java byte code corruption on that 
server. The assumption I would never be able to verify, because we did 
not make a copy of the previous binaries.


-- Aleksey

On 12-08-29 06:17 PM, Erick Erickson wrote:

Not much information to go on here, have you tried the DIH
debugging console? See:
http://wiki.apache.org/solr/DataImportHandler#interactive

Best
Erick

On Mon, Aug 27, 2012 at 7:22 PM, Aleksey Vorona  wrote:

We have Solr 3.6.1 running on Jetty (7.x) and using DIH to get data from the
MySQL database. On one of the environment the import always fails with an
exception: http://pastebin.com/tG28cHPe

It is a null pointer exception on connection being null. I've tested that I
can connect from the Solr server to Mysql server via command line mysql
client.

Does anybody knows anything about this exception and how to fix it?

I am not able to reproduce it on any other environment.

-- Aleksey




Re: Load Testing in Solr

2012-08-29 Thread Aleksey Vorona

On 12-08-29 11:44 AM, dhaivat dave wrote:

Hello everyone .

Can any one know any component or tool that can be used for testing the
solr performance.


People were recommending https://code.google.com/p/solrmeter/ earlier.

-- Aleksey



Solr and query abortion

2012-08-29 Thread Aleksey Vorona
Hi, we are running Solr 3.6.1 and see an issue in our load tests. Some 
of the queries our load test script produces result in huge number of 
hits. It may go as high as 90% of all documents we have (2.5M). Those 
are all range queries. I see in the log that those queries take much 
more time to execute.


Since such a query does not make any sense from the end user 
perspective, I would like to limit its performance impact.


Is it possible to abort the query after certain number of document hits 
or certain time elapsed and return a error? I would render that error as 
"Please refine your search" message to the end user in my application. I 
know that many sites on the web do that, and I guess most of them do 
that with Solr.


I tried setting timeAllowed limit, but, for some reason, I did not see 
those query times to go down. I suspect that most of the time is spent 
not in Search phase (which is the only one respecting timeAllowed, as 
far as I know), but in the sorting phase. And still, I want to abort any 
longer running query. Otherwise they accumulate over time, pushing 
server's load average sky high and killing performance even for regular 
queries.


-- Aleksey


Null Pointer Exception on DIH with MySQL

2012-08-27 Thread Aleksey Vorona
We have Solr 3.6.1 running on Jetty (7.x) and using DIH to get data from 
the MySQL database. On one of the environment the import always fails 
with an exception: http://pastebin.com/tG28cHPe


It is a null pointer exception on connection being null. I've tested 
that I can connect from the Solr server to Mysql server via command line 
mysql client.


Does anybody knows anything about this exception and how to fix it?

I am not able to reproduce it on any other environment.

-- Aleksey