Re: Solr performance issues

2014-12-26 Thread Otis Gospodnetic
Likely lots of disk + network IO, yes. Put SPM for Solr on your nodes to double 
check.

 Otis

> On Dec 26, 2014, at 09:17, Mahmoud Almokadem  wrote:
> 
> Dears,
> 
> We've installed a cluster of one collection of 350M documents on 3
> r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is
> about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS
> General purpose (1x1TB + 1x500GB) on each instance. Then we create logical
> volume using LVM of 1.5TB to fit our index.
> 
> The response time is about 1 and 3 seconds for simple queries (1 token).
> 
> Is the LVM become a bottleneck for our index?
> 
> Thanks for help.


Re: Solr performance issues

2014-12-28 Thread Shawn Heisey
On 12/26/2014 7:17 AM, Mahmoud Almokadem wrote:
> We've installed a cluster of one collection of 350M documents on 3
> r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is
> about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS
> General purpose (1x1TB + 1x500GB) on each instance. Then we create logical
> volume using LVM of 1.5TB to fit our index.
> 
> The response time is about 1 and 3 seconds for simple queries (1 token).
> 
> Is the LVM become a bottleneck for our index?

SSD is very fast, but its speed is very slow when compared to RAM.  The
problem here is that Solr must read data off the disk in order to do a
query, and even at SSD speeds, that is slow.  LVM is not the problem
here, though it's possible that it may be a contributing factor.  You
need more RAM.

For Solr to be fast, a large percentage (ideally 100%, but smaller
fractions can often be enough) of the index must be loaded into unused
RAM by the operating system.  Your information seems to indicate that
the index is about 3 terabytes.  If that's the index size, I would guess
that you would need somewhere between 1 and 2 terabytes of total RAM for
speed to be acceptable.  Because RAM is *very* expensive on Amazon and
is not available in sizes like 256GB-1TB, that typically means a lot of
their virtual machines, with a lot of shards in SolrCloud.  You may find
that real hardware is less expensive for very large Solr indexes in the
long term than cloud hardware.

Thanks,
Shawn



RE: Solr performance issues

2014-12-28 Thread Toke Eskildsen
Mahmoud Almokadem [prog.mahm...@gmail.com] wrote:
> We've installed a cluster of one collection of 350M documents on 3
> r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is
> about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS
> General purpose (1x1TB + 1x500GB) on each instance. Then we create logical
> volume using LVM of 1.5TB to fit our index.

Your search speed will be limited by the slowest storage in your group, which 
would be your 500GB EBS. The General Purpose SSD option means (as far as I can 
read at http://aws.amazon.com/ebs/details/#piops) that your baseline of 3 
IOPS/MB = 1500 IOPS, with bursts of 3000 IOPS. Unfortunately they do not say 
anything about latency.

For comparison, I checked the system logs from a local test with our 21TB / 7 
billion documents index. It used ~27,000 IOPS during the test, with mean search 
time a bit below 1 second. That was with ~100GB RAM for disk cache, which is 
about ½% of index size. The test was with simple term queries (1-3 terms) and 
some faceting. Back of the envelope: 27,000 IOPS for 21TB is ~1300 IOPS/TB. 
Your indexes are 1.1TB, so 1.1*1300 IOPS ~= 1400 IOPS.

All else being equal (which is never the case), getting 1-3 second response 
times for a 1.1TB index, when one link in the storage chain is capped at a few 
thousand IOPS, you are using networked storage and you have little RAM for 
caching, does not seem unrealistic. If possible, you could try temporarily 
boosting performance of the EBS, to see if raw IO is the bottleneck.

> The response time is about 1 and 3 seconds for simple queries (1 token).

Is the index updated while you are searching?
Do you do any faceting or other heavy processing as part of a search?
How many hits does a search typically have and how many documents are returned?
How many concurrent searches do you need to support? How fast should the 
response time be?

- Toke Eskildsen


Re: Solr performance issues

2014-12-29 Thread Mahmoud Almokadem
Thanks all.

I've the same index with a bit different schema and 200M documents,
installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size
of index is about 1.5TB, have many updates every 5 minutes, complex queries
and faceting with response time of 100ms that is acceptable for us.

Toke Eskildsen,

Is the index updated while you are searching? *No*
Do you do any faceting or other heavy processing as part of a search? *No*
How many hits does a search typically have and how many documents are
returned? *The test for QTime only with no documents returned and No. of
hits varying from 50,000 to 50,000,000.*
How many concurrent searches do you need to support? How fast should the
response time be? *May be 100 concurrent searches with 100ms with facets.*

Does splitting the shard to two shards on the same node so every shard will
be on a single EBS Volume better than using LVM?

Thanks

On Mon, Dec 29, 2014 at 2:00 AM, Toke Eskildsen 
wrote:

> Mahmoud Almokadem [prog.mahm...@gmail.com] wrote:
> > We've installed a cluster of one collection of 350M documents on 3
> > r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is
> > about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS
> > General purpose (1x1TB + 1x500GB) on each instance. Then we create
> logical
> > volume using LVM of 1.5TB to fit our index.
>
> Your search speed will be limited by the slowest storage in your group,
> which would be your 500GB EBS. The General Purpose SSD option means (as far
> as I can read at http://aws.amazon.com/ebs/details/#piops) that your
> baseline of 3 IOPS/MB = 1500 IOPS, with bursts of 3000 IOPS. Unfortunately
> they do not say anything about latency.
>
> For comparison, I checked the system logs from a local test with our 21TB
> / 7 billion documents index. It used ~27,000 IOPS during the test, with
> mean search time a bit below 1 second. That was with ~100GB RAM for disk
> cache, which is about ½% of index size. The test was with simple term
> queries (1-3 terms) and some faceting. Back of the envelope: 27,000 IOPS
> for 21TB is ~1300 IOPS/TB. Your indexes are 1.1TB, so 1.1*1300 IOPS ~= 1400
> IOPS.
>
> All else being equal (which is never the case), getting 1-3 second
> response times for a 1.1TB index, when one link in the storage chain is
> capped at a few thousand IOPS, you are using networked storage and you have
> little RAM for caching, does not seem unrealistic. If possible, you could
> try temporarily boosting performance of the EBS, to see if raw IO is the
> bottleneck.
>
> > The response time is about 1 and 3 seconds for simple queries (1 token).
>
> Is the index updated while you are searching?
> Do you do any faceting or other heavy processing as part of a search?
> How many hits does a search typically have and how many documents are
> returned?
> How many concurrent searches do you need to support? How fast should the
> response time be?
>
> - Toke Eskildsen
>


Re: Solr performance issues

2014-12-29 Thread Shawn Heisey
On 12/29/2014 2:36 AM, Mahmoud Almokadem wrote:
> I've the same index with a bit different schema and 200M documents,
> installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size
> of index is about 1.5TB, have many updates every 5 minutes, complex queries
> and faceting with response time of 100ms that is acceptable for us.
> 
> Toke Eskildsen,
> 
> Is the index updated while you are searching? *No*
> Do you do any faceting or other heavy processing as part of a search? *No*
> How many hits does a search typically have and how many documents are
> returned? *The test for QTime only with no documents returned and No. of
> hits varying from 50,000 to 50,000,000.*
> How many concurrent searches do you need to support? How fast should the
> response time be? *May be 100 concurrent searches with 100ms with facets.*
> 
> Does splitting the shard to two shards on the same node so every shard will
> be on a single EBS Volume better than using LVM?

The basic problem is simply that the system has so little memory that it
must read large amounts of data from the disk when it does a query.
There is not enough RAM to cache the important parts of the index.  RAM
is much faster than disk, even SSD.

Typical consumer-grade DDR3-1600 memory has a data transfer rate of
about 12800 megabytes per second.  If it's ECC memory (which I would say
is a requirement) then the transfer rate is probably a little bit slower
than that.  Figuring 9 bits for every byte gets us about 11377 MB/s.
That's only an estimate, and it could be wrong in either direction, but
I'll go ahead and use it.

http://en.wikipedia.org/wiki/DDR3_SDRAM#JEDEC_standard_modules

If your SSD is SATA, the transfer rate will be limited to approximately
600MB/s -- the 6 gigabit per second transfer rate of the newest SATA
standard.  That makes memory about 18 times as fast as SATA SSD.  I saw
one PCI express SSD that claimed a transfer rate of 2900 MB/s.  Even
that is only about one fourth of the estimated speed of DDR3-1600 with
ECC.  I don't know what interface technology Amazon uses for their SSD
volumes, but I would bet on it being the cheaper version, which would
mean SATA.  The networking between the EC2 instance and the EBS storage
is unknown to me and may be a further bottleneck.

http://ocz.com/enterprise/z-drive-4500/specifications

Bottom line -- you need a lot more memory.  Speeding up the disk may
*help* ... but it will not replace that simple requirement.  With EC2 as
the platform, you may need more instances and more shards.

Your 200 million document index that works well with only 90GB of total
memory ... that's surprising to me.  That means that the important parts
of that index *do* fit in memory ... but if the index gets much larger,
performance is likely to drop off sharply.

Thanks,
Shawn



Re: Solr performance issues

2014-12-29 Thread Mahmoud Almokadem
Thanks Shawn.

What do you mean with "important parts of index"? and how to calculate their 
size?

Thanks,
Mahmoud

Sent from my iPhone

> On Dec 29, 2014, at 8:19 PM, Shawn Heisey  wrote:
> 
>> On 12/29/2014 2:36 AM, Mahmoud Almokadem wrote:
>> I've the same index with a bit different schema and 200M documents,
>> installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size
>> of index is about 1.5TB, have many updates every 5 minutes, complex queries
>> and faceting with response time of 100ms that is acceptable for us.
>> 
>> Toke Eskildsen,
>> 
>> Is the index updated while you are searching? *No*
>> Do you do any faceting or other heavy processing as part of a search? *No*
>> How many hits does a search typically have and how many documents are
>> returned? *The test for QTime only with no documents returned and No. of
>> hits varying from 50,000 to 50,000,000.*
>> How many concurrent searches do you need to support? How fast should the
>> response time be? *May be 100 concurrent searches with 100ms with facets.*
>> 
>> Does splitting the shard to two shards on the same node so every shard will
>> be on a single EBS Volume better than using LVM?
> 
> The basic problem is simply that the system has so little memory that it
> must read large amounts of data from the disk when it does a query.
> There is not enough RAM to cache the important parts of the index.  RAM
> is much faster than disk, even SSD.
> 
> Typical consumer-grade DDR3-1600 memory has a data transfer rate of
> about 12800 megabytes per second.  If it's ECC memory (which I would say
> is a requirement) then the transfer rate is probably a little bit slower
> than that.  Figuring 9 bits for every byte gets us about 11377 MB/s.
> That's only an estimate, and it could be wrong in either direction, but
> I'll go ahead and use it.
> 
> http://en.wikipedia.org/wiki/DDR3_SDRAM#JEDEC_standard_modules
> 
> If your SSD is SATA, the transfer rate will be limited to approximately
> 600MB/s -- the 6 gigabit per second transfer rate of the newest SATA
> standard.  That makes memory about 18 times as fast as SATA SSD.  I saw
> one PCI express SSD that claimed a transfer rate of 2900 MB/s.  Even
> that is only about one fourth of the estimated speed of DDR3-1600 with
> ECC.  I don't know what interface technology Amazon uses for their SSD
> volumes, but I would bet on it being the cheaper version, which would
> mean SATA.  The networking between the EC2 instance and the EBS storage
> is unknown to me and may be a further bottleneck.
> 
> http://ocz.com/enterprise/z-drive-4500/specifications
> 
> Bottom line -- you need a lot more memory.  Speeding up the disk may
> *help* ... but it will not replace that simple requirement.  With EC2 as
> the platform, you may need more instances and more shards.
> 
> Your 200 million document index that works well with only 90GB of total
> memory ... that's surprising to me.  That means that the important parts
> of that index *do* fit in memory ... but if the index gets much larger,
> performance is likely to drop off sharply.
> 
> Thanks,
> Shawn
> 


Re: Solr performance issues

2014-12-29 Thread Shawn Heisey
On 12/29/2014 12:07 PM, Mahmoud Almokadem wrote:
> What do you mean with "important parts of index"? and how to calculate their 
> size?

I have no formal education in what's important when it comes to doing a
query, but I can make some educated guesses.

Starting with this as a reference:

http://lucene.apache.org/core/4_10_0/core/org/apache/lucene/codecs/lucene410/package-summary.html#file-names

I would guess that the segment info (.si) files and the term index
(*.tip) files would be supremely important to *always* have in memory,
and they are fairly small.  Next would be the term dictionary (*.tim)
files.  The term dictionary is pretty big, and would be very important
for fast queries.

Frequencies, positions, and norms may also be important, depending on
exactly what kind of query you have.  Frequencies and positions are
quite large.  Frequencies are critical for relevence ranking (the
default sort by score), and positions are important for phrase queries.
 Position data may also be used by relevance ranking, but I am not
familiar enough with it to say for sure.

If you have docvalues defined, then *.dvm and *.dvd files would be used
for facets and sorting on those specific fields.  The *.dvd files can be
very big, depending on your schema.

The *.fdx and *.fdt files become important when actually retrieving
results after the matching documents have been determined.  The stored
data is compressed, so additional CPU power is required to uncompress
that data before it is sent to the client.  Stored data may be large or
small, depending on your schema.  Stored data does not directly affect
search speed, but if memory space is limited, every block of stored data
that gets retrieved will result in some other part of the index being
removed from the OS disk cache, which means that it might need to be
re-read from the disk on the next query.

Thanks,
Shawn



RE: Solr performance issues

2014-12-29 Thread Toke Eskildsen
Mahmoud Almokadem [prog.mahm...@gmail.com] wrote:
> I've the same index with a bit different schema and 200M documents,
> installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size
> of index is about 1.5TB, have many updates every 5 minutes, complex queries
> and faceting with response time of 100ms that is acceptable for us.

So you have
Setup 1: 3 * (30GB RAM + 600GB SSD) for a total of 1.5TB index 200M docs. 
Acceptable performance.
Setup 2: 3 * (60GB RAM + 1TB SSD + 500GB SSD) for a total of 3.3TB 350M docs. 
Poor performance.

The only real difference, besides doubling everything, is the LVM? I understand 
why you find that to be the culprit, but from what I can read, the overhead 
should not be anywhere near enough to result in the performance drop you are 
describing. Could it be that some snapshotting or backup was running when you 
tested?

Splitting your shards and doubling the number of machines, as you suggest, 
would result in
Setup 3: 6 * (60GB RAM + 600GB SSD) for a total of 3.3TB 350M docs.
which would be remarkable similar to your setup 1. I think that would be the 
next logical step, unless you can easily do a temporary boost of your IOPS.

BTW: You are getting dangerously close to your storage limits here - it seems 
that a single large merge could make you run out of space.

- Toke Eskildsen


Re: Solr performance issues

2008-06-19 Thread Yonik Seeley
On Thu, Jun 19, 2008 at 6:11 PM, Sébastien Rainville
<[EMAIL PROTECTED]> wrote:
> I've been using solr for a little without worrying too much about how it
> works but now it's becoming a bottleneck in my application. I have a couple
> issues with it:
>
> 1. My index always gets slower and slower when commiting/optimizing for some
> obscure reason. It goes from 1 second with a new index to 45 seconds with an
> index with the same amount of data but used for a few days. Restarting solr
> doesn't fix it. The only way I found to fix that is to delete the whole
> index completely by deleting the index folder. Then when I rebuild the index
> everything goes back to normal and fast... and then performance slowly
> deteriorates again. So, the amount of data is not a factor because
> rebuilding the index from scratch fixes the problem and I am sending
> "optimize" once in a while... even maybe too often.

This sounds like OS caching to me.  A large amount of a "new" index
that was just written will be in cache and thus much faster to
optimize.

If your index is smaller than the amount of RAM, go to the index
directory of an "old" index, then try "cat * > /dev/null" and then try
optimize to see of that's the case.

> 2. I use acts_as_solr and by default they only make "post" requests, even
> for /select. With that setup the response time for most queries, simple or
> complex ones, were ranging from 150ms to 600ms, with an average of 250ms. I
> changed the select request to use "get" requests instead and now the
> response time is down to 10ms to 60ms. Did someone seen that before? Why is
> it doing it?

Are the get requests being cached by the ruby stuff?

But even with no caching, I've seen differences with get/post on Linux
with the python client when persistent HTTP connections were in use.
I tracked it down to the POST being written in two parts, triggering
nagle's algorithm in the networking stack.

-Yonik


Re: Solr performance issues

2008-06-20 Thread Erik Hatcher


On Jun 19, 2008, at 6:28 PM, Yonik Seeley wrote:
2. I use acts_as_solr and by default they only make "post"  
requests, even
for /select. With that setup the response time for most queries,  
simple or
complex ones, were ranging from 150ms to 600ms, with an average of  
250ms. I

changed the select request to use "get" requests instead and now the
response time is down to 10ms to 60ms. Did someone seen that  
before? Why is

it doing it?


Are the get requests being cached by the ruby stuff?


No, I'm sure that the results aren't being cached by Ruby's library,  
solr-ruby, or acts_as_solr.



But even with no caching, I've seen differences with get/post on Linux
with the python client when persistent HTTP connections were in use.
I tracked it down to the POST being written in two parts, triggering
nagle's algorithm in the networking stack.


There was another post I found that mentioned this a couple of years  
ago:




I would welcome patches with tests that allow solr-ruby to send most  
requests with GET, and the ones that are actually sending a body  
beyond just parameters (delete, update, commit) as POST.


Erik



Re: Solr performance issues

2008-06-20 Thread Sébastien Rainville
On Fri, Jun 20, 2008 at 8:32 AM, Erik Hatcher <[EMAIL PROTECTED]>
wrote:

>
> On Jun 19, 2008, at 6:28 PM, Yonik Seeley wrote:
>
>> 2. I use acts_as_solr and by default they only make "post" requests, even
>>> for /select. With that setup the response time for most queries, simple
>>> or
>>> complex ones, were ranging from 150ms to 600ms, with an average of 250ms.
>>> I
>>> changed the select request to use "get" requests instead and now the
>>> response time is down to 10ms to 60ms. Did someone seen that before? Why
>>> is
>>> it doing it?
>>>
>>
>> Are the get requests being cached by the ruby stuff?
>>
>
> No, I'm sure that the results aren't being cached by Ruby's library,
> solr-ruby, or acts_as_solr.
>

I confirm that the results are not cached by Ruby's library.


But even with no caching, I've seen differences with get/post on Linux
>> with the python client when persistent HTTP connections were in use.
>> I tracked it down to the POST being written in two parts, triggering
>> nagle's algorithm in the networking stack.
>>
>
> There was another post I found that mentioned this a couple of years ago:
>
> 
>
> I would welcome patches with tests that allow solr-ruby to send most
> requests with GET, and the ones that are actually sending a body beyond just
> parameters (delete, update, commit) as POST.
>
>Erik
>
>
I made a few modifications but it still need more testing...

Sebastien


Re: Solr Performance Issues

2010-03-11 Thread Erick Erickson
How many outstanding queries do you have at a time? Is it possible
that when you start, you have only a few queries executing concurrently
but as your test runs you have hundreds?

This really is a question of how your load test is structured. You might
get a better sense of how it works if your tester had a limited number
of threads running so the max concurrent requests SOLR was serving
at once were capped (30, 50, whatever).

But no, I wouldn't expect SOLR to bog down the way you're describing
just because it was running for a while.

HTH
Erick

On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel wrote:

> Hi everyone,
>
> I have an index corresponding to ~2.5 million documents. The index size is
> 43GB. The configuration of the machine which is running Solr is - Dual
> Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache, 8GB
> RAM, and 250 GB HDD.
>
> I'm observing a strange trend in the queries that I send to Solr. The query
> times for queries that I send earlier is much lesser than the queries I
> send
> afterwards. For instance, if I write a script to query solr 5000 times
> (with
> 5000 distinct queries, most of them containing not more than 3-5 words)
> with
> 10 threads running in parallel, the average times for queries goes from
> ~50ms in the beginning to ~6000ms. Is this expected or is there something
> wrong with my configuration. Currently I've configured the queryResultCache
> and the documentCache to contain 2048 entries (hit ratios for both is close
> to 50%).
>
> Apart from this, a general question that I want to ask is that is such a
> hardware enough for this scenario? I'm aiming at achieving around 20
> queries
> per second with the hardware mentioned above.
>
> Thanks,
>
> Regards,
>
> --
> - Siddhant
>


Re: Solr Performance Issues

2010-03-11 Thread Siddhant Goel
Hi Erick,

The way the load test works is that it picks up 5000 queries, splits them
according to the number of threads (so if we have 10 threads, it schedules
10 threads - each one sending 500 queries). So it might be possible that the
number of queries at a point later in time is greater than the number of
queries earlier in time. I'm not very sure about that though. Its a simple
Ruby script that starts up threads, calls the search function in each
thread, and then waits for each of them to exit.

How many queries per second can we expect Solr to serve, given this kind of
hardware? If what you suggest is true, then is it possible that while Solr
is serving a query, another query hits it, which increases the response time
even further? I'm not sure about it. But yes I can observe the query times
going up as I increase the number of threads.

Thanks,

Regards,

On Thu, Mar 11, 2010 at 8:30 PM, Erick Erickson wrote:

> How many outstanding queries do you have at a time? Is it possible
> that when you start, you have only a few queries executing concurrently
> but as your test runs you have hundreds?
>
> This really is a question of how your load test is structured. You might
> get a better sense of how it works if your tester had a limited number
> of threads running so the max concurrent requests SOLR was serving
> at once were capped (30, 50, whatever).
>
> But no, I wouldn't expect SOLR to bog down the way you're describing
> just because it was running for a while.
>
> HTH
> Erick
>
> On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel  >wrote:
>
> > Hi everyone,
> >
> > I have an index corresponding to ~2.5 million documents. The index size
> is
> > 43GB. The configuration of the machine which is running Solr is - Dual
> > Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache,
> 8GB
> > RAM, and 250 GB HDD.
> >
> > I'm observing a strange trend in the queries that I send to Solr. The
> query
> > times for queries that I send earlier is much lesser than the queries I
> > send
> > afterwards. For instance, if I write a script to query solr 5000 times
> > (with
> > 5000 distinct queries, most of them containing not more than 3-5 words)
> > with
> > 10 threads running in parallel, the average times for queries goes from
> > ~50ms in the beginning to ~6000ms. Is this expected or is there something
> > wrong with my configuration. Currently I've configured the
> queryResultCache
> > and the documentCache to contain 2048 entries (hit ratios for both is
> close
> > to 50%).
> >
> > Apart from this, a general question that I want to ask is that is such a
> > hardware enough for this scenario? I'm aiming at achieving around 20
> > queries
> > per second with the hardware mentioned above.
> >
> > Thanks,
> >
> > Regards,
> >
> > --
> > - Siddhant
> >
>



-- 
- Siddhant


Re: Solr Performance Issues

2010-03-11 Thread Tom Burton-West

How much of your memory are you allocating to the JVM and how much are you
leaving free?  

If you don't leave enough free memory for the OS, the OS won't have a large
enough disk cache, and you will be hitting the disk for lots of queries. 

You might want to monitor your Disk I/O using iostat and look at the iowait.  

If you are doing phrase queries and your *prx file is significantly larger
than the available memory then when a slow phrase query hits Solr, the
contention for disk I/O with other queries could be slowing everything down.  
You might also want to look at the 90th and 99th percentile query times in
addition to the average. For our large indexes, we found at least an order
of magnitude difference between the average and 99th percentile queries. 
Again, if Solr gets hit with a few of those 99th percentile slow queries and
your not hitting your caches, chances are you will see serious contention
for disk I/O..

Of course if you don't see any waiting on i/o, then your bottleneck is
probably somewhere else:)

See
http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1
for more background on our experience.

Tom Burton-West
University of Michigan Library
www.hathitrust.org



>
> On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel  >wrote:
>
> > Hi everyone,
> >
> > I have an index corresponding to ~2.5 million documents. The index size
> is
> > 43GB. The configuration of the machine which is running Solr is - Dual
> > Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache,
> 8GB
> > RAM, and 250 GB HDD.
> >
> > I'm observing a strange trend in the queries that I send to Solr. The
> query
> > times for queries that I send earlier is much lesser than the queries I
> > send
> > afterwards. For instance, if I write a script to query solr 5000 times
> > (with
> > 5000 distinct queries, most of them containing not more than 3-5 words)
> > with
> > 10 threads running in parallel, the average times for queries goes from
> > ~50ms in the beginning to ~6000ms. Is this expected or is there
> something
> > wrong with my configuration. Currently I've configured the
> queryResultCache
> > and the documentCache to contain 2048 entries (hit ratios for both is
> close
> > to 50%).
> >
> > Apart from this, a general question that I want to ask is that is such a
> > hardware enough for this scenario? I'm aiming at achieving around 20
> > queries
> > per second with the hardware mentioned above.
> >
> > Thanks,
> >
> > Regards,
> >
> > --
> > - Siddhant
> >
>



-- 
- Siddhant



-- 
View this message in context: 
http://old.nabble.com/Solr-Performance-Issues-tp27864278p27868456.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Performance Issues

2010-03-11 Thread Mike Malloy

I dont mean to turn this into a sales pitch, but there is a tool for Java app
performance management that you may find helpful. Its called New Relic
(www.newrelic.com) and the tool can be installed in 2 minutes. It can give
you very deep visibility inside Solr and other Java apps. (Full disclosure I
work at New Relic.)
Mike

Siddhant Goel wrote:
> 
> Hi everyone,
> 
> I have an index corresponding to ~2.5 million documents. The index size is
> 43GB. The configuration of the machine which is running Solr is - Dual
> Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache, 8GB
> RAM, and 250 GB HDD.
> 
> I'm observing a strange trend in the queries that I send to Solr. The
> query
> times for queries that I send earlier is much lesser than the queries I
> send
> afterwards. For instance, if I write a script to query solr 5000 times
> (with
> 5000 distinct queries, most of them containing not more than 3-5 words)
> with
> 10 threads running in parallel, the average times for queries goes from
> ~50ms in the beginning to ~6000ms. Is this expected or is there something
> wrong with my configuration. Currently I've configured the
> queryResultCache
> and the documentCache to contain 2048 entries (hit ratios for both is
> close
> to 50%).
> 
> Apart from this, a general question that I want to ask is that is such a
> hardware enough for this scenario? I'm aiming at achieving around 20
> queries
> per second with the hardware mentioned above.
> 
> Thanks,
> 
> Regards,
> 
> -- 
> - Siddhant
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Solr-Performance-Issues-tp27864278p27872139.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Performance Issues

2010-03-12 Thread Siddhant Goel
I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS disk
caching.

I think that at any point of time, there can be a maximum of  concurrent requests, which happens to make sense btw (does it?).

As I increase the number of threads, the load average shown by top goes up
to as high as 80%. But if I keep the number of threads low (~10), the load
average never goes beyond ~8). So probably thats the number of requests I
can expect Solr to serve concurrently on this index size with this hardware.

Can anyone give a general opinion as to how much hardware should be
sufficient for a Solr deployment with an index size of ~43GB, containing
around 2.5 million documents? I'm expecting it to serve at least 20 requests
per second. Any experiences?

Thanks

On Fri, Mar 12, 2010 at 12:47 AM, Tom Burton-West wrote:

>
> How much of your memory are you allocating to the JVM and how much are you
> leaving free?
>
> If you don't leave enough free memory for the OS, the OS won't have a large
> enough disk cache, and you will be hitting the disk for lots of queries.
>
> You might want to monitor your Disk I/O using iostat and look at the
> iowait.
>
> If you are doing phrase queries and your *prx file is significantly larger
> than the available memory then when a slow phrase query hits Solr, the
> contention for disk I/O with other queries could be slowing everything
> down.
> You might also want to look at the 90th and 99th percentile query times in
> addition to the average. For our large indexes, we found at least an order
> of magnitude difference between the average and 99th percentile queries.
> Again, if Solr gets hit with a few of those 99th percentile slow queries
> and
> your not hitting your caches, chances are you will see serious contention
> for disk I/O..
>
> Of course if you don't see any waiting on i/o, then your bottleneck is
> probably somewhere else:)
>
> See
>
> http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1
> for more background on our experience.
>
> Tom Burton-West
> University of Michigan Library
> www.hathitrust.org
>
>
>
> >
> > On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel  > >wrote:
> >
> > > Hi everyone,
> > >
> > > I have an index corresponding to ~2.5 million documents. The index size
> > is
> > > 43GB. The configuration of the machine which is running Solr is - Dual
> > > Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache,
> > 8GB
> > > RAM, and 250 GB HDD.
> > >
> > > I'm observing a strange trend in the queries that I send to Solr. The
> > query
> > > times for queries that I send earlier is much lesser than the queries I
> > > send
> > > afterwards. For instance, if I write a script to query solr 5000 times
> > > (with
> > > 5000 distinct queries, most of them containing not more than 3-5 words)
> > > with
> > > 10 threads running in parallel, the average times for queries goes from
> > > ~50ms in the beginning to ~6000ms. Is this expected or is there
> > something
> > > wrong with my configuration. Currently I've configured the
> > queryResultCache
> > > and the documentCache to contain 2048 entries (hit ratios for both is
> > close
> > > to 50%).
> > >
> > > Apart from this, a general question that I want to ask is that is such
> a
> > > hardware enough for this scenario? I'm aiming at achieving around 20
> > > queries
> > > per second with the hardware mentioned above.
> > >
> > > Thanks,
> > >
> > > Regards,
> > >
> > > --
> > > - Siddhant
> > >
> >
>
>
>
> --
> - Siddhant
>
>
>
> --
> View this message in context:
> http://old.nabble.com/Solr-Performance-Issues-tp27864278p27868456.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
- Siddhant


Re: Solr Performance Issues

2010-03-12 Thread Erick Erickson
You've probably already looked at this, but here goes anyway. The
first question probably should have been "what are you measuring"?
I've been fooled before by looking at, say, average response time
and extrapolating. You're getting 20 qps if your response time is
1 second, but you have 20 threads running simultaneously, ditto
if you're getting 2 second response time and 40 threads. So

And what is "response time"? It would clarify things a lot if you
broke out which parts of the operation are taking the time. Going
from memory, debugQuery=on will let you know how much time
was spent in various operations in SOLR. It's important to know
whether it was the searching, assembling the response, or
transmitting the data back to the client. If your timings are
all just how long it takes the response to get back to the
client, you could even be hammered by network latency.

How many threads does it take to peg the CPU? And what
response times are you getting when your number of threads is
around 10?

Erick

On Fri, Mar 12, 2010 at 3:39 AM, Siddhant Goel wrote:

> I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS disk
> caching.
>
> I think that at any point of time, there can be a maximum of  threads> concurrent requests, which happens to make sense btw (does it?).
>
> As I increase the number of threads, the load average shown by top goes up
> to as high as 80%. But if I keep the number of threads low (~10), the load
> average never goes beyond ~8). So probably thats the number of requests I
> can expect Solr to serve concurrently on this index size with this
> hardware.
>
> Can anyone give a general opinion as to how much hardware should be
> sufficient for a Solr deployment with an index size of ~43GB, containing
> around 2.5 million documents? I'm expecting it to serve at least 20
> requests
> per second. Any experiences?
>
> Thanks
>
> On Fri, Mar 12, 2010 at 12:47 AM, Tom Burton-West  >wrote:
>
> >
> > How much of your memory are you allocating to the JVM and how much are
> you
> > leaving free?
> >
> > If you don't leave enough free memory for the OS, the OS won't have a
> large
> > enough disk cache, and you will be hitting the disk for lots of queries.
> >
> > You might want to monitor your Disk I/O using iostat and look at the
> > iowait.
> >
> > If you are doing phrase queries and your *prx file is significantly
> larger
> > than the available memory then when a slow phrase query hits Solr, the
> > contention for disk I/O with other queries could be slowing everything
> > down.
> > You might also want to look at the 90th and 99th percentile query times
> in
> > addition to the average. For our large indexes, we found at least an
> order
> > of magnitude difference between the average and 99th percentile queries.
> > Again, if Solr gets hit with a few of those 99th percentile slow queries
> > and
> > your not hitting your caches, chances are you will see serious contention
> > for disk I/O..
> >
> > Of course if you don't see any waiting on i/o, then your bottleneck is
> > probably somewhere else:)
> >
> > See
> >
> >
> http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1
> > for more background on our experience.
> >
> > Tom Burton-West
> > University of Michigan Library
> > www.hathitrust.org
> >
> >
> >
> > >
> > > On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel  > > >wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I have an index corresponding to ~2.5 million documents. The index
> size
> > > is
> > > > 43GB. The configuration of the machine which is running Solr is -
> Dual
> > > > Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB
> cache,
> > > 8GB
> > > > RAM, and 250 GB HDD.
> > > >
> > > > I'm observing a strange trend in the queries that I send to Solr. The
> > > query
> > > > times for queries that I send earlier is much lesser than the queries
> I
> > > > send
> > > > afterwards. For instance, if I write a script to query solr 5000
> times
> > > > (with
> > > > 5000 distinct queries, most of them containing not more than 3-5
> words)
> > > > with
> > > > 10 threads running in parallel, the average times for queries goes
> from
> > > > ~50ms in the beginning to ~6000ms. Is this expected or is there
> > > something
> > > > wrong with my configuration. Currently I've configured the
> > > queryResultCache
> > > > and the documentCache to contain 2048 entries (hit ratios for both is
> > > close
> > > > to 50%).
> > > >
> > > > Apart from this, a general question that I want to ask is that is
> such
> > a
> > > > hardware enough for this scenario? I'm aiming at achieving around 20
> > > > queries
> > > > per second with the hardware mentioned above.
> > > >
> > > > Thanks,
> > > >
> > > > Regards,
> > > >
> > > > --
> > > > - Siddhant
> > > >
> > >
> >
> >
> >
> > --
> > - Siddhant
> >
> >
> >
> > --
> > View this message in context:
> > http://old.nabble.com/Solr-Performance-Issues-tp27864278p27868456.html
> > Sent from the

Re: Solr Performance Issues

2010-03-12 Thread Siddhant Goel
Hi,

Thanks for your responses. It actually feels good to be able to locate where
the bottlenecks are.

I've created two sets of data - in the first one I'm measuring the time took
purely on Solr's end, and in the other one I'm including network latency
(just for reference). The data that I'm posting below contains the time took
purely by Solr.

I'm running 10 threads simultaneously and the average response time (for
each query in each thread) remains close to 40 to 50 ms. But as soon as I
increase the number of threads to something like 100, the response time goes
up to ~600ms, and further up when the number of threads is close to 500. Yes
the average time definitely depends on the number of concurrent requests.

Going from memory, debugQuery=on will let you know how much time
> was spent in various operations in SOLR. It's important to know
> whether it was the searching, assembling the response, or
> transmitting the data back to the client.


I just tried this. The information that it gives me for a query that took
7165ms is - http://pastebin.ca/1835644

So out of the total time 7165ms, QueryComponent took most of the time. Plus
I can see the load average going up when the number of threads is really
high. So it actually makes sense. (I didn't add any other component while
searching; it was a plain /select?q=query call).
Like I mentioned earlier in this mail, I'm maintaining separate sets for
data with/without network latency, and I don't think its the bottleneck.


> How many threads does it take to peg the CPU? And what
> response times are you getting when your number of threads is
> around 10?
>

If the number of threads is greater than 100, that really takes its toll on
the CPU. So probably thats the number.

When the number of threads is around 10, the response times average to
something like 60ms (and 95% of the queries fall within 100ms of that
value).

Thanks,




>
> Erick
>
> On Fri, Mar 12, 2010 at 3:39 AM, Siddhant Goel  >wrote:
>
> > I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS
> disk
> > caching.
> >
> > I think that at any point of time, there can be a maximum of  > threads> concurrent requests, which happens to make sense btw (does it?).
> >
> > As I increase the number of threads, the load average shown by top goes
> up
> > to as high as 80%. But if I keep the number of threads low (~10), the
> load
> > average never goes beyond ~8). So probably thats the number of requests I
> > can expect Solr to serve concurrently on this index size with this
> > hardware.
> >
> > Can anyone give a general opinion as to how much hardware should be
> > sufficient for a Solr deployment with an index size of ~43GB, containing
> > around 2.5 million documents? I'm expecting it to serve at least 20
> > requests
> > per second. Any experiences?
> >
> > Thanks
> >
> > On Fri, Mar 12, 2010 at 12:47 AM, Tom Burton-West  > >wrote:
> >
> > >
> > > How much of your memory are you allocating to the JVM and how much are
> > you
> > > leaving free?
> > >
> > > If you don't leave enough free memory for the OS, the OS won't have a
> > large
> > > enough disk cache, and you will be hitting the disk for lots of
> queries.
> > >
> > > You might want to monitor your Disk I/O using iostat and look at the
> > > iowait.
> > >
> > > If you are doing phrase queries and your *prx file is significantly
> > larger
> > > than the available memory then when a slow phrase query hits Solr, the
> > > contention for disk I/O with other queries could be slowing everything
> > > down.
> > > You might also want to look at the 90th and 99th percentile query times
> > in
> > > addition to the average. For our large indexes, we found at least an
> > order
> > > of magnitude difference between the average and 99th percentile
> queries.
> > > Again, if Solr gets hit with a few of those 99th percentile slow
> queries
> > > and
> > > your not hitting your caches, chances are you will see serious
> contention
> > > for disk I/O..
> > >
> > > Of course if you don't see any waiting on i/o, then your bottleneck is
> > > probably somewhere else:)
> > >
> > > See
> > >
> > >
> >
> http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1
> > > for more background on our experience.
> > >
> > > Tom Burton-West
> > > University of Michigan Library
> > > www.hathitrust.org
> > >
> > >
> > >
> > > >
> > > > On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel <
> siddhantg...@gmail.com
> > > > >wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > I have an index corresponding to ~2.5 million documents. The index
> > size
> > > > is
> > > > > 43GB. The configuration of the machine which is running Solr is -
> > Dual
> > > > > Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB
> > cache,
> > > > 8GB
> > > > > RAM, and 250 GB HDD.
> > > > >
> > > > > I'm observing a strange trend in the queries that I send to Solr.
> The
> > > > query
> > > > > times for queries that I send earlier is much

Re: Solr Performance Issues

2010-03-12 Thread Erick Erickson
Sounds like you're pretty well on your way then. This is pretty typical
of multi-threaded situations... Threads 1-n wait around on I/O and
increasing the number of threads increases throughput without
changing (much) the individual response time.

Threads n+1 - p don't change throughput much, but increase
the response time for each request. On aggregate, though, the
throughput doesn't change (much).

Adding threads after p+1 *decreases* throughput while
*increasing* individual response time as your processors start
spending w to much time context and/or memory
swapping.

The trick is finding out what n and p are .

Best
Erick

On Fri, Mar 12, 2010 at 12:06 PM, Siddhant Goel wrote:

> Hi,
>
> Thanks for your responses. It actually feels good to be able to locate
> where
> the bottlenecks are.
>
> I've created two sets of data - in the first one I'm measuring the time
> took
> purely on Solr's end, and in the other one I'm including network latency
> (just for reference). The data that I'm posting below contains the time
> took
> purely by Solr.
>
> I'm running 10 threads simultaneously and the average response time (for
> each query in each thread) remains close to 40 to 50 ms. But as soon as I
> increase the number of threads to something like 100, the response time
> goes
> up to ~600ms, and further up when the number of threads is close to 500.
> Yes
> the average time definitely depends on the number of concurrent requests.
>
> Going from memory, debugQuery=on will let you know how much time
> > was spent in various operations in SOLR. It's important to know
> > whether it was the searching, assembling the response, or
> > transmitting the data back to the client.
>
>
> I just tried this. The information that it gives me for a query that took
> 7165ms is - http://pastebin.ca/1835644
>
> So out of the total time 7165ms, QueryComponent took most of the time. Plus
> I can see the load average going up when the number of threads is really
> high. So it actually makes sense. (I didn't add any other component while
> searching; it was a plain /select?q=query call).
> Like I mentioned earlier in this mail, I'm maintaining separate sets for
> data with/without network latency, and I don't think its the bottleneck.
>
>
> > How many threads does it take to peg the CPU? And what
> > response times are you getting when your number of threads is
> > around 10?
> >
>
> If the number of threads is greater than 100, that really takes its toll on
> the CPU. So probably thats the number.
>
> When the number of threads is around 10, the response times average to
> something like 60ms (and 95% of the queries fall within 100ms of that
> value).
>
> Thanks,
>
>
>
>
> >
> > Erick
> >
> > On Fri, Mar 12, 2010 at 3:39 AM, Siddhant Goel  > >wrote:
> >
> > > I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS
> > disk
> > > caching.
> > >
> > > I think that at any point of time, there can be a maximum of  > > threads> concurrent requests, which happens to make sense btw (does
> it?).
> > >
> > > As I increase the number of threads, the load average shown by top goes
> > up
> > > to as high as 80%. But if I keep the number of threads low (~10), the
> > load
> > > average never goes beyond ~8). So probably thats the number of requests
> I
> > > can expect Solr to serve concurrently on this index size with this
> > > hardware.
> > >
> > > Can anyone give a general opinion as to how much hardware should be
> > > sufficient for a Solr deployment with an index size of ~43GB,
> containing
> > > around 2.5 million documents? I'm expecting it to serve at least 20
> > > requests
> > > per second. Any experiences?
> > >
> > > Thanks
> > >
> > > On Fri, Mar 12, 2010 at 12:47 AM, Tom Burton-West <
> tburtonw...@gmail.com
> > > >wrote:
> > >
> > > >
> > > > How much of your memory are you allocating to the JVM and how much
> are
> > > you
> > > > leaving free?
> > > >
> > > > If you don't leave enough free memory for the OS, the OS won't have a
> > > large
> > > > enough disk cache, and you will be hitting the disk for lots of
> > queries.
> > > >
> > > > You might want to monitor your Disk I/O using iostat and look at the
> > > > iowait.
> > > >
> > > > If you are doing phrase queries and your *prx file is significantly
> > > larger
> > > > than the available memory then when a slow phrase query hits Solr,
> the
> > > > contention for disk I/O with other queries could be slowing
> everything
> > > > down.
> > > > You might also want to look at the 90th and 99th percentile query
> times
> > > in
> > > > addition to the average. For our large indexes, we found at least an
> > > order
> > > > of magnitude difference between the average and 99th percentile
> > queries.
> > > > Again, if Solr gets hit with a few of those 99th percentile slow
> > queries
> > > > and
> > > > your not hitting your caches, chances are you will see serious
> > contention
> > > > for disk I/O..
> > > >
> > > > Of course if you don't see any wai

Re: Solr Performance Issues

2010-03-17 Thread Siddhant Goel
Hi,

Apparently the bottleneck seem to be the time periods when CPU is waiting to
do some I/O. Out of all the numbers I can see, the CPU wait times for I/O
seem to be the highest. I've alloted 4GB to Solr out of the total 8GB
available. There's only 47MB free on the machine, so I assume the rest of
the memory is being used for OS disk caches. In addition, the hit ratios for
queryResultCache isn't going beyond 20%. So the problem I think is not at
Solr's end. Are there any pointers available on how can I resolve such
issues related to disk I/O? Does this mean I need more overall memory? Or
reducing the amount of memory allocated to Solr so that the disk cache has
more memory, would help?

Thanks,

On Fri, Mar 12, 2010 at 11:21 PM, Erick Erickson wrote:

> Sounds like you're pretty well on your way then. This is pretty typical
> of multi-threaded situations... Threads 1-n wait around on I/O and
> increasing the number of threads increases throughput without
> changing (much) the individual response time.
>
> Threads n+1 - p don't change throughput much, but increase
> the response time for each request. On aggregate, though, the
> throughput doesn't change (much).
>
> Adding threads after p+1 *decreases* throughput while
> *increasing* individual response time as your processors start
> spending w to much time context and/or memory
> swapping.
>
> The trick is finding out what n and p are .
>
> Best
> Erick
>
> On Fri, Mar 12, 2010 at 12:06 PM, Siddhant Goel  >wrote:
>
> > Hi,
> >
> > Thanks for your responses. It actually feels good to be able to locate
> > where
> > the bottlenecks are.
> >
> > I've created two sets of data - in the first one I'm measuring the time
> > took
> > purely on Solr's end, and in the other one I'm including network latency
> > (just for reference). The data that I'm posting below contains the time
> > took
> > purely by Solr.
> >
> > I'm running 10 threads simultaneously and the average response time (for
> > each query in each thread) remains close to 40 to 50 ms. But as soon as I
> > increase the number of threads to something like 100, the response time
> > goes
> > up to ~600ms, and further up when the number of threads is close to 500.
> > Yes
> > the average time definitely depends on the number of concurrent requests.
> >
> > Going from memory, debugQuery=on will let you know how much time
> > > was spent in various operations in SOLR. It's important to know
> > > whether it was the searching, assembling the response, or
> > > transmitting the data back to the client.
> >
> >
> > I just tried this. The information that it gives me for a query that took
> > 7165ms is - http://pastebin.ca/1835644
> >
> > So out of the total time 7165ms, QueryComponent took most of the time.
> Plus
> > I can see the load average going up when the number of threads is really
> > high. So it actually makes sense. (I didn't add any other component while
> > searching; it was a plain /select?q=query call).
> > Like I mentioned earlier in this mail, I'm maintaining separate sets for
> > data with/without network latency, and I don't think its the bottleneck.
> >
> >
> > > How many threads does it take to peg the CPU? And what
> > > response times are you getting when your number of threads is
> > > around 10?
> > >
> >
> > If the number of threads is greater than 100, that really takes its toll
> on
> > the CPU. So probably thats the number.
> >
> > When the number of threads is around 10, the response times average to
> > something like 60ms (and 95% of the queries fall within 100ms of that
> > value).
> >
> > Thanks,
> >
> >
> >
> >
> > >
> > > Erick
> > >
> > > On Fri, Mar 12, 2010 at 3:39 AM, Siddhant Goel  > > >wrote:
> > >
> > > > I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS
> > > disk
> > > > caching.
> > > >
> > > > I think that at any point of time, there can be a maximum of  of
> > > > threads> concurrent requests, which happens to make sense btw (does
> > it?).
> > > >
> > > > As I increase the number of threads, the load average shown by top
> goes
> > > up
> > > > to as high as 80%. But if I keep the number of threads low (~10), the
> > > load
> > > > average never goes beyond ~8). So probably thats the number of
> requests
> > I
> > > > can expect Solr to serve concurrently on this index size with this
> > > > hardware.
> > > >
> > > > Can anyone give a general opinion as to how much hardware should be
> > > > sufficient for a Solr deployment with an index size of ~43GB,
> > containing
> > > > around 2.5 million documents? I'm expecting it to serve at least 20
> > > > requests
> > > > per second. Any experiences?
> > > >
> > > > Thanks
> > > >
> > > > On Fri, Mar 12, 2010 at 12:47 AM, Tom Burton-West <
> > tburtonw...@gmail.com
> > > > >wrote:
> > > >
> > > > >
> > > > > How much of your memory are you allocating to the JVM and how much
> > are
> > > > you
> > > > > leaving free?
> > > > >
> > > > > If you don't leave enough free memory for the OS, th

Re: Solr Performance Issues

2010-03-17 Thread Lance Norskog
Try cutting back Solr's memory - the OS knows how to manage disk
caches better than Solr does.

Another approach is to raise and lower the queryResultCache and see if
the hitratio changes.

On Wed, Mar 17, 2010 at 9:44 AM, Siddhant Goel  wrote:
> Hi,
>
> Apparently the bottleneck seem to be the time periods when CPU is waiting to
> do some I/O. Out of all the numbers I can see, the CPU wait times for I/O
> seem to be the highest. I've alloted 4GB to Solr out of the total 8GB
> available. There's only 47MB free on the machine, so I assume the rest of
> the memory is being used for OS disk caches. In addition, the hit ratios for
> queryResultCache isn't going beyond 20%. So the problem I think is not at
> Solr's end. Are there any pointers available on how can I resolve such
> issues related to disk I/O? Does this mean I need more overall memory? Or
> reducing the amount of memory allocated to Solr so that the disk cache has
> more memory, would help?
>
> Thanks,
>
> On Fri, Mar 12, 2010 at 11:21 PM, Erick Erickson 
> wrote:
>
>> Sounds like you're pretty well on your way then. This is pretty typical
>> of multi-threaded situations... Threads 1-n wait around on I/O and
>> increasing the number of threads increases throughput without
>> changing (much) the individual response time.
>>
>> Threads n+1 - p don't change throughput much, but increase
>> the response time for each request. On aggregate, though, the
>> throughput doesn't change (much).
>>
>> Adding threads after p+1 *decreases* throughput while
>> *increasing* individual response time as your processors start
>> spending w to much time context and/or memory
>> swapping.
>>
>> The trick is finding out what n and p are .
>>
>> Best
>> Erick
>>
>> On Fri, Mar 12, 2010 at 12:06 PM, Siddhant Goel > >wrote:
>>
>> > Hi,
>> >
>> > Thanks for your responses. It actually feels good to be able to locate
>> > where
>> > the bottlenecks are.
>> >
>> > I've created two sets of data - in the first one I'm measuring the time
>> > took
>> > purely on Solr's end, and in the other one I'm including network latency
>> > (just for reference). The data that I'm posting below contains the time
>> > took
>> > purely by Solr.
>> >
>> > I'm running 10 threads simultaneously and the average response time (for
>> > each query in each thread) remains close to 40 to 50 ms. But as soon as I
>> > increase the number of threads to something like 100, the response time
>> > goes
>> > up to ~600ms, and further up when the number of threads is close to 500.
>> > Yes
>> > the average time definitely depends on the number of concurrent requests.
>> >
>> > Going from memory, debugQuery=on will let you know how much time
>> > > was spent in various operations in SOLR. It's important to know
>> > > whether it was the searching, assembling the response, or
>> > > transmitting the data back to the client.
>> >
>> >
>> > I just tried this. The information that it gives me for a query that took
>> > 7165ms is - http://pastebin.ca/1835644
>> >
>> > So out of the total time 7165ms, QueryComponent took most of the time.
>> Plus
>> > I can see the load average going up when the number of threads is really
>> > high. So it actually makes sense. (I didn't add any other component while
>> > searching; it was a plain /select?q=query call).
>> > Like I mentioned earlier in this mail, I'm maintaining separate sets for
>> > data with/without network latency, and I don't think its the bottleneck.
>> >
>> >
>> > > How many threads does it take to peg the CPU? And what
>> > > response times are you getting when your number of threads is
>> > > around 10?
>> > >
>> >
>> > If the number of threads is greater than 100, that really takes its toll
>> on
>> > the CPU. So probably thats the number.
>> >
>> > When the number of threads is around 10, the response times average to
>> > something like 60ms (and 95% of the queries fall within 100ms of that
>> > value).
>> >
>> > Thanks,
>> >
>> >
>> >
>> >
>> > >
>> > > Erick
>> > >
>> > > On Fri, Mar 12, 2010 at 3:39 AM, Siddhant Goel > > > >wrote:
>> > >
>> > > > I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS
>> > > disk
>> > > > caching.
>> > > >
>> > > > I think that at any point of time, there can be a maximum of > of
>> > > > threads> concurrent requests, which happens to make sense btw (does
>> > it?).
>> > > >
>> > > > As I increase the number of threads, the load average shown by top
>> goes
>> > > up
>> > > > to as high as 80%. But if I keep the number of threads low (~10), the
>> > > load
>> > > > average never goes beyond ~8). So probably thats the number of
>> requests
>> > I
>> > > > can expect Solr to serve concurrently on this index size with this
>> > > > hardware.
>> > > >
>> > > > Can anyone give a general opinion as to how much hardware should be
>> > > > sufficient for a Solr deployment with an index size of ~43GB,
>> > containing
>> > > > around 2.5 million documents? I'm expecting it to serve at least 20
>> > > > re

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Jan Høydahl
Hi,

How many shards do you have? This is a known issue with deep paging with multi 
shard, see https://issues.apache.org/jira/browse/SOLR-1726

You may be more successful in going to each shard, one at a time (with 
&distrib=false) to avoid this issue.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam :

> We have a solr core with about 115 million documents. We are trying to 
> migrate data and running a simple query with *:* query and with start and 
> rows param.
> The performance is becoming too slow in solr, its taking almost 2 mins to get 
> 4000 rows and migration is being just too slow. Logs snippet below:
> 
> INFO: [coreName] webapp=/solr path=/select 
> params={start=55438000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
> status=0 QTime=168308
> INFO: [coreName] webapp=/solr path=/select 
> params={start=55446000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
> status=0 QTime=122771
> INFO: [coreName] webapp=/solr path=/select 
> params={start=55454000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
> status=0 QTime=137615
> INFO: [coreName] webapp=/solr path=/select 
> params={start=5545&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
> status=0 QTime=141223
> INFO: [coreName] webapp=/solr path=/select 
> params={start=55462000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
> status=0 QTime=97474
> INFO: [coreName] webapp=/solr path=/select 
> params={start=55458000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
> status=0 QTime=98115
> INFO: [coreName] webapp=/solr path=/select 
> params={start=55466000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
> status=0 QTime=143822
> INFO: [coreName] webapp=/solr path=/select 
> params={start=55474000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
> status=0 QTime=118066
> INFO: [coreName] webapp=/solr path=/select 
> params={start=5547&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
> status=0 QTime=121498
> INFO: [coreName] webapp=/solr path=/select 
> params={start=55482000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
> status=0 QTime=164062
> INFO: [coreName] webapp=/solr path=/select 
> params={start=55478000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
> status=0 QTime=165518
> INFO: [coreName] webapp=/solr path=/select 
> params={start=55486000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
> status=0 QTime=118163
> INFO: [coreName] webapp=/solr path=/select 
> params={start=55494000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
> status=0 QTime=141642
> INFO: [coreName] webapp=/solr path=/select 
> params={start=5549&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
> status=0 QTime=145037
> 
> 
> I've taken some thread dumps in the solr server and most of the time the 
> threads seem to be busy in the following stacks mostly:
> Is there anything that can be done to improve the performance? Is it a known 
> issue? Its very surprising that querying for some just rows starting at some 
> points is taking in order of minutes.
> 
> 
> "395883378@qtp-162198005-7" prio=10 tid=0x7f4aa0636000 nid=0x295a 
> runnable [0x7f42865dd000]
>   java.lang.Thread.State: RUNNABLE
>at 
> org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252)
>at org.apache.lucene.util.PriorityQueue.pop(PriorityQueue.java:184)
>at 
> org.apache.lucene.search.TopDocsCollector.populateResults(TopDocsCollector.java:61)
>at 
> org.apache.lucene.search.TopDocsCollector.topDocs(TopDocsCollector.java:156)
>at 
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1499)
>at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366)
>at 
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457)
>at 
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410)
>at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
>at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
> 
> 
> "1154127582@qtp-162198005-3" prio=10 tid=0x7f4aa0613800 nid=0x2956 
> runnable [0x7f42869e1000]
>   java.lang.Thread.State: RUNNABLE
>at 
> org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252)
>at 
> org.apache.lucene.util.PriorityQueue.updateTop(PriorityQueue.java:210)
>at 
> org.apache.lucene.search.TopScoreDocCollector$InOrderTo

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Dmitry Kan
Jan,

Would the same distrib=false help for distributed faceting? We are running
into a similar issue with facet paging.

Dmitry



On Mon, Apr 29, 2013 at 11:58 AM, Jan Høydahl  wrote:

> Hi,
>
> How many shards do you have? This is a known issue with deep paging with
> multi shard, see https://issues.apache.org/jira/browse/SOLR-1726
>
> You may be more successful in going to each shard, one at a time (with
> &distrib=false) to avoid this issue.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> 29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam :
>
> > We have a solr core with about 115 million documents. We are trying to
> migrate data and running a simple query with *:* query and with start and
> rows param.
> > The performance is becoming too slow in solr, its taking almost 2 mins
> to get 4000 rows and migration is being just too slow. Logs snippet below:
> >
> > INFO: [coreName] webapp=/solr path=/select
> params={start=55438000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479
> status=0 QTime=168308
> > INFO: [coreName] webapp=/solr path=/select
> params={start=55446000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479
> status=0 QTime=122771
> > INFO: [coreName] webapp=/solr path=/select
> params={start=55454000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479
> status=0 QTime=137615
> > INFO: [coreName] webapp=/solr path=/select
> params={start=5545&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479
> status=0 QTime=141223
> > INFO: [coreName] webapp=/solr path=/select
> params={start=55462000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479
> status=0 QTime=97474
> > INFO: [coreName] webapp=/solr path=/select
> params={start=55458000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479
> status=0 QTime=98115
> > INFO: [coreName] webapp=/solr path=/select
> params={start=55466000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479
> status=0 QTime=143822
> > INFO: [coreName] webapp=/solr path=/select
> params={start=55474000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479
> status=0 QTime=118066
> > INFO: [coreName] webapp=/solr path=/select
> params={start=5547&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479
> status=0 QTime=121498
> > INFO: [coreName] webapp=/solr path=/select
> params={start=55482000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479
> status=0 QTime=164062
> > INFO: [coreName] webapp=/solr path=/select
> params={start=55478000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479
> status=0 QTime=165518
> > INFO: [coreName] webapp=/solr path=/select
> params={start=55486000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479
> status=0 QTime=118163
> > INFO: [coreName] webapp=/solr path=/select
> params={start=55494000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479
> status=0 QTime=141642
> > INFO: [coreName] webapp=/solr path=/select
> params={start=5549&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479
> status=0 QTime=145037
> >
> >
> > I've taken some thread dumps in the solr server and most of the time the
> threads seem to be busy in the following stacks mostly:
> > Is there anything that can be done to improve the performance? Is it a
> known issue? Its very surprising that querying for some just rows starting
> at some points is taking in order of minutes.
> >
> >
> > "395883378@qtp-162198005-7" prio=10 tid=0x7f4aa0636000 nid=0x295a
> runnable [0x7f42865dd000]
> >   java.lang.Thread.State: RUNNABLE
> >at
> org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252)
> >at
> org.apache.lucene.util.PriorityQueue.pop(PriorityQueue.java:184)
> >at
> org.apache.lucene.search.TopDocsCollector.populateResults(TopDocsCollector.java:61)
> >at
> org.apache.lucene.search.TopDocsCollector.topDocs(TopDocsCollector.java:156)
> >at
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1499)
> >at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366)
> >at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457)
> >at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410)
> >at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
> >at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
> >at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
> >at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> >at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
> >
> >
> > "1154127582@qtp-162198005-3" prio=10 tid=0x7f4aa0613800 nid=0x2956
> runnable [0x7f42869e1000]
> >   java.lang.Thread.State: RUNNABL

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Abhishek Sanoujam
We have a single shard, and all the data is in a single box only. 
Definitely looks like "deep-paging" is having problems.


Just to understand, is the searcher looping over the result set 
everytime and skipping the first "start" count? This will definitely 
take a toll when we reach higher "start" values.




On 4/29/13 2:28 PM, Jan Høydahl wrote:

Hi,

How many shards do you have? This is a known issue with deep paging with multi 
shard, see https://issues.apache.org/jira/browse/SOLR-1726

You may be more successful in going to each shard, one at a time (with 
&distrib=false) to avoid this issue.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam :


We have a solr core with about 115 million documents. We are trying to migrate 
data and running a simple query with *:* query and with start and rows param.
The performance is becoming too slow in solr, its taking almost 2 mins to get 
4000 rows and migration is being just too slow. Logs snippet below:

INFO: [coreName] webapp=/solr path=/select 
params={start=55438000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=168308
INFO: [coreName] webapp=/solr path=/select 
params={start=55446000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=122771
INFO: [coreName] webapp=/solr path=/select 
params={start=55454000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=137615
INFO: [coreName] webapp=/solr path=/select 
params={start=5545&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=141223
INFO: [coreName] webapp=/solr path=/select 
params={start=55462000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=97474
INFO: [coreName] webapp=/solr path=/select 
params={start=55458000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=98115
INFO: [coreName] webapp=/solr path=/select 
params={start=55466000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=143822
INFO: [coreName] webapp=/solr path=/select 
params={start=55474000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=118066
INFO: [coreName] webapp=/solr path=/select 
params={start=5547&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=121498
INFO: [coreName] webapp=/solr path=/select 
params={start=55482000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=164062
INFO: [coreName] webapp=/solr path=/select 
params={start=55478000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=165518
INFO: [coreName] webapp=/solr path=/select 
params={start=55486000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=118163
INFO: [coreName] webapp=/solr path=/select 
params={start=55494000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=141642
INFO: [coreName] webapp=/solr path=/select 
params={start=5549&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=145037


I've taken some thread dumps in the solr server and most of the time the 
threads seem to be busy in the following stacks mostly:
Is there anything that can be done to improve the performance? Is it a known 
issue? Its very surprising that querying for some just rows starting at some 
points is taking in order of minutes.


"395883378@qtp-162198005-7" prio=10 tid=0x7f4aa0636000 nid=0x295a runnable 
[0x7f42865dd000]
   java.lang.Thread.State: RUNNABLE
at org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252)
at org.apache.lucene.util.PriorityQueue.pop(PriorityQueue.java:184)
at 
org.apache.lucene.search.TopDocsCollector.populateResults(TopDocsCollector.java:61)
at 
org.apache.lucene.search.TopDocsCollector.topDocs(TopDocsCollector.java:156)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1499)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)


"1154127582@qtp-162198005-3" prio=10 tid=0x7f4aa0613800 nid=0x2956 runnable 
[0x7f42869e1000]
   java.lang.Thread.State: RUNNABLE
at org.apache.lucene.util.PriorityQueue.downHea

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Dmitry Kan
Abhishek,

There is a wiki regarding this:

http://wiki.apache.org/solr/CommonQueryParameters

search "pageDoc and pageScore".


On Mon, Apr 29, 2013 at 1:17 PM, Abhishek Sanoujam
wrote:

> We have a single shard, and all the data is in a single box only.
> Definitely looks like "deep-paging" is having problems.
>
> Just to understand, is the searcher looping over the result set everytime
> and skipping the first "start" count? This will definitely take a toll when
> we reach higher "start" values.
>
>
>
>
> On 4/29/13 2:28 PM, Jan Høydahl wrote:
>
>> Hi,
>>
>> How many shards do you have? This is a known issue with deep paging with
>> multi shard, see 
>> https://issues.apache.org/**jira/browse/SOLR-1726
>>
>> You may be more successful in going to each shard, one at a time (with
>> &distrib=false) to avoid this issue.
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Solr Training - www.solrtraining.com
>>
>> 29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam > >:
>>
>>  We have a solr core with about 115 million documents. We are trying to
>>> migrate data and running a simple query with *:* query and with start and
>>> rows param.
>>> The performance is becoming too slow in solr, its taking almost 2 mins
>>> to get 4000 rows and migration is being just too slow. Logs snippet below:
>>>
>>> INFO: [coreName] webapp=/solr path=/select params={start=55438000&q=*:*&
>>> **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=168308
>>> INFO: [coreName] webapp=/solr path=/select params={start=55446000&q=*:*&
>>> **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=122771
>>> INFO: [coreName] webapp=/solr path=/select params={start=55454000&q=*:*&
>>> **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=137615
>>> INFO: [coreName] webapp=/solr path=/select params={start=5545&q=*:*&
>>> **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=141223
>>> INFO: [coreName] webapp=/solr path=/select params={start=55462000&q=*:*&
>>> **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=97474
>>> INFO: [coreName] webapp=/solr path=/select params={start=55458000&q=*:*&
>>> **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=98115
>>> INFO: [coreName] webapp=/solr path=/select params={start=55466000&q=*:*&
>>> **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=143822
>>> INFO: [coreName] webapp=/solr path=/select params={start=55474000&q=*:*&
>>> **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=118066
>>> INFO: [coreName] webapp=/solr path=/select params={start=5547&q=*:*&
>>> **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=121498
>>> INFO: [coreName] webapp=/solr path=/select params={start=55482000&q=*:*&
>>> **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=164062
>>> INFO: [coreName] webapp=/solr path=/select params={start=55478000&q=*:*&
>>> **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=165518
>>> INFO: [coreName] webapp=/solr path=/select params={start=55486000&q=*:*&
>>> **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=118163
>>> INFO: [coreName] webapp=/solr path=/select params={start=55494000&q=*:*&
>>> **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=141642
>>> INFO: [coreName] webapp=/solr path=/select params={start=5549&q=*:*&
>>> **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=145037
>>>
>>>
>>> I've taken some thread dumps in the solr server and most of the time the
>>> threads seem to be busy in the following stacks mostly:
>>> Is there anything that can be done to improve the performance? Is it a
>>> known issue? Its very surprising that querying for some just rows starting
>>> at some points is taking in order of minutes.
>>>
>>>
>>> "395883378@qtp-162198005-7" prio=10 tid=0x7f4aa0636000 nid=0x295a
>>> runnable [0x7f42865dd000]
>>>java.lang.Thread.State: RUNNABLE
>>> at org.apache.lucene.util.**PriorityQueue.downHeap(**
>>> PriorityQueue.java:252)
>>> at org.apache.lucene.util.**PriorityQueue.pop(**
>>> PriorityQueue.java:184)
>>> at org.apache.lucene.search.**TopDocsCollector.**
>>> populateResults(**TopDocsCollector.java:61)
>>> at org.apache.lucene.search.**TopDocsCollector.topDocs(**
>>> TopDocsCollector.java:156)
>>> at org.apache.solr.search.**SolrIndexSearcher.**getDocListNC(**
>>> SolrIndexSearcher.java:1499)
>>> at org.apache.solr.search.**SolrIndexSearcher.getDocListC(**
>>> SolrIndexSearcher.java:1366)
>>> at org.apache.solr.search.**SolrIndexSearcher.search(**
>>> SolrIndexSearcher.java:457)
>>> at org.apache.solr.handler.**component.QueryComponent.**
>>> process(QueryComponent.java:**410)
>>> at org.apache.solr.handler.**component.SearchHandler.**
>>> handleRequestBody(**SearchHandler.java:208)
>>>   

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Michael Della Bitta
We've found that you can do a lot for yourself by using a filter query
to page through your data if it has a natural range to do so instead
of start and rows.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Mon, Apr 29, 2013 at 6:44 AM, Dmitry Kan  wrote:
> Abhishek,
>
> There is a wiki regarding this:
>
> http://wiki.apache.org/solr/CommonQueryParameters
>
> search "pageDoc and pageScore".
>
>
> On Mon, Apr 29, 2013 at 1:17 PM, Abhishek Sanoujam
> wrote:
>
>> We have a single shard, and all the data is in a single box only.
>> Definitely looks like "deep-paging" is having problems.
>>
>> Just to understand, is the searcher looping over the result set everytime
>> and skipping the first "start" count? This will definitely take a toll when
>> we reach higher "start" values.
>>
>>
>>
>>
>> On 4/29/13 2:28 PM, Jan Høydahl wrote:
>>
>>> Hi,
>>>
>>> How many shards do you have? This is a known issue with deep paging with
>>> multi shard, see 
>>> https://issues.apache.org/**jira/browse/SOLR-1726
>>>
>>> You may be more successful in going to each shard, one at a time (with
>>> &distrib=false) to avoid this issue.
>>>
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>> Solr Training - www.solrtraining.com
>>>
>>> 29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam >> >:
>>>
>>>  We have a solr core with about 115 million documents. We are trying to
 migrate data and running a simple query with *:* query and with start and
 rows param.
 The performance is becoming too slow in solr, its taking almost 2 mins
 to get 4000 rows and migration is being just too slow. Logs snippet below:

 INFO: [coreName] webapp=/solr path=/select params={start=55438000&q=*:*&
 **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=168308
 INFO: [coreName] webapp=/solr path=/select params={start=55446000&q=*:*&
 **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=122771
 INFO: [coreName] webapp=/solr path=/select params={start=55454000&q=*:*&
 **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=137615
 INFO: [coreName] webapp=/solr path=/select params={start=5545&q=*:*&
 **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=141223
 INFO: [coreName] webapp=/solr path=/select params={start=55462000&q=*:*&
 **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=97474
 INFO: [coreName] webapp=/solr path=/select params={start=55458000&q=*:*&
 **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=98115
 INFO: [coreName] webapp=/solr path=/select params={start=55466000&q=*:*&
 **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=143822
 INFO: [coreName] webapp=/solr path=/select params={start=55474000&q=*:*&
 **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=118066
 INFO: [coreName] webapp=/solr path=/select params={start=5547&q=*:*&
 **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=121498
 INFO: [coreName] webapp=/solr path=/select params={start=55482000&q=*:*&
 **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=164062
 INFO: [coreName] webapp=/solr path=/select params={start=55478000&q=*:*&
 **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=165518
 INFO: [coreName] webapp=/solr path=/select params={start=55486000&q=*:*&
 **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=118163
 INFO: [coreName] webapp=/solr path=/select params={start=55494000&q=*:*&
 **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=141642
 INFO: [coreName] webapp=/solr path=/select params={start=5549&q=*:*&
 **wt=javabin&version=2&rows=**4000} hits=115760479 status=0 QTime=145037


 I've taken some thread dumps in the solr server and most of the time the
 threads seem to be busy in the following stacks mostly:
 Is there anything that can be done to improve the performance? Is it a
 known issue? Its very surprising that querying for some just rows starting
 at some points is taking in order of minutes.


 "395883378@qtp-162198005-7" prio=10 tid=0x7f4aa0636000 nid=0x295a
 runnable [0x7f42865dd000]
java.lang.Thread.State: RUNNABLE
 at org.apache.lucene.util.**PriorityQueue.downHeap(**
 PriorityQueue.java:252)
 at org.apache.lucene.util.**PriorityQueue.pop(**
 PriorityQueue.java:184)
 at org.apache.lucene.search.**TopDocsCollector.**
 populateResults(**TopDocsCollector.java:61)
 at org.apache.lucene.search.**TopDocsCollector.topDocs(**
 TopDocsCollector.java:156)
 at org.apache.solr.search.**SolrIndex

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Dmitry Kan
Michael,

Interesting! Do (Can) you apply this to facet searches as well?

Dmitry


On Mon, Apr 29, 2013 at 4:02 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> We've found that you can do a lot for yourself by using a filter query
> to page through your data if it has a natural range to do so instead
> of start and rows.
>
> Michael Della Bitta
>
> 
> Appinions
> 18 East 41st Street, 2nd Floor
> New York, NY 10017-6271
>
> www.appinions.com
>
> Where Influence Isn’t a Game
>
>
> On Mon, Apr 29, 2013 at 6:44 AM, Dmitry Kan  wrote:
> > Abhishek,
> >
> > There is a wiki regarding this:
> >
> > http://wiki.apache.org/solr/CommonQueryParameters
> >
> > search "pageDoc and pageScore".
> >
> >
> > On Mon, Apr 29, 2013 at 1:17 PM, Abhishek Sanoujam
> > wrote:
> >
> >> We have a single shard, and all the data is in a single box only.
> >> Definitely looks like "deep-paging" is having problems.
> >>
> >> Just to understand, is the searcher looping over the result set
> everytime
> >> and skipping the first "start" count? This will definitely take a toll
> when
> >> we reach higher "start" values.
> >>
> >>
> >>
> >>
> >> On 4/29/13 2:28 PM, Jan Høydahl wrote:
> >>
> >>> Hi,
> >>>
> >>> How many shards do you have? This is a known issue with deep paging
> with
> >>> multi shard, see https://issues.apache.org/**jira/browse/SOLR-1726<
> https://issues.apache.org/jira/browse/SOLR-1726>
> >>>
> >>> You may be more successful in going to each shard, one at a time (with
> >>> &distrib=false) to avoid this issue.
> >>>
> >>> --
> >>> Jan Høydahl, search solution architect
> >>> Cominvent AS - www.cominvent.com
> >>> Solr Training - www.solrtraining.com
> >>>
> >>> 29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam <
> abhi.sanou...@gmail.com
> >>> >:
> >>>
> >>>  We have a solr core with about 115 million documents. We are trying to
>  migrate data and running a simple query with *:* query and with start
> and
>  rows param.
>  The performance is becoming too slow in solr, its taking almost 2 mins
>  to get 4000 rows and migration is being just too slow. Logs snippet
> below:
> 
>  INFO: [coreName] webapp=/solr path=/select
> params={start=55438000&q=*:*&
>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
> QTime=168308
>  INFO: [coreName] webapp=/solr path=/select
> params={start=55446000&q=*:*&
>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
> QTime=122771
>  INFO: [coreName] webapp=/solr path=/select
> params={start=55454000&q=*:*&
>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
> QTime=137615
>  INFO: [coreName] webapp=/solr path=/select
> params={start=5545&q=*:*&
>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
> QTime=141223
>  INFO: [coreName] webapp=/solr path=/select
> params={start=55462000&q=*:*&
>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
> QTime=97474
>  INFO: [coreName] webapp=/solr path=/select
> params={start=55458000&q=*:*&
>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
> QTime=98115
>  INFO: [coreName] webapp=/solr path=/select
> params={start=55466000&q=*:*&
>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
> QTime=143822
>  INFO: [coreName] webapp=/solr path=/select
> params={start=55474000&q=*:*&
>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
> QTime=118066
>  INFO: [coreName] webapp=/solr path=/select
> params={start=5547&q=*:*&
>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
> QTime=121498
>  INFO: [coreName] webapp=/solr path=/select
> params={start=55482000&q=*:*&
>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
> QTime=164062
>  INFO: [coreName] webapp=/solr path=/select
> params={start=55478000&q=*:*&
>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
> QTime=165518
>  INFO: [coreName] webapp=/solr path=/select
> params={start=55486000&q=*:*&
>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
> QTime=118163
>  INFO: [coreName] webapp=/solr path=/select
> params={start=55494000&q=*:*&
>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
> QTime=141642
>  INFO: [coreName] webapp=/solr path=/select
> params={start=5549&q=*:*&
>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
> QTime=145037
> 
> 
>  I've taken some thread dumps in the solr server and most of the time
> the
>  threads seem to be busy in the following stacks mostly:
>  Is there anything that can be done to improve the performance? Is it a
>  known issue? Its very surprising that querying for some just rows
> starting
>  at some points is taking in order of minutes.
> 
> 
>  "395883378@qtp-162198005-7" prio=10 tid=0x7f4aa0636000 nid=0x295a
>  runnable [0x7f42865dd000]
> java.l

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Michael Della Bitta
I guess so, you'd have to use a filter query to page through the set
of documents you were faceting against and sum them all at the end.
It's not quite the same operation as paging through results, because
facets are aggregate statistics, but if you're willing to go through
the trouble, I bet it would also help performance.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Mon, Apr 29, 2013 at 9:06 AM, Dmitry Kan  wrote:
> Michael,
>
> Interesting! Do (Can) you apply this to facet searches as well?
>
> Dmitry
>
>
> On Mon, Apr 29, 2013 at 4:02 PM, Michael Della Bitta <
> michael.della.bi...@appinions.com> wrote:
>
>> We've found that you can do a lot for yourself by using a filter query
>> to page through your data if it has a natural range to do so instead
>> of start and rows.
>>
>> Michael Della Bitta
>>
>> 
>> Appinions
>> 18 East 41st Street, 2nd Floor
>> New York, NY 10017-6271
>>
>> www.appinions.com
>>
>> Where Influence Isn’t a Game
>>
>>
>> On Mon, Apr 29, 2013 at 6:44 AM, Dmitry Kan  wrote:
>> > Abhishek,
>> >
>> > There is a wiki regarding this:
>> >
>> > http://wiki.apache.org/solr/CommonQueryParameters
>> >
>> > search "pageDoc and pageScore".
>> >
>> >
>> > On Mon, Apr 29, 2013 at 1:17 PM, Abhishek Sanoujam
>> > wrote:
>> >
>> >> We have a single shard, and all the data is in a single box only.
>> >> Definitely looks like "deep-paging" is having problems.
>> >>
>> >> Just to understand, is the searcher looping over the result set
>> everytime
>> >> and skipping the first "start" count? This will definitely take a toll
>> when
>> >> we reach higher "start" values.
>> >>
>> >>
>> >>
>> >>
>> >> On 4/29/13 2:28 PM, Jan Høydahl wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> How many shards do you have? This is a known issue with deep paging
>> with
>> >>> multi shard, see https://issues.apache.org/**jira/browse/SOLR-1726<
>> https://issues.apache.org/jira/browse/SOLR-1726>
>> >>>
>> >>> You may be more successful in going to each shard, one at a time (with
>> >>> &distrib=false) to avoid this issue.
>> >>>
>> >>> --
>> >>> Jan Høydahl, search solution architect
>> >>> Cominvent AS - www.cominvent.com
>> >>> Solr Training - www.solrtraining.com
>> >>>
>> >>> 29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam <
>> abhi.sanou...@gmail.com
>> >>> >:
>> >>>
>> >>>  We have a solr core with about 115 million documents. We are trying to
>>  migrate data and running a simple query with *:* query and with start
>> and
>>  rows param.
>>  The performance is becoming too slow in solr, its taking almost 2 mins
>>  to get 4000 rows and migration is being just too slow. Logs snippet
>> below:
>> 
>>  INFO: [coreName] webapp=/solr path=/select
>> params={start=55438000&q=*:*&
>>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
>> QTime=168308
>>  INFO: [coreName] webapp=/solr path=/select
>> params={start=55446000&q=*:*&
>>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
>> QTime=122771
>>  INFO: [coreName] webapp=/solr path=/select
>> params={start=55454000&q=*:*&
>>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
>> QTime=137615
>>  INFO: [coreName] webapp=/solr path=/select
>> params={start=5545&q=*:*&
>>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
>> QTime=141223
>>  INFO: [coreName] webapp=/solr path=/select
>> params={start=55462000&q=*:*&
>>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
>> QTime=97474
>>  INFO: [coreName] webapp=/solr path=/select
>> params={start=55458000&q=*:*&
>>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
>> QTime=98115
>>  INFO: [coreName] webapp=/solr path=/select
>> params={start=55466000&q=*:*&
>>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
>> QTime=143822
>>  INFO: [coreName] webapp=/solr path=/select
>> params={start=55474000&q=*:*&
>>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
>> QTime=118066
>>  INFO: [coreName] webapp=/solr path=/select
>> params={start=5547&q=*:*&
>>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
>> QTime=121498
>>  INFO: [coreName] webapp=/solr path=/select
>> params={start=55482000&q=*:*&
>>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
>> QTime=164062
>>  INFO: [coreName] webapp=/solr path=/select
>> params={start=55478000&q=*:*&
>>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
>> QTime=165518
>>  INFO: [coreName] webapp=/solr path=/select
>> params={start=55486000&q=*:*&
>>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
>> QTime=118163
>>  INFO: [coreName] webapp=/solr path=/select
>> params={start=55494000&q=*:*&
>>  **wt=javabin&version=2&rows=**4000} hits=115760479 statu

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Dmitry Kan
Thanks.

Only question is how to  transition to this model. Our facet
(string) fields contain timestamp prefixes, that are reverse ordered
starting from the freshest value. In theory, we could try computing the
filter queries for those. But before doing so, we would need the matched
ids from solr, so it becomes at least 2 pass algorithm?

The biggest concern in general we have with the paging is that the system
seems to pass way more data back and forth, than is needed for computing
the values.


On Mon, Apr 29, 2013 at 4:14 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> I guess so, you'd have to use a filter query to page through the set
> of documents you were faceting against and sum them all at the end.
> It's not quite the same operation as paging through results, because
> facets are aggregate statistics, but if you're willing to go through
> the trouble, I bet it would also help performance.
>
> Michael Della Bitta
>
> 
> Appinions
> 18 East 41st Street, 2nd Floor
> New York, NY 10017-6271
>
> www.appinions.com
>
> Where Influence Isn’t a Game
>
>
> On Mon, Apr 29, 2013 at 9:06 AM, Dmitry Kan  wrote:
> > Michael,
> >
> > Interesting! Do (Can) you apply this to facet searches as well?
> >
> > Dmitry
> >
> >
> > On Mon, Apr 29, 2013 at 4:02 PM, Michael Della Bitta <
> > michael.della.bi...@appinions.com> wrote:
> >
> >> We've found that you can do a lot for yourself by using a filter query
> >> to page through your data if it has a natural range to do so instead
> >> of start and rows.
> >>
> >> Michael Della Bitta
> >>
> >> 
> >> Appinions
> >> 18 East 41st Street, 2nd Floor
> >> New York, NY 10017-6271
> >>
> >> www.appinions.com
> >>
> >> Where Influence Isn’t a Game
> >>
> >>
> >> On Mon, Apr 29, 2013 at 6:44 AM, Dmitry Kan 
> wrote:
> >> > Abhishek,
> >> >
> >> > There is a wiki regarding this:
> >> >
> >> > http://wiki.apache.org/solr/CommonQueryParameters
> >> >
> >> > search "pageDoc and pageScore".
> >> >
> >> >
> >> > On Mon, Apr 29, 2013 at 1:17 PM, Abhishek Sanoujam
> >> > wrote:
> >> >
> >> >> We have a single shard, and all the data is in a single box only.
> >> >> Definitely looks like "deep-paging" is having problems.
> >> >>
> >> >> Just to understand, is the searcher looping over the result set
> >> everytime
> >> >> and skipping the first "start" count? This will definitely take a
> toll
> >> when
> >> >> we reach higher "start" values.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On 4/29/13 2:28 PM, Jan Høydahl wrote:
> >> >>
> >> >>> Hi,
> >> >>>
> >> >>> How many shards do you have? This is a known issue with deep paging
> >> with
> >> >>> multi shard, see https://issues.apache.org/**jira/browse/SOLR-1726<
> >> https://issues.apache.org/jira/browse/SOLR-1726>
> >> >>>
> >> >>> You may be more successful in going to each shard, one at a time
> (with
> >> >>> &distrib=false) to avoid this issue.
> >> >>>
> >> >>> --
> >> >>> Jan Høydahl, search solution architect
> >> >>> Cominvent AS - www.cominvent.com
> >> >>> Solr Training - www.solrtraining.com
> >> >>>
> >> >>> 29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam <
> >> abhi.sanou...@gmail.com
> >> >>> >:
> >> >>>
> >> >>>  We have a solr core with about 115 million documents. We are
> trying to
> >>  migrate data and running a simple query with *:* query and with
> start
> >> and
> >>  rows param.
> >>  The performance is becoming too slow in solr, its taking almost 2
> mins
> >>  to get 4000 rows and migration is being just too slow. Logs snippet
> >> below:
> >> 
> >>  INFO: [coreName] webapp=/solr path=/select
> >> params={start=55438000&q=*:*&
> >>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
> >> QTime=168308
> >>  INFO: [coreName] webapp=/solr path=/select
> >> params={start=55446000&q=*:*&
> >>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
> >> QTime=122771
> >>  INFO: [coreName] webapp=/solr path=/select
> >> params={start=55454000&q=*:*&
> >>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
> >> QTime=137615
> >>  INFO: [coreName] webapp=/solr path=/select
> >> params={start=5545&q=*:*&
> >>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
> >> QTime=141223
> >>  INFO: [coreName] webapp=/solr path=/select
> >> params={start=55462000&q=*:*&
> >>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
> >> QTime=97474
> >>  INFO: [coreName] webapp=/solr path=/select
> >> params={start=55458000&q=*:*&
> >>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
> >> QTime=98115
> >>  INFO: [coreName] webapp=/solr path=/select
> >> params={start=55466000&q=*:*&
> >>  **wt=javabin&version=2&rows=**4000} hits=115760479 status=0
> >> QTime=143822
> >>  INFO: [coreName] webapp=/solr path=/select
> >> params={start=55474000&q=*:*&
> >>  **wt=javabin&version=2&rows=**