Re: Solr performance issues

2014-12-29 Thread Mahmoud Almokadem
Thanks all.

I've the same index with a bit different schema and 200M documents,
installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size
of index is about 1.5TB, have many updates every 5 minutes, complex queries
and faceting with response time of 100ms that is acceptable for us.

Toke Eskildsen,

Is the index updated while you are searching? *No*
Do you do any faceting or other heavy processing as part of a search? *No*
How many hits does a search typically have and how many documents are
returned? *The test for QTime only with no documents returned and No. of
hits varying from 50,000 to 50,000,000.*
How many concurrent searches do you need to support? How fast should the
response time be? *May be 100 concurrent searches with 100ms with facets.*

Does splitting the shard to two shards on the same node so every shard will
be on a single EBS Volume better than using LVM?

Thanks

On Mon, Dec 29, 2014 at 2:00 AM, Toke Eskildsen t...@statsbiblioteket.dk
wrote:

 Mahmoud Almokadem [prog.mahm...@gmail.com] wrote:
  We've installed a cluster of one collection of 350M documents on 3
  r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is
  about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS
  General purpose (1x1TB + 1x500GB) on each instance. Then we create
 logical
  volume using LVM of 1.5TB to fit our index.

 Your search speed will be limited by the slowest storage in your group,
 which would be your 500GB EBS. The General Purpose SSD option means (as far
 as I can read at http://aws.amazon.com/ebs/details/#piops) that your
 baseline of 3 IOPS/MB = 1500 IOPS, with bursts of 3000 IOPS. Unfortunately
 they do not say anything about latency.

 For comparison, I checked the system logs from a local test with our 21TB
 / 7 billion documents index. It used ~27,000 IOPS during the test, with
 mean search time a bit below 1 second. That was with ~100GB RAM for disk
 cache, which is about ½% of index size. The test was with simple term
 queries (1-3 terms) and some faceting. Back of the envelope: 27,000 IOPS
 for 21TB is ~1300 IOPS/TB. Your indexes are 1.1TB, so 1.1*1300 IOPS ~= 1400
 IOPS.

 All else being equal (which is never the case), getting 1-3 second
 response times for a 1.1TB index, when one link in the storage chain is
 capped at a few thousand IOPS, you are using networked storage and you have
 little RAM for caching, does not seem unrealistic. If possible, you could
 try temporarily boosting performance of the EBS, to see if raw IO is the
 bottleneck.

  The response time is about 1 and 3 seconds for simple queries (1 token).

 Is the index updated while you are searching?
 Do you do any faceting or other heavy processing as part of a search?
 How many hits does a search typically have and how many documents are
 returned?
 How many concurrent searches do you need to support? How fast should the
 response time be?

 - Toke Eskildsen



Re: Solr performance issues

2014-12-29 Thread Shawn Heisey
On 12/29/2014 2:36 AM, Mahmoud Almokadem wrote:
 I've the same index with a bit different schema and 200M documents,
 installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size
 of index is about 1.5TB, have many updates every 5 minutes, complex queries
 and faceting with response time of 100ms that is acceptable for us.
 
 Toke Eskildsen,
 
 Is the index updated while you are searching? *No*
 Do you do any faceting or other heavy processing as part of a search? *No*
 How many hits does a search typically have and how many documents are
 returned? *The test for QTime only with no documents returned and No. of
 hits varying from 50,000 to 50,000,000.*
 How many concurrent searches do you need to support? How fast should the
 response time be? *May be 100 concurrent searches with 100ms with facets.*
 
 Does splitting the shard to two shards on the same node so every shard will
 be on a single EBS Volume better than using LVM?

The basic problem is simply that the system has so little memory that it
must read large amounts of data from the disk when it does a query.
There is not enough RAM to cache the important parts of the index.  RAM
is much faster than disk, even SSD.

Typical consumer-grade DDR3-1600 memory has a data transfer rate of
about 12800 megabytes per second.  If it's ECC memory (which I would say
is a requirement) then the transfer rate is probably a little bit slower
than that.  Figuring 9 bits for every byte gets us about 11377 MB/s.
That's only an estimate, and it could be wrong in either direction, but
I'll go ahead and use it.

http://en.wikipedia.org/wiki/DDR3_SDRAM#JEDEC_standard_modules

If your SSD is SATA, the transfer rate will be limited to approximately
600MB/s -- the 6 gigabit per second transfer rate of the newest SATA
standard.  That makes memory about 18 times as fast as SATA SSD.  I saw
one PCI express SSD that claimed a transfer rate of 2900 MB/s.  Even
that is only about one fourth of the estimated speed of DDR3-1600 with
ECC.  I don't know what interface technology Amazon uses for their SSD
volumes, but I would bet on it being the cheaper version, which would
mean SATA.  The networking between the EC2 instance and the EBS storage
is unknown to me and may be a further bottleneck.

http://ocz.com/enterprise/z-drive-4500/specifications

Bottom line -- you need a lot more memory.  Speeding up the disk may
*help* ... but it will not replace that simple requirement.  With EC2 as
the platform, you may need more instances and more shards.

Your 200 million document index that works well with only 90GB of total
memory ... that's surprising to me.  That means that the important parts
of that index *do* fit in memory ... but if the index gets much larger,
performance is likely to drop off sharply.

Thanks,
Shawn



Re: Solr performance issues

2014-12-29 Thread Mahmoud Almokadem
Thanks Shawn.

What do you mean with important parts of index? and how to calculate their 
size?

Thanks,
Mahmoud

Sent from my iPhone

 On Dec 29, 2014, at 8:19 PM, Shawn Heisey apa...@elyograg.org wrote:
 
 On 12/29/2014 2:36 AM, Mahmoud Almokadem wrote:
 I've the same index with a bit different schema and 200M documents,
 installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size
 of index is about 1.5TB, have many updates every 5 minutes, complex queries
 and faceting with response time of 100ms that is acceptable for us.
 
 Toke Eskildsen,
 
 Is the index updated while you are searching? *No*
 Do you do any faceting or other heavy processing as part of a search? *No*
 How many hits does a search typically have and how many documents are
 returned? *The test for QTime only with no documents returned and No. of
 hits varying from 50,000 to 50,000,000.*
 How many concurrent searches do you need to support? How fast should the
 response time be? *May be 100 concurrent searches with 100ms with facets.*
 
 Does splitting the shard to two shards on the same node so every shard will
 be on a single EBS Volume better than using LVM?
 
 The basic problem is simply that the system has so little memory that it
 must read large amounts of data from the disk when it does a query.
 There is not enough RAM to cache the important parts of the index.  RAM
 is much faster than disk, even SSD.
 
 Typical consumer-grade DDR3-1600 memory has a data transfer rate of
 about 12800 megabytes per second.  If it's ECC memory (which I would say
 is a requirement) then the transfer rate is probably a little bit slower
 than that.  Figuring 9 bits for every byte gets us about 11377 MB/s.
 That's only an estimate, and it could be wrong in either direction, but
 I'll go ahead and use it.
 
 http://en.wikipedia.org/wiki/DDR3_SDRAM#JEDEC_standard_modules
 
 If your SSD is SATA, the transfer rate will be limited to approximately
 600MB/s -- the 6 gigabit per second transfer rate of the newest SATA
 standard.  That makes memory about 18 times as fast as SATA SSD.  I saw
 one PCI express SSD that claimed a transfer rate of 2900 MB/s.  Even
 that is only about one fourth of the estimated speed of DDR3-1600 with
 ECC.  I don't know what interface technology Amazon uses for their SSD
 volumes, but I would bet on it being the cheaper version, which would
 mean SATA.  The networking between the EC2 instance and the EBS storage
 is unknown to me and may be a further bottleneck.
 
 http://ocz.com/enterprise/z-drive-4500/specifications
 
 Bottom line -- you need a lot more memory.  Speeding up the disk may
 *help* ... but it will not replace that simple requirement.  With EC2 as
 the platform, you may need more instances and more shards.
 
 Your 200 million document index that works well with only 90GB of total
 memory ... that's surprising to me.  That means that the important parts
 of that index *do* fit in memory ... but if the index gets much larger,
 performance is likely to drop off sharply.
 
 Thanks,
 Shawn
 


Re: Solr performance issues

2014-12-29 Thread Shawn Heisey
On 12/29/2014 12:07 PM, Mahmoud Almokadem wrote:
 What do you mean with important parts of index? and how to calculate their 
 size?

I have no formal education in what's important when it comes to doing a
query, but I can make some educated guesses.

Starting with this as a reference:

http://lucene.apache.org/core/4_10_0/core/org/apache/lucene/codecs/lucene410/package-summary.html#file-names

I would guess that the segment info (.si) files and the term index
(*.tip) files would be supremely important to *always* have in memory,
and they are fairly small.  Next would be the term dictionary (*.tim)
files.  The term dictionary is pretty big, and would be very important
for fast queries.

Frequencies, positions, and norms may also be important, depending on
exactly what kind of query you have.  Frequencies and positions are
quite large.  Frequencies are critical for relevence ranking (the
default sort by score), and positions are important for phrase queries.
 Position data may also be used by relevance ranking, but I am not
familiar enough with it to say for sure.

If you have docvalues defined, then *.dvm and *.dvd files would be used
for facets and sorting on those specific fields.  The *.dvd files can be
very big, depending on your schema.

The *.fdx and *.fdt files become important when actually retrieving
results after the matching documents have been determined.  The stored
data is compressed, so additional CPU power is required to uncompress
that data before it is sent to the client.  Stored data may be large or
small, depending on your schema.  Stored data does not directly affect
search speed, but if memory space is limited, every block of stored data
that gets retrieved will result in some other part of the index being
removed from the OS disk cache, which means that it might need to be
re-read from the disk on the next query.

Thanks,
Shawn



RE: Solr performance issues

2014-12-29 Thread Toke Eskildsen
Mahmoud Almokadem [prog.mahm...@gmail.com] wrote:
 I've the same index with a bit different schema and 200M documents,
 installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size
 of index is about 1.5TB, have many updates every 5 minutes, complex queries
 and faceting with response time of 100ms that is acceptable for us.

So you have
Setup 1: 3 * (30GB RAM + 600GB SSD) for a total of 1.5TB index 200M docs. 
Acceptable performance.
Setup 2: 3 * (60GB RAM + 1TB SSD + 500GB SSD) for a total of 3.3TB 350M docs. 
Poor performance.

The only real difference, besides doubling everything, is the LVM? I understand 
why you find that to be the culprit, but from what I can read, the overhead 
should not be anywhere near enough to result in the performance drop you are 
describing. Could it be that some snapshotting or backup was running when you 
tested?

Splitting your shards and doubling the number of machines, as you suggest, 
would result in
Setup 3: 6 * (60GB RAM + 600GB SSD) for a total of 3.3TB 350M docs.
which would be remarkable similar to your setup 1. I think that would be the 
next logical step, unless you can easily do a temporary boost of your IOPS.

BTW: You are getting dangerously close to your storage limits here - it seems 
that a single large merge could make you run out of space.

- Toke Eskildsen


Re: Solr performance issues

2014-12-28 Thread Shawn Heisey
On 12/26/2014 7:17 AM, Mahmoud Almokadem wrote:
 We've installed a cluster of one collection of 350M documents on 3
 r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is
 about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS
 General purpose (1x1TB + 1x500GB) on each instance. Then we create logical
 volume using LVM of 1.5TB to fit our index.
 
 The response time is about 1 and 3 seconds for simple queries (1 token).
 
 Is the LVM become a bottleneck for our index?

SSD is very fast, but its speed is very slow when compared to RAM.  The
problem here is that Solr must read data off the disk in order to do a
query, and even at SSD speeds, that is slow.  LVM is not the problem
here, though it's possible that it may be a contributing factor.  You
need more RAM.

For Solr to be fast, a large percentage (ideally 100%, but smaller
fractions can often be enough) of the index must be loaded into unused
RAM by the operating system.  Your information seems to indicate that
the index is about 3 terabytes.  If that's the index size, I would guess
that you would need somewhere between 1 and 2 terabytes of total RAM for
speed to be acceptable.  Because RAM is *very* expensive on Amazon and
is not available in sizes like 256GB-1TB, that typically means a lot of
their virtual machines, with a lot of shards in SolrCloud.  You may find
that real hardware is less expensive for very large Solr indexes in the
long term than cloud hardware.

Thanks,
Shawn



RE: Solr performance issues

2014-12-28 Thread Toke Eskildsen
Mahmoud Almokadem [prog.mahm...@gmail.com] wrote:
 We've installed a cluster of one collection of 350M documents on 3
 r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is
 about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS
 General purpose (1x1TB + 1x500GB) on each instance. Then we create logical
 volume using LVM of 1.5TB to fit our index.

Your search speed will be limited by the slowest storage in your group, which 
would be your 500GB EBS. The General Purpose SSD option means (as far as I can 
read at http://aws.amazon.com/ebs/details/#piops) that your baseline of 3 
IOPS/MB = 1500 IOPS, with bursts of 3000 IOPS. Unfortunately they do not say 
anything about latency.

For comparison, I checked the system logs from a local test with our 21TB / 7 
billion documents index. It used ~27,000 IOPS during the test, with mean search 
time a bit below 1 second. That was with ~100GB RAM for disk cache, which is 
about ½% of index size. The test was with simple term queries (1-3 terms) and 
some faceting. Back of the envelope: 27,000 IOPS for 21TB is ~1300 IOPS/TB. 
Your indexes are 1.1TB, so 1.1*1300 IOPS ~= 1400 IOPS.

All else being equal (which is never the case), getting 1-3 second response 
times for a 1.1TB index, when one link in the storage chain is capped at a few 
thousand IOPS, you are using networked storage and you have little RAM for 
caching, does not seem unrealistic. If possible, you could try temporarily 
boosting performance of the EBS, to see if raw IO is the bottleneck.

 The response time is about 1 and 3 seconds for simple queries (1 token).

Is the index updated while you are searching?
Do you do any faceting or other heavy processing as part of a search?
How many hits does a search typically have and how many documents are returned?
How many concurrent searches do you need to support? How fast should the 
response time be?

- Toke Eskildsen


Solr performance issues

2014-12-26 Thread Mahmoud Almokadem
Dears,

We've installed a cluster of one collection of 350M documents on 3
r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is
about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS
General purpose (1x1TB + 1x500GB) on each instance. Then we create logical
volume using LVM of 1.5TB to fit our index.

The response time is about 1 and 3 seconds for simple queries (1 token).

Is the LVM become a bottleneck for our index?

Thanks for help.


Re: Solr performance issues

2014-12-26 Thread Otis Gospodnetic
Likely lots of disk + network IO, yes. Put SPM for Solr on your nodes to double 
check.

 Otis

 On Dec 26, 2014, at 09:17, Mahmoud Almokadem prog.mahm...@gmail.com wrote:
 
 Dears,
 
 We've installed a cluster of one collection of 350M documents on 3
 r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is
 about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS
 General purpose (1x1TB + 1x500GB) on each instance. Then we create logical
 volume using LVM of 1.5TB to fit our index.
 
 The response time is about 1 and 3 seconds for simple queries (1 token).
 
 Is the LVM become a bottleneck for our index?
 
 Thanks for help.


Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Abhishek Sanoujam
We have a solr core with about 115 million documents. We are trying to 
migrate data and running a simple query with *:* query and with start 
and rows param.
The performance is becoming too slow in solr, its taking almost 2 mins 
to get 4000 rows and migration is being just too slow. Logs snippet below:


INFO: [coreName] webapp=/solr path=/select 
params={start=55438000q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=168308
INFO: [coreName] webapp=/solr path=/select 
params={start=55446000q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=122771
INFO: [coreName] webapp=/solr path=/select 
params={start=55454000q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=137615
INFO: [coreName] webapp=/solr path=/select 
params={start=5545q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=141223
INFO: [coreName] webapp=/solr path=/select 
params={start=55462000q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=97474
INFO: [coreName] webapp=/solr path=/select 
params={start=55458000q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=98115
INFO: [coreName] webapp=/solr path=/select 
params={start=55466000q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=143822
INFO: [coreName] webapp=/solr path=/select 
params={start=55474000q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=118066
INFO: [coreName] webapp=/solr path=/select 
params={start=5547q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=121498
INFO: [coreName] webapp=/solr path=/select 
params={start=55482000q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=164062
INFO: [coreName] webapp=/solr path=/select 
params={start=55478000q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=165518
INFO: [coreName] webapp=/solr path=/select 
params={start=55486000q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=118163
INFO: [coreName] webapp=/solr path=/select 
params={start=55494000q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=141642
INFO: [coreName] webapp=/solr path=/select 
params={start=5549q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=145037



I've taken some thread dumps in the solr server and most of the time the 
threads seem to be busy in the following stacks mostly:
Is there anything that can be done to improve the performance? Is it a 
known issue? Its very surprising that querying for some just rows 
starting at some points is taking in order of minutes.



395883378@qtp-162198005-7 prio=10 tid=0x7f4aa0636000 nid=0x295a 
runnable [0x7f42865dd000]

   java.lang.Thread.State: RUNNABLE
at 
org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252)

at org.apache.lucene.util.PriorityQueue.pop(PriorityQueue.java:184)
at 
org.apache.lucene.search.TopDocsCollector.populateResults(TopDocsCollector.java:61)
at 
org.apache.lucene.search.TopDocsCollector.topDocs(TopDocsCollector.java:156)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1499)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)



1154127582@qtp-162198005-3 prio=10 tid=0x7f4aa0613800 nid=0x2956 
runnable [0x7f42869e1000]

   java.lang.Thread.State: RUNNABLE
at 
org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252)
at 
org.apache.lucene.util.PriorityQueue.updateTop(PriorityQueue.java:210)
at 
org.apache.lucene.search.TopScoreDocCollector$InOrderTopScoreDocCollector.collect(TopScoreDocCollector.java:62)

at org.apache.lucene.search.Scorer.score(Scorer.java:64)
at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:605)
at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1491)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457)
at 

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Jan Høydahl
Hi,

How many shards do you have? This is a known issue with deep paging with multi 
shard, see https://issues.apache.org/jira/browse/SOLR-1726

You may be more successful in going to each shard, one at a time (with 
distrib=false) to avoid this issue.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam abhi.sanou...@gmail.com:

 We have a solr core with about 115 million documents. We are trying to 
 migrate data and running a simple query with *:* query and with start and 
 rows param.
 The performance is becoming too slow in solr, its taking almost 2 mins to get 
 4000 rows and migration is being just too slow. Logs snippet below:
 
 INFO: [coreName] webapp=/solr path=/select 
 params={start=55438000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=168308
 INFO: [coreName] webapp=/solr path=/select 
 params={start=55446000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=122771
 INFO: [coreName] webapp=/solr path=/select 
 params={start=55454000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=137615
 INFO: [coreName] webapp=/solr path=/select 
 params={start=5545q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=141223
 INFO: [coreName] webapp=/solr path=/select 
 params={start=55462000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=97474
 INFO: [coreName] webapp=/solr path=/select 
 params={start=55458000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=98115
 INFO: [coreName] webapp=/solr path=/select 
 params={start=55466000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=143822
 INFO: [coreName] webapp=/solr path=/select 
 params={start=55474000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=118066
 INFO: [coreName] webapp=/solr path=/select 
 params={start=5547q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=121498
 INFO: [coreName] webapp=/solr path=/select 
 params={start=55482000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=164062
 INFO: [coreName] webapp=/solr path=/select 
 params={start=55478000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=165518
 INFO: [coreName] webapp=/solr path=/select 
 params={start=55486000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=118163
 INFO: [coreName] webapp=/solr path=/select 
 params={start=55494000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=141642
 INFO: [coreName] webapp=/solr path=/select 
 params={start=5549q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=145037
 
 
 I've taken some thread dumps in the solr server and most of the time the 
 threads seem to be busy in the following stacks mostly:
 Is there anything that can be done to improve the performance? Is it a known 
 issue? Its very surprising that querying for some just rows starting at some 
 points is taking in order of minutes.
 
 
 395883378@qtp-162198005-7 prio=10 tid=0x7f4aa0636000 nid=0x295a 
 runnable [0x7f42865dd000]
   java.lang.Thread.State: RUNNABLE
at 
 org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252)
at org.apache.lucene.util.PriorityQueue.pop(PriorityQueue.java:184)
at 
 org.apache.lucene.search.TopDocsCollector.populateResults(TopDocsCollector.java:61)
at 
 org.apache.lucene.search.TopDocsCollector.topDocs(TopDocsCollector.java:156)
at 
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1499)
at 
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366)
at 
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457)
at 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410)
at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
 
 
 1154127582@qtp-162198005-3 prio=10 tid=0x7f4aa0613800 nid=0x2956 
 runnable [0x7f42869e1000]
   java.lang.Thread.State: RUNNABLE
at 
 org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252)
at 
 org.apache.lucene.util.PriorityQueue.updateTop(PriorityQueue.java:210)
at 
 org.apache.lucene.search.TopScoreDocCollector$InOrderTopScoreDocCollector.collect(TopScoreDocCollector.java:62)
at org.apache.lucene.search.Scorer.score(Scorer.java:64)
at 
 

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Dmitry Kan
Jan,

Would the same distrib=false help for distributed faceting? We are running
into a similar issue with facet paging.

Dmitry



On Mon, Apr 29, 2013 at 11:58 AM, Jan Høydahl jan@cominvent.com wrote:

 Hi,

 How many shards do you have? This is a known issue with deep paging with
 multi shard, see https://issues.apache.org/jira/browse/SOLR-1726

 You may be more successful in going to each shard, one at a time (with
 distrib=false) to avoid this issue.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam abhi.sanou...@gmail.com:

  We have a solr core with about 115 million documents. We are trying to
 migrate data and running a simple query with *:* query and with start and
 rows param.
  The performance is becoming too slow in solr, its taking almost 2 mins
 to get 4000 rows and migration is being just too slow. Logs snippet below:
 
  INFO: [coreName] webapp=/solr path=/select
 params={start=55438000q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=168308
  INFO: [coreName] webapp=/solr path=/select
 params={start=55446000q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=122771
  INFO: [coreName] webapp=/solr path=/select
 params={start=55454000q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=137615
  INFO: [coreName] webapp=/solr path=/select
 params={start=5545q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=141223
  INFO: [coreName] webapp=/solr path=/select
 params={start=55462000q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=97474
  INFO: [coreName] webapp=/solr path=/select
 params={start=55458000q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=98115
  INFO: [coreName] webapp=/solr path=/select
 params={start=55466000q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=143822
  INFO: [coreName] webapp=/solr path=/select
 params={start=55474000q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=118066
  INFO: [coreName] webapp=/solr path=/select
 params={start=5547q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=121498
  INFO: [coreName] webapp=/solr path=/select
 params={start=55482000q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=164062
  INFO: [coreName] webapp=/solr path=/select
 params={start=55478000q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=165518
  INFO: [coreName] webapp=/solr path=/select
 params={start=55486000q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=118163
  INFO: [coreName] webapp=/solr path=/select
 params={start=55494000q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=141642
  INFO: [coreName] webapp=/solr path=/select
 params={start=5549q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=145037
 
 
  I've taken some thread dumps in the solr server and most of the time the
 threads seem to be busy in the following stacks mostly:
  Is there anything that can be done to improve the performance? Is it a
 known issue? Its very surprising that querying for some just rows starting
 at some points is taking in order of minutes.
 
 
  395883378@qtp-162198005-7 prio=10 tid=0x7f4aa0636000 nid=0x295a
 runnable [0x7f42865dd000]
java.lang.Thread.State: RUNNABLE
 at
 org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252)
 at
 org.apache.lucene.util.PriorityQueue.pop(PriorityQueue.java:184)
 at
 org.apache.lucene.search.TopDocsCollector.populateResults(TopDocsCollector.java:61)
 at
 org.apache.lucene.search.TopDocsCollector.topDocs(TopDocsCollector.java:156)
 at
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1499)
 at
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366)
 at
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457)
 at
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410)
 at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
 
 
  1154127582@qtp-162198005-3 prio=10 tid=0x7f4aa0613800 nid=0x2956
 runnable [0x7f42869e1000]
java.lang.Thread.State: RUNNABLE
 at
 org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252)
 at
 

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Abhishek Sanoujam
We have a single shard, and all the data is in a single box only. 
Definitely looks like deep-paging is having problems.


Just to understand, is the searcher looping over the result set 
everytime and skipping the first start count? This will definitely 
take a toll when we reach higher start values.




On 4/29/13 2:28 PM, Jan Høydahl wrote:

Hi,

How many shards do you have? This is a known issue with deep paging with multi 
shard, see https://issues.apache.org/jira/browse/SOLR-1726

You may be more successful in going to each shard, one at a time (with 
distrib=false) to avoid this issue.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam abhi.sanou...@gmail.com:


We have a solr core with about 115 million documents. We are trying to migrate 
data and running a simple query with *:* query and with start and rows param.
The performance is becoming too slow in solr, its taking almost 2 mins to get 
4000 rows and migration is being just too slow. Logs snippet below:

INFO: [coreName] webapp=/solr path=/select 
params={start=55438000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=168308
INFO: [coreName] webapp=/solr path=/select 
params={start=55446000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=122771
INFO: [coreName] webapp=/solr path=/select 
params={start=55454000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=137615
INFO: [coreName] webapp=/solr path=/select 
params={start=5545q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=141223
INFO: [coreName] webapp=/solr path=/select 
params={start=55462000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=97474
INFO: [coreName] webapp=/solr path=/select 
params={start=55458000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=98115
INFO: [coreName] webapp=/solr path=/select 
params={start=55466000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=143822
INFO: [coreName] webapp=/solr path=/select 
params={start=55474000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=118066
INFO: [coreName] webapp=/solr path=/select 
params={start=5547q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=121498
INFO: [coreName] webapp=/solr path=/select 
params={start=55482000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=164062
INFO: [coreName] webapp=/solr path=/select 
params={start=55478000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=165518
INFO: [coreName] webapp=/solr path=/select 
params={start=55486000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=118163
INFO: [coreName] webapp=/solr path=/select 
params={start=55494000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=141642
INFO: [coreName] webapp=/solr path=/select 
params={start=5549q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=145037


I've taken some thread dumps in the solr server and most of the time the 
threads seem to be busy in the following stacks mostly:
Is there anything that can be done to improve the performance? Is it a known 
issue? Its very surprising that querying for some just rows starting at some 
points is taking in order of minutes.


395883378@qtp-162198005-7 prio=10 tid=0x7f4aa0636000 nid=0x295a runnable 
[0x7f42865dd000]
   java.lang.Thread.State: RUNNABLE
at org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252)
at org.apache.lucene.util.PriorityQueue.pop(PriorityQueue.java:184)
at 
org.apache.lucene.search.TopDocsCollector.populateResults(TopDocsCollector.java:61)
at 
org.apache.lucene.search.TopDocsCollector.topDocs(TopDocsCollector.java:156)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1499)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)


1154127582@qtp-162198005-3 prio=10 tid=0x7f4aa0613800 nid=0x2956 runnable 
[0x7f42869e1000]
   java.lang.Thread.State: RUNNABLE
at org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252)
at 

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Dmitry Kan
Abhishek,

There is a wiki regarding this:

http://wiki.apache.org/solr/CommonQueryParameters

search pageDoc and pageScore.


On Mon, Apr 29, 2013 at 1:17 PM, Abhishek Sanoujam
abhi.sanou...@gmail.comwrote:

 We have a single shard, and all the data is in a single box only.
 Definitely looks like deep-paging is having problems.

 Just to understand, is the searcher looping over the result set everytime
 and skipping the first start count? This will definitely take a toll when
 we reach higher start values.




 On 4/29/13 2:28 PM, Jan Høydahl wrote:

 Hi,

 How many shards do you have? This is a known issue with deep paging with
 multi shard, see 
 https://issues.apache.org/**jira/browse/SOLR-1726https://issues.apache.org/jira/browse/SOLR-1726

 You may be more successful in going to each shard, one at a time (with
 distrib=false) to avoid this issue.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam abhi.sanou...@gmail.com
 :

  We have a solr core with about 115 million documents. We are trying to
 migrate data and running a simple query with *:* query and with start and
 rows param.
 The performance is becoming too slow in solr, its taking almost 2 mins
 to get 4000 rows and migration is being just too slow. Logs snippet below:

 INFO: [coreName] webapp=/solr path=/select params={start=55438000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=168308
 INFO: [coreName] webapp=/solr path=/select params={start=55446000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=122771
 INFO: [coreName] webapp=/solr path=/select params={start=55454000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=137615
 INFO: [coreName] webapp=/solr path=/select params={start=5545q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=141223
 INFO: [coreName] webapp=/solr path=/select params={start=55462000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=97474
 INFO: [coreName] webapp=/solr path=/select params={start=55458000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=98115
 INFO: [coreName] webapp=/solr path=/select params={start=55466000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=143822
 INFO: [coreName] webapp=/solr path=/select params={start=55474000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=118066
 INFO: [coreName] webapp=/solr path=/select params={start=5547q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=121498
 INFO: [coreName] webapp=/solr path=/select params={start=55482000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=164062
 INFO: [coreName] webapp=/solr path=/select params={start=55478000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=165518
 INFO: [coreName] webapp=/solr path=/select params={start=55486000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=118163
 INFO: [coreName] webapp=/solr path=/select params={start=55494000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=141642
 INFO: [coreName] webapp=/solr path=/select params={start=5549q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=145037


 I've taken some thread dumps in the solr server and most of the time the
 threads seem to be busy in the following stacks mostly:
 Is there anything that can be done to improve the performance? Is it a
 known issue? Its very surprising that querying for some just rows starting
 at some points is taking in order of minutes.


 395883378@qtp-162198005-7 prio=10 tid=0x7f4aa0636000 nid=0x295a
 runnable [0x7f42865dd000]
java.lang.Thread.State: RUNNABLE
 at org.apache.lucene.util.**PriorityQueue.downHeap(**
 PriorityQueue.java:252)
 at org.apache.lucene.util.**PriorityQueue.pop(**
 PriorityQueue.java:184)
 at org.apache.lucene.search.**TopDocsCollector.**
 populateResults(**TopDocsCollector.java:61)
 at org.apache.lucene.search.**TopDocsCollector.topDocs(**
 TopDocsCollector.java:156)
 at org.apache.solr.search.**SolrIndexSearcher.**getDocListNC(**
 SolrIndexSearcher.java:1499)
 at org.apache.solr.search.**SolrIndexSearcher.getDocListC(**
 SolrIndexSearcher.java:1366)
 at org.apache.solr.search.**SolrIndexSearcher.search(**
 SolrIndexSearcher.java:457)
 at org.apache.solr.handler.**component.QueryComponent.**
 process(QueryComponent.java:**410)
 at org.apache.solr.handler.**component.SearchHandler.**
 handleRequestBody(**SearchHandler.java:208)
 at org.apache.solr.handler.**RequestHandlerBase.**handleRequest(
 **RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.**execute(SolrCore.java:1817)
 at org.apache.solr.servlet.**SolrDispatchFilter.execute(**
 

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Michael Della Bitta
We've found that you can do a lot for yourself by using a filter query
to page through your data if it has a natural range to do so instead
of start and rows.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Mon, Apr 29, 2013 at 6:44 AM, Dmitry Kan solrexp...@gmail.com wrote:
 Abhishek,

 There is a wiki regarding this:

 http://wiki.apache.org/solr/CommonQueryParameters

 search pageDoc and pageScore.


 On Mon, Apr 29, 2013 at 1:17 PM, Abhishek Sanoujam
 abhi.sanou...@gmail.comwrote:

 We have a single shard, and all the data is in a single box only.
 Definitely looks like deep-paging is having problems.

 Just to understand, is the searcher looping over the result set everytime
 and skipping the first start count? This will definitely take a toll when
 we reach higher start values.




 On 4/29/13 2:28 PM, Jan Høydahl wrote:

 Hi,

 How many shards do you have? This is a known issue with deep paging with
 multi shard, see 
 https://issues.apache.org/**jira/browse/SOLR-1726https://issues.apache.org/jira/browse/SOLR-1726

 You may be more successful in going to each shard, one at a time (with
 distrib=false) to avoid this issue.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam abhi.sanou...@gmail.com
 :

  We have a solr core with about 115 million documents. We are trying to
 migrate data and running a simple query with *:* query and with start and
 rows param.
 The performance is becoming too slow in solr, its taking almost 2 mins
 to get 4000 rows and migration is being just too slow. Logs snippet below:

 INFO: [coreName] webapp=/solr path=/select params={start=55438000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=168308
 INFO: [coreName] webapp=/solr path=/select params={start=55446000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=122771
 INFO: [coreName] webapp=/solr path=/select params={start=55454000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=137615
 INFO: [coreName] webapp=/solr path=/select params={start=5545q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=141223
 INFO: [coreName] webapp=/solr path=/select params={start=55462000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=97474
 INFO: [coreName] webapp=/solr path=/select params={start=55458000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=98115
 INFO: [coreName] webapp=/solr path=/select params={start=55466000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=143822
 INFO: [coreName] webapp=/solr path=/select params={start=55474000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=118066
 INFO: [coreName] webapp=/solr path=/select params={start=5547q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=121498
 INFO: [coreName] webapp=/solr path=/select params={start=55482000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=164062
 INFO: [coreName] webapp=/solr path=/select params={start=55478000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=165518
 INFO: [coreName] webapp=/solr path=/select params={start=55486000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=118163
 INFO: [coreName] webapp=/solr path=/select params={start=55494000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=141642
 INFO: [coreName] webapp=/solr path=/select params={start=5549q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=145037


 I've taken some thread dumps in the solr server and most of the time the
 threads seem to be busy in the following stacks mostly:
 Is there anything that can be done to improve the performance? Is it a
 known issue? Its very surprising that querying for some just rows starting
 at some points is taking in order of minutes.


 395883378@qtp-162198005-7 prio=10 tid=0x7f4aa0636000 nid=0x295a
 runnable [0x7f42865dd000]
java.lang.Thread.State: RUNNABLE
 at org.apache.lucene.util.**PriorityQueue.downHeap(**
 PriorityQueue.java:252)
 at org.apache.lucene.util.**PriorityQueue.pop(**
 PriorityQueue.java:184)
 at org.apache.lucene.search.**TopDocsCollector.**
 populateResults(**TopDocsCollector.java:61)
 at org.apache.lucene.search.**TopDocsCollector.topDocs(**
 TopDocsCollector.java:156)
 at org.apache.solr.search.**SolrIndexSearcher.**getDocListNC(**
 SolrIndexSearcher.java:1499)
 at org.apache.solr.search.**SolrIndexSearcher.getDocListC(**
 SolrIndexSearcher.java:1366)
 at org.apache.solr.search.**SolrIndexSearcher.search(**
 SolrIndexSearcher.java:457)
 at 

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Michael Della Bitta
I guess so, you'd have to use a filter query to page through the set
of documents you were faceting against and sum them all at the end.
It's not quite the same operation as paging through results, because
facets are aggregate statistics, but if you're willing to go through
the trouble, I bet it would also help performance.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Mon, Apr 29, 2013 at 9:06 AM, Dmitry Kan solrexp...@gmail.com wrote:
 Michael,

 Interesting! Do (Can) you apply this to facet searches as well?

 Dmitry


 On Mon, Apr 29, 2013 at 4:02 PM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:

 We've found that you can do a lot for yourself by using a filter query
 to page through your data if it has a natural range to do so instead
 of start and rows.

 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Mon, Apr 29, 2013 at 6:44 AM, Dmitry Kan solrexp...@gmail.com wrote:
  Abhishek,
 
  There is a wiki regarding this:
 
  http://wiki.apache.org/solr/CommonQueryParameters
 
  search pageDoc and pageScore.
 
 
  On Mon, Apr 29, 2013 at 1:17 PM, Abhishek Sanoujam
  abhi.sanou...@gmail.comwrote:
 
  We have a single shard, and all the data is in a single box only.
  Definitely looks like deep-paging is having problems.
 
  Just to understand, is the searcher looping over the result set
 everytime
  and skipping the first start count? This will definitely take a toll
 when
  we reach higher start values.
 
 
 
 
  On 4/29/13 2:28 PM, Jan Høydahl wrote:
 
  Hi,
 
  How many shards do you have? This is a known issue with deep paging
 with
  multi shard, see https://issues.apache.org/**jira/browse/SOLR-1726
 https://issues.apache.org/jira/browse/SOLR-1726
 
  You may be more successful in going to each shard, one at a time (with
  distrib=false) to avoid this issue.
 
  --
  Jan Høydahl, search solution architect
  Cominvent AS - www.cominvent.com
  Solr Training - www.solrtraining.com
 
  29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam 
 abhi.sanou...@gmail.com
  :
 
   We have a solr core with about 115 million documents. We are trying to
  migrate data and running a simple query with *:* query and with start
 and
  rows param.
  The performance is becoming too slow in solr, its taking almost 2 mins
  to get 4000 rows and migration is being just too slow. Logs snippet
 below:
 
  INFO: [coreName] webapp=/solr path=/select
 params={start=55438000q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=168308
  INFO: [coreName] webapp=/solr path=/select
 params={start=55446000q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=122771
  INFO: [coreName] webapp=/solr path=/select
 params={start=55454000q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=137615
  INFO: [coreName] webapp=/solr path=/select
 params={start=5545q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=141223
  INFO: [coreName] webapp=/solr path=/select
 params={start=55462000q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=97474
  INFO: [coreName] webapp=/solr path=/select
 params={start=55458000q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=98115
  INFO: [coreName] webapp=/solr path=/select
 params={start=55466000q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=143822
  INFO: [coreName] webapp=/solr path=/select
 params={start=55474000q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=118066
  INFO: [coreName] webapp=/solr path=/select
 params={start=5547q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=121498
  INFO: [coreName] webapp=/solr path=/select
 params={start=55482000q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=164062
  INFO: [coreName] webapp=/solr path=/select
 params={start=55478000q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=165518
  INFO: [coreName] webapp=/solr path=/select
 params={start=55486000q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=118163
  INFO: [coreName] webapp=/solr path=/select
 params={start=55494000q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=141642
  INFO: [coreName] webapp=/solr path=/select
 params={start=5549q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=145037
 
 
  I've taken some thread dumps in the solr server and most of the time
 the
  threads seem to be busy in the following stacks mostly:
  Is there anything that can be done to improve the performance? Is it a
  known issue? Its very surprising that querying for some just rows
 starting
  at some 

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Dmitry Kan
Thanks.

Only question is how to smoothly transition to this model. Our facet
(string) fields contain timestamp prefixes, that are reverse ordered
starting from the freshest value. In theory, we could try computing the
filter queries for those. But before doing so, we would need the matched
ids from solr, so it becomes at least 2 pass algorithm?

The biggest concern in general we have with the paging is that the system
seems to pass way more data back and forth, than is needed for computing
the values.


On Mon, Apr 29, 2013 at 4:14 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 I guess so, you'd have to use a filter query to page through the set
 of documents you were faceting against and sum them all at the end.
 It's not quite the same operation as paging through results, because
 facets are aggregate statistics, but if you're willing to go through
 the trouble, I bet it would also help performance.

 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Mon, Apr 29, 2013 at 9:06 AM, Dmitry Kan solrexp...@gmail.com wrote:
  Michael,
 
  Interesting! Do (Can) you apply this to facet searches as well?
 
  Dmitry
 
 
  On Mon, Apr 29, 2013 at 4:02 PM, Michael Della Bitta 
  michael.della.bi...@appinions.com wrote:
 
  We've found that you can do a lot for yourself by using a filter query
  to page through your data if it has a natural range to do so instead
  of start and rows.
 
  Michael Della Bitta
 
  
  Appinions
  18 East 41st Street, 2nd Floor
  New York, NY 10017-6271
 
  www.appinions.com
 
  Where Influence Isn’t a Game
 
 
  On Mon, Apr 29, 2013 at 6:44 AM, Dmitry Kan solrexp...@gmail.com
 wrote:
   Abhishek,
  
   There is a wiki regarding this:
  
   http://wiki.apache.org/solr/CommonQueryParameters
  
   search pageDoc and pageScore.
  
  
   On Mon, Apr 29, 2013 at 1:17 PM, Abhishek Sanoujam
   abhi.sanou...@gmail.comwrote:
  
   We have a single shard, and all the data is in a single box only.
   Definitely looks like deep-paging is having problems.
  
   Just to understand, is the searcher looping over the result set
  everytime
   and skipping the first start count? This will definitely take a
 toll
  when
   we reach higher start values.
  
  
  
  
   On 4/29/13 2:28 PM, Jan Høydahl wrote:
  
   Hi,
  
   How many shards do you have? This is a known issue with deep paging
  with
   multi shard, see https://issues.apache.org/**jira/browse/SOLR-1726
  https://issues.apache.org/jira/browse/SOLR-1726
  
   You may be more successful in going to each shard, one at a time
 (with
   distrib=false) to avoid this issue.
  
   --
   Jan Høydahl, search solution architect
   Cominvent AS - www.cominvent.com
   Solr Training - www.solrtraining.com
  
   29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam 
  abhi.sanou...@gmail.com
   :
  
We have a solr core with about 115 million documents. We are
 trying to
   migrate data and running a simple query with *:* query and with
 start
  and
   rows param.
   The performance is becoming too slow in solr, its taking almost 2
 mins
   to get 4000 rows and migration is being just too slow. Logs snippet
  below:
  
   INFO: [coreName] webapp=/solr path=/select
  params={start=55438000q=*:*
   **wt=javabinversion=2rows=**4000} hits=115760479 status=0
  QTime=168308
   INFO: [coreName] webapp=/solr path=/select
  params={start=55446000q=*:*
   **wt=javabinversion=2rows=**4000} hits=115760479 status=0
  QTime=122771
   INFO: [coreName] webapp=/solr path=/select
  params={start=55454000q=*:*
   **wt=javabinversion=2rows=**4000} hits=115760479 status=0
  QTime=137615
   INFO: [coreName] webapp=/solr path=/select
  params={start=5545q=*:*
   **wt=javabinversion=2rows=**4000} hits=115760479 status=0
  QTime=141223
   INFO: [coreName] webapp=/solr path=/select
  params={start=55462000q=*:*
   **wt=javabinversion=2rows=**4000} hits=115760479 status=0
  QTime=97474
   INFO: [coreName] webapp=/solr path=/select
  params={start=55458000q=*:*
   **wt=javabinversion=2rows=**4000} hits=115760479 status=0
  QTime=98115
   INFO: [coreName] webapp=/solr path=/select
  params={start=55466000q=*:*
   **wt=javabinversion=2rows=**4000} hits=115760479 status=0
  QTime=143822
   INFO: [coreName] webapp=/solr path=/select
  params={start=55474000q=*:*
   **wt=javabinversion=2rows=**4000} hits=115760479 status=0
  QTime=118066
   INFO: [coreName] webapp=/solr path=/select
  params={start=5547q=*:*
   **wt=javabinversion=2rows=**4000} hits=115760479 status=0
  QTime=121498
   INFO: [coreName] webapp=/solr path=/select
  params={start=55482000q=*:*
   **wt=javabinversion=2rows=**4000} hits=115760479 status=0
  QTime=164062
   INFO: [coreName] webapp=/solr path=/select
  params={start=55478000q=*:*
   **wt=javabinversion=2rows=**4000} hits=115760479 status=0
  

Re: Occasional Solr performance issues

2012-10-29 Thread Dotan Cohen
On Mon, Oct 29, 2012 at 7:04 AM, Shawn Heisey s...@elyograg.org wrote:
 They are indeed Java options.  The first two control the maximum and
 starting heap sizes.  NewRatio controls the relative size of the young and
 old generations, making the young generation considerably larger than it is
 by default.  The others are garbage collector options.  This seems to be a
 good summary:

 http://www.petefreitag.com/articles/gctuning/

 Here's the official Sun (Oracle) documentation on GC tuning:

 http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html


Thank you Shawn! Those are exactly the documents that I need. Google
should hire you to fill in the pages when someone searches for java
garbage collection. Interestingly, I just check and bing.com does
list the Oracle page on the first pager of results. I shudder to think
that I might have to switch search engines!

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-28 Thread Dotan Cohen
On Fri, Oct 26, 2012 at 11:04 PM, Shawn Heisey s...@elyograg.org wrote:
 Warming doesn't seem to be a problem here -- all your warm times are zero,
 so I am going to take a guess that it may be a heap/GC issue.  I would
 recommend starting with the following additional arguments to your JVM.
 Since I have no idea how solr gets started on your server, I don't know
 where you would add these:

 -Xmx4096M -Xms4096M -XX:NewRatio=1 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
 -XX:+CMSParallelRemarkEnabled


Thanks. I've added those flags to the Solr line that I use to start
Solr. Those are Java flags, not Solr, correct? I'm googling the flags
now, but I find it interesting that I cannot find a canonical
reference for them.


 This allocates 4GB of RAM to java, sets up a larger than normal Eden space
 in the heap, and uses garbage collection options that usually fare better in
 a server environment than the default.Java memory management options are
 like religion to some people ... I may start a flamewar with these
 recommendations. ;)  The best I can tell you about these choices: They made
 a big difference for me.


Thanks. I will experiment with them empirically. The first step is to
learn to read the debug info, though. I've been googing for days, but
I must be missing something. Where is the information that I pasted in
pastebin documented?


 I would also recommend switching to a Sun/Oracle jvm.  I have heard that
 previous versions of Solr were not happy on variants like OpenJDK, I have no
 idea whether that might still be the case with 4.0.  If you choose to do
 this, you probably have package choices in Ubuntu.  I know that in Debian,
 the package is called sun-java6-jre ... Ubuntu is probably something
 similar. Debian has a CLI command 'update-java-alternatives' that will
 quickly switch between different java implementations that are installed.
 Hopefully Ubuntu also has this.  If not, you might need the following
 command instead to switch the main java executable:

 update-alternatives --config java


Thanks, I will take a look at the current Oracle JVM.


-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-28 Thread Shawn Heisey

On 10/28/2012 2:28 PM, Dotan Cohen wrote:

On Fri, Oct 26, 2012 at 11:04 PM, Shawn Heisey s...@elyograg.org wrote:

Warming doesn't seem to be a problem here -- all your warm times are zero,
so I am going to take a guess that it may be a heap/GC issue.  I would
recommend starting with the following additional arguments to your JVM.
Since I have no idea how solr gets started on your server, I don't know
where you would add these:

-Xmx4096M -Xms4096M -XX:NewRatio=1 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled

Thanks. I've added those flags to the Solr line that I use to start
Solr. Those are Java flags, not Solr, correct? I'm googling the flags
now, but I find it interesting that I cannot find a canonical
reference for them.


They are indeed Java options.  The first two control the maximum and 
starting heap sizes.  NewRatio controls the relative size of the young 
and old generations, making the young generation considerably larger 
than it is by default.  The others are garbage collector options.  This 
seems to be a good summary:


http://www.petefreitag.com/articles/gctuning/

Here's the official Sun (Oracle) documentation on GC tuning:

http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html

Thanks,
Shawn



Re: Occasional Solr performance issues

2012-10-26 Thread Dotan Cohen
On Wed, Oct 24, 2012 at 4:33 PM, Walter Underwood wun...@wunderwood.org wrote:
 Please consider never running optimize. That should be called force merge.


Thanks. I have been letting the system run for about two days already
without an optimize. I will let it run a week, then merge to see the
effect.

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-26 Thread Dotan Cohen
I spoke too soon! Wereas three days ago when the index was new 500
records could be written to it in 3 seconds, now that operation is
taking a minute and a half, sometimes longer. I ran optimize() but
that did not help the writes. What can I do to improve the write
performance?

Even opening the Logging tab of the Solr instance is taking quite a
long time. In fact, I just left it for 20 minutes and it still hasn't
come back with anything. I do have an SSH window open on the server
hosting Solr and it doesn't look overloaded at all:

$ date  du -sh data/  uptime  free -m
Fri Oct 26 13:15:59 UTC 2012
578Mdata/
 13:15:59 up 4 days, 17:59,  1 user,  load average: 0.06, 0.12, 0.22
 total   used   free sharedbuffers cached
Mem: 14980   3237  11743  0284   
-/+ buffers/cache:729  14250
Swap:0  0  0


-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-26 Thread Shawn Heisey

On 10/26/2012 7:16 AM, Dotan Cohen wrote:

I spoke too soon! Wereas three days ago when the index was new 500
records could be written to it in 3 seconds, now that operation is
taking a minute and a half, sometimes longer. I ran optimize() but
that did not help the writes. What can I do to improve the write
performance?

Even opening the Logging tab of the Solr instance is taking quite a
long time. In fact, I just left it for 20 minutes and it still hasn't
come back with anything. I do have an SSH window open on the server
hosting Solr and it doesn't look overloaded at all:

$ date  du -sh data/  uptime  free -m
Fri Oct 26 13:15:59 UTC 2012
578Mdata/
  13:15:59 up 4 days, 17:59,  1 user,  load average: 0.06, 0.12, 0.22
  total   used   free sharedbuffers cached
Mem: 14980   3237  11743  0284   
-/+ buffers/cache:729  14250
Swap:0  0  0


Taking all the information I've seen so far, my bet is on either cache 
warming or heap/GC trouble as the source of your problem.  It's now 
specific information gathering time.  Can you gather all the following 
information and put it into a web paste page, such as pastie.org, and 
reply with the link?  I have gathered the same information from my test 
server and created a pastie example. http://pastie.org/5118979


On the dashboard of the GUI, it lists all the jvm arguments. Include those.

Click Java Properties and gather the java.runtime.version and 
java.specification.vendor information.


After one of the long update times, pause/stop your indexing 
application.  Click on your core in the GUI, open Plugins/Stats, and 
paste the following bits with a header to indicate what each section is:

CACHE-filterCache
CACHE-queryResultCache
CORE-searcher

Thanks,
Shawn



Re: Occasional Solr performance issues

2012-10-26 Thread Dotan Cohen
On Fri, Oct 26, 2012 at 4:02 PM, Shawn Heisey s...@elyograg.org wrote:

 Taking all the information I've seen so far, my bet is on either cache
 warming or heap/GC trouble as the source of your problem.  It's now specific
 information gathering time.  Can you gather all the following information
 and put it into a web paste page, such as pastie.org, and reply with the
 link?  I have gathered the same information from my test server and created
 a pastie example. http://pastie.org/5118979

 On the dashboard of the GUI, it lists all the jvm arguments. Include those.

 Click Java Properties and gather the java.runtime.version and
 java.specification.vendor information.

 After one of the long update times, pause/stop your indexing application.
 Click on your core in the GUI, open Plugins/Stats, and paste the following
 bits with a header to indicate what each section is:
 CACHE-filterCache
 CACHE-queryResultCache
 CORE-searcher

 Thanks,
 Shawn


Thank you Shawn. The information is here:
http://pastebin.com/aqEfeYVA

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-26 Thread Shawn Heisey

On 10/26/2012 9:41 AM, Dotan Cohen wrote:

On the dashboard of the GUI, it lists all the jvm arguments. Include those.

Click Java Properties and gather the java.runtime.version and
java.specification.vendor information.

After one of the long update times, pause/stop your indexing application.
Click on your core in the GUI, open Plugins/Stats, and paste the following
bits with a header to indicate what each section is:
CACHE-filterCache
CACHE-queryResultCache
CORE-searcher

Thanks,
Shawn

Thank you Shawn. The information is here:
http://pastebin.com/aqEfeYVA



Warming doesn't seem to be a problem here -- all your warm times are 
zero, so I am going to take a guess that it may be a heap/GC issue.  I 
would recommend starting with the following additional arguments to your 
JVM.  Since I have no idea how solr gets started on your server, I don't 
know where you would add these:


-Xmx4096M -Xms4096M -XX:NewRatio=1 -XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled


This allocates 4GB of RAM to java, sets up a larger than normal Eden 
space in the heap, and uses garbage collection options that usually fare 
better in a server environment than the default.Java memory management 
options are like religion to some people ... I may start a flamewar with 
these recommendations. ;)  The best I can tell you about these choices: 
They made a big difference for me.


I would also recommend switching to a Sun/Oracle jvm.  I have heard that 
previous versions of Solr were not happy on variants like OpenJDK, I 
have no idea whether that might still be the case with 4.0.  If you 
choose to do this, you probably have package choices in Ubuntu.  I know 
that in Debian, the package is called sun-java6-jre ... Ubuntu is 
probably something similar. Debian has a CLI command 
'update-java-alternatives' that will quickly switch between different 
java implementations that are installed.  Hopefully Ubuntu also has 
this.  If not, you might need the following command instead to switch 
the main java executable:


update-alternatives --config java

Thanks,
Shawn



Re: Occasional Solr performance issues

2012-10-24 Thread Dotan Cohen
On Tue, Oct 23, 2012 at 3:07 PM, Erick Erickson erickerick...@gmail.com wrote:
 Maybe you've been looking at it but one thing that I didn't see on a fast
 scan was that maybe the commit bit is the problem. When you commit,
 eventually the segments will be merged and a new searcher will be opened
 (this is true even if you're NOT optimizing). So you're effectively committing
 every 1-2 seconds, creating many segments which get merged, but more
 importantly opening new searchers (which you are getting since you pasted
 the message: Overlapping onDeckSearchers=2).

 You could pinpoint this by NOT committing explicitly, just set your autocommit
 parameters (or specify commitWithin in your indexing program, which is
 preferred). Try setting it at a minute or so and see if your problem goes away
 perhaps?

 The NRT stuff happens on soft commits, so you have that option to have the
 documents immediately available for search.



Thanks, Erick. I'll play around with different configurations. So far
just removing the periodic optimize command worked wonders. I'll see
how much it helps or hurts to run that daily or more or less frequent.


-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-24 Thread Walter Underwood
Please consider never running optimize. That should be called force merge. 

wunder

On Oct 24, 2012, at 3:28 AM, Dotan Cohen wrote:

 On Tue, Oct 23, 2012 at 3:07 PM, Erick Erickson erickerick...@gmail.com 
 wrote:
 Maybe you've been looking at it but one thing that I didn't see on a fast
 scan was that maybe the commit bit is the problem. When you commit,
 eventually the segments will be merged and a new searcher will be opened
 (this is true even if you're NOT optimizing). So you're effectively 
 committing
 every 1-2 seconds, creating many segments which get merged, but more
 importantly opening new searchers (which you are getting since you pasted
 the message: Overlapping onDeckSearchers=2).
 
 You could pinpoint this by NOT committing explicitly, just set your 
 autocommit
 parameters (or specify commitWithin in your indexing program, which is
 preferred). Try setting it at a minute or so and see if your problem goes 
 away
 perhaps?
 
 The NRT stuff happens on soft commits, so you have that option to have the
 documents immediately available for search.
 
 
 
 Thanks, Erick. I'll play around with different configurations. So far
 just removing the periodic optimize command worked wonders. I'll see
 how much it helps or hurts to run that daily or more or less frequent.
 
 
 -- 
 Dotan Cohen
 
 http://gibberish.co.il
 http://what-is-what.com






Re: Occasional Solr performance issues

2012-10-23 Thread Erick Erickson
Maybe you've been looking at it but one thing that I didn't see on a fast
scan was that maybe the commit bit is the problem. When you commit,
eventually the segments will be merged and a new searcher will be opened
(this is true even if you're NOT optimizing). So you're effectively committing
every 1-2 seconds, creating many segments which get merged, but more
importantly opening new searchers (which you are getting since you pasted
the message: Overlapping onDeckSearchers=2).

You could pinpoint this by NOT committing explicitly, just set your autocommit
parameters (or specify commitWithin in your indexing program, which is
preferred). Try setting it at a minute or so and see if your problem goes away
perhaps?

The NRT stuff happens on soft commits, so you have that option to have the
documents immediately available for search.

Best
Erick

On Mon, Oct 22, 2012 at 10:44 AM, Dotan Cohen dotanco...@gmail.com wrote:
 I've got a script writing ~50 documents to Solr at a time, then
 commiting. Each of these documents is no longer than 1 KiB of text,
 some much less. Usually the write-and-commit will take 1-2 seconds or
 less, but sometimes it can go over 60 seconds.

 During a recent time of over-60-second write-and-commits, I saw that
 the server did not look overloaded:

 $ uptime
  14:36:46 up 19:20,  1 user,  load average: 1.08, 1.16, 1.16
 $ free -m
  total   used   free sharedbuffers cached
 Mem: 14980   2091  12889  0233   1243
 -/+ buffers/cache:613  14366
 Swap:0  0  0

 Other than Solr, nothing is running on this machine other than stock
 Ubuntu Server services (no Apache, no MySQL). The machine is running
 on an Extra Large Amazon EC2 instance, with a virtual 4-core 2.4 GHz
 Xeon processor and ~16 GiB of RAM. The solr home is on a mounted EBS
 volume.

 What might make some queries take so long, while others perform fine?

 Thanks.


 --
 Dotan Cohen

 http://gibberish.co.il
 http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-22 Thread Dotan Cohen
When Solr is slow, I'm seeing these in the logs:
[collection1] Error opening new searcher. exceeded limit of
maxWarmingSearchers=2,​ try again later.
[collection1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2

Googling, I found this in the FAQ:
Typically the way to avoid this error is to either reduce the
frequency of commits, or reduce the amount of warming a searcher does
while it's on deck (by reducing the work in newSearcher listeners,
and/or reducing the autowarmCount on your caches)
http://wiki.apache.org/solr/FAQ#What_does_.22PERFORMANCE_WARNING:_Overlapping_onDeckSearchers.3DX.22_mean_in_my_logs.3F

I happen to know that the script will try to commit once every 60
seconds. How does one reduce the work in newSearcher listeners? What
effect will this have? What effect will reducing the autowarmCount on
caches have?

Thanks.

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-22 Thread Rafał Kuć
Hello!

You can check if the long warming is causing the overlapping
searchers. Check Solr admin panel and look at cache statistics, there
should be warmupTime property.

Lowering the autowarmCount should lower the time needed to warm up,
howere you can also look at your warming queries (if you have such)
and see how long they take.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 When Solr is slow, I'm seeing these in the logs:
 [collection1] Error opening new searcher. exceeded limit of
 maxWarmingSearchers=2,​ try again later.
 [collection1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2

 Googling, I found this in the FAQ:
 Typically the way to avoid this error is to either reduce the
 frequency of commits, or reduce the amount of warming a searcher does
 while it's on deck (by reducing the work in newSearcher listeners,
 and/or reducing the autowarmCount on your caches)
 http://wiki.apache.org/solr/FAQ#What_does_.22PERFORMANCE_WARNING:_Overlapping_onDeckSearchers.3DX.22_mean_in_my_logs.3F

 I happen to know that the script will try to commit once every 60
 seconds. How does one reduce the work in newSearcher listeners? What
 effect will this have? What effect will reducing the autowarmCount on
 caches have?

 Thanks.



Re: Occasional Solr performance issues

2012-10-22 Thread Mark Miller
Are you using Solr 3X? The occasional long commit should no longer
show up in Solr 4.

- Mark

On Mon, Oct 22, 2012 at 10:44 AM, Dotan Cohen dotanco...@gmail.com wrote:
 I've got a script writing ~50 documents to Solr at a time, then
 commiting. Each of these documents is no longer than 1 KiB of text,
 some much less. Usually the write-and-commit will take 1-2 seconds or
 less, but sometimes it can go over 60 seconds.

 During a recent time of over-60-second write-and-commits, I saw that
 the server did not look overloaded:

 $ uptime
  14:36:46 up 19:20,  1 user,  load average: 1.08, 1.16, 1.16
 $ free -m
  total   used   free sharedbuffers cached
 Mem: 14980   2091  12889  0233   1243
 -/+ buffers/cache:613  14366
 Swap:0  0  0

 Other than Solr, nothing is running on this machine other than stock
 Ubuntu Server services (no Apache, no MySQL). The machine is running
 on an Extra Large Amazon EC2 instance, with a virtual 4-core 2.4 GHz
 Xeon processor and ~16 GiB of RAM. The solr home is on a mounted EBS
 volume.

 What might make some queries take so long, while others perform fine?

 Thanks.


 --
 Dotan Cohen

 http://gibberish.co.il
 http://what-is-what.com



-- 
- Mark


Re: Occasional Solr performance issues

2012-10-22 Thread Dotan Cohen
On Mon, Oct 22, 2012 at 5:02 PM, Rafał Kuć r@solr.pl wrote:
 Hello!

 You can check if the long warming is causing the overlapping
 searchers. Check Solr admin panel and look at cache statistics, there
 should be warmupTime property.


Thank you, I have gone over the Solr admin panel twice and I cannot
find the cache statistics. Where are they?


 Lowering the autowarmCount should lower the time needed to warm up,
 howere you can also look at your warming queries (if you have such)
 and see how long they take.


Thank you, I will look at that!

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-22 Thread Dotan Cohen
On Mon, Oct 22, 2012 at 5:27 PM, Mark Miller markrmil...@gmail.com wrote:
 Are you using Solr 3X? The occasional long commit should no longer
 show up in Solr 4.


Thank you Mark. In fact, this is the production release of Solr 4.


-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-22 Thread Shawn Heisey

On 10/22/2012 9:58 AM, Dotan Cohen wrote:
Thank you, I have gone over the Solr admin panel twice and I cannot 
find the cache statistics. Where are they?


If you are running Solr4, you can see individual cache autowarming times 
here, assuming your core is named collection1:


http://server:port/solr/#/collection1/plugins/cache?entry=queryResultCache
http://server:port/solr/#/collection1/plugins/cache?entry=filterCache

The warmup time for the entire searcher can be found here:

http://server:port/solr/#/collection1/plugins/core?entry=searcher


If you are on an older Solr release, everything is in various sections 
of the stats page.  Do a page search for warmup multiple times to see 
them all:


http://server:port/solr/corename/admin/stats.jsp

Thanks,
Shawn



Re: Occasional Solr performance issues

2012-10-22 Thread Dotan Cohen
On Mon, Oct 22, 2012 at 7:29 PM, Shawn Heisey s...@elyograg.org wrote:
 On 10/22/2012 9:58 AM, Dotan Cohen wrote:

 Thank you, I have gone over the Solr admin panel twice and I cannot find
 the cache statistics. Where are they?


 If you are running Solr4, you can see individual cache autowarming times
 here, assuming your core is named collection1:

 http://server:port/solr/#/collection1/plugins/cache?entry=queryResultCache
 http://server:port/solr/#/collection1/plugins/cache?entry=filterCache

 The warmup time for the entire searcher can be found here:

 http://server:port/solr/#/collection1/plugins/core?entry=searcher



Thank you Shawn! I can see how I missed that data. I'm reviewing it
now. Solr has a low barrier to entry, but quite a learning curve. I'm
loving it!

I see that the server is using less than 2 GiB of memory, whereas it
is a dedicated Solr server with 16 GiB of memory. I understand that I
can increase the query and document caches to increase performance,
but I worry that this will increase the warm-up time to unacceptable
levels. What is a good strategy for increasing the caches yet
preserving performance after an optimize operation?

Thanks.

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-22 Thread Mark Miller
Perhaps you can grab a snapshot of the stack traces when the 60 second
delay is occurring?

You can get the stack traces right in the admin ui, or you can use
another tool (jconsole, visualvm, jstack cmd line, etc)

- Mark

On Mon, Oct 22, 2012 at 1:47 PM, Dotan Cohen dotanco...@gmail.com wrote:
 On Mon, Oct 22, 2012 at 7:29 PM, Shawn Heisey s...@elyograg.org wrote:
 On 10/22/2012 9:58 AM, Dotan Cohen wrote:

 Thank you, I have gone over the Solr admin panel twice and I cannot find
 the cache statistics. Where are they?


 If you are running Solr4, you can see individual cache autowarming times
 here, assuming your core is named collection1:

 http://server:port/solr/#/collection1/plugins/cache?entry=queryResultCache
 http://server:port/solr/#/collection1/plugins/cache?entry=filterCache

 The warmup time for the entire searcher can be found here:

 http://server:port/solr/#/collection1/plugins/core?entry=searcher



 Thank you Shawn! I can see how I missed that data. I'm reviewing it
 now. Solr has a low barrier to entry, but quite a learning curve. I'm
 loving it!

 I see that the server is using less than 2 GiB of memory, whereas it
 is a dedicated Solr server with 16 GiB of memory. I understand that I
 can increase the query and document caches to increase performance,
 but I worry that this will increase the warm-up time to unacceptable
 levels. What is a good strategy for increasing the caches yet
 preserving performance after an optimize operation?

 Thanks.

 --
 Dotan Cohen

 http://gibberish.co.il
 http://what-is-what.com



-- 
- Mark


Re: Occasional Solr performance issues

2012-10-22 Thread Dotan Cohen
On Mon, Oct 22, 2012 at 9:22 PM, Mark Miller markrmil...@gmail.com wrote:
 Perhaps you can grab a snapshot of the stack traces when the 60 second
 delay is occurring?

 You can get the stack traces right in the admin ui, or you can use
 another tool (jconsole, visualvm, jstack cmd line, etc)

Thanks. I've refactored so that the index is optimized once per hour,
instead after each dump of commits. But when I will need to increase
the optmize frequency in the future I will go through the stack
traces. Thanks!

In any case, the server has an extra 14 GiB of memory available, how
might I make the best use of that for Solr assuming both heavy reads
and writes?

Thanks.

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-22 Thread Walter Underwood
First, stop optimizing. You do not need to manually force merges. The system 
does a great job. Forcing merges (optimize) uses a lot of CPU and disk IO and 
might be the cause of your problem.

Second, the OS will use the extra memory for file buffers, which really helps 
performance, so you might not need to do anything. This will work better after 
you stop forcing merges. A forced merge replaces every file, so the OS needs to 
reload everything into file buffers.

wunder

On Oct 22, 2012, at 12:55 PM, Dotan Cohen wrote:

 On Mon, Oct 22, 2012 at 9:22 PM, Mark Miller markrmil...@gmail.com wrote:
 Perhaps you can grab a snapshot of the stack traces when the 60 second
 delay is occurring?
 
 You can get the stack traces right in the admin ui, or you can use
 another tool (jconsole, visualvm, jstack cmd line, etc)
 
 Thanks. I've refactored so that the index is optimized once per hour,
 instead after each dump of commits. But when I will need to increase
 the optmize frequency in the future I will go through the stack
 traces. Thanks!
 
 In any case, the server has an extra 14 GiB of memory available, how
 might I make the best use of that for Solr assuming both heavy reads
 and writes?
 
 Thanks.
 
 -- 
 Dotan Cohen
 
 http://gibberish.co.il
 http://what-is-what.com






Re: Occasional Solr performance issues

2012-10-22 Thread Michael Della Bitta
Has the Solr team considered renaming the optimize function to avoid
leading people down the path of this antipattern?

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Mon, Oct 22, 2012 at 4:01 PM, Walter Underwood wun...@wunderwood.org wrote:
 First, stop optimizing. You do not need to manually force merges. The system 
 does a great job. Forcing merges (optimize) uses a lot of CPU and disk IO and 
 might be the cause of your problem.

 Second, the OS will use the extra memory for file buffers, which really 
 helps performance, so you might not need to do anything. This will work 
 better after you stop forcing merges. A forced merge replaces every file, so 
 the OS needs to reload everything into file buffers.

 wunder

 On Oct 22, 2012, at 12:55 PM, Dotan Cohen wrote:

 On Mon, Oct 22, 2012 at 9:22 PM, Mark Miller markrmil...@gmail.com wrote:
 Perhaps you can grab a snapshot of the stack traces when the 60 second
 delay is occurring?

 You can get the stack traces right in the admin ui, or you can use
 another tool (jconsole, visualvm, jstack cmd line, etc)

 Thanks. I've refactored so that the index is optimized once per hour,
 instead after each dump of commits. But when I will need to increase
 the optmize frequency in the future I will go through the stack
 traces. Thanks!

 In any case, the server has an extra 14 GiB of memory available, how
 might I make the best use of that for Solr assuming both heavy reads
 and writes?

 Thanks.

 --
 Dotan Cohen

 http://gibberish.co.il
 http://what-is-what.com






Re: Occasional Solr performance issues

2012-10-22 Thread Walter Underwood
Lucene already did that:

https://issues.apache.org/jira/browse/LUCENE-3454

Here is the Solr issue:

https://issues.apache.org/jira/browse/SOLR-3141

People over-use this regardless of the name. In Ultraseek Server, it was called 
force merge and we had to tell people to stop doing that nearly every month.

wunder

On Oct 22, 2012, at 1:39 PM, Michael Della Bitta wrote:

 Has the Solr team considered renaming the optimize function to avoid
 leading people down the path of this antipattern?
 
 Michael Della Bitta
 
 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271
 
 www.appinions.com
 
 Where Influence Isn’t a Game
 
 
 On Mon, Oct 22, 2012 at 4:01 PM, Walter Underwood wun...@wunderwood.org 
 wrote:
 First, stop optimizing. You do not need to manually force merges. The system 
 does a great job. Forcing merges (optimize) uses a lot of CPU and disk IO 
 and might be the cause of your problem.
 
 Second, the OS will use the extra memory for file buffers, which really 
 helps performance, so you might not need to do anything. This will work 
 better after you stop forcing merges. A forced merge replaces every file, so 
 the OS needs to reload everything into file buffers.
 
 wunder
 
 On Oct 22, 2012, at 12:55 PM, Dotan Cohen wrote:
 
 On Mon, Oct 22, 2012 at 9:22 PM, Mark Miller markrmil...@gmail.com wrote:
 Perhaps you can grab a snapshot of the stack traces when the 60 second
 delay is occurring?
 
 You can get the stack traces right in the admin ui, or you can use
 another tool (jconsole, visualvm, jstack cmd line, etc)
 
 Thanks. I've refactored so that the index is optimized once per hour,
 instead after each dump of commits. But when I will need to increase
 the optmize frequency in the future I will go through the stack
 traces. Thanks!
 
 In any case, the server has an extra 14 GiB of memory available, how
 might I make the best use of that for Solr assuming both heavy reads
 and writes?
 
 Thanks.
 
 --
 Dotan Cohen
 
 http://gibberish.co.il
 http://what-is-what.com
 
 
 
 

--
Walter Underwood
wun...@wunderwood.org





Re: Occasional Solr performance issues

2012-10-22 Thread Yonik Seeley
On Mon, Oct 22, 2012 at 4:39 PM, Michael Della Bitta
michael.della.bi...@appinions.com wrote:
 Has the Solr team considered renaming the optimize function to avoid
 leading people down the path of this antipattern?

If it were never the right thing to do, it could simply be removed.
The problem is that it's sometimes the right thing to do - but it
depends heavily on the use cases and trade-offs.  The best thing is to
simply document what it does and the cost of doing it.

-Yonik
http://lucidworks.com


Re: Occasional Solr performance issues

2012-10-22 Thread Dotan Cohen
On Mon, Oct 22, 2012 at 10:01 PM, Walter Underwood
wun...@wunderwood.org wrote:
 First, stop optimizing. You do not need to manually force merges. The system 
 does a great job. Forcing merges (optimize) uses a lot of CPU and disk IO and 
 might be the cause of your problem.


Thanks. Looking at the index statistics, I see that within minutes
after running optimize that the stats say the index needs to be
reoptimized. Though, the index still reads and writes fine even in
that state.


 Second, the OS will use the extra memory for file buffers, which really 
 helps performance, so you might not need to do anything. This will work 
 better after you stop forcing merges. A forced merge replaces every file, so 
 the OS needs to reload everything into file buffers.


I don't see that the memory is being used:

$ free -g
 total   used   free sharedbuffers cached
Mem:14  2 12  0  0  1
-/+ buffers/cache:  0 14
Swap:0  0  0

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-22 Thread Dotan Cohen
On Mon, Oct 22, 2012 at 10:44 PM, Walter Underwood
wun...@wunderwood.org wrote:
 Lucene already did that:

 https://issues.apache.org/jira/browse/LUCENE-3454

 Here is the Solr issue:

 https://issues.apache.org/jira/browse/SOLR-3141

 People over-use this regardless of the name. In Ultraseek Server, it was 
 called force merge and we had to tell people to stop doing that nearly 
 every month.


Thank you for those links. I commented on the Solr bug. There are some
very insightful comments in there.


-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Occasional Solr performance issues

2012-10-22 Thread Shawn Heisey

On 10/22/2012 3:11 PM, Dotan Cohen wrote:

On Mon, Oct 22, 2012 at 10:01 PM, Walter Underwood
wun...@wunderwood.org wrote:

First, stop optimizing. You do not need to manually force merges. The system 
does a great job. Forcing merges (optimize) uses a lot of CPU and disk IO and 
might be the cause of your problem.


Thanks. Looking at the index statistics, I see that within minutes
after running optimize that the stats say the index needs to be
reoptimized. Though, the index still reads and writes fine even in
that state.


As soon as you make any change at all to an index, it's no longer 
optimized.  Delete one document, add one document, anything.  Most of 
the time you will not see a performance increase from optimizing an 
index that consists of one large segment and a bunch of very tiny 
segments or deleted documents.



Second, the OS will use the extra memory for file buffers, which really helps 
performance, so you might not need to do anything. This will work better after you stop 
forcing merges. A forced merge replaces every file, so the OS needs to reload everything 
into file buffers.


I don't see that the memory is being used:

$ free -g
  total   used   free sharedbuffers cached
Mem:14  2 12  0  0  1
-/+ buffers/cache:  0 14
Swap:0  0  0


How big is your index, and did you run this right after a reboot?  If 
you did, then the cache will be fairly empty, and Solr has only read 
enough from the index files to open the searcher.The number is probably 
too small to show up on a gigabyte scale.  As you issue queries, the 
cached amount will get bigger.  If your index is small enough to fit in 
the 14GB of free RAM that you have, you can manually populate the disk 
cache by going to your index directory and doing 'cat *  /dev/null' 
from the commandline or a script.  The first time you do it, it may go 
slowly, but if you immediately do it again, it will complete VERY fast 
-- the data will all be in RAM.


The 'free -m' command in your first email shows cache usage of 1243MB, 
which suggests that maybe your index is considerably smaller than your 
available RAM.  Having loads of free RAM is a good thing for just about 
any workload, but especially for Solr.Try running the free command 
without the -g so you can see those numbers in kilobytes.


I have seen a tendency towards creating huge caches in Solr because 
people have lots of memory.  It's important to realize that the OS is 
far better at the overall job of caching the index files than Solr 
itself is.  Solr caches are meant to cache result sets from queries and 
filters, not large sections of the actual index contents.  Make the 
caches big enough that you see some benefit, but not big enough to suck 
up all your RAM.


If you are having warm time problems, make the autowarm counts low.  I 
have run into problems with warming on my filter cache, because we have 
filters that are extremely hairy and slow to run. I had to reduce my 
autowarm count on the filter cache to FOUR, with a cache size of 512.  
When it is 8 or higher, it can take over a minute to autowarm.


Thanks,
Shawn



Re: Occasional Solr performance issues

2012-10-22 Thread Dotan Cohen
On Tue, Oct 23, 2012 at 3:52 AM, Shawn Heisey s...@elyograg.org wrote:
 As soon as you make any change at all to an index, it's no longer
 optimized.  Delete one document, add one document, anything.  Most of the
 time you will not see a performance increase from optimizing an index that
 consists of one large segment and a bunch of very tiny segments or deleted
 documents.


I've since realized that by experimentation. I've probably saved quite
a few minutes of reading time by investing hours of experiment time!


 How big is your index, and did you run this right after a reboot?  If you
 did, then the cache will be fairly empty, and Solr has only read enough from
 the index files to open the searcher.The number is probably too small to
 show up on a gigabyte scale.  As you issue queries, the cached amount will
 get bigger.  If your index is small enough to fit in the 14GB of free RAM
 that you have, you can manually populate the disk cache by going to your
 index directory and doing 'cat *  /dev/null' from the commandline or a
 script.  The first time you do it, it may go slowly, but if you immediately
 do it again, it will complete VERY fast -- the data will all be in RAM.


The cat trick to get the files in RAM is great. I would not have
thought that would work for binary files.

The index is small, much less than the available RAM, for the time
being. Therefore, there was nothing to fill it with I now understand.
Both 'free' outputs were after the system had been running for some
time.


 The 'free -m' command in your first email shows cache usage of 1243MB, which
 suggests that maybe your index is considerably smaller than your available
 RAM.  Having loads of free RAM is a good thing for just about any workload,
 but especially for Solr.Try running the free command without the -g so you
 can see those numbers in kilobytes.

 I have seen a tendency towards creating huge caches in Solr because people
 have lots of memory.  It's important to realize that the OS is far better at
 the overall job of caching the index files than Solr itself is.  Solr caches
 are meant to cache result sets from queries and filters, not large sections
 of the actual index contents.  Make the caches big enough that you see some
 benefit, but not big enough to suck up all your RAM.


I see, thanks.


 If you are having warm time problems, make the autowarm counts low.  I have
 run into problems with warming on my filter cache, because we have filters
 that are extremely hairy and slow to run. I had to reduce my autowarm count
 on the filter cache to FOUR, with a cache size of 512.  When it is 8 or
 higher, it can take over a minute to autowarm.


I will have to experiment with the warning. Thank you for the tips.


-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Solr Performance Issues

2010-03-17 Thread Lance Norskog
?

 If you don't leave enough free memory for the OS, the OS won't have
 a
large
 enough disk cache, and you will be hitting the disk for lots of
   queries.

 You might want to monitor your Disk I/O using iostat and look at
 the
 iowait.

 If you are doing phrase queries and your *prx file is significantly
larger
 than the available memory then when a slow phrase query hits Solr,
  the
 contention for disk I/O with other queries could be slowing
  everything
 down.
 You might also want to look at the 90th and 99th percentile query
  times
in
 addition to the average. For our large indexes, we found at least
 an
order
 of magnitude difference between the average and 99th percentile
   queries.
 Again, if Solr gets hit with a few of those 99th percentile slow
   queries
 and
 your not hitting your caches, chances are you will see serious
   contention
 for disk I/O..

 Of course if you don't see any waiting on i/o, then your bottleneck
  is
 probably somewhere else:)

 See


   
  
 
 http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1
 for more background on our experience.

 Tom Burton-West
 University of Michigan Library
 www.hathitrust.org



 
  On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel 
   siddhantg...@gmail.com
  wrote:
 
   Hi everyone,
  
   I have an index corresponding to ~2.5 million documents. The
  index
size
  is
   43GB. The configuration of the machine which is running Solr is
 -
Dual
   Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB
cache,
  8GB
   RAM, and 250 GB HDD.
  
   I'm observing a strange trend in the queries that I send to
 Solr.
   The
  query
   times for queries that I send earlier is much lesser than the
   queries
I
   send
   afterwards. For instance, if I write a script to query solr
 5000
times
   (with
   5000 distinct queries, most of them containing not more than
 3-5
words)
   with
   10 threads running in parallel, the average times for queries
  goes
from
   ~50ms in the beginning to ~6000ms. Is this expected or is there
  something
   wrong with my configuration. Currently I've configured the
  queryResultCache
   and the documentCache to contain 2048 entries (hit ratios for
  both
   is
  close
   to 50%).
  
   Apart from this, a general question that I want to ask is that
 is
such
 a
   hardware enough for this scenario? I'm aiming at achieving
 around
   20
   queries
   per second with the hardware mentioned above.
  
   Thanks,
  
   Regards,
  
   --
   - Siddhant
  
 



 --
 - Siddhant



 --
 View this message in context:

  http://old.nabble.com/Solr-Performance-Issues-tp27864278p27868456.html
 Sent from the Solr - User mailing list archive at Nabble.com.


   
   
--
- Siddhant
   
  
 
 
 
  --
  - Siddhant
 




 --
 - Siddhant




-- 
Lance Norskog
goks...@gmail.com


Re: Solr Performance Issues

2010-03-12 Thread Siddhant Goel
I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS disk
caching.

I think that at any point of time, there can be a maximum of number of
threads concurrent requests, which happens to make sense btw (does it?).

As I increase the number of threads, the load average shown by top goes up
to as high as 80%. But if I keep the number of threads low (~10), the load
average never goes beyond ~8). So probably thats the number of requests I
can expect Solr to serve concurrently on this index size with this hardware.

Can anyone give a general opinion as to how much hardware should be
sufficient for a Solr deployment with an index size of ~43GB, containing
around 2.5 million documents? I'm expecting it to serve at least 20 requests
per second. Any experiences?

Thanks

On Fri, Mar 12, 2010 at 12:47 AM, Tom Burton-West tburtonw...@gmail.comwrote:


 How much of your memory are you allocating to the JVM and how much are you
 leaving free?

 If you don't leave enough free memory for the OS, the OS won't have a large
 enough disk cache, and you will be hitting the disk for lots of queries.

 You might want to monitor your Disk I/O using iostat and look at the
 iowait.

 If you are doing phrase queries and your *prx file is significantly larger
 than the available memory then when a slow phrase query hits Solr, the
 contention for disk I/O with other queries could be slowing everything
 down.
 You might also want to look at the 90th and 99th percentile query times in
 addition to the average. For our large indexes, we found at least an order
 of magnitude difference between the average and 99th percentile queries.
 Again, if Solr gets hit with a few of those 99th percentile slow queries
 and
 your not hitting your caches, chances are you will see serious contention
 for disk I/O..

 Of course if you don't see any waiting on i/o, then your bottleneck is
 probably somewhere else:)

 See

 http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1
 for more background on our experience.

 Tom Burton-West
 University of Michigan Library
 www.hathitrust.org



 
  On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel siddhantg...@gmail.com
  wrote:
 
   Hi everyone,
  
   I have an index corresponding to ~2.5 million documents. The index size
  is
   43GB. The configuration of the machine which is running Solr is - Dual
   Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache,
  8GB
   RAM, and 250 GB HDD.
  
   I'm observing a strange trend in the queries that I send to Solr. The
  query
   times for queries that I send earlier is much lesser than the queries I
   send
   afterwards. For instance, if I write a script to query solr 5000 times
   (with
   5000 distinct queries, most of them containing not more than 3-5 words)
   with
   10 threads running in parallel, the average times for queries goes from
   ~50ms in the beginning to ~6000ms. Is this expected or is there
  something
   wrong with my configuration. Currently I've configured the
  queryResultCache
   and the documentCache to contain 2048 entries (hit ratios for both is
  close
   to 50%).
  
   Apart from this, a general question that I want to ask is that is such
 a
   hardware enough for this scenario? I'm aiming at achieving around 20
   queries
   per second with the hardware mentioned above.
  
   Thanks,
  
   Regards,
  
   --
   - Siddhant
  
 



 --
 - Siddhant



 --
 View this message in context:
 http://old.nabble.com/Solr-Performance-Issues-tp27864278p27868456.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
- Siddhant


Re: Solr Performance Issues

2010-03-12 Thread Erick Erickson
You've probably already looked at this, but here goes anyway. The
first question probably should have been what are you measuring?
I've been fooled before by looking at, say, average response time
and extrapolating. You're getting 20 qps if your response time is
1 second, but you have 20 threads running simultaneously, ditto
if you're getting 2 second response time and 40 threads. So

And what is response time? It would clarify things a lot if you
broke out which parts of the operation are taking the time. Going
from memory, debugQuery=on will let you know how much time
was spent in various operations in SOLR. It's important to know
whether it was the searching, assembling the response, or
transmitting the data back to the client. If your timings are
all just how long it takes the response to get back to the
client, you could even be hammered by network latency.

How many threads does it take to peg the CPU? And what
response times are you getting when your number of threads is
around 10?

Erick

On Fri, Mar 12, 2010 at 3:39 AM, Siddhant Goel siddhantg...@gmail.comwrote:

 I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS disk
 caching.

 I think that at any point of time, there can be a maximum of number of
 threads concurrent requests, which happens to make sense btw (does it?).

 As I increase the number of threads, the load average shown by top goes up
 to as high as 80%. But if I keep the number of threads low (~10), the load
 average never goes beyond ~8). So probably thats the number of requests I
 can expect Solr to serve concurrently on this index size with this
 hardware.

 Can anyone give a general opinion as to how much hardware should be
 sufficient for a Solr deployment with an index size of ~43GB, containing
 around 2.5 million documents? I'm expecting it to serve at least 20
 requests
 per second. Any experiences?

 Thanks

 On Fri, Mar 12, 2010 at 12:47 AM, Tom Burton-West tburtonw...@gmail.com
 wrote:

 
  How much of your memory are you allocating to the JVM and how much are
 you
  leaving free?
 
  If you don't leave enough free memory for the OS, the OS won't have a
 large
  enough disk cache, and you will be hitting the disk for lots of queries.
 
  You might want to monitor your Disk I/O using iostat and look at the
  iowait.
 
  If you are doing phrase queries and your *prx file is significantly
 larger
  than the available memory then when a slow phrase query hits Solr, the
  contention for disk I/O with other queries could be slowing everything
  down.
  You might also want to look at the 90th and 99th percentile query times
 in
  addition to the average. For our large indexes, we found at least an
 order
  of magnitude difference between the average and 99th percentile queries.
  Again, if Solr gets hit with a few of those 99th percentile slow queries
  and
  your not hitting your caches, chances are you will see serious contention
  for disk I/O..
 
  Of course if you don't see any waiting on i/o, then your bottleneck is
  probably somewhere else:)
 
  See
 
 
 http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1
  for more background on our experience.
 
  Tom Burton-West
  University of Michigan Library
  www.hathitrust.org
 
 
 
  
   On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel siddhantg...@gmail.com
   wrote:
  
Hi everyone,
   
I have an index corresponding to ~2.5 million documents. The index
 size
   is
43GB. The configuration of the machine which is running Solr is -
 Dual
Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB
 cache,
   8GB
RAM, and 250 GB HDD.
   
I'm observing a strange trend in the queries that I send to Solr. The
   query
times for queries that I send earlier is much lesser than the queries
 I
send
afterwards. For instance, if I write a script to query solr 5000
 times
(with
5000 distinct queries, most of them containing not more than 3-5
 words)
with
10 threads running in parallel, the average times for queries goes
 from
~50ms in the beginning to ~6000ms. Is this expected or is there
   something
wrong with my configuration. Currently I've configured the
   queryResultCache
and the documentCache to contain 2048 entries (hit ratios for both is
   close
to 50%).
   
Apart from this, a general question that I want to ask is that is
 such
  a
hardware enough for this scenario? I'm aiming at achieving around 20
queries
per second with the hardware mentioned above.
   
Thanks,
   
Regards,
   
--
- Siddhant
   
  
 
 
 
  --
  - Siddhant
 
 
 
  --
  View this message in context:
  http://old.nabble.com/Solr-Performance-Issues-tp27864278p27868456.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 


 --
 - Siddhant



Re: Solr Performance Issues

2010-03-12 Thread Siddhant Goel
)
 with
 10 threads running in parallel, the average times for queries goes
  from
 ~50ms in the beginning to ~6000ms. Is this expected or is there
something
 wrong with my configuration. Currently I've configured the
queryResultCache
 and the documentCache to contain 2048 entries (hit ratios for both
 is
close
 to 50%).

 Apart from this, a general question that I want to ask is that is
  such
   a
 hardware enough for this scenario? I'm aiming at achieving around
 20
 queries
 per second with the hardware mentioned above.

 Thanks,

 Regards,

 --
 - Siddhant

   
  
  
  
   --
   - Siddhant
  
  
  
   --
   View this message in context:
   http://old.nabble.com/Solr-Performance-Issues-tp27864278p27868456.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
  
 
 
  --
  - Siddhant
 




-- 
- Siddhant


Solr Performance Issues

2010-03-11 Thread Siddhant Goel
Hi everyone,

I have an index corresponding to ~2.5 million documents. The index size is
43GB. The configuration of the machine which is running Solr is - Dual
Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache, 8GB
RAM, and 250 GB HDD.

I'm observing a strange trend in the queries that I send to Solr. The query
times for queries that I send earlier is much lesser than the queries I send
afterwards. For instance, if I write a script to query solr 5000 times (with
5000 distinct queries, most of them containing not more than 3-5 words) with
10 threads running in parallel, the average times for queries goes from
~50ms in the beginning to ~6000ms. Is this expected or is there something
wrong with my configuration. Currently I've configured the queryResultCache
and the documentCache to contain 2048 entries (hit ratios for both is close
to 50%).

Apart from this, a general question that I want to ask is that is such a
hardware enough for this scenario? I'm aiming at achieving around 20 queries
per second with the hardware mentioned above.

Thanks,

Regards,

-- 
- Siddhant


Re: Solr Performance Issues

2010-03-11 Thread Erick Erickson
How many outstanding queries do you have at a time? Is it possible
that when you start, you have only a few queries executing concurrently
but as your test runs you have hundreds?

This really is a question of how your load test is structured. You might
get a better sense of how it works if your tester had a limited number
of threads running so the max concurrent requests SOLR was serving
at once were capped (30, 50, whatever).

But no, I wouldn't expect SOLR to bog down the way you're describing
just because it was running for a while.

HTH
Erick

On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel siddhantg...@gmail.comwrote:

 Hi everyone,

 I have an index corresponding to ~2.5 million documents. The index size is
 43GB. The configuration of the machine which is running Solr is - Dual
 Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache, 8GB
 RAM, and 250 GB HDD.

 I'm observing a strange trend in the queries that I send to Solr. The query
 times for queries that I send earlier is much lesser than the queries I
 send
 afterwards. For instance, if I write a script to query solr 5000 times
 (with
 5000 distinct queries, most of them containing not more than 3-5 words)
 with
 10 threads running in parallel, the average times for queries goes from
 ~50ms in the beginning to ~6000ms. Is this expected or is there something
 wrong with my configuration. Currently I've configured the queryResultCache
 and the documentCache to contain 2048 entries (hit ratios for both is close
 to 50%).

 Apart from this, a general question that I want to ask is that is such a
 hardware enough for this scenario? I'm aiming at achieving around 20
 queries
 per second with the hardware mentioned above.

 Thanks,

 Regards,

 --
 - Siddhant



Re: Solr Performance Issues

2010-03-11 Thread Siddhant Goel
Hi Erick,

The way the load test works is that it picks up 5000 queries, splits them
according to the number of threads (so if we have 10 threads, it schedules
10 threads - each one sending 500 queries). So it might be possible that the
number of queries at a point later in time is greater than the number of
queries earlier in time. I'm not very sure about that though. Its a simple
Ruby script that starts up threads, calls the search function in each
thread, and then waits for each of them to exit.

How many queries per second can we expect Solr to serve, given this kind of
hardware? If what you suggest is true, then is it possible that while Solr
is serving a query, another query hits it, which increases the response time
even further? I'm not sure about it. But yes I can observe the query times
going up as I increase the number of threads.

Thanks,

Regards,

On Thu, Mar 11, 2010 at 8:30 PM, Erick Erickson erickerick...@gmail.comwrote:

 How many outstanding queries do you have at a time? Is it possible
 that when you start, you have only a few queries executing concurrently
 but as your test runs you have hundreds?

 This really is a question of how your load test is structured. You might
 get a better sense of how it works if your tester had a limited number
 of threads running so the max concurrent requests SOLR was serving
 at once were capped (30, 50, whatever).

 But no, I wouldn't expect SOLR to bog down the way you're describing
 just because it was running for a while.

 HTH
 Erick

 On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel siddhantg...@gmail.com
 wrote:

  Hi everyone,
 
  I have an index corresponding to ~2.5 million documents. The index size
 is
  43GB. The configuration of the machine which is running Solr is - Dual
  Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache,
 8GB
  RAM, and 250 GB HDD.
 
  I'm observing a strange trend in the queries that I send to Solr. The
 query
  times for queries that I send earlier is much lesser than the queries I
  send
  afterwards. For instance, if I write a script to query solr 5000 times
  (with
  5000 distinct queries, most of them containing not more than 3-5 words)
  with
  10 threads running in parallel, the average times for queries goes from
  ~50ms in the beginning to ~6000ms. Is this expected or is there something
  wrong with my configuration. Currently I've configured the
 queryResultCache
  and the documentCache to contain 2048 entries (hit ratios for both is
 close
  to 50%).
 
  Apart from this, a general question that I want to ask is that is such a
  hardware enough for this scenario? I'm aiming at achieving around 20
  queries
  per second with the hardware mentioned above.
 
  Thanks,
 
  Regards,
 
  --
  - Siddhant
 




-- 
- Siddhant


Re: Solr Performance Issues

2010-03-11 Thread Mike Malloy

I dont mean to turn this into a sales pitch, but there is a tool for Java app
performance management that you may find helpful. Its called New Relic
(www.newrelic.com) and the tool can be installed in 2 minutes. It can give
you very deep visibility inside Solr and other Java apps. (Full disclosure I
work at New Relic.)
Mike

Siddhant Goel wrote:
 
 Hi everyone,
 
 I have an index corresponding to ~2.5 million documents. The index size is
 43GB. The configuration of the machine which is running Solr is - Dual
 Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache, 8GB
 RAM, and 250 GB HDD.
 
 I'm observing a strange trend in the queries that I send to Solr. The
 query
 times for queries that I send earlier is much lesser than the queries I
 send
 afterwards. For instance, if I write a script to query solr 5000 times
 (with
 5000 distinct queries, most of them containing not more than 3-5 words)
 with
 10 threads running in parallel, the average times for queries goes from
 ~50ms in the beginning to ~6000ms. Is this expected or is there something
 wrong with my configuration. Currently I've configured the
 queryResultCache
 and the documentCache to contain 2048 entries (hit ratios for both is
 close
 to 50%).
 
 Apart from this, a general question that I want to ask is that is such a
 hardware enough for this scenario? I'm aiming at achieving around 20
 queries
 per second with the hardware mentioned above.
 
 Thanks,
 
 Regards,
 
 -- 
 - Siddhant
 
 

-- 
View this message in context: 
http://old.nabble.com/Solr-Performance-Issues-tp27864278p27872139.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr performance issues

2008-06-20 Thread Erik Hatcher


On Jun 19, 2008, at 6:28 PM, Yonik Seeley wrote:
2. I use acts_as_solr and by default they only make post  
requests, even
for /select. With that setup the response time for most queries,  
simple or
complex ones, were ranging from 150ms to 600ms, with an average of  
250ms. I

changed the select request to use get requests instead and now the
response time is down to 10ms to 60ms. Did someone seen that  
before? Why is

it doing it?


Are the get requests being cached by the ruby stuff?


No, I'm sure that the results aren't being cached by Ruby's library,  
solr-ruby, or acts_as_solr.



But even with no caching, I've seen differences with get/post on Linux
with the python client when persistent HTTP connections were in use.
I tracked it down to the POST being written in two parts, triggering
nagle's algorithm in the networking stack.


There was another post I found that mentioned this a couple of years  
ago:


http://markmail.org/message/45qflvwnakhripqp

I would welcome patches with tests that allow solr-ruby to send most  
requests with GET, and the ones that are actually sending a body  
beyond just parameters (delete, update, commit) as POST.


Erik



Re: Solr performance issues

2008-06-20 Thread Sébastien Rainville
On Fri, Jun 20, 2008 at 8:32 AM, Erik Hatcher [EMAIL PROTECTED]
wrote:


 On Jun 19, 2008, at 6:28 PM, Yonik Seeley wrote:

 2. I use acts_as_solr and by default they only make post requests, even
 for /select. With that setup the response time for most queries, simple
 or
 complex ones, were ranging from 150ms to 600ms, with an average of 250ms.
 I
 changed the select request to use get requests instead and now the
 response time is down to 10ms to 60ms. Did someone seen that before? Why
 is
 it doing it?


 Are the get requests being cached by the ruby stuff?


 No, I'm sure that the results aren't being cached by Ruby's library,
 solr-ruby, or acts_as_solr.


I confirm that the results are not cached by Ruby's library.


But even with no caching, I've seen differences with get/post on Linux
 with the python client when persistent HTTP connections were in use.
 I tracked it down to the POST being written in two parts, triggering
 nagle's algorithm in the networking stack.


 There was another post I found that mentioned this a couple of years ago:

 http://markmail.org/message/45qflvwnakhripqp

 I would welcome patches with tests that allow solr-ruby to send most
 requests with GET, and the ones that are actually sending a body beyond just
 parameters (delete, update, commit) as POST.

Erik


I made a few modifications but it still need more testing...

Sebastien


Solr performance issues

2008-06-19 Thread Sébastien Rainville
Hi,

I've been using solr for a little without worrying too much about how it
works but now it's becoming a bottleneck in my application. I have a couple
issues with it:

1. My index always gets slower and slower when commiting/optimizing for some
obscure reason. It goes from 1 second with a new index to 45 seconds with an
index with the same amount of data but used for a few days. Restarting solr
doesn't fix it. The only way I found to fix that is to delete the whole
index completely by deleting the index folder. Then when I rebuild the index
everything goes back to normal and fast... and then performance slowly
deteriorates again. So, the amount of data is not a factor because
rebuilding the index from scratch fixes the problem and I am sending
optimize once in a while... even maybe too often.

2. I use acts_as_solr and by default they only make post requests, even
for /select. With that setup the response time for most queries, simple or
complex ones, were ranging from 150ms to 600ms, with an average of 250ms. I
changed the select request to use get requests instead and now the
response time is down to 10ms to 60ms. Did someone seen that before? Why is
it doing it?

Thanks in advance,
Sebastien