subject:"Solr performance issues"

Re: Solr performance issues

2014-12-29 Thread Mahmoud Almokadem

Thanks all.

I've the same index with a bit different schema and 200M documents,
installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size
of index is about 1.5TB, have many updates every 5 minutes, complex queries
and faceting with response time of 100ms that is acceptable for us.

Toke Eskildsen,

Is the index updated while you are searching? *No*
Do you do any faceting or other heavy processing as part of a search? *No*
How many hits does a search typically have and how many documents are
returned? *The test for QTime only with no documents returned and No. of
hits varying from 50,000 to 50,000,000.*
How many concurrent searches do you need to support? How fast should the
response time be? *May be 100 concurrent searches with 100ms with facets.*

Does splitting the shard to two shards on the same node so every shard will
be on a single EBS Volume better than using LVM?

Thanks

On Mon, Dec 29, 2014 at 2:00 AM, Toke Eskildsen t...@statsbiblioteket.dk
wrote:

 Mahmoud Almokadem [prog.mahm...@gmail.com] wrote:
  We've installed a cluster of one collection of 350M documents on 3
  r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is
  about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS
  General purpose (1x1TB + 1x500GB) on each instance. Then we create
 logical
  volume using LVM of 1.5TB to fit our index.

 Your search speed will be limited by the slowest storage in your group,
 which would be your 500GB EBS. The General Purpose SSD option means (as far
 as I can read at http://aws.amazon.com/ebs/details/#piops) that your
 baseline of 3 IOPS/MB = 1500 IOPS, with bursts of 3000 IOPS. Unfortunately
 they do not say anything about latency.

 For comparison, I checked the system logs from a local test with our 21TB
 / 7 billion documents index. It used ~27,000 IOPS during the test, with
 mean search time a bit below 1 second. That was with ~100GB RAM for disk
 cache, which is about ½% of index size. The test was with simple term
 queries (1-3 terms) and some faceting. Back of the envelope: 27,000 IOPS
 for 21TB is ~1300 IOPS/TB. Your indexes are 1.1TB, so 1.1*1300 IOPS ~= 1400
 IOPS.

 All else being equal (which is never the case), getting 1-3 second
 response times for a 1.1TB index, when one link in the storage chain is
 capped at a few thousand IOPS, you are using networked storage and you have
 little RAM for caching, does not seem unrealistic. If possible, you could
 try temporarily boosting performance of the EBS, to see if raw IO is the
 bottleneck.

  The response time is about 1 and 3 seconds for simple queries (1 token).

 Is the index updated while you are searching?
 Do you do any faceting or other heavy processing as part of a search?
 How many hits does a search typically have and how many documents are
 returned?
 How many concurrent searches do you need to support? How fast should the
 response time be?

 - Toke Eskildsen

Re: Solr performance issues

2014-12-29 Thread Shawn Heisey

On 12/29/2014 2:36 AM, Mahmoud Almokadem wrote:
 I've the same index with a bit different schema and 200M documents,
 installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size
 of index is about 1.5TB, have many updates every 5 minutes, complex queries
 and faceting with response time of 100ms that is acceptable for us.
 
 Toke Eskildsen,
 
 Is the index updated while you are searching? *No*
 Do you do any faceting or other heavy processing as part of a search? *No*
 How many hits does a search typically have and how many documents are
 returned? *The test for QTime only with no documents returned and No. of
 hits varying from 50,000 to 50,000,000.*
 How many concurrent searches do you need to support? How fast should the
 response time be? *May be 100 concurrent searches with 100ms with facets.*
 
 Does splitting the shard to two shards on the same node so every shard will
 be on a single EBS Volume better than using LVM?

The basic problem is simply that the system has so little memory that it
must read large amounts of data from the disk when it does a query.
There is not enough RAM to cache the important parts of the index.  RAM
is much faster than disk, even SSD.

Typical consumer-grade DDR3-1600 memory has a data transfer rate of
about 12800 megabytes per second.  If it's ECC memory (which I would say
is a requirement) then the transfer rate is probably a little bit slower
than that.  Figuring 9 bits for every byte gets us about 11377 MB/s.
That's only an estimate, and it could be wrong in either direction, but
I'll go ahead and use it.

http://en.wikipedia.org/wiki/DDR3_SDRAM#JEDEC_standard_modules

If your SSD is SATA, the transfer rate will be limited to approximately
600MB/s -- the 6 gigabit per second transfer rate of the newest SATA
standard.  That makes memory about 18 times as fast as SATA SSD.  I saw
one PCI express SSD that claimed a transfer rate of 2900 MB/s.  Even
that is only about one fourth of the estimated speed of DDR3-1600 with
ECC.  I don't know what interface technology Amazon uses for their SSD
volumes, but I would bet on it being the cheaper version, which would
mean SATA.  The networking between the EC2 instance and the EBS storage
is unknown to me and may be a further bottleneck.

http://ocz.com/enterprise/z-drive-4500/specifications

Bottom line -- you need a lot more memory.  Speeding up the disk may
*help* ... but it will not replace that simple requirement.  With EC2 as
the platform, you may need more instances and more shards.

Your 200 million document index that works well with only 90GB of total
memory ... that's surprising to me.  That means that the important parts
of that index *do* fit in memory ... but if the index gets much larger,
performance is likely to drop off sharply.

Thanks,
Shawn

Re: Solr performance issues

2014-12-29 Thread Mahmoud Almokadem

Thanks Shawn.

What do you mean with important parts of index? and how to calculate their 
size?

Thanks,
Mahmoud

Sent from my iPhone

 On Dec 29, 2014, at 8:19 PM, Shawn Heisey apa...@elyograg.org wrote:
 
 On 12/29/2014 2:36 AM, Mahmoud Almokadem wrote:
 I've the same index with a bit different schema and 200M documents,
 installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size
 of index is about 1.5TB, have many updates every 5 minutes, complex queries
 and faceting with response time of 100ms that is acceptable for us.
 
 Toke Eskildsen,
 
 Is the index updated while you are searching? *No*
 Do you do any faceting or other heavy processing as part of a search? *No*
 How many hits does a search typically have and how many documents are
 returned? *The test for QTime only with no documents returned and No. of
 hits varying from 50,000 to 50,000,000.*
 How many concurrent searches do you need to support? How fast should the
 response time be? *May be 100 concurrent searches with 100ms with facets.*
 
 Does splitting the shard to two shards on the same node so every shard will
 be on a single EBS Volume better than using LVM?
 
 The basic problem is simply that the system has so little memory that it
 must read large amounts of data from the disk when it does a query.
 There is not enough RAM to cache the important parts of the index.  RAM
 is much faster than disk, even SSD.
 
 Typical consumer-grade DDR3-1600 memory has a data transfer rate of
 about 12800 megabytes per second.  If it's ECC memory (which I would say
 is a requirement) then the transfer rate is probably a little bit slower
 than that.  Figuring 9 bits for every byte gets us about 11377 MB/s.
 That's only an estimate, and it could be wrong in either direction, but
 I'll go ahead and use it.
 
 http://en.wikipedia.org/wiki/DDR3_SDRAM#JEDEC_standard_modules
 
 If your SSD is SATA, the transfer rate will be limited to approximately
 600MB/s -- the 6 gigabit per second transfer rate of the newest SATA
 standard.  That makes memory about 18 times as fast as SATA SSD.  I saw
 one PCI express SSD that claimed a transfer rate of 2900 MB/s.  Even
 that is only about one fourth of the estimated speed of DDR3-1600 with
 ECC.  I don't know what interface technology Amazon uses for their SSD
 volumes, but I would bet on it being the cheaper version, which would
 mean SATA.  The networking between the EC2 instance and the EBS storage
 is unknown to me and may be a further bottleneck.
 
 http://ocz.com/enterprise/z-drive-4500/specifications
 
 Bottom line -- you need a lot more memory.  Speeding up the disk may
 *help* ... but it will not replace that simple requirement.  With EC2 as
 the platform, you may need more instances and more shards.
 
 Your 200 million document index that works well with only 90GB of total
 memory ... that's surprising to me.  That means that the important parts
 of that index *do* fit in memory ... but if the index gets much larger,
 performance is likely to drop off sharply.
 
 Thanks,
 Shawn

Re: Solr performance issues

2014-12-29 Thread Shawn Heisey

On 12/29/2014 12:07 PM, Mahmoud Almokadem wrote:
What do you mean with important parts of index? and how to calculate their
size?

I have no formal education in what's important when it comes to doing a
query, but I can make some educated guesses.

Starting with this as a reference:

http://lucene.apache.org/core/4_10_0/core/org/apache/lucene/codecs/lucene410/package-summary.html#file-names

I would guess that the segment info (.si) files and the term index
(*.tip) files would be supremely important to *always* have in memory,
and they are fairly small. Next would be the term dictionary (*.tim)
files. The term dictionary is pretty big, and would be very important
for fast queries.

Frequencies, positions, and norms may also be important, depending on
exactly what kind of query you have. Frequencies and positions are
quite large. Frequencies are critical for relevence ranking (the
default sort by score), and positions are important for phrase queries.
Position data may also be used by relevance ranking, but I am not
familiar enough with it to say for sure.

If you have docvalues defined, then *.dvm and *.dvd files would be used
for facets and sorting on those specific fields. The *.dvd files can be
very big, depending on your schema.

The *.fdx and *.fdt files become important when actually retrieving
results after the matching documents have been determined. The stored
data is compressed, so additional CPU power is required to uncompress
that data before it is sent to the client. Stored data may be large or
small, depending on your schema. Stored data does not directly affect
search speed, but if memory space is limited, every block of stored data
that gets retrieved will result in some other part of the index being
removed from the OS disk cache, which means that it might need to be
re-read from the disk on the next query.

Thanks,
Shawn

RE: Solr performance issues

2014-12-29 Thread Toke Eskildsen

Mahmoud Almokadem [prog.mahm...@gmail.com] wrote:
 I've the same index with a bit different schema and 200M documents,
 installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size
 of index is about 1.5TB, have many updates every 5 minutes, complex queries
 and faceting with response time of 100ms that is acceptable for us.

So you have
Setup 1: 3 * (30GB RAM + 600GB SSD) for a total of 1.5TB index 200M docs. 
Acceptable performance.
Setup 2: 3 * (60GB RAM + 1TB SSD + 500GB SSD) for a total of 3.3TB 350M docs. 
Poor performance.

The only real difference, besides doubling everything, is the LVM? I understand 
why you find that to be the culprit, but from what I can read, the overhead 
should not be anywhere near enough to result in the performance drop you are 
describing. Could it be that some snapshotting or backup was running when you 
tested?

Splitting your shards and doubling the number of machines, as you suggest, 
would result in
Setup 3: 6 * (60GB RAM + 600GB SSD) for a total of 3.3TB 350M docs.
which would be remarkable similar to your setup 1. I think that would be the 
next logical step, unless you can easily do a temporary boost of your IOPS.

BTW: You are getting dangerously close to your storage limits here - it seems 
that a single large merge could make you run out of space.

- Toke Eskildsen

Re: Solr performance issues

2014-12-28 Thread Shawn Heisey

On 12/26/2014 7:17 AM, Mahmoud Almokadem wrote:
 We've installed a cluster of one collection of 350M documents on 3
 r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is
 about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS
 General purpose (1x1TB + 1x500GB) on each instance. Then we create logical
 volume using LVM of 1.5TB to fit our index.
 
 The response time is about 1 and 3 seconds for simple queries (1 token).
 
 Is the LVM become a bottleneck for our index?

SSD is very fast, but its speed is very slow when compared to RAM.  The
problem here is that Solr must read data off the disk in order to do a
query, and even at SSD speeds, that is slow.  LVM is not the problem
here, though it's possible that it may be a contributing factor.  You
need more RAM.

For Solr to be fast, a large percentage (ideally 100%, but smaller
fractions can often be enough) of the index must be loaded into unused
RAM by the operating system.  Your information seems to indicate that
the index is about 3 terabytes.  If that's the index size, I would guess
that you would need somewhere between 1 and 2 terabytes of total RAM for
speed to be acceptable.  Because RAM is *very* expensive on Amazon and
is not available in sizes like 256GB-1TB, that typically means a lot of
their virtual machines, with a lot of shards in SolrCloud.  You may find
that real hardware is less expensive for very large Solr indexes in the
long term than cloud hardware.

Thanks,
Shawn

RE: Solr performance issues

2014-12-28 Thread Toke Eskildsen

Mahmoud Almokadem [prog.mahm...@gmail.com] wrote:
 We've installed a cluster of one collection of 350M documents on 3
 r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is
 about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS
 General purpose (1x1TB + 1x500GB) on each instance. Then we create logical
 volume using LVM of 1.5TB to fit our index.

Your search speed will be limited by the slowest storage in your group, which 
would be your 500GB EBS. The General Purpose SSD option means (as far as I can 
read at http://aws.amazon.com/ebs/details/#piops) that your baseline of 3 
IOPS/MB = 1500 IOPS, with bursts of 3000 IOPS. Unfortunately they do not say 
anything about latency.

For comparison, I checked the system logs from a local test with our 21TB / 7 
billion documents index. It used ~27,000 IOPS during the test, with mean search 
time a bit below 1 second. That was with ~100GB RAM for disk cache, which is 
about ½% of index size. The test was with simple term queries (1-3 terms) and 
some faceting. Back of the envelope: 27,000 IOPS for 21TB is ~1300 IOPS/TB. 
Your indexes are 1.1TB, so 1.1*1300 IOPS ~= 1400 IOPS.

All else being equal (which is never the case), getting 1-3 second response 
times for a 1.1TB index, when one link in the storage chain is capped at a few 
thousand IOPS, you are using networked storage and you have little RAM for 
caching, does not seem unrealistic. If possible, you could try temporarily 
boosting performance of the EBS, to see if raw IO is the bottleneck.

 The response time is about 1 and 3 seconds for simple queries (1 token).

Is the index updated while you are searching?
Do you do any faceting or other heavy processing as part of a search?
How many hits does a search typically have and how many documents are returned?
How many concurrent searches do you need to support? How fast should the 
response time be?

- Toke Eskildsen

Solr performance issues

2014-12-26 Thread Mahmoud Almokadem

Dears,

We've installed a cluster of one collection of 350M documents on 3
r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is
about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS
General purpose (1x1TB + 1x500GB) on each instance. Then we create logical
volume using LVM of 1.5TB to fit our index.

The response time is about 1 and 3 seconds for simple queries (1 token).

Is the LVM become a bottleneck for our index?

Thanks for help.

Re: Solr performance issues

2014-12-26 Thread Otis Gospodnetic

Likely lots of disk + network IO, yes. Put SPM for Solr on your nodes to double 
check.

 Otis

 On Dec 26, 2014, at 09:17, Mahmoud Almokadem prog.mahm...@gmail.com wrote:
 
 Dears,
 
 We've installed a cluster of one collection of 350M documents on 3
 r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is
 about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS
 General purpose (1x1TB + 1x500GB) on each instance. Then we create logical
 volume using LVM of 1.5TB to fit our index.
 
 The response time is about 1 and 3 seconds for simple queries (1 token).
 
 Is the LVM become a bottleneck for our index?
 
 Thanks for help.

Solr performance issues for simple query - q=: with start and rows

2013-04-29 Thread Abhishek Sanoujam

We have a solr core with about 115 million documents. We are trying to 
migrate data and running a simple query with *:* query and with start 
and rows param.
The performance is becoming too slow in solr, its taking almost 2 mins 
to get 4000 rows and migration is being just too slow. Logs snippet below:


INFO: [coreName] webapp=/solr path=/select 
params={start=55438000q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=168308
INFO: [coreName] webapp=/solr path=/select 
params={start=55446000q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=122771
INFO: [coreName] webapp=/solr path=/select 
params={start=55454000q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=137615
INFO: [coreName] webapp=/solr path=/select 
params={start=5545q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=141223
INFO: [coreName] webapp=/solr path=/select 
params={start=55462000q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=97474
INFO: [coreName] webapp=/solr path=/select 
params={start=55458000q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=98115
INFO: [coreName] webapp=/solr path=/select 
params={start=55466000q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=143822
INFO: [coreName] webapp=/solr path=/select 
params={start=55474000q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=118066
INFO: [coreName] webapp=/solr path=/select 
params={start=5547q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=121498
INFO: [coreName] webapp=/solr path=/select 
params={start=55482000q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=164062
INFO: [coreName] webapp=/solr path=/select 
params={start=55478000q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=165518
INFO: [coreName] webapp=/solr path=/select 
params={start=55486000q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=118163
INFO: [coreName] webapp=/solr path=/select 
params={start=55494000q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=141642
INFO: [coreName] webapp=/solr path=/select 
params={start=5549q=*:*wt=javabinversion=2rows=4000} 
hits=115760479 status=0 QTime=145037



I've taken some thread dumps in the solr server and most of the time the 
threads seem to be busy in the following stacks mostly:
Is there anything that can be done to improve the performance? Is it a 
known issue? Its very surprising that querying for some just rows 
starting at some points is taking in order of minutes.



395883378@qtp-162198005-7 prio=10 tid=0x7f4aa0636000 nid=0x295a 
runnable [0x7f42865dd000]

   java.lang.Thread.State: RUNNABLE
at 
org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252)

at org.apache.lucene.util.PriorityQueue.pop(PriorityQueue.java:184)
at 
org.apache.lucene.search.TopDocsCollector.populateResults(TopDocsCollector.java:61)
at 
org.apache.lucene.search.TopDocsCollector.topDocs(TopDocsCollector.java:156)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1499)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)



1154127582@qtp-162198005-3 prio=10 tid=0x7f4aa0613800 nid=0x2956 
runnable [0x7f42869e1000]

   java.lang.Thread.State: RUNNABLE
at 
org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252)
at 
org.apache.lucene.util.PriorityQueue.updateTop(PriorityQueue.java:210)
at 
org.apache.lucene.search.TopScoreDocCollector$InOrderTopScoreDocCollector.collect(TopScoreDocCollector.java:62)

at org.apache.lucene.search.Scorer.score(Scorer.java:64)
at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:605)
at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1491)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457)
at

Re: Solr performance issues for simple query - q=: with start and rows

2013-04-29 Thread Jan Høydahl

Hi,

How many shards do you have? This is a known issue with deep paging with multi 
shard, see https://issues.apache.org/jira/browse/SOLR-1726

You may be more successful in going to each shard, one at a time (with 
distrib=false) to avoid this issue.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam abhi.sanou...@gmail.com:

 We have a solr core with about 115 million documents. We are trying to 
 migrate data and running a simple query with *:* query and with start and 
 rows param.
 The performance is becoming too slow in solr, its taking almost 2 mins to get 
 4000 rows and migration is being just too slow. Logs snippet below:
 
 INFO: [coreName] webapp=/solr path=/select 
 params={start=55438000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=168308
 INFO: [coreName] webapp=/solr path=/select 
 params={start=55446000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=122771
 INFO: [coreName] webapp=/solr path=/select 
 params={start=55454000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=137615
 INFO: [coreName] webapp=/solr path=/select 
 params={start=5545q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=141223
 INFO: [coreName] webapp=/solr path=/select 
 params={start=55462000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=97474
 INFO: [coreName] webapp=/solr path=/select 
 params={start=55458000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=98115
 INFO: [coreName] webapp=/solr path=/select 
 params={start=55466000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=143822
 INFO: [coreName] webapp=/solr path=/select 
 params={start=55474000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=118066
 INFO: [coreName] webapp=/solr path=/select 
 params={start=5547q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=121498
 INFO: [coreName] webapp=/solr path=/select 
 params={start=55482000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=164062
 INFO: [coreName] webapp=/solr path=/select 
 params={start=55478000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=165518
 INFO: [coreName] webapp=/solr path=/select 
 params={start=55486000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=118163
 INFO: [coreName] webapp=/solr path=/select 
 params={start=55494000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=141642
 INFO: [coreName] webapp=/solr path=/select 
 params={start=5549q=*:*wt=javabinversion=2rows=4000} hits=115760479 
 status=0 QTime=145037
 
 
 I've taken some thread dumps in the solr server and most of the time the 
 threads seem to be busy in the following stacks mostly:
 Is there anything that can be done to improve the performance? Is it a known 
 issue? Its very surprising that querying for some just rows starting at some 
 points is taking in order of minutes.
 
 
 395883378@qtp-162198005-7 prio=10 tid=0x7f4aa0636000 nid=0x295a 
 runnable [0x7f42865dd000]
   java.lang.Thread.State: RUNNABLE
at 
 org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252)
at org.apache.lucene.util.PriorityQueue.pop(PriorityQueue.java:184)
at 
 org.apache.lucene.search.TopDocsCollector.populateResults(TopDocsCollector.java:61)
at 
 org.apache.lucene.search.TopDocsCollector.topDocs(TopDocsCollector.java:156)
at 
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1499)
at 
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366)
at 
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457)
at 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410)
at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
 
 
 1154127582@qtp-162198005-3 prio=10 tid=0x7f4aa0613800 nid=0x2956 
 runnable [0x7f42869e1000]
   java.lang.Thread.State: RUNNABLE
at 
 org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252)
at 
 org.apache.lucene.util.PriorityQueue.updateTop(PriorityQueue.java:210)
at 
 org.apache.lucene.search.TopScoreDocCollector$InOrderTopScoreDocCollector.collect(TopScoreDocCollector.java:62)
at org.apache.lucene.search.Scorer.score(Scorer.java:64)
at

Re: Solr performance issues for simple query - q=: with start and rows

2013-04-29 Thread Dmitry Kan

Jan,

Would the same distrib=false help for distributed faceting? We are running
into a similar issue with facet paging.

Dmitry



On Mon, Apr 29, 2013 at 11:58 AM, Jan Høydahl jan@cominvent.com wrote:

 Hi,

 How many shards do you have? This is a known issue with deep paging with
 multi shard, see https://issues.apache.org/jira/browse/SOLR-1726

 You may be more successful in going to each shard, one at a time (with
 distrib=false) to avoid this issue.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam abhi.sanou...@gmail.com:

  We have a solr core with about 115 million documents. We are trying to
 migrate data and running a simple query with *:* query and with start and
 rows param.
  The performance is becoming too slow in solr, its taking almost 2 mins
 to get 4000 rows and migration is being just too slow. Logs snippet below:
 
  INFO: [coreName] webapp=/solr path=/select
 params={start=55438000q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=168308
  INFO: [coreName] webapp=/solr path=/select
 params={start=55446000q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=122771
  INFO: [coreName] webapp=/solr path=/select
 params={start=55454000q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=137615
  INFO: [coreName] webapp=/solr path=/select
 params={start=5545q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=141223
  INFO: [coreName] webapp=/solr path=/select
 params={start=55462000q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=97474
  INFO: [coreName] webapp=/solr path=/select
 params={start=55458000q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=98115
  INFO: [coreName] webapp=/solr path=/select
 params={start=55466000q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=143822
  INFO: [coreName] webapp=/solr path=/select
 params={start=55474000q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=118066
  INFO: [coreName] webapp=/solr path=/select
 params={start=5547q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=121498
  INFO: [coreName] webapp=/solr path=/select
 params={start=55482000q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=164062
  INFO: [coreName] webapp=/solr path=/select
 params={start=55478000q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=165518
  INFO: [coreName] webapp=/solr path=/select
 params={start=55486000q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=118163
  INFO: [coreName] webapp=/solr path=/select
 params={start=55494000q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=141642
  INFO: [coreName] webapp=/solr path=/select
 params={start=5549q=*:*wt=javabinversion=2rows=4000} hits=115760479
 status=0 QTime=145037
 
 
  I've taken some thread dumps in the solr server and most of the time the
 threads seem to be busy in the following stacks mostly:
  Is there anything that can be done to improve the performance? Is it a
 known issue? Its very surprising that querying for some just rows starting
 at some points is taking in order of minutes.
 
 
  395883378@qtp-162198005-7 prio=10 tid=0x7f4aa0636000 nid=0x295a
 runnable [0x7f42865dd000]
java.lang.Thread.State: RUNNABLE
 at
 org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252)
 at
 org.apache.lucene.util.PriorityQueue.pop(PriorityQueue.java:184)
 at
 org.apache.lucene.search.TopDocsCollector.populateResults(TopDocsCollector.java:61)
 at
 org.apache.lucene.search.TopDocsCollector.topDocs(TopDocsCollector.java:156)
 at
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1499)
 at
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366)
 at
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457)
 at
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410)
 at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
 
 
  1154127582@qtp-162198005-3 prio=10 tid=0x7f4aa0613800 nid=0x2956
 runnable [0x7f42869e1000]
java.lang.Thread.State: RUNNABLE
 at
 org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252)
 at

Re: Solr performance issues for simple query - q=: with start and rows

2013-04-29 Thread Abhishek Sanoujam

We have a single shard, and all the data is in a single box only. 
Definitely looks like deep-paging is having problems.


Just to understand, is the searcher looping over the result set 
everytime and skipping the first start count? This will definitely 
take a toll when we reach higher start values.




On 4/29/13 2:28 PM, Jan Høydahl wrote:

Hi,

How many shards do you have? This is a known issue with deep paging with multi 
shard, see https://issues.apache.org/jira/browse/SOLR-1726

You may be more successful in going to each shard, one at a time (with 
distrib=false) to avoid this issue.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam abhi.sanou...@gmail.com:


We have a solr core with about 115 million documents. We are trying to migrate 
data and running a simple query with *:* query and with start and rows param.
The performance is becoming too slow in solr, its taking almost 2 mins to get 
4000 rows and migration is being just too slow. Logs snippet below:

INFO: [coreName] webapp=/solr path=/select 
params={start=55438000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=168308
INFO: [coreName] webapp=/solr path=/select 
params={start=55446000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=122771
INFO: [coreName] webapp=/solr path=/select 
params={start=55454000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=137615
INFO: [coreName] webapp=/solr path=/select 
params={start=5545q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=141223
INFO: [coreName] webapp=/solr path=/select 
params={start=55462000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=97474
INFO: [coreName] webapp=/solr path=/select 
params={start=55458000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=98115
INFO: [coreName] webapp=/solr path=/select 
params={start=55466000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=143822
INFO: [coreName] webapp=/solr path=/select 
params={start=55474000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=118066
INFO: [coreName] webapp=/solr path=/select 
params={start=5547q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=121498
INFO: [coreName] webapp=/solr path=/select 
params={start=55482000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=164062
INFO: [coreName] webapp=/solr path=/select 
params={start=55478000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=165518
INFO: [coreName] webapp=/solr path=/select 
params={start=55486000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=118163
INFO: [coreName] webapp=/solr path=/select 
params={start=55494000q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=141642
INFO: [coreName] webapp=/solr path=/select 
params={start=5549q=*:*wt=javabinversion=2rows=4000} hits=115760479 
status=0 QTime=145037


I've taken some thread dumps in the solr server and most of the time the 
threads seem to be busy in the following stacks mostly:
Is there anything that can be done to improve the performance? Is it a known 
issue? Its very surprising that querying for some just rows starting at some 
points is taking in order of minutes.


395883378@qtp-162198005-7 prio=10 tid=0x7f4aa0636000 nid=0x295a runnable 
[0x7f42865dd000]
   java.lang.Thread.State: RUNNABLE
at org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252)
at org.apache.lucene.util.PriorityQueue.pop(PriorityQueue.java:184)
at 
org.apache.lucene.search.TopDocsCollector.populateResults(TopDocsCollector.java:61)
at 
org.apache.lucene.search.TopDocsCollector.topDocs(TopDocsCollector.java:156)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1499)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)


1154127582@qtp-162198005-3 prio=10 tid=0x7f4aa0613800 nid=0x2956 runnable 
[0x7f42869e1000]
   java.lang.Thread.State: RUNNABLE
at org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252)
at

Re: Solr performance issues for simple query - q=: with start and rows

2013-04-29 Thread Dmitry Kan

Abhishek,

There is a wiki regarding this:

http://wiki.apache.org/solr/CommonQueryParameters

search pageDoc and pageScore.


On Mon, Apr 29, 2013 at 1:17 PM, Abhishek Sanoujam
abhi.sanou...@gmail.comwrote:

 We have a single shard, and all the data is in a single box only.
 Definitely looks like deep-paging is having problems.

 Just to understand, is the searcher looping over the result set everytime
 and skipping the first start count? This will definitely take a toll when
 we reach higher start values.




 On 4/29/13 2:28 PM, Jan Høydahl wrote:

 Hi,

 How many shards do you have? This is a known issue with deep paging with
 multi shard, see 
 https://issues.apache.org/**jira/browse/SOLR-1726https://issues.apache.org/jira/browse/SOLR-1726

 You may be more successful in going to each shard, one at a time (with
 distrib=false) to avoid this issue.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam abhi.sanou...@gmail.com
 :

  We have a solr core with about 115 million documents. We are trying to
 migrate data and running a simple query with *:* query and with start and
 rows param.
 The performance is becoming too slow in solr, its taking almost 2 mins
 to get 4000 rows and migration is being just too slow. Logs snippet below:

 INFO: [coreName] webapp=/solr path=/select params={start=55438000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=168308
 INFO: [coreName] webapp=/solr path=/select params={start=55446000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=122771
 INFO: [coreName] webapp=/solr path=/select params={start=55454000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=137615
 INFO: [coreName] webapp=/solr path=/select params={start=5545q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=141223
 INFO: [coreName] webapp=/solr path=/select params={start=55462000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=97474
 INFO: [coreName] webapp=/solr path=/select params={start=55458000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=98115
 INFO: [coreName] webapp=/solr path=/select params={start=55466000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=143822
 INFO: [coreName] webapp=/solr path=/select params={start=55474000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=118066
 INFO: [coreName] webapp=/solr path=/select params={start=5547q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=121498
 INFO: [coreName] webapp=/solr path=/select params={start=55482000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=164062
 INFO: [coreName] webapp=/solr path=/select params={start=55478000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=165518
 INFO: [coreName] webapp=/solr path=/select params={start=55486000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=118163
 INFO: [coreName] webapp=/solr path=/select params={start=55494000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=141642
 INFO: [coreName] webapp=/solr path=/select params={start=5549q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=145037


 I've taken some thread dumps in the solr server and most of the time the
 threads seem to be busy in the following stacks mostly:
 Is there anything that can be done to improve the performance? Is it a
 known issue? Its very surprising that querying for some just rows starting
 at some points is taking in order of minutes.


 395883378@qtp-162198005-7 prio=10 tid=0x7f4aa0636000 nid=0x295a
 runnable [0x7f42865dd000]
java.lang.Thread.State: RUNNABLE
 at org.apache.lucene.util.**PriorityQueue.downHeap(**
 PriorityQueue.java:252)
 at org.apache.lucene.util.**PriorityQueue.pop(**
 PriorityQueue.java:184)
 at org.apache.lucene.search.**TopDocsCollector.**
 populateResults(**TopDocsCollector.java:61)
 at org.apache.lucene.search.**TopDocsCollector.topDocs(**
 TopDocsCollector.java:156)
 at org.apache.solr.search.**SolrIndexSearcher.**getDocListNC(**
 SolrIndexSearcher.java:1499)
 at org.apache.solr.search.**SolrIndexSearcher.getDocListC(**
 SolrIndexSearcher.java:1366)
 at org.apache.solr.search.**SolrIndexSearcher.search(**
 SolrIndexSearcher.java:457)
 at org.apache.solr.handler.**component.QueryComponent.**
 process(QueryComponent.java:**410)
 at org.apache.solr.handler.**component.SearchHandler.**
 handleRequestBody(**SearchHandler.java:208)
 at org.apache.solr.handler.**RequestHandlerBase.**handleRequest(
 **RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.**execute(SolrCore.java:1817)
 at org.apache.solr.servlet.**SolrDispatchFilter.execute(**

Re: Solr performance issues for simple query - q=: with start and rows

2013-04-29 Thread Michael Della Bitta

We've found that you can do a lot for yourself by using a filter query
to page through your data if it has a natural range to do so instead
of start and rows.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Mon, Apr 29, 2013 at 6:44 AM, Dmitry Kan solrexp...@gmail.com wrote:
 Abhishek,

 There is a wiki regarding this:

 http://wiki.apache.org/solr/CommonQueryParameters

 search pageDoc and pageScore.


 On Mon, Apr 29, 2013 at 1:17 PM, Abhishek Sanoujam
 abhi.sanou...@gmail.comwrote:

 We have a single shard, and all the data is in a single box only.
 Definitely looks like deep-paging is having problems.

 Just to understand, is the searcher looping over the result set everytime
 and skipping the first start count? This will definitely take a toll when
 we reach higher start values.




 On 4/29/13 2:28 PM, Jan Høydahl wrote:

 Hi,

 How many shards do you have? This is a known issue with deep paging with
 multi shard, see 
 https://issues.apache.org/**jira/browse/SOLR-1726https://issues.apache.org/jira/browse/SOLR-1726

 You may be more successful in going to each shard, one at a time (with
 distrib=false) to avoid this issue.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam abhi.sanou...@gmail.com
 :

  We have a solr core with about 115 million documents. We are trying to
 migrate data and running a simple query with *:* query and with start and
 rows param.
 The performance is becoming too slow in solr, its taking almost 2 mins
 to get 4000 rows and migration is being just too slow. Logs snippet below:

 INFO: [coreName] webapp=/solr path=/select params={start=55438000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=168308
 INFO: [coreName] webapp=/solr path=/select params={start=55446000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=122771
 INFO: [coreName] webapp=/solr path=/select params={start=55454000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=137615
 INFO: [coreName] webapp=/solr path=/select params={start=5545q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=141223
 INFO: [coreName] webapp=/solr path=/select params={start=55462000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=97474
 INFO: [coreName] webapp=/solr path=/select params={start=55458000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=98115
 INFO: [coreName] webapp=/solr path=/select params={start=55466000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=143822
 INFO: [coreName] webapp=/solr path=/select params={start=55474000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=118066
 INFO: [coreName] webapp=/solr path=/select params={start=5547q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=121498
 INFO: [coreName] webapp=/solr path=/select params={start=55482000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=164062
 INFO: [coreName] webapp=/solr path=/select params={start=55478000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=165518
 INFO: [coreName] webapp=/solr path=/select params={start=55486000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=118163
 INFO: [coreName] webapp=/solr path=/select params={start=55494000q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=141642
 INFO: [coreName] webapp=/solr path=/select params={start=5549q=*:*
 **wt=javabinversion=2rows=**4000} hits=115760479 status=0 QTime=145037


 I've taken some thread dumps in the solr server and most of the time the
 threads seem to be busy in the following stacks mostly:
 Is there anything that can be done to improve the performance? Is it a
 known issue? Its very surprising that querying for some just rows starting
 at some points is taking in order of minutes.


 395883378@qtp-162198005-7 prio=10 tid=0x7f4aa0636000 nid=0x295a
 runnable [0x7f42865dd000]
java.lang.Thread.State: RUNNABLE
 at org.apache.lucene.util.**PriorityQueue.downHeap(**
 PriorityQueue.java:252)
 at org.apache.lucene.util.**PriorityQueue.pop(**
 PriorityQueue.java:184)
 at org.apache.lucene.search.**TopDocsCollector.**
 populateResults(**TopDocsCollector.java:61)
 at org.apache.lucene.search.**TopDocsCollector.topDocs(**
 TopDocsCollector.java:156)
 at org.apache.solr.search.**SolrIndexSearcher.**getDocListNC(**
 SolrIndexSearcher.java:1499)
 at org.apache.solr.search.**SolrIndexSearcher.getDocListC(**
 SolrIndexSearcher.java:1366)
 at org.apache.solr.search.**SolrIndexSearcher.search(**
 SolrIndexSearcher.java:457)
 at

Re: Solr performance issues for simple query - q=: with start and rows

2013-04-29 Thread Michael Della Bitta

I guess so, you'd have to use a filter query to page through the set
of documents you were faceting against and sum them all at the end.
It's not quite the same operation as paging through results, because
facets are aggregate statistics, but if you're willing to go through
the trouble, I bet it would also help performance.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Mon, Apr 29, 2013 at 9:06 AM, Dmitry Kan solrexp...@gmail.com wrote:
 Michael,

 Interesting! Do (Can) you apply this to facet searches as well?

 Dmitry


 On Mon, Apr 29, 2013 at 4:02 PM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:

 We've found that you can do a lot for yourself by using a filter query
 to page through your data if it has a natural range to do so instead
 of start and rows.

 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Mon, Apr 29, 2013 at 6:44 AM, Dmitry Kan solrexp...@gmail.com wrote:
  Abhishek,
 
  There is a wiki regarding this:
 
  http://wiki.apache.org/solr/CommonQueryParameters
 
  search pageDoc and pageScore.
 
 
  On Mon, Apr 29, 2013 at 1:17 PM, Abhishek Sanoujam
  abhi.sanou...@gmail.comwrote:
 
  We have a single shard, and all the data is in a single box only.
  Definitely looks like deep-paging is having problems.
 
  Just to understand, is the searcher looping over the result set
 everytime
  and skipping the first start count? This will definitely take a toll
 when
  we reach higher start values.
 
 
 
 
  On 4/29/13 2:28 PM, Jan Høydahl wrote:
 
  Hi,
 
  How many shards do you have? This is a known issue with deep paging
 with
  multi shard, see https://issues.apache.org/**jira/browse/SOLR-1726
 https://issues.apache.org/jira/browse/SOLR-1726
 
  You may be more successful in going to each shard, one at a time (with
  distrib=false) to avoid this issue.
 
  --
  Jan Høydahl, search solution architect
  Cominvent AS - www.cominvent.com
  Solr Training - www.solrtraining.com
 
  29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam 
 abhi.sanou...@gmail.com
  :
 
   We have a solr core with about 115 million documents. We are trying to
  migrate data and running a simple query with *:* query and with start
 and
  rows param.
  The performance is becoming too slow in solr, its taking almost 2 mins
  to get 4000 rows and migration is being just too slow. Logs snippet
 below:
 
  INFO: [coreName] webapp=/solr path=/select
 params={start=55438000q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=168308
  INFO: [coreName] webapp=/solr path=/select
 params={start=55446000q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=122771
  INFO: [coreName] webapp=/solr path=/select
 params={start=55454000q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=137615
  INFO: [coreName] webapp=/solr path=/select
 params={start=5545q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=141223
  INFO: [coreName] webapp=/solr path=/select
 params={start=55462000q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=97474
  INFO: [coreName] webapp=/solr path=/select
 params={start=55458000q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=98115
  INFO: [coreName] webapp=/solr path=/select
 params={start=55466000q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=143822
  INFO: [coreName] webapp=/solr path=/select
 params={start=55474000q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=118066
  INFO: [coreName] webapp=/solr path=/select
 params={start=5547q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=121498
  INFO: [coreName] webapp=/solr path=/select
 params={start=55482000q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=164062
  INFO: [coreName] webapp=/solr path=/select
 params={start=55478000q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=165518
  INFO: [coreName] webapp=/solr path=/select
 params={start=55486000q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=118163
  INFO: [coreName] webapp=/solr path=/select
 params={start=55494000q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=141642
  INFO: [coreName] webapp=/solr path=/select
 params={start=5549q=*:*
  **wt=javabinversion=2rows=**4000} hits=115760479 status=0
 QTime=145037
 
 
  I've taken some thread dumps in the solr server and most of the time
 the
  threads seem to be busy in the following stacks mostly:
  Is there anything that can be done to improve the performance? Is it a
  known issue? Its very surprising that querying for some just rows
 starting
  at some

Re: Solr performance issues for simple query - q=: with start and rows

2013-04-29 Thread Dmitry Kan

Thanks.

Only question is how to smoothly transition to this model. Our facet
(string) fields contain timestamp prefixes, that are reverse ordered
starting from the freshest value. In theory, we could try computing the
filter queries for those. But before doing so, we would need the matched
ids from solr, so it becomes at least 2 pass algorithm?

The biggest concern in general we have with the paging is that the system
seems to pass way more data back and forth, than is needed for computing
the values.


On Mon, Apr 29, 2013 at 4:14 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 I guess so, you'd have to use a filter query to page through the set
 of documents you were faceting against and sum them all at the end.
 It's not quite the same operation as paging through results, because
 facets are aggregate statistics, but if you're willing to go through
 the trouble, I bet it would also help performance.

 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Mon, Apr 29, 2013 at 9:06 AM, Dmitry Kan solrexp...@gmail.com wrote:
  Michael,
 
  Interesting! Do (Can) you apply this to facet searches as well?
 
  Dmitry
 
 
  On Mon, Apr 29, 2013 at 4:02 PM, Michael Della Bitta 
  michael.della.bi...@appinions.com wrote:
 
  We've found that you can do a lot for yourself by using a filter query
  to page through your data if it has a natural range to do so instead
  of start and rows.
 
  Michael Della Bitta
 
  
  Appinions
  18 East 41st Street, 2nd Floor
  New York, NY 10017-6271
 
  www.appinions.com
 
  Where Influence Isn’t a Game
 
 
  On Mon, Apr 29, 2013 at 6:44 AM, Dmitry Kan solrexp...@gmail.com
 wrote:
   Abhishek,
  
   There is a wiki regarding this:
  
   http://wiki.apache.org/solr/CommonQueryParameters
  
   search pageDoc and pageScore.
  
  
   On Mon, Apr 29, 2013 at 1:17 PM, Abhishek Sanoujam
   abhi.sanou...@gmail.comwrote:
  
   We have a single shard, and all the data is in a single box only.
   Definitely looks like deep-paging is having problems.
  
   Just to understand, is the searcher looping over the result set
  everytime
   and skipping the first start count? This will definitely take a
 toll
  when
   we reach higher start values.
  
  
  
  
   On 4/29/13 2:28 PM, Jan Høydahl wrote:
  
   Hi,
  
   How many shards do you have? This is a known issue with deep paging
  with
   multi shard, see https://issues.apache.org/**jira/browse/SOLR-1726
  https://issues.apache.org/jira/browse/SOLR-1726
  
   You may be more successful in going to each shard, one at a time
 (with
   distrib=false) to avoid this issue.
  
   --
   Jan Høydahl, search solution architect
   Cominvent AS - www.cominvent.com
   Solr Training - www.solrtraining.com
  
   29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam 
  abhi.sanou...@gmail.com
   :
  
We have a solr core with about 115 million documents. We are
 trying to
   migrate data and running a simple query with *:* query and with
 start
  and
   rows param.
   The performance is becoming too slow in solr, its taking almost 2
 mins
   to get 4000 rows and migration is being just too slow. Logs snippet
  below:
  
   INFO: [coreName] webapp=/solr path=/select
  params={start=55438000q=*:*
   **wt=javabinversion=2rows=**4000} hits=115760479 status=0
  QTime=168308
   INFO: [coreName] webapp=/solr path=/select
  params={start=55446000q=*:*
   **wt=javabinversion=2rows=**4000} hits=115760479 status=0
  QTime=122771
   INFO: [coreName] webapp=/solr path=/select
  params={start=55454000q=*:*
   **wt=javabinversion=2rows=**4000} hits=115760479 status=0
  QTime=137615
   INFO: [coreName] webapp=/solr path=/select
  params={start=5545q=*:*
   **wt=javabinversion=2rows=**4000} hits=115760479 status=0
  QTime=141223
   INFO: [coreName] webapp=/solr path=/select
  params={start=55462000q=*:*
   **wt=javabinversion=2rows=**4000} hits=115760479 status=0
  QTime=97474
   INFO: [coreName] webapp=/solr path=/select
  params={start=55458000q=*:*
   **wt=javabinversion=2rows=**4000} hits=115760479 status=0
  QTime=98115
   INFO: [coreName] webapp=/solr path=/select
  params={start=55466000q=*:*
   **wt=javabinversion=2rows=**4000} hits=115760479 status=0
  QTime=143822
   INFO: [coreName] webapp=/solr path=/select
  params={start=55474000q=*:*
   **wt=javabinversion=2rows=**4000} hits=115760479 status=0
  QTime=118066
   INFO: [coreName] webapp=/solr path=/select
  params={start=5547q=*:*
   **wt=javabinversion=2rows=**4000} hits=115760479 status=0
  QTime=121498
   INFO: [coreName] webapp=/solr path=/select
  params={start=55482000q=*:*
   **wt=javabinversion=2rows=**4000} hits=115760479 status=0
  QTime=164062
   INFO: [coreName] webapp=/solr path=/select
  params={start=55478000q=*:*
   **wt=javabinversion=2rows=**4000} hits=115760479 status=0

56 matches

Mail list logo