Re: Solr performance issue

2018-02-15 Thread Shawn Heisey
On 2/15/2018 2:00 AM, Srinivas Kashyap wrote:
> I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the 
> child entities in data-config.xml. And i'm using the same for full-import 
> only. And in the beginning of my implementation, i had written delta-import 
> query to index the modified changes. But my requirement grew and i have 17 
> child entities for a single parent entity now. When doing delta-import for 
> huge data, the number of requests being made to datasource(database)  became 
> more and CPU utilization was 100% when concurrent users started modifying the 
> data. For this instead of calling delta-import which imports based on last 
> index time, I did full-import('SortedMapBackedCache' ) based on last index 
> time.
>
> Though the parent entity query would return only records that are modified, 
> the child entity queries pull all the data from the database and the indexing 
> happens 'in-memory' which is causing the JVM memory go out of memory.

Can you provide your DIH config file (with passwords redacted) and the
precise URL you are using to initiate dataimport?  Also, I would like to
know what field you have defined as your uniqueKey.  I may have more
questions about the data in your system, depending on what I see.

That cache implementation should only cache entries from the database
that are actually requested.  If your query is correctly defined, it
should not pull all records from the DB table.

> Is there a way to specify in the child query entity to pull the record 
> related to parent entity in the full-import mode.

If I am understanding your question correctly, this is one of the fairly
basic things that DIH does.  Look at this config example in the
reference guide:

https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html#configuring-the-dih-configuration-file

In the entity named feature in that example config, the query string
uses ${item.ID} to reference the ID column from the parent entity, which
is item.

I should warn you that a cached entity does not always improve
performance.  This is particularly true if the lookup into the cache is
the information that goes to your uniqueKey field.  When the lookup is
by uniqueKey, every single row requested from the database will be used
exactly once, so there's not really any point to caching it.

Thanks,
Shawn



Re: Solr performance issue

2018-02-15 Thread Erick Erickson
Srinivas:

Not an answer to your question, but when DIH starts getting this
complicated, I start to seriously think about SolrJ, see:
https://lucidworks.com/2012/02/14/indexing-with-solrj/

IN particular, it moves the heavy lifting of acquiring the data from a
Solr node (which I'm assuming also has to index docs) to "some
client". It also let's you play some tricks with the code to make
things faster.

Best,
Erick

On Thu, Feb 15, 2018 at 1:00 AM, Srinivas Kashyap
 wrote:
> Hi,
>
> I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the 
> child entities in data-config.xml. And i'm using the same for full-import 
> only. And in the beginning of my implementation, i had written delta-import 
> query to index the modified changes. But my requirement grew and i have 17 
> child entities for a single parent entity now. When doing delta-import for 
> huge data, the number of requests being made to datasource(database)  became 
> more and CPU utilization was 100% when concurrent users started modifying the 
> data. For this instead of calling delta-import which imports based on last 
> index time, I did full-import('SortedMapBackedCache' ) based on last index 
> time.
>
> Though the parent entity query would return only records that are modified, 
> the child entity queries pull all the data from the database and the indexing 
> happens 'in-memory' which is causing the JVM memory go out of memory.
>
> Is there a way to specify in the child query entity to pull the record 
> related to parent entity in the full-import mode.
>
> Thanks and Regards,
> Srinivas Kashyap
>
> DISCLAIMER:
> E-mails and attachments from TradeStone Software, Inc. are confidential.
> If you are not the intended recipient, please notify the sender immediately by
> replying to the e-mail, and then delete it without making copies or using it
> in any way. No representation is made that this email or any attachments are
> free of viruses. Virus scanning is recommended and is the responsibility of
> the recipient.


Solr performance issue

2018-02-15 Thread Srinivas Kashyap
Hi,

I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the 
child entities in data-config.xml. And i'm using the same for full-import only. 
And in the beginning of my implementation, i had written delta-import query to 
index the modified changes. But my requirement grew and i have 17 child 
entities for a single parent entity now. When doing delta-import for huge data, 
the number of requests being made to datasource(database)  became more and CPU 
utilization was 100% when concurrent users started modifying the data. For this 
instead of calling delta-import which imports based on last index time, I did 
full-import('SortedMapBackedCache' ) based on last index time.

Though the parent entity query would return only records that are modified, the 
child entity queries pull all the data from the database and the indexing 
happens 'in-memory' which is causing the JVM memory go out of memory.

Is there a way to specify in the child query entity to pull the record related 
to parent entity in the full-import mode.

Thanks and Regards,
Srinivas Kashyap

DISCLAIMER: 
E-mails and attachments from TradeStone Software, Inc. are confidential.
If you are not the intended recipient, please notify the sender immediately by
replying to the e-mail, and then delete it without making copies or using it
in any way. No representation is made that this email or any attachments are
free of viruses. Virus scanning is recommended and is the responsibility of
the recipient.

Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-30 Thread sasarun
Hi Erick, 

As suggested, I did try nonHDFS solr cloud instance and it response looks to
be really better. From the configuration side to, I am mostly using default
configurations and with block.cache.direct.memory.allocation as false.  On
analysis of hdfs cache, evictions seems to be on higher side. 

Thanks, 
Arun



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-27 Thread Emir Arnautović
Hi Arun,
It is hard to measure something without affecting it, but we could use debug 
results and combine with QTime without debug: If we ignore merging results, it 
seems that majority of time is spent for retrieving docs (~500ms). You should 
consider reducing number of rows if you want better response time (you can ask 
for rows=0 to see max possible time). Also, as Erick suggested, reducing number 
of shards (1 if not plan much more doc) will trim some overhead of merging 
results.

Thanks,
Emir

I noticed that you removed bq - is time with bq acceptable as well?
> On 27 Sep 2017, at 12:34, sasarun  wrote:
> 
> Hi Emir, 
> 
> Please find the response without bq parameter and debugQuery set to true. 
> Also it was noted that Qtime comes down drastically without the debug
> parameter to about 700-800. 
> 
> 
> true
> 0
> 3446
> 
> 
> ("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
> "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
> Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
> "hybrid electric" "electric powerplant")
> 
> edismax
> on
> 
> host
> title
> url
> customContent
> contentSpecificSearch
> 
> 
> id
> contentOntologyTagsCount
> 
> 0
> OR
> 3985d7e2-3e54-48d8-8336-229e85f5d9de
> 600
> true
> 
> 
>  maxScore="56.74194">...
> 
> 
> 
> solr-prd-cluster-m-GooglePatent_shard4_replica2-1506504238282-20
> 
> 
> 
> 35
> 159
> GET_TOP_IDS
> 41294
> ...
> 
> 
> 29
> 165
> GET_TOP_IDS
> 40980
> ...
> 
> 
> 31
> 200
> GET_TOP_IDS
> 41006
> ...
> 
> 
> 43
> 208
> GET_TOP_IDS
> 41040
> ...
> 
> 
> 181
> 466
> GET_TOP_IDS
> 41138
> ...
> 
> 
> 
> 
> 1518
> 1523
> GET_FIELDS,GET_DEBUG
> 110
> ...
> 
> 
> 1562
> 1573
> GET_FIELDS,GET_DEBUG
> 115
> ...
> 
> 
> 1793
> 1800
> GET_FIELDS,GET_DEBUG
> 120
> ...
> 
> 
> 2153
> 2161
> GET_FIELDS,GET_DEBUG
> 125
> ...
> 
> 
> 2957
> 2970
> GET_FIELDS,GET_DEBUG
> 130
> ...
> 
> 
> 
> 
> 10302.0
> 
> 2.0
> 
> 2.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 
> 10288.0
> 
> 661.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 9627.0
> 
> 
> 
> 
> ("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
> "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
> Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
> "hybrid electric" "electric powerplant")
> 
> 
> ("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
> "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
> Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
> "hybrid electric" "electric powerplant")
> 
> 
> (+(DisjunctionMaxQuery((host:hybrid electric powerplant |
> contentSpecificSearch:"hybrid electric powerplant" | customContent:"hybrid
> electric powerplant" | title:hybrid electric powerplant | url:hybrid
> electric powerplant)) DisjunctionMaxQuery((host:hybrid electric powerplants
> | contentSpecificSearch:"hybrid electric powerplants" |
> customContent:"hybrid electric powerplants" | title:hybrid electric
> powerplants | url:hybrid electric powerplants))
> DisjunctionMaxQuery((host:Electric | contentSpecificSearch:electric |
> customContent:electric | title:Electric | url:Electric))
> DisjunctionMaxQuery((host:Electrical | contentSpecificSearch:electrical |
> customContent:electrical | title:Electrical | url:Electrical))
> DisjunctionMaxQuery((host:Electricity | contentSpecificSearch:electricity |
> customContent:electricity | title:Electricity | url:Electricity))
> DisjunctionMaxQuery((host:Engine | contentSpecificSearch:engine |
> customContent:engine | title:Engine | url:Engine))
> DisjunctionMaxQuery((host:fuel economy | contentSpecificSearch:"fuel
> economy" | customContent:"fuel economy" | title:fuel economy | url:fuel
> economy)) DisjunctionMaxQuery((host:fuel efficiency |
> contentSpecificSearch:"fuel efficiency" | customContent:"fuel efficiency" |
> title:fuel efficiency | url:fuel efficiency))
> DisjunctionMaxQuery((host:Hybrid Electric Propulsion |
> contentSpecificSearch:"hybrid electric propulsion" | customContent:"hybrid
> electric propulsion" | title:Hybrid Electric Propulsion | url:Hybrid
> Electric Propulsion)) DisjunctionMaxQuery((host:Power Systems |
> contentSpecificSearch:"power systems" | customContent:"power systems" |
> title:Power Systems | url:Power Systems))
> DisjunctionMaxQuery((host:Powerplant | contentSpecificSearch:powerplant |
> customContent:powerplant | title:Powerplant | url:Powerplant))
> DisjunctionMaxQuery((host:Propulsion | contentSpecificSearch:propulsion |
> customContent:propulsion | title:Propulsion | url:Propulsion))
> DisjunctionMaxQuery((host:hybrid | contentSpecificSearch:hybrid |
> customContent:hybrid | title:hybrid | url:hybrid))
> DisjunctionMaxQuery((host:hybrid electric | contentSpecificSearch:"hybrid
> electric" | customContent:"hybrid 

Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-27 Thread sasarun
Hi Emir, 

Please find the response without bq parameter and debugQuery set to true. 
Also it was noted that Qtime comes down drastically without the debug
parameter to about 700-800. 


true
0
3446


("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
"Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
"hybrid electric" "electric powerplant")

edismax
on

host
title
url
customContent
contentSpecificSearch


id
contentOntologyTagsCount

0
OR
3985d7e2-3e54-48d8-8336-229e85f5d9de
600
true


...



solr-prd-cluster-m-GooglePatent_shard4_replica2-1506504238282-20



35
159
GET_TOP_IDS
41294
...


29
165
GET_TOP_IDS
40980
...


31
200
GET_TOP_IDS
41006
...


43
208
GET_TOP_IDS
41040
...


181
466
GET_TOP_IDS
41138
...




1518
1523
GET_FIELDS,GET_DEBUG
110
...


1562
1573
GET_FIELDS,GET_DEBUG
115
...


1793
1800
GET_FIELDS,GET_DEBUG
120
...


2153
2161
GET_FIELDS,GET_DEBUG
125
...


2957
2970
GET_FIELDS,GET_DEBUG
130
...




10302.0

2.0

2.0


0.0


0.0


0.0


0.0


0.0


0.0


0.0


0.0



10288.0

661.0


0.0


0.0


0.0


0.0


0.0


0.0


0.0


9627.0




("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
"Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
"hybrid electric" "electric powerplant")


("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
"Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
"hybrid electric" "electric powerplant")


(+(DisjunctionMaxQuery((host:hybrid electric powerplant |
contentSpecificSearch:"hybrid electric powerplant" | customContent:"hybrid
electric powerplant" | title:hybrid electric powerplant | url:hybrid
electric powerplant)) DisjunctionMaxQuery((host:hybrid electric powerplants
| contentSpecificSearch:"hybrid electric powerplants" |
customContent:"hybrid electric powerplants" | title:hybrid electric
powerplants | url:hybrid electric powerplants))
DisjunctionMaxQuery((host:Electric | contentSpecificSearch:electric |
customContent:electric | title:Electric | url:Electric))
DisjunctionMaxQuery((host:Electrical | contentSpecificSearch:electrical |
customContent:electrical | title:Electrical | url:Electrical))
DisjunctionMaxQuery((host:Electricity | contentSpecificSearch:electricity |
customContent:electricity | title:Electricity | url:Electricity))
DisjunctionMaxQuery((host:Engine | contentSpecificSearch:engine |
customContent:engine | title:Engine | url:Engine))
DisjunctionMaxQuery((host:fuel economy | contentSpecificSearch:"fuel
economy" | customContent:"fuel economy" | title:fuel economy | url:fuel
economy)) DisjunctionMaxQuery((host:fuel efficiency |
contentSpecificSearch:"fuel efficiency" | customContent:"fuel efficiency" |
title:fuel efficiency | url:fuel efficiency))
DisjunctionMaxQuery((host:Hybrid Electric Propulsion |
contentSpecificSearch:"hybrid electric propulsion" | customContent:"hybrid
electric propulsion" | title:Hybrid Electric Propulsion | url:Hybrid
Electric Propulsion)) DisjunctionMaxQuery((host:Power Systems |
contentSpecificSearch:"power systems" | customContent:"power systems" |
title:Power Systems | url:Power Systems))
DisjunctionMaxQuery((host:Powerplant | contentSpecificSearch:powerplant |
customContent:powerplant | title:Powerplant | url:Powerplant))
DisjunctionMaxQuery((host:Propulsion | contentSpecificSearch:propulsion |
customContent:propulsion | title:Propulsion | url:Propulsion))
DisjunctionMaxQuery((host:hybrid | contentSpecificSearch:hybrid |
customContent:hybrid | title:hybrid | url:hybrid))
DisjunctionMaxQuery((host:hybrid electric | contentSpecificSearch:"hybrid
electric" | customContent:"hybrid electric" | title:hybrid electric |
url:hybrid electric)) DisjunctionMaxQuery((host:electric powerplant |
contentSpecificSearch:"electric powerplant" | customContent:"electric
powerplant" | title:electric powerplant | url:electric
powerplant/no_coord


+((host:hybrid electric powerplant | contentSpecificSearch:"hybrid electric
powerplant" | customContent:"hybrid electric powerplant" | title:hybrid
electric powerplant | url:hybrid electric powerplant) (host:hybrid electric
powerplants | contentSpecificSearch:"hybrid electric powerplants" |
customContent:"hybrid electric powerplants" | title:hybrid electric
powerplants | url:hybrid electric powerplants) (host:Electric |
contentSpecificSearch:electric | customContent:electric | title:Electric |
url:Electric) (host:Electrical | contentSpecificSearch:electrical |
customContent:electrical | title:Electrical | url:Electrical)
(host:Electricity | contentSpecificSearch:electricity |
customContent:electricity | title:Electricity | url:Electricity)
(host:Engine | contentSpecificSearch:engine | customContent:engine |
title:Engine | url:Engine) (host:fuel 

Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-27 Thread sasarun
Hi Erick, 

Qtime comes down with rows set as 1. Also it was noted that qtime comes down
when debug parameter is not added with the query. It comes to about 900.

Thanks, 
Arun 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-27 Thread Toke Eskildsen
On Tue, 2017-09-26 at 07:43 -0700, sasarun wrote:
> Allocated heap size for young generation is about 8 gb and old 
> generation is about 24 gb. And gc analysis showed peak
> size utlisation is really low compared to these values.

That does not come as a surprise. Your collections would normally be
considered small, if not tiny, looking only at their size measured in
bytes. Again, if you expect them to grow significantly (more than 10x),
your allocation might make sense. If you do not expect such a growth in
the near future, you will be better off with a much smaller heap: The
peak heap utilization that you have logged (or twice that to err on the
cautious side) seems a good starting point.

And whatever you do, don't set Xmx to 32GB. Use <31GB or significantly
more than 32GB:
https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-mem
ory-oddities/


Are you indexing while you search? If so, you need to set auto-warm or
state a few explicit warmup-queries. If not, your measuring will not be
representative as it will be on first-searches, which are always slower
than warmed-searches.


- Toke Eskildsen, Royal Danish Library



Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-27 Thread Emir Arnautović
Hi Arun,
This is not the most simple query either - a dozen of phrase queries on several 
fields + the same query as bq. Can you provide debugQuery info.
I did not look much into debug times and what includes what, but one thing that 
is strange to me is that QTime is 4s while query in debug is 1.3s. Can you try 
running without bq? Can you include boost factors in the main query?

Thanks,
Emir

> On 26 Sep 2017, at 16:43, sasarun  wrote:
> 
> Hi All, 
> I have been using Solr for some time now but mostly in standalone mode. Now
> my current project is using Solr 6.5.1 hosted on hadoop. My solrconfig.xml
> has the following configuration. In the prod environment the performance on
> querying seems to really slow. Can anyone help me with few pointers on
> howimprove on the same. 
> 
> 
>${solr.hdfs.home:}
> name="solr.hdfs.blockcache.enabled">${solr.hdfs.blockcache.enabled:true}
> name="solr.hdfs.blockcache.slab.count">${solr.hdfs.blockcache.slab.count:1}
> name="solr.hdfs.blockcache.direct.memory.allocation">${solr.hdfs.blockcache.direct.memory.allocation:false}
> name="solr.hdfs.blockcache.blocksperbank">${solr.hdfs.blockcache.blocksperbank:16384}
> name="solr.hdfs.blockcache.read.enabled">${solr.hdfs.blockcache.read.enabled:true}
> name="solr.hdfs.blockcache.write.enabled">${solr.hdfs.blockcache.write.enabled:false}
> name="solr.hdfs.nrtcachingdirectory.enable">${solr.hdfs.nrtcachingdirectory.enable:true}
> name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">${solr.hdfs.nrtcachingdirectory.maxmergesizemb:16}
> name="solr.hdfs.nrtcachingdirectory.maxcachedmb">${solr.hdfs.nrtcachingdirectory.maxcachedmb:192}
> 
>hdfs
> It has 6 collections of following size 
> Collection 1 -->6.41 MB
> Collection 2 -->634.51 KB 
> Collection 3 -->4.59 MB 
> Collection 4 -->1,020.56 MB 
> Collection 5 --> 607.26 MB
> Collection 6 -->102.4 kb
> Each Collection has 5 shards each. Allocated heap size for young generation
> is about 8 gb and old generation is about 24 gb. And gc analysis showed peak
> size 
> utlisation is really low compared to these values. 
> But querying to Collection 4 and collection 5 is giving really slow response
> even thoughwe are not using any complex queries.Output of debug quries run
> with debug=timing
> are given below for reference. Can anyone help suggest a way improve the
> performance.
> 
> Response to query
> 
> 
> true
> 0
> 3962
> 
> 
> ("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
> "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
> Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
> "hybrid electric" "electric powerplant")
> 
> edismax
> true
> on
> 
> host
> title
> url
> customContent
> contentSpecificSearch
> 
> 
> id
> contentTagsCount
> 
> 0
> OR
> OR
> 3985d7e2-3e54-48d8-8336-229e85f5d9de
> 600
> 
> ("hybrid electric powerplant"^100.0 "hybrid electric powerplants"^100.0
> "Electric"^50.0 "Electrical"^50.0 "Electricity"^50.0 "Engine"^50.0 "fuel
> economy"^50.0 "fuel efficiency"^50.0 "Hybrid Electric Propulsion"^50.0
> "Power Systems"^50.0 "Powerplant"^50.0 "Propulsion"^50.0 "hybrid"^15.0
> "hybrid electric"^15.0 "electric powerplant"^15.0)
> 
> 
> 
> 
> 
> 15374.0
> 
> 2.0
> 
> 2.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 
> 15363.0
> 
> 1313.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 14048.0
> 
> 
> 
> 
> 
> Thanks,
> Arun
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-26 Thread Erick Erickson
Well, 15 second responses are not what I'd expect either. But two
things (just looked again)

1> note that the time to assemble the debug information is a large
majority of your total time (14 of 15.3 seconds).
2> you're specifying 600 rows which is quite a lot as each one
requires that a 16K block of data be read from disk and decompressed
to assemble the "fl" list.

so one quick test would be to set rows=1 or something. All that said,
the QTime value returned does _not_ include <1> or <2> above and even
4 seconds seems excessive.

Best,
Erick

On Tue, Sep 26, 2017 at 10:54 AM, sasarun  wrote:
> Hi Erick,
>
> Thank you for the quick response. Query time was relatively faster once it
> is read from memory. But personally I always felt response time could be far
> better. As suggested, We will try and set up in a non HDFS environment and
> update on the results.
>
> Thanks,
> Arun
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-26 Thread sasarun
Hi Erick, 

Thank you for the quick response. Query time was relatively faster once it
is read from memory. But personally I always felt response time could be far
better. As suggested, We will try and set up in a non HDFS environment and
update on the results. 

Thanks, 
Arun



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-26 Thread Erick Erickson
Does the query time _stay_ low? Once the data is read from HDFS it
should pretty much stay in memory. So my question is whether, once
Solr warms up you see this kind of query response time.

Have you tried this on a non HDFS system? That would be useful to help
figure out where to look.

And given the sizes of your collections, unless you expect them to get
much larger, there's no reason to shard any of them. Sharding should
only really be used when the collections are too big for a single
shard as distributed searches inevitably have increased overhead. I
expect _at least_ 20M documents/shard, and have seen 200M docs/shard.
YMMV of course.

Best,
Erick

On Tue, Sep 26, 2017 at 7:43 AM, sasarun  wrote:
> Hi All,
> I have been using Solr for some time now but mostly in standalone mode. Now
> my current project is using Solr 6.5.1 hosted on hadoop. My solrconfig.xml
> has the following configuration. In the prod environment the performance on
> querying seems to really slow. Can anyone help me with few pointers on
> howimprove on the same.
>
> 
> ${solr.hdfs.home:}
>  name="solr.hdfs.blockcache.enabled">${solr.hdfs.blockcache.enabled:true}
>  name="solr.hdfs.blockcache.slab.count">${solr.hdfs.blockcache.slab.count:1}
>  name="solr.hdfs.blockcache.direct.memory.allocation">${solr.hdfs.blockcache.direct.memory.allocation:false}
>  name="solr.hdfs.blockcache.blocksperbank">${solr.hdfs.blockcache.blocksperbank:16384}
>  name="solr.hdfs.blockcache.read.enabled">${solr.hdfs.blockcache.read.enabled:true}
>  name="solr.hdfs.blockcache.write.enabled">${solr.hdfs.blockcache.write.enabled:false}
>  name="solr.hdfs.nrtcachingdirectory.enable">${solr.hdfs.nrtcachingdirectory.enable:true}
>  name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">${solr.hdfs.nrtcachingdirectory.maxmergesizemb:16}
>  name="solr.hdfs.nrtcachingdirectory.maxcachedmb">${solr.hdfs.nrtcachingdirectory.maxcachedmb:192}
> 
> hdfs
> It has 6 collections of following size
> Collection 1 -->6.41 MB
> Collection 2 -->634.51 KB
> Collection 3 -->4.59 MB
> Collection 4 -->1,020.56 MB
> Collection 5 --> 607.26 MB
> Collection 6 -->102.4 kb
> Each Collection has 5 shards each. Allocated heap size for young generation
> is about 8 gb and old generation is about 24 gb. And gc analysis showed peak
> size
> utlisation is really low compared to these values.
> But querying to Collection 4 and collection 5 is giving really slow response
> even thoughwe are not using any complex queries.Output of debug quries run
> with debug=timing
> are given below for reference. Can anyone help suggest a way improve the
> performance.
>
> Response to query
> 
> 
> true
> 0
> 3962
> 
> 
> ("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
> "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
> Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
> "hybrid electric" "electric powerplant")
> 
> edismax
> true
> on
> 
> host
> title
> url
> customContent
> contentSpecificSearch
> 
> 
> id
> contentTagsCount
> 
> 0
> OR
> OR
> 3985d7e2-3e54-48d8-8336-229e85f5d9de
> 600
> 
> ("hybrid electric powerplant"^100.0 "hybrid electric powerplants"^100.0
> "Electric"^50.0 "Electrical"^50.0 "Electricity"^50.0 "Engine"^50.0 "fuel
> economy"^50.0 "fuel efficiency"^50.0 "Hybrid Electric Propulsion"^50.0
> "Power Systems"^50.0 "Powerplant"^50.0 "Propulsion"^50.0 "hybrid"^15.0
> "hybrid electric"^15.0 "electric powerplant"^15.0)
> 
> 
> 
> 
> 
> 15374.0
> 
> 2.0
> 
> 2.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 
> 15363.0
> 
> 1313.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 14048.0
> 
> 
> 
>
>
> Thanks,
> Arun
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr performance issue on querying --> Solr 6.5.1

2017-09-26 Thread sasarun
Hi All, 
I have been using Solr for some time now but mostly in standalone mode. Now
my current project is using Solr 6.5.1 hosted on hadoop. My solrconfig.xml
has the following configuration. In the prod environment the performance on
querying seems to really slow. Can anyone help me with few pointers on
howimprove on the same. 


${solr.hdfs.home:}
${solr.hdfs.blockcache.enabled:true}
${solr.hdfs.blockcache.slab.count:1}
${solr.hdfs.blockcache.direct.memory.allocation:false}
${solr.hdfs.blockcache.blocksperbank:16384}
${solr.hdfs.blockcache.read.enabled:true}
${solr.hdfs.blockcache.write.enabled:false}
${solr.hdfs.nrtcachingdirectory.enable:true}
${solr.hdfs.nrtcachingdirectory.maxmergesizemb:16}
${solr.hdfs.nrtcachingdirectory.maxcachedmb:192}

hdfs
It has 6 collections of following size 
Collection 1 -->6.41 MB
Collection 2 -->634.51 KB 
Collection 3 -->4.59 MB 
Collection 4 -->1,020.56 MB 
Collection 5 --> 607.26 MB
Collection 6 -->102.4 kb
Each Collection has 5 shards each. Allocated heap size for young generation
is about 8 gb and old generation is about 24 gb. And gc analysis showed peak
size 
utlisation is really low compared to these values. 
But querying to Collection 4 and collection 5 is giving really slow response
even thoughwe are not using any complex queries.Output of debug quries run
with debug=timing
are given below for reference. Can anyone help suggest a way improve the
performance.

Response to query


true
0
3962


("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
"Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
"hybrid electric" "electric powerplant")

edismax
true
on

host
title
url
customContent
contentSpecificSearch


id
contentTagsCount

0
OR
OR
3985d7e2-3e54-48d8-8336-229e85f5d9de
600

("hybrid electric powerplant"^100.0 "hybrid electric powerplants"^100.0
"Electric"^50.0 "Electrical"^50.0 "Electricity"^50.0 "Engine"^50.0 "fuel
economy"^50.0 "fuel efficiency"^50.0 "Hybrid Electric Propulsion"^50.0
"Power Systems"^50.0 "Powerplant"^50.0 "Propulsion"^50.0 "hybrid"^15.0
"hybrid electric"^15.0 "electric powerplant"^15.0)





15374.0

2.0

2.0


0.0


0.0


0.0


0.0


0.0


0.0


0.0


0.0



15363.0

1313.0


0.0


0.0


0.0


0.0


0.0


0.0


0.0


14048.0





Thanks,
Arun



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Solr performance issue on indexing

2017-04-04 Thread Allison, Timothy B.
>  Also we will try to decouple tika to solr.
+1


-Original Message-
From: tstusr [mailto:ulfrhe...@gmail.com] 
Sent: Friday, March 31, 2017 4:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr performance issue on indexing

Hi, thanks for the feedback.

Yes, it is about OOM, indeed even solr instance makes unavailable. As I was 
saying I can't find more relevant information on logs.

We're are able to increment JVM amout, so, the first thing we'll do will be 
that.

As far as I know, all documents are bounded to that amount (14K), just the 
processing could change. We are making some tests on indexing and it seems it 
works without concurrent threads. Also we will try to decouple tika to solr.

By the way, make it available with solr cloud will improve performance? Or 
there will be no perceptible improvement?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886p4327914.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr performance issue on indexing

2017-03-31 Thread Erick Erickson
If, by chance, the docs you're sending get routed to different Solr
nodes then all the processing is in parallel. I don't know if there's
a good way to insure that the docs get sent to different replicas on
different Solr instances. You could try addressing specific Solr
replicas, something like "blah
blah/solr/collection1_shard1_replica1/export" but I'm not totally sure
that'll do what you want either.

 But that still doesn't decouple Tika from the Solr instances running
those replicas. So if Tika has a problem it has the potential to bring
the Solr node down.

Best,
Erick

On Fri, Mar 31, 2017 at 1:31 PM, tstusr <ulfrhe...@gmail.com> wrote:
> Hi, thanks for the feedback.
>
> Yes, it is about OOM, indeed even solr instance makes unavailable. As I was
> saying I can't find more relevant information on logs.
>
> We're are able to increment JVM amout, so, the first thing we'll do will be
> that.
>
> As far as I know, all documents are bounded to that amount (14K), just the
> processing could change. We are making some tests on indexing and it seems
> it works without concurrent threads. Also we will try to decouple tika to
> solr.
>
> By the way, make it available with solr cloud will improve performance? Or
> there will be no perceptible improvement?
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886p4327914.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr performance issue on indexing

2017-03-31 Thread tstusr
Hi, thanks for the feedback.

Yes, it is about OOM, indeed even solr instance makes unavailable. As I was
saying I can't find more relevant information on logs.

We're are able to increment JVM amout, so, the first thing we'll do will be
that.

As far as I know, all documents are bounded to that amount (14K), just the
processing could change. We are making some tests on indexing and it seems
it works without concurrent threads. Also we will try to decouple tika to
solr.

By the way, make it available with solr cloud will improve performance? Or
there will be no perceptible improvement?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886p4327914.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr performance issue on indexing

2017-03-31 Thread Erick Erickson
First, running multiple threads with PDF files to a Solr running 4G of
JVM is...ambitious. You say it crashes; how? OOMs?

Second while the extracting request handler is a fine way to get up
and running, any problems with Tika will affect Solr. Tika does a
great job of extraction, but there are so many variants of so many
file formats that this scenario isn' recommended for production.
Consider extracting the PDF on a client and sending the docs to Solr.
Tika can run as a server also so you aren't coupling Solr and Tika.

For a sample SolrJ program, see:
https://lucidworks.com/2012/02/14/indexing-with-solrj/

Best,
Erick

On Fri, Mar 31, 2017 at 10:44 AM, tstusr <ulfrhe...@gmail.com> wrote:
> Hi there.
>
> We are currently indexing some PDF files, the main handler to index is
> /extract where we perform simple processing (extract relevant fields and
> store on some fields).
>
> The PDF files are about 10M~100M size and we have to have available the text
> extracted. So, everything works correct on test stages, but when we try to
> index all the 14K files (around 120Gb) on a client application that only
> sends http curls through 3-4 concurrent threads to /extract handler it
> crashes. I can't find some relevant information about on solr logs (We
> checked in server/logs & in core_dir/tlog).
>
> My question is about performance. I think it is a small amount of info we
> are processing, the deploy scenario is in a docker container with 4gb of JVM
> Memory and ~50gb of physical memory (reported through dashboard) we are
> using a single instance.
>
> I don't think is a normal behaviour that handler crashes. So, what are some
> general tips about improving performance for this scenario?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Solr performance issue on indexing

2017-03-31 Thread tstusr
Hi there.

We are currently indexing some PDF files, the main handler to index is
/extract where we perform simple processing (extract relevant fields and
store on some fields). 

The PDF files are about 10M~100M size and we have to have available the text
extracted. So, everything works correct on test stages, but when we try to
index all the 14K files (around 120Gb) on a client application that only
sends http curls through 3-4 concurrent threads to /extract handler it
crashes. I can't find some relevant information about on solr logs (We
checked in server/logs & in core_dir/tlog).

My question is about performance. I think it is a small amount of info we
are processing, the deploy scenario is in a docker container with 4gb of JVM
Memory and ~50gb of physical memory (reported through dashboard) we are
using a single instance. 

I don't think is a normal behaviour that handler crashes. So, what are some
general tips about improving performance for this scenario?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr performance issue

2016-02-09 Thread Zheng Lin Edwin Yeo
1 million document isn't considered big for Solr. How much RAM does your
machine have?

Regards,
Edwin

On 8 February 2016 at 23:45, Susheel Kumar  wrote:

> 1 million document shouldn't have any issues at all.  Something else is
> wrong with your hw/system configuration.
>
> Thanks,
> Susheel
>
> On Mon, Feb 8, 2016 at 6:45 AM, sara hajili  wrote:
>
> > On Mon, Feb 8, 2016 at 3:04 AM, sara hajili 
> wrote:
> >
> > > sorry i made a mistake i have a bout 1000 K doc.
> > > i mean about 100 doc.
> > >
> > > On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic <
> > > emir.arnauto...@sematext.com> wrote:
> > >
> > >> Hi Sara,
> > >> Not sure if I am reading this right, but I read it as you have 1000
> doc
> > >> index and issues? Can you tell us bit more about your setup: number of
> > >> servers, hw, index size, number of shards, queries that you run, do
> you
> > >> index at the same time...
> > >>
> > >> It seems to me that you are running Solr on server with limited RAM
> and
> > >> probably small heap. Swapping for sure will slow things down and GC is
> > most
> > >> likely reason for high CPU.
> > >>
> > >> You can use http://sematext.com/spm to collect Solr and host metrics
> > and
> > >> see where the issue is.
> > >>
> > >> Thanks,
> > >> Emir
> > >>
> > >> --
> > >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > >> Solr & Elasticsearch Support * http://sematext.com/
> > >>
> > >>
> > >>
> > >> On 08.02.2016 10:27, sara hajili wrote:
> > >>
> > >>> hi all.
> > >>> i have a problem with my solr performance and usage hardware like a
> > >>> ram,cup...
> > >>> i have a lot of document and so indexed file about 1000 doc in solr
> > that
> > >>> every doc has about 8 field in average.
> > >>> and each field has about 60 char.
> > >>> i set my field as a storedfield = "false" except of  1 field. // i
> read
> > >>> that this help performance.
> > >>> i used copy field and dynamic field if it was necessary . // i read
> > that
> > >>> this help performance.
> > >>> and now my question is that when i run a lot of query on solr i faced
> > >>> with
> > >>> a problem solr use more cpu and ram and after that filled ,it use a
> lot
> > >>>   swapped storage and then use hard,but doesn't create a system file!
> > >>> solr
> > >>> fill hard until i forced to restart server to release hard disk.
> > >>> and now my question is why solr treat in this way? and how i can
> avoid
> > >>> solr
> > >>> to use huge cpu space?
> > >>> any config need?!
> > >>>
> > >>>
> > >>
> > >
> >
>


Re: solr performance issue

2016-02-08 Thread Susheel Kumar
1 million document shouldn't have any issues at all.  Something else is
wrong with your hw/system configuration.

Thanks,
Susheel

On Mon, Feb 8, 2016 at 6:45 AM, sara hajili  wrote:

> On Mon, Feb 8, 2016 at 3:04 AM, sara hajili  wrote:
>
> > sorry i made a mistake i have a bout 1000 K doc.
> > i mean about 100 doc.
> >
> > On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic <
> > emir.arnauto...@sematext.com> wrote:
> >
> >> Hi Sara,
> >> Not sure if I am reading this right, but I read it as you have 1000 doc
> >> index and issues? Can you tell us bit more about your setup: number of
> >> servers, hw, index size, number of shards, queries that you run, do you
> >> index at the same time...
> >>
> >> It seems to me that you are running Solr on server with limited RAM and
> >> probably small heap. Swapping for sure will slow things down and GC is
> most
> >> likely reason for high CPU.
> >>
> >> You can use http://sematext.com/spm to collect Solr and host metrics
> and
> >> see where the issue is.
> >>
> >> Thanks,
> >> Emir
> >>
> >> --
> >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> >> Solr & Elasticsearch Support * http://sematext.com/
> >>
> >>
> >>
> >> On 08.02.2016 10:27, sara hajili wrote:
> >>
> >>> hi all.
> >>> i have a problem with my solr performance and usage hardware like a
> >>> ram,cup...
> >>> i have a lot of document and so indexed file about 1000 doc in solr
> that
> >>> every doc has about 8 field in average.
> >>> and each field has about 60 char.
> >>> i set my field as a storedfield = "false" except of  1 field. // i read
> >>> that this help performance.
> >>> i used copy field and dynamic field if it was necessary . // i read
> that
> >>> this help performance.
> >>> and now my question is that when i run a lot of query on solr i faced
> >>> with
> >>> a problem solr use more cpu and ram and after that filled ,it use a lot
> >>>   swapped storage and then use hard,but doesn't create a system file!
> >>> solr
> >>> fill hard until i forced to restart server to release hard disk.
> >>> and now my question is why solr treat in this way? and how i can avoid
> >>> solr
> >>> to use huge cpu space?
> >>> any config need?!
> >>>
> >>>
> >>
> >
>


solr performance issue

2016-02-08 Thread sara hajili
hi all.
i have a problem with my solr performance and usage hardware like a
ram,cup...
i have a lot of document and so indexed file about 1000 doc in solr that
every doc has about 8 field in average.
and each field has about 60 char.
i set my field as a storedfield = "false" except of  1 field. // i read
that this help performance.
i used copy field and dynamic field if it was necessary . // i read that
this help performance.
and now my question is that when i run a lot of query on solr i faced with
a problem solr use more cpu and ram and after that filled ,it use a lot
 swapped storage and then use hard,but doesn't create a system file! solr
fill hard until i forced to restart server to release hard disk.
and now my question is why solr treat in this way? and how i can avoid solr
to use huge cpu space?
any config need?!


Re: solr performance issue

2016-02-08 Thread Emir Arnautovic

Hi Sara,
It is still considered to be small index. Can you give us bit details 
about your setup?


Thanks,
Emir

On 08.02.2016 12:04, sara hajili wrote:

sorry i made a mistake i have a bout 1000 K doc.
i mean about 100 doc.

On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:


Hi Sara,
Not sure if I am reading this right, but I read it as you have 1000 doc
index and issues? Can you tell us bit more about your setup: number of
servers, hw, index size, number of shards, queries that you run, do you
index at the same time...

It seems to me that you are running Solr on server with limited RAM and
probably small heap. Swapping for sure will slow things down and GC is most
likely reason for high CPU.

You can use http://sematext.com/spm to collect Solr and host metrics and
see where the issue is.

Thanks,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



On 08.02.2016 10:27, sara hajili wrote:


hi all.
i have a problem with my solr performance and usage hardware like a
ram,cup...
i have a lot of document and so indexed file about 1000 doc in solr that
every doc has about 8 field in average.
and each field has about 60 char.
i set my field as a storedfield = "false" except of  1 field. // i read
that this help performance.
i used copy field and dynamic field if it was necessary . // i read that
this help performance.
and now my question is that when i run a lot of query on solr i faced with
a problem solr use more cpu and ram and after that filled ,it use a lot
   swapped storage and then use hard,but doesn't create a system file! solr
fill hard until i forced to restart server to release hard disk.
and now my question is why solr treat in this way? and how i can avoid
solr
to use huge cpu space?
any config need?!




--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: solr performance issue

2016-02-08 Thread Emir Arnautovic

Hi Sara,
Not sure if I am reading this right, but I read it as you have 1000 doc 
index and issues? Can you tell us bit more about your setup: number of 
servers, hw, index size, number of shards, queries that you run, do you 
index at the same time...


It seems to me that you are running Solr on server with limited RAM and 
probably small heap. Swapping for sure will slow things down and GC is 
most likely reason for high CPU.


You can use http://sematext.com/spm to collect Solr and host metrics and 
see where the issue is.


Thanks,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On 08.02.2016 10:27, sara hajili wrote:

hi all.
i have a problem with my solr performance and usage hardware like a
ram,cup...
i have a lot of document and so indexed file about 1000 doc in solr that
every doc has about 8 field in average.
and each field has about 60 char.
i set my field as a storedfield = "false" except of  1 field. // i read
that this help performance.
i used copy field and dynamic field if it was necessary . // i read that
this help performance.
and now my question is that when i run a lot of query on solr i faced with
a problem solr use more cpu and ram and after that filled ,it use a lot
  swapped storage and then use hard,but doesn't create a system file! solr
fill hard until i forced to restart server to release hard disk.
and now my question is why solr treat in this way? and how i can avoid solr
to use huge cpu space?
any config need?!





Re: solr performance issue

2016-02-08 Thread sara hajili
sorry i made a mistake i have a bout 1000 K doc.
i mean about 100 doc.

On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Hi Sara,
> Not sure if I am reading this right, but I read it as you have 1000 doc
> index and issues? Can you tell us bit more about your setup: number of
> servers, hw, index size, number of shards, queries that you run, do you
> index at the same time...
>
> It seems to me that you are running Solr on server with limited RAM and
> probably small heap. Swapping for sure will slow things down and GC is most
> likely reason for high CPU.
>
> You can use http://sematext.com/spm to collect Solr and host metrics and
> see where the issue is.
>
> Thanks,
> Emir
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> On 08.02.2016 10:27, sara hajili wrote:
>
>> hi all.
>> i have a problem with my solr performance and usage hardware like a
>> ram,cup...
>> i have a lot of document and so indexed file about 1000 doc in solr that
>> every doc has about 8 field in average.
>> and each field has about 60 char.
>> i set my field as a storedfield = "false" except of  1 field. // i read
>> that this help performance.
>> i used copy field and dynamic field if it was necessary . // i read that
>> this help performance.
>> and now my question is that when i run a lot of query on solr i faced with
>> a problem solr use more cpu and ram and after that filled ,it use a lot
>>   swapped storage and then use hard,but doesn't create a system file! solr
>> fill hard until i forced to restart server to release hard disk.
>> and now my question is why solr treat in this way? and how i can avoid
>> solr
>> to use huge cpu space?
>> any config need?!
>>
>>
>


Re: solr performance issue

2016-02-08 Thread sara hajili
On Mon, Feb 8, 2016 at 3:04 AM, sara hajili  wrote:

> sorry i made a mistake i have a bout 1000 K doc.
> i mean about 100 doc.
>
> On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic <
> emir.arnauto...@sematext.com> wrote:
>
>> Hi Sara,
>> Not sure if I am reading this right, but I read it as you have 1000 doc
>> index and issues? Can you tell us bit more about your setup: number of
>> servers, hw, index size, number of shards, queries that you run, do you
>> index at the same time...
>>
>> It seems to me that you are running Solr on server with limited RAM and
>> probably small heap. Swapping for sure will slow things down and GC is most
>> likely reason for high CPU.
>>
>> You can use http://sematext.com/spm to collect Solr and host metrics and
>> see where the issue is.
>>
>> Thanks,
>> Emir
>>
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>>
>> On 08.02.2016 10:27, sara hajili wrote:
>>
>>> hi all.
>>> i have a problem with my solr performance and usage hardware like a
>>> ram,cup...
>>> i have a lot of document and so indexed file about 1000 doc in solr that
>>> every doc has about 8 field in average.
>>> and each field has about 60 char.
>>> i set my field as a storedfield = "false" except of  1 field. // i read
>>> that this help performance.
>>> i used copy field and dynamic field if it was necessary . // i read that
>>> this help performance.
>>> and now my question is that when i run a lot of query on solr i faced
>>> with
>>> a problem solr use more cpu and ram and after that filled ,it use a lot
>>>   swapped storage and then use hard,but doesn't create a system file!
>>> solr
>>> fill hard until i forced to restart server to release hard disk.
>>> and now my question is why solr treat in this way? and how i can avoid
>>> solr
>>> to use huge cpu space?
>>> any config need?!
>>>
>>>
>>
>


Re: Solr Performance Issue

2013-12-05 Thread Furkan KAMACI
Hi;

Erick and Shawn have explained that we need more information about your
infrastructure. I should add that: I had test data at my SolrCloud nearly
as much as yours and I did not have any problems except for when indexing
at a huge index rate and it can be solved with turning. You should optimize
your parameters according to your system. So you should give use more
information about your system.

Thanks;
Furkan KAMACI

4 Aralık 2013 Çarşamba tarihinde Shawn Heisey s...@elyograg.org adlı
kullanıcı şöyle yazdı:
 On 12/4/2013 6:31 AM, kumar wrote:
 I am having almost 5 to 6 crores of indexed documents in solr. And when
i am
 going to change anything in the configuration file solr server is going
 down.

 If you mean crore and not core, then you are talking about 50 to 60
 million documents.  That's a lot.  Solr is perfectly capable of handling
 that many documents, but you do need to have very good hardware.

 Even if they are small, your index is likely to be many gigabytes in
 size.  If the documents are large, that might be measured in terabytes.
  Large indexes require a lot of memory for good performance.  This will
 be discussed in more detail below.

 As a new user to solr i can't able to find the exact reason for going
server
 down.

 I am using cache's in the following way :

 filterCache class=solr.FastLRUCache
  size=16384
  initialSize=4096
  autowarmCount=4096/
  queryResultCache class=solr.FastLRUCache
  size=16384
  initialSize=4096
  autowarmCount=1024/

 and i am not using any documentCache, fieldValueCahe's

 As Erick said, these cache sizes are HUGE.  In particular, your
 autowarmCount values are extremely high.

 Whether this can lead any performance issue means going server down.

 Another thing that Erick pointed out is that you haven't really told us
 what's happening.  When you say that the server goes down, what EXACTLY
 do you mean?

 And i am seeing logging in the server it is showing exception in the
 following way


 Servlet.service() for servlet [default] in context with path [/solr]
threw
 exception [java.lang.IllegalStateException: Cannot call sendError() after
 the response has been committed] with root cause

 This message comes from your servlet container, not Solr.  You're
 probably using Tomcat, not the included Jetty.  There is some indirect
 evidence that this can be fixed by increasing the servlet container's
 setting for the maximum number of request parameters.

 http://forums.adobe.com/message/4590864

 Here's what I can say without further information:

 You're likely having performance issues.  One potential problem is your
 insanely high autowarmCount values.  Your cache configuration tells Solr
 that every time you have a soft commit or a hard commit with
 openSearcher=true, you're going to execute up to 1024 queries and up to
 4096 filters from the old caches, in order to warm the new caches.  Even
 if you have an optimal setup, this takes a lot of time.  I suspect that
 you don't have an optimal setup.

 Another potential problem is that you don't have enough memory for the
 size of your index.  A number of potential performance problems are
 discussed on this wiki page:

 http://wiki.apache.org/solr/SolrPerformanceProblems

 A lot more details are required.  Here's some things that will be
 helpful, and more is always better:

 * Exact symptoms.
 * Excerpts from the Solr logfile that include entire stacktraces.
 * Operating system and version.
 * Total server index size on disk.
 * Total machine memory.
 * Java heap size for your servlet container.
 * Which servlet container you are using to run Solr.
 * Solr version.
 * Server hardware details.

 Thanks,
 Shawn




Re: Solr Performance Issue

2013-12-05 Thread Hien Luu
Hi Furkan,

Just curious what was the index rate that you were able to achieve?
 
Regards, 

Hien



On Thursday, December 5, 2013 3:06 PM, Furkan KAMACI furkankam...@gmail.com 
wrote:
 
Hi;

Erick and Shawn have explained that we need more information about your
infrastructure. I should add that: I had test data at my SolrCloud nearly
as much as yours and I did not have any problems except for when indexing
at a huge index rate and it can be solved with turning. You should optimize
your parameters according to your system. So you should give use more
information about your system.

Thanks;
Furkan KAMACI

4 Aralık 2013 Çarşamba tarihinde Shawn Heisey s...@elyograg.org adlı
kullanıcı şöyle yazdı:

 On 12/4/2013 6:31 AM, kumar wrote:
 I am having almost 5 to 6 crores of indexed documents in solr. And when
i am
 going to change anything in the configuration file solr server is going
 down.

 If you mean crore and not core, then you are talking about 50 to 60
 million documents.  That's a lot.  Solr is perfectly capable of handling
 that many documents, but you do need to have very good hardware.

 Even if they are small, your index is likely to be many gigabytes in
 size.  If the documents are large, that might be measured in terabytes.
  Large indexes require a lot of memory for good performance.  This will
 be discussed in more detail below.

 As a new user to solr i can't able to find the exact reason for going
server
 down.

 I am using cache's in the following way :

 filterCache class=solr.FastLRUCache
                  size=16384
                  initialSize=4096
                  autowarmCount=4096/
  queryResultCache class=solr.FastLRUCache
                      size=16384
                      initialSize=4096
                      autowarmCount=1024/

 and i am not using any documentCache, fieldValueCahe's

 As Erick said, these cache sizes are HUGE.  In particular, your
 autowarmCount values are extremely high.

 Whether this can lead any performance issue means going server down.

 Another thing that Erick pointed out is that you haven't really told us
 what's happening.  When you say that the server goes down, what EXACTLY
 do you mean?

 And i am seeing logging in the server it is showing exception in the
 following way


 Servlet.service() for servlet [default] in context with path [/solr]
threw
 exception [java.lang.IllegalStateException: Cannot call sendError() after
 the response has been committed] with root cause

 This message comes from your servlet container, not Solr.  You're
 probably using Tomcat, not the included Jetty.  There is some indirect
 evidence that this can be fixed by increasing the servlet container's
 setting for the maximum number of request parameters.

 http://forums.adobe.com/message/4590864

 Here's what I can say without further information:

 You're likely having performance issues.  One potential problem is your
 insanely high autowarmCount values.  Your cache configuration tells Solr
 that every time you have a soft commit or a hard commit with
 openSearcher=true, you're going to execute up to 1024 queries and up to
 4096 filters from the old caches, in order to warm the new caches.  Even
 if you have an optimal setup, this takes a lot of time.  I suspect that
 you don't have an optimal setup.

 Another potential problem is that you don't have enough memory for the
 size of your index.  A number of potential performance problems are
 discussed on this wiki page:

 http://wiki.apache.org/solr/SolrPerformanceProblems

 A lot more details are required.  Here's some things that will be
 helpful, and more is always better:

 * Exact symptoms.
 * Excerpts from the Solr logfile that include entire stacktraces.
 * Operating system and version.
 * Total server index size on disk.
 * Total machine memory.
 * Java heap size for your servlet container.
 * Which servlet container you are using to run Solr.
 * Solr version.
 * Server hardware details.

 Thanks,
 Shawn



Re: Solr Performance Issue

2013-12-05 Thread Shawn Heisey

On 12/5/2013 4:08 PM, Hien Luu wrote:

Just curious what was the index rate that you were able to achieve?


What I've usually seen based on my experience and what people have said 
here and on IRC is that the data source is usually the bottleneck - Solr 
typically indexes VERY fast, as long as you have sized your hardware and 
configuration appropriately.


I import from MySQL.  By running dataimport handlers on all my shards at 
once and using two servers for the entire index, I can do a full 
re-index of 87 million documents on my production hardware in under 5 
hours.  On my single dev server, it takes about 8.5 hours.  I'm not 
using SolrCloud.I'm very7 confident that MySQL is the bottleneck here, 
not Solr.


Thanks,
Shawn



Re: Solr Performance Issue

2013-12-05 Thread Furkan KAMACI
Hi Hien;

Actually high index rate is a relative concept. I could index such kind of
data within a few hours. I aim to index much much more data within same
time soon. I can share my test results when I do.

Thanks;
Furkan KAMACI

6 Aralık 2013 Cuma tarihinde Hien Luu h...@yahoo.com adlı kullanıcı şöyle
yazdı:
 Hi Furkan,

 Just curious what was the index rate that you were able to achieve?

 Regards,

 Hien



 On Thursday, December 5, 2013 3:06 PM, Furkan KAMACI 
furkankam...@gmail.com wrote:

 Hi;

 Erick and Shawn have explained that we need more information about your
 infrastructure. I should add that: I had test data at my SolrCloud nearly
 as much as yours and I did not have any problems except for when indexing
 at a huge index rate and it can be solved with turning. You should
optimize
 your parameters according to your system. So you should give use more
 information about your system.

 Thanks;
 Furkan KAMACI

 4 Aralık 2013 Çarşamba tarihinde Shawn Heisey s...@elyograg.org adlı
 kullanıcı şöyle yazdı:

 On 12/4/2013 6:31 AM, kumar wrote:
 I am having almost 5 to 6 crores of indexed documents in solr. And when
 i am
 going to change anything in the configuration file solr server is going
 down.

 If you mean crore and not core, then you are talking about 50 to 60
 million documents.  That's a lot.  Solr is perfectly capable of handling
 that many documents, but you do need to have very good hardware.

 Even if they are small, your index is likely to be many gigabytes in
 size.  If the documents are large, that might be measured in terabytes.
  Large indexes require a lot of memory for good performance.  This will
 be discussed in more detail below.

 As a new user to solr i can't able to find the exact reason for going
 server
 down.

 I am using cache's in the following way :

 filterCache class=solr.FastLRUCache
  size=16384
  initialSize=4096
  autowarmCount=4096/
  queryResultCache class=solr.FastLRUCache
  size=16384
  initialSize=4096
  autowarmCount=1024/

 and i am not using any documentCache, fieldValueCahe's

 As Erick said, these cache sizes are HUGE.  In particular, your
 autowarmCount values are extremely high.

 Whether this can lead any performance issue means going server down.

 Another thing that Erick pointed out is that you haven't really told us
 what's happening.  When you say that the server goes down, what EXACTLY
 do you mean?

 And i am seeing logging in the server it is showing exception in the
 following way


 Servlet.service() for servlet [default] in context with path [/solr]
 threw
 exception [java.lang.IllegalStateException: Cannot call sendError()
after
 the response has been committed] with root cause

 This message comes from your servlet container, not Solr.  You're
 probably using Tomcat, not the included Jetty.  There is some indirect
 evidence that this can be fixed by increasing the servlet container's
 setting for the maximum number of request parameters.

 http://forums.adobe.com/message/4590864

 Here's what I can say without further information:

 You're likely having performance issues.  One potential problem is your
 insanely high autowarmCount values.  Your cache configuration tells Solr
 that every time you have a soft commit or a hard commit with
 openSearcher=true, you're going to execute up to 1024 queries and up to
 4096 filters from the old caches, in order to warm the new caches.  Even
 if you have an optimal setup, this takes a lot of time.  I suspect that
 you don't have an optimal setup.

 Another potential problem is that you don't have enough memory for the
 size of your index.  A number of potential performance problems are
 discussed on this wiki page:




Re: Solr Performance Issue

2013-12-05 Thread Hien Luu
Thanks Furkan. Looking forward to seeing your test results.

Sent from Yahoo Mail on Android



Solr Performance Issue

2013-12-04 Thread kumar
I am having almost 5 to 6 crores of indexed documents in solr. And when i am
going to change anything in the configuration file solr server is going
down.

As a new user to solr i can't able to find the exact reason for going server
down.

I am using cache's in the following way :

filterCache class=solr.FastLRUCache
 size=16384
 initialSize=4096
 autowarmCount=4096/
 queryResultCache class=solr.FastLRUCache
 size=16384
 initialSize=4096
 autowarmCount=1024/

and i am not using any documentCache, fieldValueCahe's

Whether this can lead any performance issue means going server down.

And i am seeing logging in the server it is showing exception in the
following way


Servlet.service() for servlet [default] in context with path [/solr] threw
exception [java.lang.IllegalStateException: Cannot call sendError() after
the response has been committed] with root cause



Can anybody help me how can i solve this problem.

Kumar.









--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Performance-Issue-tp4104907.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Performance Issue

2013-12-04 Thread Erick Erickson
You need to give us more of the exception trace,
the real cause is often buried down the stack with
some text like
Caused by...

But at a glance your cache sizes and autowarm counts
are far higher than they should be. Try reducing
particularly the autowarm count down to, say, 16 or so.
It's actually rare that you really need very many.

I'd actually go back to the defaults to start with to test
whether this is the problem.

Further, we need to know exactly what you mean by
change anything in the configuration file. Change
what? Details matter.

Of course the last thing you changed before you started
seeing this problem is the most likely culprit.

Best,
Erick


On Wed, Dec 4, 2013 at 8:31 AM, kumar pavan2...@gmail.com wrote:

 I am having almost 5 to 6 crores of indexed documents in solr. And when i
 am
 going to change anything in the configuration file solr server is going
 down.

 As a new user to solr i can't able to find the exact reason for going
 server
 down.

 I am using cache's in the following way :

 filterCache class=solr.FastLRUCache
  size=16384
  initialSize=4096
  autowarmCount=4096/
  queryResultCache class=solr.FastLRUCache
  size=16384
  initialSize=4096
  autowarmCount=1024/

 and i am not using any documentCache, fieldValueCahe's

 Whether this can lead any performance issue means going server down.

 And i am seeing logging in the server it is showing exception in the
 following way


 Servlet.service() for servlet [default] in context with path [/solr] threw
 exception [java.lang.IllegalStateException: Cannot call sendError() after
 the response has been committed] with root cause



 Can anybody help me how can i solve this problem.

 Kumar.









 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Performance-Issue-tp4104907.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Performance Issue

2013-12-04 Thread Shawn Heisey
On 12/4/2013 6:31 AM, kumar wrote:
 I am having almost 5 to 6 crores of indexed documents in solr. And when i am
 going to change anything in the configuration file solr server is going
 down.

If you mean crore and not core, then you are talking about 50 to 60
million documents.  That's a lot.  Solr is perfectly capable of handling
that many documents, but you do need to have very good hardware.

Even if they are small, your index is likely to be many gigabytes in
size.  If the documents are large, that might be measured in terabytes.
 Large indexes require a lot of memory for good performance.  This will
be discussed in more detail below.

 As a new user to solr i can't able to find the exact reason for going server
 down.
 
 I am using cache's in the following way :
 
 filterCache class=solr.FastLRUCache
  size=16384
  initialSize=4096
  autowarmCount=4096/
  queryResultCache class=solr.FastLRUCache
  size=16384
  initialSize=4096
  autowarmCount=1024/
 
 and i am not using any documentCache, fieldValueCahe's

As Erick said, these cache sizes are HUGE.  In particular, your
autowarmCount values are extremely high.

 Whether this can lead any performance issue means going server down.

Another thing that Erick pointed out is that you haven't really told us
what's happening.  When you say that the server goes down, what EXACTLY
do you mean?

 And i am seeing logging in the server it is showing exception in the
 following way
 
 
 Servlet.service() for servlet [default] in context with path [/solr] threw
 exception [java.lang.IllegalStateException: Cannot call sendError() after
 the response has been committed] with root cause

This message comes from your servlet container, not Solr.  You're
probably using Tomcat, not the included Jetty.  There is some indirect
evidence that this can be fixed by increasing the servlet container's
setting for the maximum number of request parameters.

http://forums.adobe.com/message/4590864

Here's what I can say without further information:

You're likely having performance issues.  One potential problem is your
insanely high autowarmCount values.  Your cache configuration tells Solr
that every time you have a soft commit or a hard commit with
openSearcher=true, you're going to execute up to 1024 queries and up to
4096 filters from the old caches, in order to warm the new caches.  Even
if you have an optimal setup, this takes a lot of time.  I suspect that
you don't have an optimal setup.

Another potential problem is that you don't have enough memory for the
size of your index.  A number of potential performance problems are
discussed on this wiki page:

http://wiki.apache.org/solr/SolrPerformanceProblems

A lot more details are required.  Here's some things that will be
helpful, and more is always better:

* Exact symptoms.
* Excerpts from the Solr logfile that include entire stacktraces.
* Operating system and version.
* Total server index size on disk.
* Total machine memory.
* Java heap size for your servlet container.
* Which servlet container you are using to run Solr.
* Solr version.
* Server hardware details.

Thanks,
Shawn



Re: Solr performance issue

2011-03-23 Thread Doğacan Güney
Hello,

The problem turned out to be some sort of sharding/searching weirdness. We
modified some code in sharding but I don't think it is related. In any case,
we just added a new server that just shards (but doesn't do any searching /
doesn't contain any index) and performance is very very good.

Thanks for all the help.

On Tue, Mar 22, 2011 at 14:30, Alexey Serba ase...@gmail.com wrote:

  Btw, I am monitoring output via jconsole with 8gb of ram and it still
 goes
  to 8gb every 20 seconds or so,
  gc runs, falls down to 1gb.

 Hmm, jvm is eating 8Gb for 20 seconds - sounds a lot.

 Do you return all results (ids) for your queries? Any tricky
 faceting/sorting/function queries?




-- 
Doğacan Güney


Re: Solr performance issue

2011-03-22 Thread Alexey Serba
 Btw, I am monitoring output via jconsole with 8gb of ram and it still goes
 to 8gb every 20 seconds or so,
 gc runs, falls down to 1gb.

Hmm, jvm is eating 8Gb for 20 seconds - sounds a lot.

Do you return all results (ids) for your queries? Any tricky
faceting/sorting/function queries?


Re: Solr performance issue

2011-03-15 Thread Shawn Heisey
My solr+jetty+java6 install seems to work well with these GC options.  
It's a dual processor environment:


-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode

I've never had a real problem with memory, so I've not done any kind of 
auditing.  I probably should, but time is a limited resource.


Shawn


On 3/14/2011 2:29 PM, Markus Jelsma wrote:

That depends on your GC settings and generation sizes. And, instead of
UseParallelGC you'd better use UseParNewGC in combination with CMS.

See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html

It's actually, as I understand it, expected JVM behavior to see the heap
rise to close to it's limit before it gets GC'd, that's how Java GC
works.  Whether that should happen every 20 seconds or what, I don't nkow.

Another option is setting better JVM garbage collection arguments, so GC
doesn't stop the world so often. I have had good luck with my Solr
using this:  -XX:+UseParallelGC




Re: Solr performance issue

2011-03-15 Thread Markus Jelsma
CMS is very good for multicore CPU's. Use incremental mode only when you have 
a single CPU with only one or two cores.

On Tuesday 15 March 2011 16:03:38 Shawn Heisey wrote:
 My solr+jetty+java6 install seems to work well with these GC options.
 It's a dual processor environment:
 
 -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
 
 I've never had a real problem with memory, so I've not done any kind of
 auditing.  I probably should, but time is a limited resource.
 
 Shawn
 
 On 3/14/2011 2:29 PM, Markus Jelsma wrote:
  That depends on your GC settings and generation sizes. And, instead of
  UseParallelGC you'd better use UseParNewGC in combination with CMS.
  
  See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html
  
  It's actually, as I understand it, expected JVM behavior to see the heap
  rise to close to it's limit before it gets GC'd, that's how Java GC
  works.  Whether that should happen every 20 seconds or what, I don't
  nkow.
  
  Another option is setting better JVM garbage collection arguments, so GC
  doesn't stop the world so often. I have had good luck with my Solr
  using this:  -XX:+UseParallelGC

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Solr performance issue

2011-03-15 Thread Shawn Heisey
The host is dual quad-core, each Xen VM has been given two CPUs.  Not 
counting dom0, two of the hosts have 10/8 CPUs allocated, two of them 
have 8/8.  The dom0 VM is also allocated two CPUs.


I'm not really sure how that works out when it comes to Java running on 
the VM, but if at all possible, it is likely that Xen would try and keep 
both VM cpus on the same physical CPU and the VM's memory allocation on 
the same NUMA node.  If that's the case, it would meet what you've 
stated as the recommendation for incremental mode.


Shawn


On 3/15/2011 9:10 AM, Markus Jelsma wrote:

CMS is very good for multicore CPU's. Use incremental mode only when you have
a single CPU with only one or two cores.




Solr performance issue

2011-03-14 Thread Doğacan Güney
Hello everyone,

First of all here is our Solr setup:

- Solr nightly build 986158
- Running solr inside the default jetty comes with solr build
- 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of RAM)
- Index replicated (on optimize) to slaves via Solr Replication
- Size of index is around 2.5gb
- No incremental writes, index is created from scratch(delete old documents
- commit new documents - optimize)  every 6 hours
- Avg # of request per second is around 60 (for a single slave)
- Avg time per request is around 25ms (before having problems)
- Load on each is slave is around 2

We are using this set-up for months without any problem. However last week
we started to experience very weird performance problems like :

- Avg time per request increased from 25ms to 200-300ms (even higher if we
don't restart the slaves)
- Load on each slave increased from 2 to 15-20 (solr uses %400-%600 cpu)

When we profile solr we see two very strange things :

1 - This is the jconsole output:

https://skitch.com/meralan/rwwcf/mail-886x691

As you see gc runs for every 10-15 seconds and collects more than 1 gb of
memory. (Actually if you wait more than 10 minutes you see spikes up to 4gb
consistently)

2 - This is the newrelic output :

https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm

As you see solr spent ridiculously long time in
SolrDispatchFilter.doFilter() method.


Apart form these, when we clean the index directory, re-replicate and
restart  each slave one by one we see a relief in the system but after some
time servers start to melt down again. Although deleting index and
replicating doesn't solve the problem, we think that these problems are
somehow related to replication. Because symptoms started after replication
and once it heals itself after replication. I also see lucene-write.lock
files in slaves (we don't have write.lock files in the master) which I think
we shouldn't see.


If anyone can give any sort of ideas, we will appreciate it.

Regards,
Dogacan Guney


Re: Solr performance issue

2011-03-14 Thread Markus Jelsma
Hi Doğacan,

Are you, at some point, running out of heap space? In my experience, that's 
the common cause of increased load and excessivly high response times (or time 
outs).

Cheers,

 Hello everyone,
 
 First of all here is our Solr setup:
 
 - Solr nightly build 986158
 - Running solr inside the default jetty comes with solr build
 - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of
 RAM) - Index replicated (on optimize) to slaves via Solr Replication
 - Size of index is around 2.5gb
 - No incremental writes, index is created from scratch(delete old documents
 - commit new documents - optimize)  every 6 hours
 - Avg # of request per second is around 60 (for a single slave)
 - Avg time per request is around 25ms (before having problems)
 - Load on each is slave is around 2
 
 We are using this set-up for months without any problem. However last week
 we started to experience very weird performance problems like :
 
 - Avg time per request increased from 25ms to 200-300ms (even higher if we
 don't restart the slaves)
 - Load on each slave increased from 2 to 15-20 (solr uses %400-%600 cpu)
 
 When we profile solr we see two very strange things :
 
 1 - This is the jconsole output:
 
 https://skitch.com/meralan/rwwcf/mail-886x691
 
 As you see gc runs for every 10-15 seconds and collects more than 1 gb of
 memory. (Actually if you wait more than 10 minutes you see spikes up to 4gb
 consistently)
 
 2 - This is the newrelic output :
 
 https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
 
 As you see solr spent ridiculously long time in
 SolrDispatchFilter.doFilter() method.
 
 
 Apart form these, when we clean the index directory, re-replicate and
 restart  each slave one by one we see a relief in the system but after some
 time servers start to melt down again. Although deleting index and
 replicating doesn't solve the problem, we think that these problems are
 somehow related to replication. Because symptoms started after replication
 and once it heals itself after replication. I also see lucene-write.lock
 files in slaves (we don't have write.lock files in the master) which I
 think we shouldn't see.
 
 
 If anyone can give any sort of ideas, we will appreciate it.
 
 Regards,
 Dogacan Guney


Re: Solr performance issue

2011-03-14 Thread Doğacan Güney
Hello,

2011/3/14 Markus Jelsma markus.jel...@openindex.io

 Hi Doğacan,

 Are you, at some point, running out of heap space? In my experience, that's
 the common cause of increased load and excessivly high response times (or
 time
 outs).


How much of a heap size would be enough? Our index size is growing slowly
but we did not have this problem
a couple weeks ago where index size was maybe 100mb smaller.

We left most of the caches in solrconfig as default and only increased
filterCache to 1024. We only ask for ids (which
are unique) and no other fields during queries (though we do faceting). Btw,
1.6gb of our index is stored fields (we store
everything for now, even though we do not get them during queries), and
about 1gb of index.

Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get any
improvement in load. I can try monitoring with Jconsole
with 8gigs of heap to see if it helps.


 Cheers,

  Hello everyone,
 
  First of all here is our Solr setup:
 
  - Solr nightly build 986158
  - Running solr inside the default jetty comes with solr build
  - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of
  RAM) - Index replicated (on optimize) to slaves via Solr Replication
  - Size of index is around 2.5gb
  - No incremental writes, index is created from scratch(delete old
 documents
  - commit new documents - optimize)  every 6 hours
  - Avg # of request per second is around 60 (for a single slave)
  - Avg time per request is around 25ms (before having problems)
  - Load on each is slave is around 2
 
  We are using this set-up for months without any problem. However last
 week
  we started to experience very weird performance problems like :
 
  - Avg time per request increased from 25ms to 200-300ms (even higher if
 we
  don't restart the slaves)
  - Load on each slave increased from 2 to 15-20 (solr uses %400-%600 cpu)
 
  When we profile solr we see two very strange things :
 
  1 - This is the jconsole output:
 
  https://skitch.com/meralan/rwwcf/mail-886x691
 
  As you see gc runs for every 10-15 seconds and collects more than 1 gb of
  memory. (Actually if you wait more than 10 minutes you see spikes up to
 4gb
  consistently)
 
  2 - This is the newrelic output :
 
  https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
 
  As you see solr spent ridiculously long time in
  SolrDispatchFilter.doFilter() method.
 
 
  Apart form these, when we clean the index directory, re-replicate and
  restart  each slave one by one we see a relief in the system but after
 some
  time servers start to melt down again. Although deleting index and
  replicating doesn't solve the problem, we think that these problems are
  somehow related to replication. Because symptoms started after
 replication
  and once it heals itself after replication. I also see lucene-write.lock
  files in slaves (we don't have write.lock files in the master) which I
  think we shouldn't see.
 
 
  If anyone can give any sort of ideas, we will appreciate it.
 
  Regards,
  Dogacan Guney




-- 
Doğacan Güney


Re: Solr performance issue

2011-03-14 Thread Markus Jelsma
 Hello,
 
 2011/3/14 Markus Jelsma markus.jel...@openindex.io
 
  Hi Doğacan,
  
  Are you, at some point, running out of heap space? In my experience,
  that's the common cause of increased load and excessivly high response
  times (or time
  outs).
 
 How much of a heap size would be enough? Our index size is growing slowly
 but we did not have this problem
 a couple weeks ago where index size was maybe 100mb smaller.

Telling how much heap space is needed isn't easy to say. It usually needs to 
be increased when you run out of memory and get those nasty OOM errors, are 
you getting them? 
Replication eventes will increase heap usage due to cache warming queries and 
autowarming.

 
 We left most of the caches in solrconfig as default and only increased
 filterCache to 1024. We only ask for ids (which
 are unique) and no other fields during queries (though we do faceting).
 Btw, 1.6gb of our index is stored fields (we store
 everything for now, even though we do not get them during queries), and
 about 1gb of index.

Hmm, it seems 4000 would be enough indeed. What about the fieldCache, are there 
a lot of entries? Is there an insanity count? Do you use boost functions?

It might not have anything to do with memory at all but i'm just asking. There 
may be a bug in your revision causing this.

 
 Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get any
 improvement in load. I can try monitoring with Jconsole
 with 8gigs of heap to see if it helps.
 
  Cheers,
  
   Hello everyone,
   
   First of all here is our Solr setup:
   
   - Solr nightly build 986158
   - Running solr inside the default jetty comes with solr build
   - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of
   RAM) - Index replicated (on optimize) to slaves via Solr Replication
   - Size of index is around 2.5gb
   - No incremental writes, index is created from scratch(delete old
  
  documents
  
   - commit new documents - optimize)  every 6 hours
   - Avg # of request per second is around 60 (for a single slave)
   - Avg time per request is around 25ms (before having problems)
   - Load on each is slave is around 2
   
   We are using this set-up for months without any problem. However last
  
  week
  
   we started to experience very weird performance problems like :
   
   - Avg time per request increased from 25ms to 200-300ms (even higher if
  
  we
  
   don't restart the slaves)
   - Load on each slave increased from 2 to 15-20 (solr uses %400-%600
   cpu)
   
   When we profile solr we see two very strange things :
   
   1 - This is the jconsole output:
   
   https://skitch.com/meralan/rwwcf/mail-886x691
   
   As you see gc runs for every 10-15 seconds and collects more than 1 gb
   of memory. (Actually if you wait more than 10 minutes you see spikes
   up to
  
  4gb
  
   consistently)
   
   2 - This is the newrelic output :
   
   https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
   
   As you see solr spent ridiculously long time in
   SolrDispatchFilter.doFilter() method.
   
   
   Apart form these, when we clean the index directory, re-replicate and
   restart  each slave one by one we see a relief in the system but after
  
  some
  
   time servers start to melt down again. Although deleting index and
   replicating doesn't solve the problem, we think that these problems are
   somehow related to replication. Because symptoms started after
  
  replication
  
   and once it heals itself after replication. I also see
   lucene-write.lock files in slaves (we don't have write.lock files in
   the master) which I think we shouldn't see.
   
   
   If anyone can give any sort of ideas, we will appreciate it.
   
   Regards,
   Dogacan Guney


Re: Solr performance issue

2011-03-14 Thread Jonathan Rochkind
I've definitely had cases in 1.4.1 where even though I didn't have an 
OOM error, Solr was being weirdly slow, and increasing the JVM heap size 
fixed it.  I can't explain why it happened, or exactly how you'd know 
this was going on, I didn't see anything odd in the logs to indicate, I 
just tried increasing the JVM heap to see what happened, and it worked 
great.


The one case I remember specifically is when I was using the 
StatsComponent, with a stats.facet.  Pathologically slow, increasing 
heap magically made it go down to negligible again.


On 3/14/2011 3:38 PM, Markus Jelsma wrote:

Hello,

2011/3/14 Markus Jelsmamarkus.jel...@openindex.io


Hi Doğacan,

Are you, at some point, running out of heap space? In my experience,
that's the common cause of increased load and excessivly high response
times (or time
outs).

How much of a heap size would be enough? Our index size is growing slowly
but we did not have this problem
a couple weeks ago where index size was maybe 100mb smaller.

Telling how much heap space is needed isn't easy to say. It usually needs to
be increased when you run out of memory and get those nasty OOM errors, are
you getting them?
Replication eventes will increase heap usage due to cache warming queries and
autowarming.


We left most of the caches in solrconfig as default and only increased
filterCache to 1024. We only ask for ids (which
are unique) and no other fields during queries (though we do faceting).
Btw, 1.6gb of our index is stored fields (we store
everything for now, even though we do not get them during queries), and
about 1gb of index.

Hmm, it seems 4000 would be enough indeed. What about the fieldCache, are there
a lot of entries? Is there an insanity count? Do you use boost functions?

It might not have anything to do with memory at all but i'm just asking. There
may be a bug in your revision causing this.


Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get any
improvement in load. I can try monitoring with Jconsole
with 8gigs of heap to see if it helps.


Cheers,


Hello everyone,

First of all here is our Solr setup:

- Solr nightly build 986158
- Running solr inside the default jetty comes with solr build
- 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of
RAM) - Index replicated (on optimize) to slaves via Solr Replication
- Size of index is around 2.5gb
- No incremental writes, index is created from scratch(delete old

documents


-  commit new documents -  optimize)  every 6 hours
- Avg # of request per second is around 60 (for a single slave)
- Avg time per request is around 25ms (before having problems)
- Load on each is slave is around 2

We are using this set-up for months without any problem. However last

week


we started to experience very weird performance problems like :

- Avg time per request increased from 25ms to 200-300ms (even higher if

we


don't restart the slaves)
- Load on each slave increased from 2 to 15-20 (solr uses %400-%600
cpu)

When we profile solr we see two very strange things :

1 - This is the jconsole output:

https://skitch.com/meralan/rwwcf/mail-886x691

As you see gc runs for every 10-15 seconds and collects more than 1 gb
of memory. (Actually if you wait more than 10 minutes you see spikes
up to

4gb


consistently)

2 - This is the newrelic output :

https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm

As you see solr spent ridiculously long time in
SolrDispatchFilter.doFilter() method.


Apart form these, when we clean the index directory, re-replicate and
restart  each slave one by one we see a relief in the system but after

some


time servers start to melt down again. Although deleting index and
replicating doesn't solve the problem, we think that these problems are
somehow related to replication. Because symptoms started after

replication


and once it heals itself after replication. I also see
lucene-write.lock files in slaves (we don't have write.lock files in
the master) which I think we shouldn't see.


If anyone can give any sort of ideas, we will appreciate it.

Regards,
Dogacan Guney


Re: Solr performance issue

2011-03-14 Thread Doğacan Güney
Hello again,

2011/3/14 Markus Jelsma markus.jel...@openindex.io

  Hello,
 
  2011/3/14 Markus Jelsma markus.jel...@openindex.io
 
   Hi Doğacan,
  
   Are you, at some point, running out of heap space? In my experience,
   that's the common cause of increased load and excessivly high response
   times (or time
   outs).
 
  How much of a heap size would be enough? Our index size is growing slowly
  but we did not have this problem
  a couple weeks ago where index size was maybe 100mb smaller.

 Telling how much heap space is needed isn't easy to say. It usually needs
 to
 be increased when you run out of memory and get those nasty OOM errors, are
 you getting them?
 Replication eventes will increase heap usage due to cache warming queries
 and
 autowarming.


Nope, no OOM errors.


 
  We left most of the caches in solrconfig as default and only increased
  filterCache to 1024. We only ask for ids (which
  are unique) and no other fields during queries (though we do faceting).
  Btw, 1.6gb of our index is stored fields (we store
  everything for now, even though we do not get them during queries), and
  about 1gb of index.

 Hmm, it seems 4000 would be enough indeed. What about the fieldCache, are
 there
 a lot of entries? Is there an insanity count? Do you use boost functions?


Insanity count is 0 and fieldCAche has 12 entries. We do use some boosting
functions.

Btw, I am monitoring output via jconsole with 8gb of ram and it still goes
to 8gb every 20 seconds or so,
gc runs, falls down to 1gb.

Btw, our current revision was just a random choice but up until two weeks
ago it has been rock-solid so we have been
reluctant to update to another version. Would you recommend upgrading to
latest trunk?


 It might not have anything to do with memory at all but i'm just asking.
 There
 may be a bug in your revision causing this.

 
  Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get
 any
  improvement in load. I can try monitoring with Jconsole
  with 8gigs of heap to see if it helps.
 
   Cheers,
  
Hello everyone,
   
First of all here is our Solr setup:
   
- Solr nightly build 986158
- Running solr inside the default jetty comes with solr build
- 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb
 of
RAM) - Index replicated (on optimize) to slaves via Solr Replication
- Size of index is around 2.5gb
- No incremental writes, index is created from scratch(delete old
  
   documents
  
- commit new documents - optimize)  every 6 hours
- Avg # of request per second is around 60 (for a single slave)
- Avg time per request is around 25ms (before having problems)
- Load on each is slave is around 2
   
We are using this set-up for months without any problem. However last
  
   week
  
we started to experience very weird performance problems like :
   
- Avg time per request increased from 25ms to 200-300ms (even higher
 if
  
   we
  
don't restart the slaves)
- Load on each slave increased from 2 to 15-20 (solr uses %400-%600
cpu)
   
When we profile solr we see two very strange things :
   
1 - This is the jconsole output:
   
https://skitch.com/meralan/rwwcf/mail-886x691
   
As you see gc runs for every 10-15 seconds and collects more than 1
 gb
of memory. (Actually if you wait more than 10 minutes you see spikes
up to
  
   4gb
  
consistently)
   
2 - This is the newrelic output :
   
https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
   
As you see solr spent ridiculously long time in
SolrDispatchFilter.doFilter() method.
   
   
Apart form these, when we clean the index directory, re-replicate and
restart  each slave one by one we see a relief in the system but
 after
  
   some
  
time servers start to melt down again. Although deleting index and
replicating doesn't solve the problem, we think that these problems
 are
somehow related to replication. Because symptoms started after
  
   replication
  
and once it heals itself after replication. I also see
lucene-write.lock files in slaves (we don't have write.lock files in
the master) which I think we shouldn't see.
   
   
If anyone can give any sort of ideas, we will appreciate it.
   
Regards,
Dogacan Guney




-- 
Doğacan Güney


Re: Solr performance issue

2011-03-14 Thread Markus Jelsma
 Nope, no OOM errors.

That's a good start!

 Insanity count is 0 and fieldCAche has 12 entries. We do use some boosting
 functions.
 
 Btw, I am monitoring output via jconsole with 8gb of ram and it still goes
 to 8gb every 20 seconds or so,
 gc runs, falls down to 1gb.

Hmm, maybe the garbage collector takes up a lot of CPU time. Could you check 
your garbage collector log? It must be enabled via some JVM options:

JAVA_OPTS=$JAVA_OPTS -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -
Xloggc:/var/log/tomcat6/gc.log

Also, what JVM version are you using and what are your other JVM settings? Are 
Xms and Xmx at the same value? I see you're using the throughput collector. 
You might want to use CMS because it partially runs concurrently (the low-
pause collector) and has less stop-the-world interruptions.

http://download.oracle.com/javase/6/docs/technotes/guides/vm/cms-6.html

Again, this may not be the issue ;)

 
 Btw, our current revision was just a random choice but up until two weeks
 ago it has been rock-solid so we have been
 reluctant to update to another version. Would you recommend upgrading to
 latest trunk?

I don't know what changes have been made since your revision. Please consult 
the CHANGES.txt for that.

 
  It might not have anything to do with memory at all but i'm just asking.
  There
  may be a bug in your revision causing this.
  
   Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get
  
  any
  
   improvement in load. I can try monitoring with Jconsole
   with 8gigs of heap to see if it helps.
   
Cheers,

 Hello everyone,
 
 First of all here is our Solr setup:
 
 - Solr nightly build 986158
 - Running solr inside the default jetty comes with solr build
 - 1 write only Master , 4 read only Slaves (quad core 5640 with
 24gb
  
  of
  
 RAM) - Index replicated (on optimize) to slaves via Solr
 Replication - Size of index is around 2.5gb
 - No incremental writes, index is created from scratch(delete old

documents

 - commit new documents - optimize)  every 6 hours
 - Avg # of request per second is around 60 (for a single slave)
 - Avg time per request is around 25ms (before having problems)
 - Load on each is slave is around 2
 
 We are using this set-up for months without any problem. However
 last

week

 we started to experience very weird performance problems like :
 
 - Avg time per request increased from 25ms to 200-300ms (even
 higher
  
  if
  
we

 don't restart the slaves)
 - Load on each slave increased from 2 to 15-20 (solr uses %400-%600
 cpu)
 
 When we profile solr we see two very strange things :
 
 1 - This is the jconsole output:
 
 https://skitch.com/meralan/rwwcf/mail-886x691
 
 As you see gc runs for every 10-15 seconds and collects more than 1
  
  gb
  
 of memory. (Actually if you wait more than 10 minutes you see
 spikes up to

4gb

 consistently)
 
 2 - This is the newrelic output :
 
 https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
 
 As you see solr spent ridiculously long time in
 SolrDispatchFilter.doFilter() method.
 
 
 Apart form these, when we clean the index directory, re-replicate
 and restart  each slave one by one we see a relief in the system
 but
  
  after
  
some

 time servers start to melt down again. Although deleting index and
 replicating doesn't solve the problem, we think that these problems
  
  are
  
 somehow related to replication. Because symptoms started after

replication

 and once it heals itself after replication. I also see
 lucene-write.lock files in slaves (we don't have write.lock files
 in the master) which I think we shouldn't see.
 
 
 If anyone can give any sort of ideas, we will appreciate it.
 
 Regards,
 Dogacan Guney


Re: Solr performance issue

2011-03-14 Thread Jonathan Rochkind
It's actually, as I understand it, expected JVM behavior to see the heap 
rise to close to it's limit before it gets GC'd, that's how Java GC 
works.  Whether that should happen every 20 seconds or what, I don't nkow.


Another option is setting better JVM garbage collection arguments, so GC 
doesn't stop the world so often. I have had good luck with my Solr 
using this:  -XX:+UseParallelGC






On 3/14/2011 4:15 PM, Doğacan Güney wrote:

Hello again,

2011/3/14 Markus Jelsmamarkus.jel...@openindex.io


Hello,

2011/3/14 Markus Jelsmamarkus.jel...@openindex.io


Hi Doğacan,

Are you, at some point, running out of heap space? In my experience,
that's the common cause of increased load and excessivly high response
times (or time
outs).

How much of a heap size would be enough? Our index size is growing slowly
but we did not have this problem
a couple weeks ago where index size was maybe 100mb smaller.

Telling how much heap space is needed isn't easy to say. It usually needs
to
be increased when you run out of memory and get those nasty OOM errors, are
you getting them?
Replication eventes will increase heap usage due to cache warming queries
and
autowarming.



Nope, no OOM errors.



We left most of the caches in solrconfig as default and only increased
filterCache to 1024. We only ask for ids (which
are unique) and no other fields during queries (though we do faceting).
Btw, 1.6gb of our index is stored fields (we store
everything for now, even though we do not get them during queries), and
about 1gb of index.

Hmm, it seems 4000 would be enough indeed. What about the fieldCache, are
there
a lot of entries? Is there an insanity count? Do you use boost functions?



Insanity count is 0 and fieldCAche has 12 entries. We do use some boosting
functions.

Btw, I am monitoring output via jconsole with 8gb of ram and it still goes
to 8gb every 20 seconds or so,
gc runs, falls down to 1gb.

Btw, our current revision was just a random choice but up until two weeks
ago it has been rock-solid so we have been
reluctant to update to another version. Would you recommend upgrading to
latest trunk?



It might not have anything to do with memory at all but i'm just asking.
There
may be a bug in your revision causing this.


Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get

any

improvement in load. I can try monitoring with Jconsole
with 8gigs of heap to see if it helps.


Cheers,


Hello everyone,

First of all here is our Solr setup:

- Solr nightly build 986158
- Running solr inside the default jetty comes with solr build
- 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb

of

RAM) - Index replicated (on optimize) to slaves via Solr Replication
- Size of index is around 2.5gb
- No incremental writes, index is created from scratch(delete old

documents


-  commit new documents -  optimize)  every 6 hours
- Avg # of request per second is around 60 (for a single slave)
- Avg time per request is around 25ms (before having problems)
- Load on each is slave is around 2

We are using this set-up for months without any problem. However last

week


we started to experience very weird performance problems like :

- Avg time per request increased from 25ms to 200-300ms (even higher

if

we


don't restart the slaves)
- Load on each slave increased from 2 to 15-20 (solr uses %400-%600
cpu)

When we profile solr we see two very strange things :

1 - This is the jconsole output:

https://skitch.com/meralan/rwwcf/mail-886x691

As you see gc runs for every 10-15 seconds and collects more than 1

gb

of memory. (Actually if you wait more than 10 minutes you see spikes
up to

4gb


consistently)

2 - This is the newrelic output :

https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm

As you see solr spent ridiculously long time in
SolrDispatchFilter.doFilter() method.


Apart form these, when we clean the index directory, re-replicate and
restart  each slave one by one we see a relief in the system but

after

some


time servers start to melt down again. Although deleting index and
replicating doesn't solve the problem, we think that these problems

are

somehow related to replication. Because symptoms started after

replication


and once it heals itself after replication. I also see
lucene-write.lock files in slaves (we don't have write.lock files in
the master) which I think we shouldn't see.


If anyone can give any sort of ideas, we will appreciate it.

Regards,
Dogacan Guney





Re: Solr performance issue

2011-03-14 Thread Markus Jelsma
You might also want to add the following switches for your GC log.

 JAVA_OPTS=$JAVA_OPTS -verbose:gc -XX:+PrintGCTimeStamps
 -XX:+PrintGCDetails - Xloggc:/var/log/tomcat6/gc.log

-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

 
 Also, what JVM version are you using and what are your other JVM settings?
 Are Xms and Xmx at the same value? I see you're using the throughput
 collector. You might want to use CMS because it partially runs
 concurrently (the low- pause collector) and has less stop-the-world
 interruptions.
 
 http://download.oracle.com/javase/6/docs/technotes/guides/vm/cms-6.html
 
 Again, this may not be the issue ;)
 
  Btw, our current revision was just a random choice but up until two weeks
  ago it has been rock-solid so we have been
  reluctant to update to another version. Would you recommend upgrading to
  latest trunk?
 
 I don't know what changes have been made since your revision. Please
 consult the CHANGES.txt for that.
 
   It might not have anything to do with memory at all but i'm just
   asking. There
   may be a bug in your revision causing this.
   
Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not
get
   
   any
   
improvement in load. I can try monitoring with Jconsole
with 8gigs of heap to see if it helps.

 Cheers,
 
  Hello everyone,
  
  First of all here is our Solr setup:
  
  - Solr nightly build 986158
  - Running solr inside the default jetty comes with solr build
  - 1 write only Master , 4 read only Slaves (quad core 5640 with
  24gb
   
   of
   
  RAM) - Index replicated (on optimize) to slaves via Solr
  Replication - Size of index is around 2.5gb
  - No incremental writes, index is created from scratch(delete old
 
 documents
 
  - commit new documents - optimize)  every 6 hours
  - Avg # of request per second is around 60 (for a single slave)
  - Avg time per request is around 25ms (before having problems)
  - Load on each is slave is around 2
  
  We are using this set-up for months without any problem. However
  last
 
 week
 
  we started to experience very weird performance problems like :
  
  - Avg time per request increased from 25ms to 200-300ms (even
  higher
   
   if
   
 we
 
  don't restart the slaves)
  - Load on each slave increased from 2 to 15-20 (solr uses
  %400-%600 cpu)
  
  When we profile solr we see two very strange things :
  
  1 - This is the jconsole output:
  
  https://skitch.com/meralan/rwwcf/mail-886x691
  
  As you see gc runs for every 10-15 seconds and collects more than
  1
   
   gb
   
  of memory. (Actually if you wait more than 10 minutes you see
  spikes up to
 
 4gb
 
  consistently)
  
  2 - This is the newrelic output :
  
  https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
  
  As you see solr spent ridiculously long time in
  SolrDispatchFilter.doFilter() method.
  
  
  Apart form these, when we clean the index directory, re-replicate
  and restart  each slave one by one we see a relief in the system
  but
   
   after
   
 some
 
  time servers start to melt down again. Although deleting index
  and replicating doesn't solve the problem, we think that these
  problems
   
   are
   
  somehow related to replication. Because symptoms started after
 
 replication
 
  and once it heals itself after replication. I also see
  lucene-write.lock files in slaves (we don't have write.lock files
  in the master) which I think we shouldn't see.
  
  
  If anyone can give any sort of ideas, we will appreciate it.
  
  Regards,
  Dogacan Guney


Re: Solr performance issue

2011-03-14 Thread Markus Jelsma
That depends on your GC settings and generation sizes. And, instead of 
UseParallelGC you'd better use UseParNewGC in combination with CMS.

See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html

 It's actually, as I understand it, expected JVM behavior to see the heap
 rise to close to it's limit before it gets GC'd, that's how Java GC
 works.  Whether that should happen every 20 seconds or what, I don't nkow.
 
 Another option is setting better JVM garbage collection arguments, so GC
 doesn't stop the world so often. I have had good luck with my Solr
 using this:  -XX:+UseParallelGC
 
 On 3/14/2011 4:15 PM, Doğacan Güney wrote:
  Hello again,
  
  2011/3/14 Markus Jelsmamarkus.jel...@openindex.io
  
  Hello,
  
  2011/3/14 Markus Jelsmamarkus.jel...@openindex.io
  
  Hi Doğacan,
  
  Are you, at some point, running out of heap space? In my experience,
  that's the common cause of increased load and excessivly high response
  times (or time
  outs).
  
  How much of a heap size would be enough? Our index size is growing
  slowly but we did not have this problem
  a couple weeks ago where index size was maybe 100mb smaller.
  
  Telling how much heap space is needed isn't easy to say. It usually
  needs to
  be increased when you run out of memory and get those nasty OOM errors,
  are you getting them?
  Replication eventes will increase heap usage due to cache warming
  queries and
  autowarming.
  
  Nope, no OOM errors.
  
  We left most of the caches in solrconfig as default and only increased
  filterCache to 1024. We only ask for ids (which
  are unique) and no other fields during queries (though we do faceting).
  Btw, 1.6gb of our index is stored fields (we store
  everything for now, even though we do not get them during queries), and
  about 1gb of index.
  
  Hmm, it seems 4000 would be enough indeed. What about the fieldCache,
  are there
  a lot of entries? Is there an insanity count? Do you use boost
  functions?
  
  Insanity count is 0 and fieldCAche has 12 entries. We do use some
  boosting functions.
  
  Btw, I am monitoring output via jconsole with 8gb of ram and it still
  goes to 8gb every 20 seconds or so,
  gc runs, falls down to 1gb.
  
  Btw, our current revision was just a random choice but up until two weeks
  ago it has been rock-solid so we have been
  reluctant to update to another version. Would you recommend upgrading to
  latest trunk?
  
  It might not have anything to do with memory at all but i'm just asking.
  There
  may be a bug in your revision causing this.
  
  Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get
  
  any
  
  improvement in load. I can try monitoring with Jconsole
  with 8gigs of heap to see if it helps.
  
  Cheers,
  
  Hello everyone,
  
  First of all here is our Solr setup:
  
  - Solr nightly build 986158
  - Running solr inside the default jetty comes with solr build
  - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb
  
  of
  
  RAM) - Index replicated (on optimize) to slaves via Solr Replication
  - Size of index is around 2.5gb
  - No incremental writes, index is created from scratch(delete old
  
  documents
  
  -  commit new documents -  optimize)  every 6 hours
  - Avg # of request per second is around 60 (for a single slave)
  - Avg time per request is around 25ms (before having problems)
  - Load on each is slave is around 2
  
  We are using this set-up for months without any problem. However last
  
  week
  
  we started to experience very weird performance problems like :
  
  - Avg time per request increased from 25ms to 200-300ms (even higher
  
  if
  
  we
  
  don't restart the slaves)
  - Load on each slave increased from 2 to 15-20 (solr uses %400-%600
  cpu)
  
  When we profile solr we see two very strange things :
  
  1 - This is the jconsole output:
  
  https://skitch.com/meralan/rwwcf/mail-886x691
  
  As you see gc runs for every 10-15 seconds and collects more than 1
  
  gb
  
  of memory. (Actually if you wait more than 10 minutes you see spikes
  up to
  
  4gb
  
  consistently)
  
  2 - This is the newrelic output :
  
  https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
  
  As you see solr spent ridiculously long time in
  SolrDispatchFilter.doFilter() method.
  
  
  Apart form these, when we clean the index directory, re-replicate and
  restart  each slave one by one we see a relief in the system but
  
  after
  
  some
  
  time servers start to melt down again. Although deleting index and
  replicating doesn't solve the problem, we think that these problems
  
  are
  
  somehow related to replication. Because symptoms started after
  
  replication
  
  and once it heals itself after replication. I also see
  lucene-write.lock files in slaves (we don't have write.lock files in
  the master) which I think we shouldn't see.
  
  
  If anyone can give any sort of ideas, we will appreciate it.
  
  Regards,
  Dogacan Guney


Re: Solr performance issue

2011-03-14 Thread Doğacan Güney
Hello,

2011/3/14 Markus Jelsma markus.jel...@openindex.io

 That depends on your GC settings and generation sizes. And, instead of
 UseParallelGC you'd better use UseParNewGC in combination with CMS.


JConsole now shows a different profile output but load is still high and
performance is still bad.

Btw, here is the thread profile from newrelic:

https://skitch.com/meralan/rwscm/thread-profiler-solr-new-relic-rpm

Note that we do use a form of sharding so I maybe all the time spent waiting
for handleRequestBody
is results from sharding?


 See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html

  It's actually, as I understand it, expected JVM behavior to see the heap
  rise to close to it's limit before it gets GC'd, that's how Java GC
  works.  Whether that should happen every 20 seconds or what, I don't
 nkow.
 
  Another option is setting better JVM garbage collection arguments, so GC
  doesn't stop the world so often. I have had good luck with my Solr
  using this:  -XX:+UseParallelGC
 
  On 3/14/2011 4:15 PM, Doğacan Güney wrote:
   Hello again,
  
   2011/3/14 Markus Jelsmamarkus.jel...@openindex.io
  
   Hello,
  
   2011/3/14 Markus Jelsmamarkus.jel...@openindex.io
  
   Hi Doğacan,
  
   Are you, at some point, running out of heap space? In my experience,
   that's the common cause of increased load and excessivly high
 response
   times (or time
   outs).
  
   How much of a heap size would be enough? Our index size is growing
   slowly but we did not have this problem
   a couple weeks ago where index size was maybe 100mb smaller.
  
   Telling how much heap space is needed isn't easy to say. It usually
   needs to
   be increased when you run out of memory and get those nasty OOM
 errors,
   are you getting them?
   Replication eventes will increase heap usage due to cache warming
   queries and
   autowarming.
  
   Nope, no OOM errors.
  
   We left most of the caches in solrconfig as default and only
 increased
   filterCache to 1024. We only ask for ids (which
   are unique) and no other fields during queries (though we do
 faceting).
   Btw, 1.6gb of our index is stored fields (we store
   everything for now, even though we do not get them during queries),
 and
   about 1gb of index.
  
   Hmm, it seems 4000 would be enough indeed. What about the fieldCache,
   are there
   a lot of entries? Is there an insanity count? Do you use boost
   functions?
  
   Insanity count is 0 and fieldCAche has 12 entries. We do use some
   boosting functions.
  
   Btw, I am monitoring output via jconsole with 8gb of ram and it still
   goes to 8gb every 20 seconds or so,
   gc runs, falls down to 1gb.
  
   Btw, our current revision was just a random choice but up until two
 weeks
   ago it has been rock-solid so we have been
   reluctant to update to another version. Would you recommend upgrading
 to
   latest trunk?
  
   It might not have anything to do with memory at all but i'm just
 asking.
   There
   may be a bug in your revision causing this.
  
   Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not
 get
  
   any
  
   improvement in load. I can try monitoring with Jconsole
   with 8gigs of heap to see if it helps.
  
   Cheers,
  
   Hello everyone,
  
   First of all here is our Solr setup:
  
   - Solr nightly build 986158
   - Running solr inside the default jetty comes with solr build
   - 1 write only Master , 4 read only Slaves (quad core 5640 with
 24gb
  
   of
  
   RAM) - Index replicated (on optimize) to slaves via Solr
 Replication
   - Size of index is around 2.5gb
   - No incremental writes, index is created from scratch(delete old
  
   documents
  
   -  commit new documents -  optimize)  every 6 hours
   - Avg # of request per second is around 60 (for a single slave)
   - Avg time per request is around 25ms (before having problems)
   - Load on each is slave is around 2
  
   We are using this set-up for months without any problem. However
 last
  
   week
  
   we started to experience very weird performance problems like :
  
   - Avg time per request increased from 25ms to 200-300ms (even
 higher
  
   if
  
   we
  
   don't restart the slaves)
   - Load on each slave increased from 2 to 15-20 (solr uses %400-%600
   cpu)
  
   When we profile solr we see two very strange things :
  
   1 - This is the jconsole output:
  
   https://skitch.com/meralan/rwwcf/mail-886x691
  
   As you see gc runs for every 10-15 seconds and collects more than 1
  
   gb
  
   of memory. (Actually if you wait more than 10 minutes you see
 spikes
   up to
  
   4gb
  
   consistently)
  
   2 - This is the newrelic output :
  
   https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
  
   As you see solr spent ridiculously long time in
   SolrDispatchFilter.doFilter() method.
  
  
   Apart form these, when we clean the index directory, re-replicate
 and
   restart  each slave one by one we see a relief in the system but
  
   after
  
   some
  
   time 

Re: Solr performance issue

2011-03-14 Thread Markus Jelsma
Mmm. SearchHander.handleRequestBody takes care of sharding. Could your system 
suffer from http://wiki.apache.org/solr/DistributedSearch#Distributed_Deadlock 
?

I'm not sure, i haven't seen a similar issue in a sharded environment, 
probably because it was a controlled environment.


 Hello,
 
 2011/3/14 Markus Jelsma markus.jel...@openindex.io
 
  That depends on your GC settings and generation sizes. And, instead of
  UseParallelGC you'd better use UseParNewGC in combination with CMS.
 
 JConsole now shows a different profile output but load is still high and
 performance is still bad.
 
 Btw, here is the thread profile from newrelic:
 
 https://skitch.com/meralan/rwscm/thread-profiler-solr-new-relic-rpm
 
 Note that we do use a form of sharding so I maybe all the time spent
 waiting for handleRequestBody
 is results from sharding?
 
  See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html
  
   It's actually, as I understand it, expected JVM behavior to see the
   heap rise to close to it's limit before it gets GC'd, that's how Java
   GC works.  Whether that should happen every 20 seconds or what, I
   don't
  
  nkow.
  
   Another option is setting better JVM garbage collection arguments, so
   GC doesn't stop the world so often. I have had good luck with my
   Solr using this:  -XX:+UseParallelGC
   
   On 3/14/2011 4:15 PM, Doğacan Güney wrote:
Hello again,

2011/3/14 Markus Jelsmamarkus.jel...@openindex.io

Hello,

2011/3/14 Markus Jelsmamarkus.jel...@openindex.io

Hi Doğacan,

Are you, at some point, running out of heap space? In my
experience, that's the common cause of increased load and
excessivly high
  
  response
  
times (or time
outs).

How much of a heap size would be enough? Our index size is growing
slowly but we did not have this problem
a couple weeks ago where index size was maybe 100mb smaller.

Telling how much heap space is needed isn't easy to say. It usually
needs to
be increased when you run out of memory and get those nasty OOM
  
  errors,
  
are you getting them?
Replication eventes will increase heap usage due to cache warming
queries and
autowarming.

Nope, no OOM errors.

We left most of the caches in solrconfig as default and only
  
  increased
  
filterCache to 1024. We only ask for ids (which
are unique) and no other fields during queries (though we do
  
  faceting).
  
Btw, 1.6gb of our index is stored fields (we store
everything for now, even though we do not get them during queries),
  
  and
  
about 1gb of index.

Hmm, it seems 4000 would be enough indeed. What about the
fieldCache, are there
a lot of entries? Is there an insanity count? Do you use boost
functions?

Insanity count is 0 and fieldCAche has 12 entries. We do use some
boosting functions.

Btw, I am monitoring output via jconsole with 8gb of ram and it still
goes to 8gb every 20 seconds or so,
gc runs, falls down to 1gb.

Btw, our current revision was just a random choice but up until two
  
  weeks
  
ago it has been rock-solid so we have been
reluctant to update to another version. Would you recommend upgrading
  
  to
  
latest trunk?

It might not have anything to do with memory at all but i'm just
  
  asking.
  
There
may be a bug in your revision causing this.

Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not
  
  get
  
any

improvement in load. I can try monitoring with Jconsole
with 8gigs of heap to see if it helps.

Cheers,

Hello everyone,

First of all here is our Solr setup:

- Solr nightly build 986158
- Running solr inside the default jetty comes with solr build
- 1 write only Master , 4 read only Slaves (quad core 5640 with
  
  24gb
  
of

RAM) - Index replicated (on optimize) to slaves via Solr
  
  Replication
  
- Size of index is around 2.5gb
- No incremental writes, index is created from scratch(delete old

documents

-  commit new documents -  optimize)  every 6 hours
- Avg # of request per second is around 60 (for a single slave)
- Avg time per request is around 25ms (before having problems)
- Load on each is slave is around 2

We are using this set-up for months without any problem. However
  
  last
  
week

we started to experience very weird performance problems like :

- Avg time per request increased from 25ms to 200-300ms (even
  
  higher
  
if

we

don't restart the slaves)
- Load on each slave increased from 2 to 15-20 (solr uses
%400-%600 cpu)

When we profile solr we see two very strange things :

1 - This is the jconsole output:

https://skitch.com/meralan/rwwcf/mail-886x691

As you see gc runs for every 10-15 seconds 

Re: Solr performance issue

2011-03-14 Thread Doğacan Güney
2011/3/14 Markus Jelsma markus.jel...@openindex.io

 Mmm. SearchHander.handleRequestBody takes care of sharding. Could your
 system
 suffer from
 http://wiki.apache.org/solr/DistributedSearch#Distributed_Deadlock
 ?


We increased thread limit (which was 1 before) but it did not help.

Anyway, we will try to disable sharding tomorrow. Maybe this can give us a
better picture.

Thanks for the help, everyone.


 I'm not sure, i haven't seen a similar issue in a sharded environment,
 probably because it was a controlled environment.


  Hello,
 
  2011/3/14 Markus Jelsma markus.jel...@openindex.io
 
   That depends on your GC settings and generation sizes. And, instead of
   UseParallelGC you'd better use UseParNewGC in combination with CMS.
 
  JConsole now shows a different profile output but load is still high and
  performance is still bad.
 
  Btw, here is the thread profile from newrelic:
 
  https://skitch.com/meralan/rwscm/thread-profiler-solr-new-relic-rpm
 
  Note that we do use a form of sharding so I maybe all the time spent
  waiting for handleRequestBody
  is results from sharding?
 
   See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html
  
It's actually, as I understand it, expected JVM behavior to see the
heap rise to close to it's limit before it gets GC'd, that's how Java
GC works.  Whether that should happen every 20 seconds or what, I
don't
  
   nkow.
  
Another option is setting better JVM garbage collection arguments, so
GC doesn't stop the world so often. I have had good luck with my
Solr using this:  -XX:+UseParallelGC
   
On 3/14/2011 4:15 PM, Doğacan Güney wrote:
 Hello again,

 2011/3/14 Markus Jelsmamarkus.jel...@openindex.io

 Hello,

 2011/3/14 Markus Jelsmamarkus.jel...@openindex.io

 Hi Doğacan,

 Are you, at some point, running out of heap space? In my
 experience, that's the common cause of increased load and
 excessivly high
  
   response
  
 times (or time
 outs).

 How much of a heap size would be enough? Our index size is
 growing
 slowly but we did not have this problem
 a couple weeks ago where index size was maybe 100mb smaller.

 Telling how much heap space is needed isn't easy to say. It
 usually
 needs to
 be increased when you run out of memory and get those nasty OOM
  
   errors,
  
 are you getting them?
 Replication eventes will increase heap usage due to cache warming
 queries and
 autowarming.

 Nope, no OOM errors.

 We left most of the caches in solrconfig as default and only
  
   increased
  
 filterCache to 1024. We only ask for ids (which
 are unique) and no other fields during queries (though we do
  
   faceting).
  
 Btw, 1.6gb of our index is stored fields (we store
 everything for now, even though we do not get them during
 queries),
  
   and
  
 about 1gb of index.

 Hmm, it seems 4000 would be enough indeed. What about the
 fieldCache, are there
 a lot of entries? Is there an insanity count? Do you use boost
 functions?

 Insanity count is 0 and fieldCAche has 12 entries. We do use some
 boosting functions.

 Btw, I am monitoring output via jconsole with 8gb of ram and it
 still
 goes to 8gb every 20 seconds or so,
 gc runs, falls down to 1gb.

 Btw, our current revision was just a random choice but up until two
  
   weeks
  
 ago it has been rock-solid so we have been
 reluctant to update to another version. Would you recommend
 upgrading
  
   to
  
 latest trunk?

 It might not have anything to do with memory at all but i'm just
  
   asking.
  
 There
 may be a bug in your revision causing this.

 Anyway, Xmx was 4000m, we tried increasing it to 8000m but did
 not
  
   get
  
 any

 improvement in load. I can try monitoring with Jconsole
 with 8gigs of heap to see if it helps.

 Cheers,

 Hello everyone,

 First of all here is our Solr setup:

 - Solr nightly build 986158
 - Running solr inside the default jetty comes with solr build
 - 1 write only Master , 4 read only Slaves (quad core 5640 with
  
   24gb
  
 of

 RAM) - Index replicated (on optimize) to slaves via Solr
  
   Replication
  
 - Size of index is around 2.5gb
 - No incremental writes, index is created from scratch(delete
 old

 documents

 -  commit new documents -  optimize)  every 6 hours
 - Avg # of request per second is around 60 (for a single slave)
 - Avg time per request is around 25ms (before having problems)
 - Load on each is slave is around 2

 We are using this set-up for months without any problem.
 However
  
   last
  
 week

 we started to experience very weird performance problems like :

 - Avg time per request increased from 25ms to 200-300ms (even