Re: Solr Search Inconsistent result

2014-12-23 Thread Ahmet Arslan
Hi Ankit,

So you are using solr.UUIDUpdateProcessorFactory to pupolate unique keys?

Ahmet

On Tuesday, December 23, 2014 7:13 AM, Ankit Jain ankitjainc...@gmail.com 
wrote:
Hi Ahmet,

Thanks for the response.
Document ID is unique because we are using *UUID* to generate the document
ID.

Thanks,
Ankit

On Tue, Dec 23, 2014 at 12:16 AM, Ahmet Arslan iori...@yahoo.com.invalid
wrote:

 Hi,

 Do you happen to have documents with with unique id in different shards?
 When unique ids are not unique across shards, people see inconsistent
 results.
 Please see : http://find.searchhub.org/document/2814183511b5a52

 Ahmet



 On Monday, December 22, 2014 8:06 PM, Ankit Jain ankitjainc...@gmail.com
 wrote:
 Hi Ahmet,

 Thanks for the response.
 I am running this query from Solr Search UI. The number of shards for a
 collection is two.

 Thanks,
 Ankit

 On Mon, Dec 22, 2014 at 8:34 PM, Ahmet Arslan iori...@yahoo.com.invalid
 wrote:

  Hi,
 
  Is this sharded  query?
 
  Ahmet
 
 
  On Monday, December 22, 2014 4:47 PM, Ankit Jain 
 ankitjainc...@gmail.com
  wrote:
  Hi All,
 
  We are getting inconsistent search result on searching on *multivalued*
  field:
 
  *Input Query:*
  ( t : [ 0 TO 1419245069253 ] )AND(_all:impetus-i0111.impetus.co.in)
 
  The _all field is multivalued field.
 
  The above query is returning sometimes 11 records and sometimes 12471
  records.
 
  Please help.
 
  Thanks,

  Ankit
 



 --
 Thanks,

 Ankit Jain




-- 
Thanks,
Ankit Jain


Loading data to FieldValueCache

2014-12-23 Thread Manohar Sripada
Hello,

From the wiki, it states that
http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly used for
faceting.

Can someone please throw some light on how to load data to this cache. Like
on what solrquery option does this consider the data to be loaded to this
cache.

My requirement is I have 10 facet fields (with facetlimit - 5) to be shown
in my UI. I want to speed up this by using this cache. Is there a way where
I can specify only the list of fields to be loaded to FieldValue Cache?

Thanks,
Manohar


RE: Loading data to FieldValueCache

2014-12-23 Thread Toke Eskildsen
Manohar Sripada [manohar...@gmail.com] wrote:
 From the wiki, it states that
 http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly used for
 faceting.

 Can someone please throw some light on how to load data to this cache. Like
 on what solrquery option does this consider the data to be loaded to this
 cache.

The values are loaded on first facet call with facet.method=fc.
http://wiki.apache.org/solr/SimpleFacetParameters#facet.method

 My requirement is I have 10 facet fields (with facetlimit - 5) to be shown
 in my UI. I want to speed up this by using this cache. Is there a way where
 I can specify only the list of fields to be loaded to FieldValue Cache?

Add a facet call as explicit warmup in your solrconfig.xml.

You might want to consider DocValues for your facet fields.
https://cwiki.apache.org/confluence/display/solr/DocValues

- Toke Eskildsen

Re: Endless 100% CPU usage on searcherExecutor thread

2014-12-23 Thread heaven
We do not use dates here, at least not too often. Usually its something like
type:Profile (we do use it from the rails application so type describes
model names), opted_in:true, etc. Solr wasn't running too long though, so
this could not show the real state.

Currently for the filter cache it shows 1 and 0.84 of the query results. I
also increased the cache size to
autowarm: 512, initial: 1024 and size 4096, which is actually never reached
because of commits.

Best,
Alex



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Endless-100-CPU-usage-on-searcherExecutor-thread-tp4175088p4175725.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Paging on large indexes

2014-12-23 Thread Bram Van Dam

On 12/22/2014 04:27 PM, Erick Erickson wrote:

Have you read Hossman's blog here?
https://lucidworks.com/blog/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/#referrer=solr.pl


Oh thanks, that's a pretty interesting read. The scale we're 
investigating is several orders of magnitude larger than what was tested 
there, so I'm still a bit worried.



Because if you're trying this and _still_ getting bad performance we
need to know.


I'll definitely keep you posted when our test results on larger indexes 
(~50 billion documents) come in, but this sadly won't be any time soon 
(infrastructure sucks). The largest index I currently have access to is 
about a billion documents in size. Paging there is a nightmare, but the 
Solr version is too old to support cursors so I'm afraid I can't offer 
any useful data.


Does anyone have any performance data on multi-billion-document indexes? 
With or without SolrCloud?



Bram:
One minor pedantic clarification.. The first round-trip only returns
the id and sort criteria (score by default), not the whole document,
although the effect is the same, as you page N into the corpus, the
default implementation returns N * (pageNum + 1) entries. Even worse,
each node itself has to _sort_ that many entries Then a second
call is made to get the page-worth of docs...


I was trying to keep it short and sweet, but yes, that's the way I think 
it works ;-)



That said, though, its pretty easy to argue that the 500th page is
pretty useless, nobody will ever hit the next page button 499 times.


Nobody will hit next 499 times, but a lot of our users skip to the last 
page quite often. Maybe I should make *that* as hard as possible. Hmm.


Thanks for the tips!

 - Bram


Re: How to define Json list in schema in xml

2014-12-23 Thread Erik Hatcher
A multiValued=true string field is what you're after here. 

   Erik


 On Dec 22, 2014, at 23:19, Xin Cai xincai2...@gmail.com wrote:
 
 hi guys
 I am looking to parse a json file that contains fields that has a list of
 schools
 
 So for example I would have
 
 
 {Schools:[
 name: Seirra High School,
 name: Walnut elementary School]}
 
 So if I want to be able to index all the different schools so i can fast
 look up with people that went to a certain school, what is the best way for
 me to define the schema file? I have looked around and I don't think Solr
 has a native support for list but I can be wrong because list is used so
 oftenAny advice would be appreciated. Thanks
 
 Xin Cai


'Illegal character in query' on Solr cloud 4.10.1

2014-12-23 Thread S.L
Hi All,

I am using SolrCloud 4.10.1 and I have 3 shards with replication factor of
2 , i.e is 6 nodes altogether.

When I query the server1 out of 6 nodes in the cluster with the below query
, it works fine , but any other node in the cluster when queried with the
same query results in a *HTTP Status 500 - {msg=Illegal character in query
at index 181:*
error.

The character at index 181 is the boost character ^. I have see a Jira
SOLR-5971 https://issues.apache.org/jira/browse/SOLR-5971 for a similar
issue , how can I overcome this issue.

The query I use is below. Thanks in Advance!

http://xx2..com:8081/solr/dyCollection1_shard2_replica1/?q=x+x+xxsort=score+descwt=jsonindent=truedebugQuery=truedefType=edismaxqf=productName
^1.5+productDescriptionmm=1pf=productName+productDescriptionps=1pf2=productName+productDescriptionpf3=productName+productDescriptionstopwords=truelowercaseOperators=true


Solr server becomes non-responsive.

2014-12-23 Thread Modassar Ather
Hi,

I have a setup of 4 shard Solr cluster with embedded zookeeper on one of
them. The zkClient time out is set to 30 seconds, -Xms is 20g and -Xms is
24g.
When executing a huge query with many wildcards inside it the server
crashes and becomes non-responsive. Even the dashboard does not responds
and shows connection lost error. This requires me to restart the servers.

I have set query *timeAllowed* to 5 minutes but it also seems to be not
getting honored and the query hangs around.

Kindly help me debug the issue and fix it or if there is a way the
timeAllowed can be made honored or a the query which is hanging for
sometime can be stopped.

*Following are few exceptions.*

*org.apache.zookeeper.server.NIOServerCnxn doIO*
WARNING: caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
session id, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)

*org.apache.zookeeper.server.NIOServerCnxn sendBuffer*
SEVERE: Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at
org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
at
org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081)
at
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404)
at
org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
at
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)

*org.apache.zookeeper.server.persistence.FileTxnLog commit*
WARNING: fsync-ing the write ahead log in SyncThread:0 took 28346ms which
will adversely effect operation latency. See the ZooKeeper troubleshooting
guide
org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: no servers hosting shard:
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

*Caused by: java.lang.OutOfMemoryError: Java heap space*
at
org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.init(Lucene41PostingsReader.java:640)
at
org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docsAndPositions(Lucene41PostingsReader.java:278)
at
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docsAndPositions(SegmentTermsEnum.java:1011)
at
org.apache.lucene.search.spans.SpanTermQuery.getSpans(SpanTermQuery.java:123)
at
org.apache.lucene.search.spans.SpanOrQuery$1.initSpanQueue(SpanOrQuery.java:180)
at
org.apache.lucene.search.spans.SpanOrQuery$1.next(SpanOrQuery.java:193)
at
org.apache.lucene.search.spans.SpanOrQuery$1.initSpanQueue(SpanOrQuery.java:182)
at
org.apache.lucene.search.spans.SpanOrQuery$1.next(SpanOrQuery.java:193)
at
org.apache.lucene.search.spans.NearSpansUnordered$SpansCell.next(NearSpansUnordered.java:88)
at
org.apache.lucene.search.spans.NearSpansUnordered.initList(NearSpansUnordered.java:295)
at
org.apache.lucene.search.spans.NearSpansUnordered.next(NearSpansUnordered.java:164)
at
org.apache.lucene.search.spans.SpanScorer.init(SpanScorer.java:46)
at
org.apache.lucene.search.spans.SpanWeight.scorer(SpanWeight.java:88)
at
org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.scorer(DisjunctionMaxQuery.java:160)
at
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356)
at
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356)
at
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356)
at
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356)
at
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356)
at
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356)
at
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356)
at

Re: Solr server becomes non-responsive.

2014-12-23 Thread Modassar Ather
To add to the details of above issue the query as soon it is executed, even
before the OutOfMemory error causes the solr servers to become
non-responsive.

On Tue, Dec 23, 2014 at 5:04 PM, Modassar Ather modather1...@gmail.com
wrote:

 Hi,

 I have a setup of 4 shard Solr cluster with embedded zookeeper on one of
 them. The zkClient time out is set to 30 seconds, -Xms is 20g and -Xms is
 24g.
 When executing a huge query with many wildcards inside it the server
 crashes and becomes non-responsive. Even the dashboard does not responds
 and shows connection lost error. This requires me to restart the servers.

 I have set query *timeAllowed* to 5 minutes but it also seems to be not
 getting honored and the query hangs around.

 Kindly help me debug the issue and fix it or if there is a way the
 timeAllowed can be made honored or a the query which is hanging for
 sometime can be stopped.

 *Following are few exceptions.*

 *org.apache.zookeeper.server.NIOServerCnxn doIO*
 WARNING: caught end of stream exception
 EndOfStreamException: Unable to read additional data from client sessionid
 session id, likely client has closed socket
 at
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
 at
 org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
 at java.lang.Thread.run(Thread.java:745)

 *org.apache.zookeeper.server.NIOServerCnxn sendBuffer*
 SEVERE: Unexpected Exception:
 java.nio.channels.CancelledKeyException
 at
 sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
 at
 sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
 at
 org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
 at
 org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081)
 at
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404)
 at
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
 at
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)

 *org.apache.zookeeper.server.persistence.FileTxnLog commit*
 WARNING: fsync-ing the write ahead log in SyncThread:0 took 28346ms which
 will adversely effect operation latency. See the ZooKeeper troubleshooting
 guide
 org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: no servers hosting shard:
 at
 org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149)
 at
 org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)

 *Caused by: java.lang.OutOfMemoryError: Java heap space*
 at
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.init(Lucene41PostingsReader.java:640)
 at
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docsAndPositions(Lucene41PostingsReader.java:278)
 at
 org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docsAndPositions(SegmentTermsEnum.java:1011)
 at
 org.apache.lucene.search.spans.SpanTermQuery.getSpans(SpanTermQuery.java:123)
 at
 org.apache.lucene.search.spans.SpanOrQuery$1.initSpanQueue(SpanOrQuery.java:180)
 at
 org.apache.lucene.search.spans.SpanOrQuery$1.next(SpanOrQuery.java:193)
 at
 org.apache.lucene.search.spans.SpanOrQuery$1.initSpanQueue(SpanOrQuery.java:182)
 at
 org.apache.lucene.search.spans.SpanOrQuery$1.next(SpanOrQuery.java:193)
 at
 org.apache.lucene.search.spans.NearSpansUnordered$SpansCell.next(NearSpansUnordered.java:88)
 at
 org.apache.lucene.search.spans.NearSpansUnordered.initList(NearSpansUnordered.java:295)
 at
 org.apache.lucene.search.spans.NearSpansUnordered.next(NearSpansUnordered.java:164)
 at
 org.apache.lucene.search.spans.SpanScorer.init(SpanScorer.java:46)
 at
 org.apache.lucene.search.spans.SpanWeight.scorer(SpanWeight.java:88)
 at
 org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.scorer(DisjunctionMaxQuery.java:160)
 at
 org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356)
 at
 org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356)
 at
 org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356)
 at
 org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356)
 at
 

How best to fork Solr for enhancement

2014-12-23 Thread Upayavira
Hi,

I've (hopefully) made some time to do some work on the Solr Admin UI
(convert it to AngularJS). I plan to do it on a clone of the lucene-solr
project at GitHub.

Before I dive too thoroughly into this, I wanted to see if there were
any best practices that would make it easier to back-port these changes
into SVN should I actually succeed at producing something useful. Is it
enough just to make a branch called SOLR-5507 and start committing my
changes there?

Periodically, I'll zip up the relevant bits and attach them to the JIRA
ticket.

TIA

Upayavira


Re: How best to fork Solr for enhancement

2014-12-23 Thread Shalin Shekhar Mangar
You can make github play well with Apache Infra. See
https://wiki.apache.org/lucene-java/BensonMarguliesGitWorkflow

On Tue, Dec 23, 2014 at 11:52 AM, Upayavira u...@odoko.co.uk wrote:

 Hi,

 I've (hopefully) made some time to do some work on the Solr Admin UI
 (convert it to AngularJS). I plan to do it on a clone of the lucene-solr
 project at GitHub.

 Before I dive too thoroughly into this, I wanted to see if there were
 any best practices that would make it easier to back-port these changes
 into SVN should I actually succeed at producing something useful. Is it
 enough just to make a branch called SOLR-5507 and start committing my
 changes there?

 Periodically, I'll zip up the relevant bits and attach them to the JIRA
 ticket.

 TIA

 Upayavira




-- 
Regards,
Shalin Shekhar Mangar.


Re: How best to fork Solr for enhancement

2014-12-23 Thread Upayavira
Perfect, thanks!

On Tue, Dec 23, 2014, at 07:10 AM, Shalin Shekhar Mangar wrote:
 You can make github play well with Apache Infra. See
 https://wiki.apache.org/lucene-java/BensonMarguliesGitWorkflow
 
 On Tue, Dec 23, 2014 at 11:52 AM, Upayavira u...@odoko.co.uk wrote:
 
  Hi,
 
  I've (hopefully) made some time to do some work on the Solr Admin UI
  (convert it to AngularJS). I plan to do it on a clone of the lucene-solr
  project at GitHub.
 
  Before I dive too thoroughly into this, I wanted to see if there were
  any best practices that would make it easier to back-port these changes
  into SVN should I actually succeed at producing something useful. Is it
  enough just to make a branch called SOLR-5507 and start committing my
  changes there?
 
  Periodically, I'll zip up the relevant bits and attach them to the JIRA
  ticket.
 
  TIA
 
  Upayavira
 
 
 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.


Re: what does this write.lock does not exist mean??

2014-12-23 Thread brian4
Haven't seen this particular problem before, but it sounds like it could be a
problem with permissions or data size limits - it may be worth looking into.

The write.lock file is used when an index is being modified - it is how
lucene handles concurrent attempts to modify the index - a writer obtains
the lock on the index
(http://wiki.apache.org/lucene-java/LuceneFAQ#What_is_the_purpose_of_write.lock_file.2C_when_is_it_used.2C_and_by_which_classes.3F).
  

Did you check if the file is actually there (in the data/index folder of
your solr core)?  If it is, then maybe the app has permission to create the
file, but the created file does not have permissions to read and/or modify
it, so the app thinks it exists but cannot access it?  

If it's not there, maybe it is being deleted unexpectedly by some other
process/person, or alternatively, maybe it can't even create the file -
either it doesn't have permissions for that directory or there is no more
free space.

I've seen this issue several times before of running out of allowed disk
space preventing index files from being created. It's kind of like the index
is locked in its current state - or at least can't be updated all the way. 
Are you able to add a large number of documents to the index and then
confirm that they have actually been added (search for them by ID for
instance)?







--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-does-this-write-lock-does-not-exist-mean-tp4175291p4175773.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SolrCloud Paging on large indexes

2014-12-23 Thread Toke Eskildsen
Bram Van Dam [bram.van...@intix.eu] wrote:

[Solr cursors]

 Oh thanks, that's a pretty interesting read. The scale we're
 investigating is several orders of magnitude larger than what was tested
 there, so I'm still a bit worried.

The beauty of the cursor is that it is has little to no overhead, relative to a 
standard top-X sorted search. A standard search uses a sliding window over the 
full result set, as does a cursor-search. Same amount of work. It is just a 
question of limits for the window.

 The largest index I currently have access to is
 about a billion documents in size. Paging there is a nightmare, but the
 Solr version is too old to support cursors so I'm afraid I can't offer
 any useful data.

Non-cursor paging in Solr uses a sliding window sort with a heap that contains 
all documents up to the paging number. A heap is a very fine thing for sliding 
window sort, as long as it is small. But performance drops to horrible levels 
when it gets large as it is extremely RAM-cache unfriendly.

 Does anyone have any performance data on multi-billion-document indexes?

Sorry, no. I could do a test on our 7 billion documents index, but it would 
have to wait until the end of January.

Nobody will hit next 499 times, but a lot of our users skip to the last
 page quite often. Maybe I should make *that* as hard as possible. Hmm.

Issue a search with sort in reverse order, then reverse the returned list of 
documents?

- Toke Eskildsen


Re: How best to fork Solr for enhancement

2014-12-23 Thread Alexandre Rafalovitch
Semi Off Topic, but is AngularJS the best next choice, given the
version 2 being so different from version 1?

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 23 December 2014 at 06:52, Upayavira u...@odoko.co.uk wrote:
 Hi,

 I've (hopefully) made some time to do some work on the Solr Admin UI
 (convert it to AngularJS). I plan to do it on a clone of the lucene-solr
 project at GitHub.

 Before I dive too thoroughly into this, I wanted to see if there were
 any best practices that would make it easier to back-port these changes
 into SVN should I actually succeed at producing something useful. Is it
 enough just to make a branch called SOLR-5507 and start committing my
 changes there?

 Periodically, I'll zip up the relevant bits and attach them to the JIRA
 ticket.

 TIA

 Upayavira


Re: How best to fork Solr for enhancement

2014-12-23 Thread Yago Riveiro
There is other options like Ember or Backbone, either way AngularJS is well 
adopted.




Alexandre, your question is about the radical change between versions?





In some way this shows progress and support to the framework.




Other good reason is that AngularJS has a ton of components ready to use.






—
/Yago Riveiro




On Tuesday, Dec 23, 2014 at 3:10 pm, Alexandre Rafalovitch 
arafa...@gmail.com, wrote:
Semi Off Topic, but is AngularJS the best next choice, given the

version 2 being so different from version 1?


Regards,

   Alex.



Sign up for my Solr resources newsletter at http://www.solr-start.com/



On 23 December 2014 at 06:52, Upayavira u...@odoko.co.uk wrote:

 Hi,



 I've (hopefully) made some time to do some work on the Solr Admin UI

 (convert it to AngularJS). I plan to do it on a clone of the lucene-solr

 project at GitHub.



 Before I dive too thoroughly into this, I wanted to see if there were

 any best practices that would make it easier to back-port these changes

 into SVN should I actually succeed at producing something useful. Is it

 enough just to make a branch called SOLR-5507 and start committing my

 changes there?



 Periodically, I'll zip up the relevant bits and attach them to the JIRA

 ticket.



 TIA



 Upayavira

UI for Solr

2014-12-23 Thread Olivier Austina
Hi,

I would like to build a User Interface on top of Solr for PC and mobile. I
am wondering if there is a framework, best practice commonly used. I want
Solr features such as suggestion, auto complete, facet to be available for
UI. Any suggestion is welcome. Than you.

Regards
Olivier


Re: UI for Solr

2014-12-23 Thread Alexandre Rafalovitch
You don't expose Solr directly to the user, it is not setup for
full-proof security out of the box. So you would need a client to talk
to Solr.

Something like Spring.io's Spring Data Solr could be one of the things
to check. You can see an auto-complete example for it at:
https://github.com/arafalov/Solr-Javadoc/tree/master/SearchServer/src/main
and embedded in action at
http://www.solr-start.com/javadoc/solr-lucene/index.html (search box
on the top)

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 23 December 2014 at 10:45, Olivier Austina olivier.aust...@gmail.com wrote:
 Hi,

 I would like to build a User Interface on top of Solr for PC and mobile. I
 am wondering if there is a framework, best practice commonly used. I want
 Solr features such as suggestion, auto complete, facet to be available for
 UI. Any suggestion is welcome. Than you.

 Regards
 Olivier


Re: Solr Search Inconsistent result

2014-12-23 Thread Ankit Jain
Hi Ahmet,

We are using the *java.util.UUID* to generate the unique id for each
document.

Thanks,
Ankit Jain

On Tue, Dec 23, 2014 at 1:32 PM, Ahmet Arslan iori...@yahoo.com.invalid
wrote:

 Hi Ankit,

 So you are using solr.UUIDUpdateProcessorFactory to pupolate unique keys?

 Ahmet

 On Tuesday, December 23, 2014 7:13 AM, Ankit Jain ankitjainc...@gmail.com
 wrote:
 Hi Ahmet,

 Thanks for the response.
 Document ID is unique because we are using *UUID* to generate the document
 ID.

 Thanks,
 Ankit

 On Tue, Dec 23, 2014 at 12:16 AM, Ahmet Arslan iori...@yahoo.com.invalid
 wrote:

  Hi,
 
  Do you happen to have documents with with unique id in different shards?
  When unique ids are not unique across shards, people see inconsistent
  results.
  Please see : http://find.searchhub.org/document/2814183511b5a52
 
  Ahmet
 
 
 
  On Monday, December 22, 2014 8:06 PM, Ankit Jain 
 ankitjainc...@gmail.com
  wrote:
  Hi Ahmet,
 
  Thanks for the response.
  I am running this query from Solr Search UI. The number of shards for a
  collection is two.
 
  Thanks,
  Ankit
 
  On Mon, Dec 22, 2014 at 8:34 PM, Ahmet Arslan iori...@yahoo.com.invalid
 
  wrote:
 
   Hi,
  
   Is this sharded  query?
  
   Ahmet
  
  
   On Monday, December 22, 2014 4:47 PM, Ankit Jain 
  ankitjainc...@gmail.com
   wrote:
   Hi All,
  
   We are getting inconsistent search result on searching on *multivalued*
   field:
  
   *Input Query:*
   ( t : [ 0 TO 1419245069253 ] )AND(_all:impetus-i0111.impetus.co.in)
  
   The _all field is multivalued field.
  
   The above query is returning sometimes 11 records and sometimes 12471
   records.
  
   Please help.
  
   Thanks,

   Ankit
  
 
 
 
  --
  Thanks,
 
  Ankit Jain
 



 --
 Thanks,
 Ankit Jain




-- 
Thanks,
Ankit Jain


Re: Endless 100% CPU usage on searcherExecutor thread

2014-12-23 Thread Shawn Heisey
On 12/23/2014 2:31 AM, heaven wrote:
 We do not use dates here, at least not too often. Usually its something like
 type:Profile (we do use it from the rails application so type describes
 model names), opted_in:true, etc. Solr wasn't running too long though, so
 this could not show the real state.

 Currently for the filter cache it shows 1 and 0.84 of the query results. I
 also increased the cache size to
 autowarm: 512, initial: 1024 and size 4096, which is actually never reached
 because of commits.

Warming the filter cache *can* be very slow.  It all depends on exactly
what your filters are.  I had to reduce the autowarmCount on my
filterCache to *four* because if it was any higher, a commit would take
up to a minute.  We have some really complex filters.

Thanks,
Shawn



Re: Solr server becomes non-responsive.

2014-12-23 Thread Shawn Heisey
On 12/23/2014 4:34 AM, Modassar Ather wrote:
 Hi,

 I have a setup of 4 shard Solr cluster with embedded zookeeper on one of
 them. The zkClient time out is set to 30 seconds, -Xms is 20g and -Xms is
 24g.
 When executing a huge query with many wildcards inside it the server
 crashes and becomes non-responsive. Even the dashboard does not responds
 and shows connection lost error. This requires me to restart the servers.

Here's the important part of your message:

*Caused by: java.lang.OutOfMemoryError: Java heap space*


Your heap is not big enough for what Solr has been asked to do.  You
need to either increase your heap size or change your configuration so
that it uses less memory.

http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

Most programs have pretty much undefined behavior when an OOME occurs. 
Lucene's IndexWriter has been hardened so that it tries extremely hard
to avoid index corruption when OOME strikes, and I believe that works
well enough that we can call it nearly bulletproof ... but the rest of
Lucene and Solr will make no guarantees.

It's very difficult to have definable program behavior when an OOME
happens, because you simply cannot know the precise point during program
execution where it will happen, or what isn't going to work because Java
did not have memory space to create an object.  Going unresponsive is
not surprising.

If you can solve your heap problem, note that you may run into other
performance issues discussed on the wiki page that I linked.

Thanks,
Shawn



Re: Loading data to FieldValueCache

2014-12-23 Thread Erick Erickson
Or just not worry about it. The cache will be filled up automatically
as you query for facets etc., the benefit to trying to fill it up as
Toke outlines is just that the first few user queries that call for
faceting will be somewhat faster. But after the first few user
queries have gone through, it won't matter whether you've
pre-loaded the cache or not.

My point is that you'll get the benefit of the cache no matter what,
it's just a matter of whether it's important that the first few users
don't have to wait while they're loaded. And with DocValues,
as Toke recommends, even that may be unimportant.

Best,
Erick

On Tue, Dec 23, 2014 at 1:03 AM, Toke Eskildsen t...@statsbiblioteket.dk 
wrote:
 Manohar Sripada [manohar...@gmail.com] wrote:
 From the wiki, it states that
 http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly used for
 faceting.

 Can someone please throw some light on how to load data to this cache. Like
 on what solrquery option does this consider the data to be loaded to this
 cache.

 The values are loaded on first facet call with facet.method=fc.
 http://wiki.apache.org/solr/SimpleFacetParameters#facet.method

 My requirement is I have 10 facet fields (with facetlimit - 5) to be shown
 in my UI. I want to speed up this by using this cache. Is there a way where
 I can specify only the list of fields to be loaded to FieldValue Cache?

 Add a facet call as explicit warmup in your solrconfig.xml.

 You might want to consider DocValues for your facet fields.
 https://cwiki.apache.org/confluence/display/solr/DocValues

 - Toke Eskildsen


Re: 'Illegal character in query' on Solr cloud 4.10.1

2014-12-23 Thread Erick Erickson
Hmmm, so you are you pinging the servers directly, right?
Here's a couple of things to try:
1 add distrib=false to the query and try each of the 6 servers.
What I'm wondering is if this is happening on the sub-query sent
out or on the primary server. Adding distrib=false will just execute
on the node you're sending it to, and will NOT send sub-queries out
to any other node so you'll get partial results back.

If one server continues to work but the other 5 fail, then your servlet
container is probably not set up with the right character sets. Although
why that would manifest itself on the ^ character mystifies me.

2 Let's assume that all 6 servers handle the raw query. Next thing that
would be really helpful is to see the sub-queries. Take distrib=false
off and tail the logs on all the servers. What we're looking for here is
whether the sub-queries even make it to Solr or whether the problem
is in your container.

3 If the sub-queries do NOT make it to the Solr logs, what is the query
that the container sees? Is it recognizable or has Solr somehow munged
the sub-query?

What is your environment like? Tomcat? Jetty? Other? What JVM
etc?

Best,
Erick

On Tue, Dec 23, 2014 at 3:23 AM, S.L simpleliving...@gmail.com wrote:
 Hi All,

 I am using SolrCloud 4.10.1 and I have 3 shards with replication factor of
 2 , i.e is 6 nodes altogether.

 When I query the server1 out of 6 nodes in the cluster with the below query
 , it works fine , but any other node in the cluster when queried with the
 same query results in a *HTTP Status 500 - {msg=Illegal character in query
 at index 181:*
 error.

 The character at index 181 is the boost character ^. I have see a Jira
 SOLR-5971 https://issues.apache.org/jira/browse/SOLR-5971 for a similar
 issue , how can I overcome this issue.

 The query I use is below. Thanks in Advance!

 http://xx2..com:8081/solr/dyCollection1_shard2_replica1/?q=x+x+xxsort=score+descwt=jsonindent=truedebugQuery=truedefType=edismaxqf=productName
 ^1.5+productDescriptionmm=1pf=productName+productDescriptionps=1pf2=productName+productDescriptionpf3=productName+productDescriptionstopwords=truelowercaseOperators=true


Re: Solr server becomes non-responsive.

2014-12-23 Thread Erick Erickson
Second most important part of your message:
When executing a huge query with many wildcards inside it the server

This is usually an anti-pattern. The very first thing
I'd be doing is trying to not do this. See ngrams for infix
queries, or shingles or ReverseWildcardFilterFactory or.

And if your corpus is very large with many unique terms it's even
worse, but you haven't really told us about that yet.

Best,
Erick

On Tue, Dec 23, 2014 at 8:30 AM, Shawn Heisey apa...@elyograg.org wrote:
 On 12/23/2014 4:34 AM, Modassar Ather wrote:
 Hi,

 I have a setup of 4 shard Solr cluster with embedded zookeeper on one of
 them. The zkClient time out is set to 30 seconds, -Xms is 20g and -Xms is
 24g.
 When executing a huge query with many wildcards inside it the server
 crashes and becomes non-responsive. Even the dashboard does not responds
 and shows connection lost error. This requires me to restart the servers.

 Here's the important part of your message:

 *Caused by: java.lang.OutOfMemoryError: Java heap space*


 Your heap is not big enough for what Solr has been asked to do.  You
 need to either increase your heap size or change your configuration so
 that it uses less memory.

 http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

 Most programs have pretty much undefined behavior when an OOME occurs.
 Lucene's IndexWriter has been hardened so that it tries extremely hard
 to avoid index corruption when OOME strikes, and I believe that works
 well enough that we can call it nearly bulletproof ... but the rest of
 Lucene and Solr will make no guarantees.

 It's very difficult to have definable program behavior when an OOME
 happens, because you simply cannot know the precise point during program
 execution where it will happen, or what isn't going to work because Java
 did not have memory space to create an object.  Going unresponsive is
 not surprising.

 If you can solve your heap problem, note that you may run into other
 performance issues discussed on the wiki page that I linked.

 Thanks,
 Shawn



Re: SolrCloud Paging on large indexes

2014-12-23 Thread Erick Erickson
 Nobody will hit next 499 times, but a lot of our users skip to the last page 
 quite often. Maybe I should make *that* as hard as possible. Hmm

Right. I'd actually argue that providing a last page link in this situation is

1) useless to the user, I mean what's the point? Curiosity? If it really _must_
be supported, Toke's approach is sneaky and elegant. Sort in reverse order and
give them the first page ;).

2) dangerous as you well know...

 several orders of magnitude larger than what was tested
 there, so I'm still a bit worried.

I sympathize, but somebody has to be first ;). Besides, the
current situation is untenable from what you're saying...

Good luck!
Erick

On Tue, Dec 23, 2014 at 7:07 AM, Toke Eskildsen t...@statsbiblioteket.dk 
wrote:
 Bram Van Dam [bram.van...@intix.eu] wrote:

 [Solr cursors]

 Oh thanks, that's a pretty interesting read. The scale we're
 investigating is several orders of magnitude larger than what was tested
 there, so I'm still a bit worried.

 The beauty of the cursor is that it is has little to no overhead, relative to 
 a standard top-X sorted search. A standard search uses a sliding window over 
 the full result set, as does a cursor-search. Same amount of work. It is just 
 a question of limits for the window.

 The largest index I currently have access to is
 about a billion documents in size. Paging there is a nightmare, but the
 Solr version is too old to support cursors so I'm afraid I can't offer
 any useful data.

 Non-cursor paging in Solr uses a sliding window sort with a heap that 
 contains all documents up to the paging number. A heap is a very fine thing 
 for sliding window sort, as long as it is small. But performance drops to 
 horrible levels when it gets large as it is extremely RAM-cache unfriendly.

 Does anyone have any performance data on multi-billion-document indexes?

 Sorry, no. I could do a test on our 7 billion documents index, but it would 
 have to wait until the end of January.

Nobody will hit next 499 times, but a lot of our users skip to the last
 page quite often. Maybe I should make *that* as hard as possible. Hmm.

 Issue a search with sort in reverse order, then reverse the returned list of 
 documents?

 - Toke Eskildsen


Re: Solr Search Inconsistent result

2014-12-23 Thread Erick Erickson
This really sounds like you _think_ you have two shards
in a single collection, but really you don't. I admit I'm not
quite sure how it got that way, but...

So try this: Add distrib=false to the query and ping each
of your servers separately. My bet is that you'll find they
have wildly varying numbers of docs, specifically
11 and 12,471.

Next 'tail -f' the logs and send the query again. You should
see each shard get a sub-query, if you do NOT see the
sub-query at one node for each shard you don't really have
a sharded collection (or there's some other problem)

But this is not supposed to happen in SolrCloud. You haven't
told us anything about your setup, what version of Solr? Old-style
master/slave or SolrCloud? How did you create your collection?
How are you indexing to that collection? In short, please review:

http://wiki.apache.org/solr/UsingMailingLists

Best,
Erick

On Tue, Dec 23, 2014 at 7:57 AM, Ankit Jain ankitjainc...@gmail.com wrote:
 Hi Ahmet,

 We are using the *java.util.UUID* to generate the unique id for each
 document.

 Thanks,
 Ankit Jain

 On Tue, Dec 23, 2014 at 1:32 PM, Ahmet Arslan iori...@yahoo.com.invalid
 wrote:

 Hi Ankit,

 So you are using solr.UUIDUpdateProcessorFactory to pupolate unique keys?

 Ahmet

 On Tuesday, December 23, 2014 7:13 AM, Ankit Jain ankitjainc...@gmail.com
 wrote:
 Hi Ahmet,

 Thanks for the response.
 Document ID is unique because we are using *UUID* to generate the document
 ID.

 Thanks,
 Ankit

 On Tue, Dec 23, 2014 at 12:16 AM, Ahmet Arslan iori...@yahoo.com.invalid
 wrote:

  Hi,
 
  Do you happen to have documents with with unique id in different shards?
  When unique ids are not unique across shards, people see inconsistent
  results.
  Please see : http://find.searchhub.org/document/2814183511b5a52
 
  Ahmet
 
 
 
  On Monday, December 22, 2014 8:06 PM, Ankit Jain 
 ankitjainc...@gmail.com
  wrote:
  Hi Ahmet,
 
  Thanks for the response.
  I am running this query from Solr Search UI. The number of shards for a
  collection is two.
 
  Thanks,
  Ankit
 
  On Mon, Dec 22, 2014 at 8:34 PM, Ahmet Arslan iori...@yahoo.com.invalid
 
  wrote:
 
   Hi,
  
   Is this sharded  query?
  
   Ahmet
  
  
   On Monday, December 22, 2014 4:47 PM, Ankit Jain 
  ankitjainc...@gmail.com
   wrote:
   Hi All,
  
   We are getting inconsistent search result on searching on *multivalued*
   field:
  
   *Input Query:*
   ( t : [ 0 TO 1419245069253 ] )AND(_all:impetus-i0111.impetus.co.in)
  
   The _all field is multivalued field.
  
   The above query is returning sometimes 11 records and sometimes 12471
   records.
  
   Please help.
  
   Thanks,

   Ankit
  
 
 
 
  --
  Thanks,
 
  Ankit Jain
 



 --
 Thanks,
 Ankit Jain




 --
 Thanks,
 Ankit Jain


Re: How best to fork Solr for enhancement

2014-12-23 Thread Upayavira
I'm somewhat open to other suggestions, as I'm right at the beginning of
the project. I know Angular, and like it. I've looked at a couple of
others, but have found them to be more of a collection of disparate
components and not as integrated as Angular.

However, if folks want to have a discussion on competing frameworks, I'm
at least prepared to listen!!

Note - the design goal is to make it as easy for *Java* developers to
work with. Folks who are typically back-end developers, thus the
framework must isolate the developer from UI quirks as much as possible,
and handle have some form of design abstraction.

Upayavira

On Tue, Dec 23, 2014, at 10:09 AM, Alexandre Rafalovitch wrote:
 Semi Off Topic, but is AngularJS the best next choice, given the
 version 2 being so different from version 1?
 
 Regards,
Alex.
 
 Sign up for my Solr resources newsletter at http://www.solr-start.com/
 
 
 On 23 December 2014 at 06:52, Upayavira u...@odoko.co.uk wrote:
  Hi,
 
  I've (hopefully) made some time to do some work on the Solr Admin UI
  (convert it to AngularJS). I plan to do it on a clone of the lucene-solr
  project at GitHub.
 
  Before I dive too thoroughly into this, I wanted to see if there were
  any best practices that would make it easier to back-port these changes
  into SVN should I actually succeed at producing something useful. Is it
  enough just to make a branch called SOLR-5507 and start committing my
  changes there?
 
  Periodically, I'll zip up the relevant bits and attach them to the JIRA
  ticket.
 
  TIA
 
  Upayavira


Re: UI for Solr

2014-12-23 Thread Olivier Austina
Hi Alex,

Thank you for prompt reply. I am not aware of Spring.io's Spring Data Solr.

Regards
Olivier


2014-12-23 16:50 GMT+01:00 Alexandre Rafalovitch arafa...@gmail.com:

 You don't expose Solr directly to the user, it is not setup for
 full-proof security out of the box. So you would need a client to talk
 to Solr.

 Something like Spring.io's Spring Data Solr could be one of the things
 to check. You can see an auto-complete example for it at:
 https://github.com/arafalov/Solr-Javadoc/tree/master/SearchServer/src/main
 and embedded in action at
 http://www.solr-start.com/javadoc/solr-lucene/index.html (search box
 on the top)

 Regards,
Alex.
 
 Sign up for my Solr resources newsletter at http://www.solr-start.com/


 On 23 December 2014 at 10:45, Olivier Austina olivier.aust...@gmail.com
 wrote:
  Hi,
 
  I would like to build a User Interface on top of Solr for PC and mobile.
 I
  am wondering if there is a framework, best practice commonly used. I want
  Solr features such as suggestion, auto complete, facet to be available
 for
  UI. Any suggestion is welcome. Than you.
 
  Regards
  Olivier



Solr Cloud and relative paths in solrconfig.xml lib directives

2014-12-23 Thread Jens Ivar Jørdre
Hi all,

I seek some advice on the use of lib directives in solrconfig.xml in Solr 
Cloud. My project has been tested with Solr 4.10.2 and run nicely on single 
node with the included Jetty. The setup adds a DataImportHandler request 
handler to solrconfig.xml. It also adds a lib directive to solconfig.xml to 
pick up dataimporhandler jars from «../../../dist».

Now, in migrating this setup to Solr Cloud I upconfig the configuration to 
ZooKeeper and create collection with the collections API’s CREATE action. The 
problem with this approach is that the relative path to dist in the lib 
directive does not resolve correctly.

failure: {
: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error 
CREATEing SolrCore 'cloudcollection1_shard2_replica2': Unable to create core 
[cloudcollection1_shard2_replica2] Caused by: 
org.apache.solr.handler.dataimport.DataImportHandler
}

and the logs reveal that class 
org.apache.solr.handler.dataimport.DataImportHandler is yet to be found. Then, 
revamping my lib directive with absolute path to dist directory that includes 
the dataimporthandler jars, another upconfig and collection creation anew 
successfully creates the collection.

Is this intentional behavior forcing the use of absolute paths, or is it 
possible to use relative path to dist and contrib directories on solrconfig.xml 
in Cloud mode?

--
Sincerely,
Jens Ivar Jørdre
about.me/jijordre http://about.me/jijordre


Re: Solr server becomes non-responsive.

2014-12-23 Thread Dominique Bejean
Hi,

I agree Erick it could be a good think to have more details about your
configuration and collection.

Your heap size is 32Gb. How many RAM on each servers ?

By « 4 shard Solr cluster », you mean a 4 nodes Solr servers or a
collection with 4 shards ?

So, how many nodes in the cluster ?
How many shards and replicas for the collection ?
How many items in the collection ?
What is the size of the index ?
How is updated the collection (frequency, how many items per days, what is
your hard commit strategy) ?
How are configured cache in solrconfig.xml ?
Can you provide all other JVM parameters ?

Regards

Dominique

2014-12-23 17:50 GMT+01:00 Erick Erickson erickerick...@gmail.com:

 Second most important part of your message:
 When executing a huge query with many wildcards inside it the server

 This is usually an anti-pattern. The very first thing
 I'd be doing is trying to not do this. See ngrams for infix
 queries, or shingles or ReverseWildcardFilterFactory or.

 And if your corpus is very large with many unique terms it's even
 worse, but you haven't really told us about that yet.

 Best,
 Erick

 On Tue, Dec 23, 2014 at 8:30 AM, Shawn Heisey apa...@elyograg.org wrote:
  On 12/23/2014 4:34 AM, Modassar Ather wrote:
  Hi,
 
  I have a setup of 4 shard Solr cluster with embedded zookeeper on one of
  them. The zkClient time out is set to 30 seconds, -Xms is 20g and -Xms
 is
  24g.
  When executing a huge query with many wildcards inside it the server
  crashes and becomes non-responsive. Even the dashboard does not responds
  and shows connection lost error. This requires me to restart the
 servers.
 
  Here's the important part of your message:
 
  *Caused by: java.lang.OutOfMemoryError: Java heap space*
 
 
  Your heap is not big enough for what Solr has been asked to do.  You
  need to either increase your heap size or change your configuration so
  that it uses less memory.
 
  http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
 
  Most programs have pretty much undefined behavior when an OOME occurs.
  Lucene's IndexWriter has been hardened so that it tries extremely hard
  to avoid index corruption when OOME strikes, and I believe that works
  well enough that we can call it nearly bulletproof ... but the rest of
  Lucene and Solr will make no guarantees.
 
  It's very difficult to have definable program behavior when an OOME
  happens, because you simply cannot know the precise point during program
  execution where it will happen, or what isn't going to work because Java
  did not have memory space to create an object.  Going unresponsive is
  not surprising.
 
  If you can solve your heap problem, note that you may run into other
  performance issues discussed on the wiki page that I linked.
 
  Thanks,
  Shawn
 



Re: Solr Cloud and relative paths in solrconfig.xml lib directives

2014-12-23 Thread Dominique Bejean
Hi,

I use to put all dependency jar files (dih, adbc driver, …) in a lib
directory in the solr home directory where your shard are created.

something like this

solr/
solr.xml
cloudcollection1_shard2_replica2/
lib/


In solrconfig.xml, I remove all the lib … directives except this one lib
 dir=../lib /
You may need to restart your nodes after creating your lib directory

Regards

Dominique

2014-12-23 21:25 GMT+01:00 Jens Ivar Jørdre jijor...@gmail.com:

 Hi all,

 I seek some advice on the use of lib directives in solrconfig.xml in Solr
 Cloud. My project has been tested with Solr 4.10.2 and run nicely on single
 node with the included Jetty. The setup adds a DataImportHandler request
 handler to solrconfig.xml. It also adds a lib directive to solconfig.xml to
 pick up dataimporhandler jars from «../../../dist».

 Now, in migrating this setup to Solr Cloud I upconfig the configuration to
 ZooKeeper and create collection with the collections API’s CREATE action.
 The problem with this approach is that the relative path to dist in the lib
 directive does not resolve correctly.

 failure: {
 :
 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
 CREATEing SolrCore 'cloudcollection1_shard2_replica2': Unable to create
 core [cloudcollection1_shard2_replica2] Caused by:
 org.apache.solr.handler.dataimport.DataImportHandler
 }

 and the logs reveal that class
 org.apache.solr.handler.dataimport.DataImportHandler is yet to be found.
 Then, revamping my lib directive with absolute path to dist directory that
 includes the dataimporthandler jars, another upconfig and collection
 creation anew successfully creates the collection.

 Is this intentional behavior forcing the use of absolute paths, or is it
 possible to use relative path to dist and contrib directories on
 solrconfig.xml in Cloud mode?

 --
 Sincerely,
 Jens Ivar Jørdre
 about.me/jijordre http://about.me/jijordre



Re: Solr Cloud and relative paths in solrconfig.xml lib directives

2014-12-23 Thread Shalin Shekhar Mangar
I think you may be running into a bug which was reported an hour back. See
https://issues.apache.org/jira/browse/SOLR-6887

On Tue, Dec 23, 2014 at 8:25 PM, Jens Ivar Jørdre jijor...@gmail.com
wrote:

 Hi all,

 I seek some advice on the use of lib directives in solrconfig.xml in Solr
 Cloud. My project has been tested with Solr 4.10.2 and run nicely on single
 node with the included Jetty. The setup adds a DataImportHandler request
 handler to solrconfig.xml. It also adds a lib directive to solconfig.xml to
 pick up dataimporhandler jars from «../../../dist».

 Now, in migrating this setup to Solr Cloud I upconfig the configuration to
 ZooKeeper and create collection with the collections API’s CREATE action.
 The problem with this approach is that the relative path to dist in the lib
 directive does not resolve correctly.

 failure: {
 :
 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
 CREATEing SolrCore 'cloudcollection1_shard2_replica2': Unable to create
 core [cloudcollection1_shard2_replica2] Caused by:
 org.apache.solr.handler.dataimport.DataImportHandler
 }

 and the logs reveal that class
 org.apache.solr.handler.dataimport.DataImportHandler is yet to be found.
 Then, revamping my lib directive with absolute path to dist directory that
 includes the dataimporthandler jars, another upconfig and collection
 creation anew successfully creates the collection.

 Is this intentional behavior forcing the use of absolute paths, or is it
 possible to use relative path to dist and contrib directories on
 solrconfig.xml in Cloud mode?

 --
 Sincerely,
 Jens Ivar Jørdre
 about.me/jijordre http://about.me/jijordre




-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr Cloud and relative paths in solrconfig.xml lib directives

2014-12-23 Thread Shawn Heisey
On 12/23/2014 1:25 PM, Jens Ivar Jørdre wrote:
 I seek some advice on the use of lib directives in solrconfig.xml in Solr 
 Cloud. My project has been tested with Solr 4.10.2 and run nicely on single 
 node with the included Jetty. The setup adds a DataImportHandler request 
 handler to solrconfig.xml. It also adds a lib directive to solconfig.xml to 
 pick up dataimporhandler jars from «../../../dist».

Dominique's answer is the best approach ... but you should remove *all*
lib directives from solrconfig.xml.  You don't even need the directive
that he mentioned with ../lib.

Just create a lib directory in the same place as your solr.xml and put
all the extra jars needed by all your collections in that directory. 
Make sure that all other copies of those jars are not on your classpath.

As of Solr 4.3 (from what I remember, that's the right version),
${solr.solr.home}/lib is automatically included by the resource loader. 
Prior to that version, you had to include sharedLib=lib in solr.xml.  I
ran into a problem related to this, a problem that was declared to NOT
be a bug:

https://issues.apache.org/jira/browse/SOLR-4852?focusedCommentId=13820197page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13820197

Thanks,
Shawn



Re: Solr Cloud and relative paths in solrconfig.xml lib directives

2014-12-23 Thread Erick Erickson
You ought to be able to specify an environment var
as well. S you have something like this in your
solrocnfig.xml file:
 lib dir=${solr.install.dir:../../..}/contrib/clustering/lib/
regex=.*\.jar /

and define solr.install.dir as a system var when you
invoke Solr.

Best,
Erick

On Tue, Dec 23, 2014 at 2:05 PM, Shawn Heisey apa...@elyograg.org wrote:
 On 12/23/2014 1:25 PM, Jens Ivar Jørdre wrote:
 I seek some advice on the use of lib directives in solrconfig.xml in Solr 
 Cloud. My project has been tested with Solr 4.10.2 and run nicely on single 
 node with the included Jetty. The setup adds a DataImportHandler request 
 handler to solrconfig.xml. It also adds a lib directive to solconfig.xml to 
 pick up dataimporhandler jars from «../../../dist».

 Dominique's answer is the best approach ... but you should remove *all*
 lib directives from solrconfig.xml.  You don't even need the directive
 that he mentioned with ../lib.

 Just create a lib directory in the same place as your solr.xml and put
 all the extra jars needed by all your collections in that directory.
 Make sure that all other copies of those jars are not on your classpath.

 As of Solr 4.3 (from what I remember, that's the right version),
 ${solr.solr.home}/lib is automatically included by the resource loader.
 Prior to that version, you had to include sharedLib=lib in solr.xml.  I
 ran into a problem related to this, a problem that was declared to NOT
 be a bug:

 https://issues.apache.org/jira/browse/SOLR-4852?focusedCommentId=13820197page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13820197

 Thanks,
 Shawn



Re: Solr server becomes non-responsive.

2014-12-23 Thread Modassar Ather
Thanks for your suggestions.

I will look into the link provided.
http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

This is usually an anti-pattern. The very first thing
I'd be doing is trying to not do this. See ngrams for infix
queries, or shingles or ReverseWildcardFilterFactory or.

We cannot avoid multiple wildcards since that's is our user's requirement.
We try to discourage it but the users insist on firing such queries. Also,
ngrams etc. can be tried but our index is already huge and ngrams may
further add lot to it. We are OK with such queries failing as long as other
queries are not affected.


Please find the details below.

So, how many nodes in the cluster ?
There are total 4 nodes on the cluster.

How many shards and replicas for the collection ?
There are 4 shards and no replica for any of them.

How many items in the collection ?
If I understand the question correctly there are two collection on each
node and there size on each node is approximately 190GB and 130GB.

What is the size of the index ?
There are two collection on each node and there size on each node is
approximately 190GB and 130GB.

How is updated the collection (frequency, how many items per days, what is
your hard commit strategy) ?
It is an optimized index and read-only. There are no inter-mediate update.

How are configured cache in solrconfig.xml ?
Filter cache, query result cache and document cache are enabled.
Auto-warming is also done.

Can you provide all other JVM parameters ?
-Xms20g -Xmx24g -XX:+UseConcMarkSweepGC

Thanks again,
Modassar

On Wed, Dec 24, 2014 at 2:29 AM, Dominique Bejean dominique.bej...@eolya.fr
 wrote:

 Hi,

 I agree Erick it could be a good think to have more details about your
 configuration and collection.

 Your heap size is 32Gb. How many RAM on each servers ?

 By « 4 shard Solr cluster », you mean a 4 nodes Solr servers or a
 collection with 4 shards ?

 So, how many nodes in the cluster ?
 How many shards and replicas for the collection ?
 How many items in the collection ?
 What is the size of the index ?
 How is updated the collection (frequency, how many items per days, what is
 your hard commit strategy) ?
 How are configured cache in solrconfig.xml ?
 Can you provide all other JVM parameters ?

 Regards

 Dominique

 2014-12-23 17:50 GMT+01:00 Erick Erickson erickerick...@gmail.com:

  Second most important part of your message:
  When executing a huge query with many wildcards inside it the server
 
  This is usually an anti-pattern. The very first thing
  I'd be doing is trying to not do this. See ngrams for infix
  queries, or shingles or ReverseWildcardFilterFactory or.
 
  And if your corpus is very large with many unique terms it's even
  worse, but you haven't really told us about that yet.
 
  Best,
  Erick
 
  On Tue, Dec 23, 2014 at 8:30 AM, Shawn Heisey apa...@elyograg.org
 wrote:
   On 12/23/2014 4:34 AM, Modassar Ather wrote:
   Hi,
  
   I have a setup of 4 shard Solr cluster with embedded zookeeper on one
 of
   them. The zkClient time out is set to 30 seconds, -Xms is 20g and -Xms
  is
   24g.
   When executing a huge query with many wildcards inside it the server
   crashes and becomes non-responsive. Even the dashboard does not
 responds
   and shows connection lost error. This requires me to restart the
  servers.
  
   Here's the important part of your message:
  
   *Caused by: java.lang.OutOfMemoryError: Java heap space*
  
  
   Your heap is not big enough for what Solr has been asked to do.  You
   need to either increase your heap size or change your configuration so
   that it uses less memory.
  
   http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
  
   Most programs have pretty much undefined behavior when an OOME occurs.
   Lucene's IndexWriter has been hardened so that it tries extremely hard
   to avoid index corruption when OOME strikes, and I believe that works
   well enough that we can call it nearly bulletproof ... but the rest of
   Lucene and Solr will make no guarantees.
  
   It's very difficult to have definable program behavior when an OOME
   happens, because you simply cannot know the precise point during
 program
   execution where it will happen, or what isn't going to work because
 Java
   did not have memory space to create an object.  Going unresponsive is
   not surprising.
  
   If you can solve your heap problem, note that you may run into other
   performance issues discussed on the wiki page that I linked.
  
   Thanks,
   Shawn
  
 



On Wed, Dec 24, 2014 at 2:29 AM, Dominique Bejean dominique.bej...@eolya.fr
 wrote:

 Hi,

 I agree Erick it could be a good think to have more details about your
 configuration and collection.

 Your heap size is 32Gb. How many RAM on each servers ?

 By « 4 shard Solr cluster », you mean a 4 nodes Solr servers or a
 collection with 4 shards ?

 So, how many nodes in the cluster ?
 How many shards and replicas for the collection ?
 How many items in the 

Re: Loading data to FieldValueCache

2014-12-23 Thread Manohar Sripada
Thanks Erick and Toke,

Also, I read here https://wiki.apache.org/solr/SolrCaching#filterCache that,
filterCache can also be used for faceting with facet.method=enum. So, I am
bit confused here on which one to use for faceting.

One more thing here is I have different types of facets. (For example -
Product List, States). The Product List facet has lot many unique values
(around 10 million), where as States list will be in hundreds. So, I want
to come up with the numbers for size of fieldValueCache/filterCache and
pre-populate this.

Thanks,
Manohar

On Tue, Dec 23, 2014 at 10:07 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Or just not worry about it. The cache will be filled up automatically
 as you query for facets etc., the benefit to trying to fill it up as
 Toke outlines is just that the first few user queries that call for
 faceting will be somewhat faster. But after the first few user
 queries have gone through, it won't matter whether you've
 pre-loaded the cache or not.

 My point is that you'll get the benefit of the cache no matter what,
 it's just a matter of whether it's important that the first few users
 don't have to wait while they're loaded. And with DocValues,
 as Toke recommends, even that may be unimportant.

 Best,
 Erick

 On Tue, Dec 23, 2014 at 1:03 AM, Toke Eskildsen t...@statsbiblioteket.dk
 wrote:
  Manohar Sripada [manohar...@gmail.com] wrote:
  From the wiki, it states that
  http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly used
 for
  faceting.
 
  Can someone please throw some light on how to load data to this cache.
 Like
  on what solrquery option does this consider the data to be loaded to
 this
  cache.
 
  The values are loaded on first facet call with facet.method=fc.
  http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
 
  My requirement is I have 10 facet fields (with facetlimit - 5) to be
 shown
  in my UI. I want to speed up this by using this cache. Is there a way
 where
  I can specify only the list of fields to be loaded to FieldValue Cache?
 
  Add a facet call as explicit warmup in your solrconfig.xml.
 
  You might want to consider DocValues for your facet fields.
  https://cwiki.apache.org/confluence/display/solr/DocValues
 
  - Toke Eskildsen



Solr Date Range not returning results for last 1 month

2014-12-23 Thread Yavar Husain
So my Solr date range query is as follows:

facet.range=datefacet.range.start=NOW/DAY-36MONTHfacet.range.end=NOW/DAYfacet.range.gap=%2B1MONTH

I need facets for past 36 months or 3 year and everything is fine except
for data not being returned for last 1 month,

However the facets I am getting for the date is till last month, say today
is 24th December and I am getting it till 24th November. How should I
modify my query to obtain results till today? Tried a few options using HIT
and TRIAL :) but could not arrive at a solution.

Appreciate the help in this regard.


Re: Loading data to FieldValueCache

2014-12-23 Thread Erick Erickson
By and large, don't use the enum method unless there are _very_
few unique values. It forms a filter (size roughly mixDoc/8 bytes)
for _every_ unique value in the field, i.e. if you have 10,000 unique
values it'll try to form 10,000 filterCache entries. Let the system
do this for you automatically if appropriate.

Best,
Erick

On Tue, Dec 23, 2014 at 9:37 PM, Manohar Sripada manohar...@gmail.com wrote:
 Thanks Erick and Toke,

 Also, I read here https://wiki.apache.org/solr/SolrCaching#filterCache that,
 filterCache can also be used for faceting with facet.method=enum. So, I am
 bit confused here on which one to use for faceting.

 One more thing here is I have different types of facets. (For example -
 Product List, States). The Product List facet has lot many unique values
 (around 10 million), where as States list will be in hundreds. So, I want
 to come up with the numbers for size of fieldValueCache/filterCache and
 pre-populate this.

 Thanks,
 Manohar

 On Tue, Dec 23, 2014 at 10:07 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 Or just not worry about it. The cache will be filled up automatically
 as you query for facets etc., the benefit to trying to fill it up as
 Toke outlines is just that the first few user queries that call for
 faceting will be somewhat faster. But after the first few user
 queries have gone through, it won't matter whether you've
 pre-loaded the cache or not.

 My point is that you'll get the benefit of the cache no matter what,
 it's just a matter of whether it's important that the first few users
 don't have to wait while they're loaded. And with DocValues,
 as Toke recommends, even that may be unimportant.

 Best,
 Erick

 On Tue, Dec 23, 2014 at 1:03 AM, Toke Eskildsen t...@statsbiblioteket.dk
 wrote:
  Manohar Sripada [manohar...@gmail.com] wrote:
  From the wiki, it states that
  http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly used
 for
  faceting.
 
  Can someone please throw some light on how to load data to this cache.
 Like
  on what solrquery option does this consider the data to be loaded to
 this
  cache.
 
  The values are loaded on first facet call with facet.method=fc.
  http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
 
  My requirement is I have 10 facet fields (with facetlimit - 5) to be
 shown
  in my UI. I want to speed up this by using this cache. Is there a way
 where
  I can specify only the list of fields to be loaded to FieldValue Cache?
 
  Add a facet call as explicit warmup in your solrconfig.xml.
 
  You might want to consider DocValues for your facet fields.
  https://cwiki.apache.org/confluence/display/solr/DocValues
 
  - Toke Eskildsen



Re: Solr Date Range not returning results for last 1 month

2014-12-23 Thread Erick Erickson
Hmmm, not quite sure what's going on here, but try an end
time of NOW/MONTH+1MONTH with the usual escaping of the
plus sign...

Best,
Erick

On Tue, Dec 23, 2014 at 9:55 PM, Yavar Husain yavarhus...@gmail.com wrote:
 So my Solr date range query is as follows:

 facet.range=datefacet.range.start=NOW/DAY-36MONTHfacet.range.end=NOW/DAYfacet.range.gap=%2B1MONTH

 I need facets for past 36 months or 3 year and everything is fine except
 for data not being returned for last 1 month,

 However the facets I am getting for the date is till last month, say today
 is 24th December and I am getting it till 24th November. How should I
 modify my query to obtain results till today? Tried a few options using HIT
 and TRIAL :) but could not arrive at a solution.

 Appreciate the help in this regard.


Re: Solr Date Range not returning results for last 1 month

2014-12-23 Thread Yavar Husain
Thanks Erick. That works!

Will check some other time as to why NOW/DAY does not work.

Regards,
Yavar

On Wed, Dec 24, 2014 at 11:39 AM, Erick Erickson erickerick...@gmail.com
wrote:

 Hmmm, not quite sure what's going on here, but try an end
 time of NOW/MONTH+1MONTH with the usual escaping of the
 plus sign...

 Best,
 Erick

 On Tue, Dec 23, 2014 at 9:55 PM, Yavar Husain yavarhus...@gmail.com
 wrote:
  So my Solr date range query is as follows:
 
 
 facet.range=datefacet.range.start=NOW/DAY-36MONTHfacet.range.end=NOW/DAYfacet.range.gap=%2B1MONTH
 
  I need facets for past 36 months or 3 year and everything is fine except
  for data not being returned for last 1 month,
 
  However the facets I am getting for the date is till last month, say
 today
  is 24th December and I am getting it till 24th November. How should I
  modify my query to obtain results till today? Tried a few options using
 HIT
  and TRIAL :) but could not arrive at a solution.
 
  Appreciate the help in this regard.



Re: Loading data to FieldValueCache

2014-12-23 Thread Manohar Sripada
Okay. Let me try like this, as mine is a read-only index. I will have some
queries in firstSearcher event listener
1) q=*:*facet=truefacet.method=enumfacet.field=state   -- To load all
the state related unique values to filterCache.
Will it use filterCache when I sent a query with filter, eg:
fq=state:CA ??
Once it is loaded, Do I need to sent a query with facet.method=enum
every time along with facet.field=state to get state related facet data
from filterCache?

2) q=*:*facet=truefacet.method=fcfacet.field=products  -- To load the
values related to products to fieldCache.
 Again, while querying for this facet do I need to sent
facet.method=fc every time?

Thanks,
Manohar

On Wed, Dec 24, 2014 at 11:36 AM, Erick Erickson erickerick...@gmail.com
wrote:

 By and large, don't use the enum method unless there are _very_
 few unique values. It forms a filter (size roughly mixDoc/8 bytes)
 for _every_ unique value in the field, i.e. if you have 10,000 unique
 values it'll try to form 10,000 filterCache entries. Let the system
 do this for you automatically if appropriate.

 Best,
 Erick

 On Tue, Dec 23, 2014 at 9:37 PM, Manohar Sripada manohar...@gmail.com
 wrote:
  Thanks Erick and Toke,
 
  Also, I read here https://wiki.apache.org/solr/SolrCaching#filterCache
 that,
  filterCache can also be used for faceting with facet.method=enum. So, I
 am
  bit confused here on which one to use for faceting.
 
  One more thing here is I have different types of facets. (For example -
  Product List, States). The Product List facet has lot many unique values
  (around 10 million), where as States list will be in hundreds. So, I want
  to come up with the numbers for size of fieldValueCache/filterCache and
  pre-populate this.
 
  Thanks,
  Manohar
 
  On Tue, Dec 23, 2014 at 10:07 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  Or just not worry about it. The cache will be filled up automatically
  as you query for facets etc., the benefit to trying to fill it up as
  Toke outlines is just that the first few user queries that call for
  faceting will be somewhat faster. But after the first few user
  queries have gone through, it won't matter whether you've
  pre-loaded the cache or not.
 
  My point is that you'll get the benefit of the cache no matter what,
  it's just a matter of whether it's important that the first few users
  don't have to wait while they're loaded. And with DocValues,
  as Toke recommends, even that may be unimportant.
 
  Best,
  Erick
 
  On Tue, Dec 23, 2014 at 1:03 AM, Toke Eskildsen t...@statsbiblioteket.dk
 
  wrote:
   Manohar Sripada [manohar...@gmail.com] wrote:
   From the wiki, it states that
   http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly
 used
  for
   faceting.
  
   Can someone please throw some light on how to load data to this
 cache.
  Like
   on what solrquery option does this consider the data to be loaded to
  this
   cache.
  
   The values are loaded on first facet call with facet.method=fc.
   http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
  
   My requirement is I have 10 facet fields (with facetlimit - 5) to be
  shown
   in my UI. I want to speed up this by using this cache. Is there a way
  where
   I can specify only the list of fields to be loaded to FieldValue
 Cache?
  
   Add a facet call as explicit warmup in your solrconfig.xml.
  
   You might want to consider DocValues for your facet fields.
   https://cwiki.apache.org/confluence/display/solr/DocValues
  
   - Toke Eskildsen
 



Re: Solr server becomes non-responsive.

2014-12-23 Thread Dominique Bejean
Modassar,

How many items in the collection ?
I mean how many documents per collection ? 1 million, 10 millions, …?

How are configured cache in solrconfig.xml ?
What are the size attribute value for each cache ?

Can you provide a sample of the query ?
Does it fail immediately after solrcloud startup or after several hours ?

Dominique

2014-12-24 6:20 GMT+01:00 Modassar Ather modather1...@gmail.com:

 Thanks for your suggestions.

 I will look into the link provided.
 http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

 This is usually an anti-pattern. The very first thing
 I'd be doing is trying to not do this. See ngrams for infix
 queries, or shingles or ReverseWildcardFilterFactory or.

 We cannot avoid multiple wildcards since that's is our user's requirement.
 We try to discourage it but the users insist on firing such queries. Also,
 ngrams etc. can be tried but our index is already huge and ngrams may
 further add lot to it. We are OK with such queries failing as long as other
 queries are not affected.


 Please find the details below.

 So, how many nodes in the cluster ?
 There are total 4 nodes on the cluster.

 How many shards and replicas for the collection ?
 There are 4 shards and no replica for any of them.

 How many items in the collection ?
 If I understand the question correctly there are two collection on each
 node and there size on each node is approximately 190GB and 130GB.

 What is the size of the index ?
 There are two collection on each node and there size on each node is
 approximately 190GB and 130GB.

 How is updated the collection (frequency, how many items per days, what is
 your hard commit strategy) ?
 It is an optimized index and read-only. There are no inter-mediate update.

 How are configured cache in solrconfig.xml ?
 Filter cache, query result cache and document cache are enabled.
 Auto-warming is also done.

 Can you provide all other JVM parameters ?
 -Xms20g -Xmx24g -XX:+UseConcMarkSweepGC

 Thanks again,
 Modassar

 On Wed, Dec 24, 2014 at 2:29 AM, Dominique Bejean 
 dominique.bej...@eolya.fr
  wrote:

  Hi,
 
  I agree Erick it could be a good think to have more details about your
  configuration and collection.
 
  Your heap size is 32Gb. How many RAM on each servers ?
 
  By « 4 shard Solr cluster », you mean a 4 nodes Solr servers or a
  collection with 4 shards ?
 
  So, how many nodes in the cluster ?
  How many shards and replicas for the collection ?
  How many items in the collection ?
  What is the size of the index ?
  How is updated the collection (frequency, how many items per days, what
 is
  your hard commit strategy) ?
  How are configured cache in solrconfig.xml ?
  Can you provide all other JVM parameters ?
 
  Regards
 
  Dominique
 
  2014-12-23 17:50 GMT+01:00 Erick Erickson erickerick...@gmail.com:
 
   Second most important part of your message:
   When executing a huge query with many wildcards inside it the server
  
   This is usually an anti-pattern. The very first thing
   I'd be doing is trying to not do this. See ngrams for infix
   queries, or shingles or ReverseWildcardFilterFactory or.
  
   And if your corpus is very large with many unique terms it's even
   worse, but you haven't really told us about that yet.
  
   Best,
   Erick
  
   On Tue, Dec 23, 2014 at 8:30 AM, Shawn Heisey apa...@elyograg.org
  wrote:
On 12/23/2014 4:34 AM, Modassar Ather wrote:
Hi,
   
I have a setup of 4 shard Solr cluster with embedded zookeeper on
 one
  of
them. The zkClient time out is set to 30 seconds, -Xms is 20g and
 -Xms
   is
24g.
When executing a huge query with many wildcards inside it the server
crashes and becomes non-responsive. Even the dashboard does not
  responds
and shows connection lost error. This requires me to restart the
   servers.
   
Here's the important part of your message:
   
*Caused by: java.lang.OutOfMemoryError: Java heap space*
   
   
Your heap is not big enough for what Solr has been asked to do.  You
need to either increase your heap size or change your configuration
 so
that it uses less memory.
   
http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
   
Most programs have pretty much undefined behavior when an OOME
 occurs.
Lucene's IndexWriter has been hardened so that it tries extremely
 hard
to avoid index corruption when OOME strikes, and I believe that works
well enough that we can call it nearly bulletproof ... but the rest
 of
Lucene and Solr will make no guarantees.
   
It's very difficult to have definable program behavior when an OOME
happens, because you simply cannot know the precise point during
  program
execution where it will happen, or what isn't going to work because
  Java
did not have memory space to create an object.  Going unresponsive is
not surprising.
   
If you can solve your heap problem, note that you may run into other
performance issues discussed