org.eclipse.jetty.io.EofException: early EOF

2014-02-20 Thread lalitjangra
Hi, 

I am using solr 4.6 along with Apache Manifold CF 1.4.1 to index alfresco
cms repository. While indexing alfresco i am getting below error in solr
logs while indexing media content such as image or video. 

ERROR - 2014-02-20 12:50:45.108; org.apache.solr.common.SolrException;
null:org.apache.commons.fileupload.FileUploadBase$IOFileUploadException:
Processing of multipart/form-data request failed. early EOF

at
org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:367)

at
org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126)

at
org.apache.solr.servlet.SolrRequestParsers$MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:547)

at
org.apache.solr.servlet.SolrRequestParsers$StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:681)

at
org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:150)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:393)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)

at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)

at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)

at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)

at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)

at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)

at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)

at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)

at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)

at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)

at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)

at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)

at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)

at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)

at org.eclipse.jetty.server.Server.handle(Server.java:368)

at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)

at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)

at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)

at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)

at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:636)

at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)

at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)

at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)

at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)

at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)

at java.lang.Thread.run(Unknown Source)

Caused by: org.eclipse.jetty.io.EofException: early EOF

at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:65)

at java.io.FilterInputStream.read(Unknown Source)

at
org.apache.commons.fileupload.util.LimitedInputStream.read(LimitedInputStream.java:125)

at
org.apache.commons.fileupload.MultipartStream$ItemInputStream.makeAvailable(MultipartStream.java:977)

at
org.apache.commons.fileupload.MultipartStream$ItemInputStream.read(MultipartStream.java:887)

at java.io.InputStream.read(Unknown Source)

at org.apache.commons.fileupload.util.Streams.copy(Streams.java:94)

at org.apache.commons.fileupload.util.Streams.copy(Streams.java:64)

at
org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:362)

... 31 more


ERROR - 2014-02-20 12:50:45.117; org.apache.solr.common.SolrException;
null:org.apache.commons.fileupload.FileUploadBase$IOFileUploadException:
Processing of multipart/form-data request failed. early EOF

at
org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:367)

at
org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126)

at
org.apache.solr.servlet.SolrRequestParsers$MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:547)

at
org.apache.solr.servlet.SolrRequestParsers$StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:681)

   

Group on multiple fields in a sharded environment

2014-02-20 Thread lboutros
Dear all,

I would like to group my query results on two different fields (not at the
same time).
I also would like to get the exact group count.
And I'm working with a sharded index.

I know that to get the exact group count, all documents from a group must be
indexed in a unique shard.

Now, is there a good magic way other than create two collections with two
different sharding keys ?
I could as well imagine a sort of document balancer which could move
(reindex) the documents by changing the sharding key, but it could be quite
complex and indexing performances could be bad I think.

Any advice ?

Ludovic.




-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Group-on-multiple-fields-in-a-sharded-environment-tp4118527.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Additive boost function

2014-02-20 Thread Zwer
Jack, 

Could you, please, suggest how to use SOLR query functions to make all
fields boosts added on such query as I specified in the topic ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Additive-boost-function-tp4118066p4118537.html
Sent from the Solr - User mailing list archive at Nabble.com.


Grouping performance improvement

2014-02-20 Thread soodyogesh
Im facing slow performance for query where im grouping on a field while
querying.

Size of index 57 million records, and we would be targeting 100 million + 

Im using grouping to create category based autosuggest.

so when user press a

I go and search for a and group by field say products. Now i have noticed
performance of query is really get bad with group by clause.

Im at experimental stage so I can change schema or try other alternative.

Please let me know if there are way to cleverly design your schema to
improve performance  or im meeting some option to fine tune.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Grouping-performance-improvement-tp4118549.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Grouping performance improvement

2014-02-20 Thread Alexey Kozhemiakin
You can think of using facets by category field instead of grouping. It will be 
faster and categorization can be done against multiple category fields. Try 
different facet methods.

If you don't need number of documents in each category and number of unique 
categories  is relatively low, you might be interested in following performance 
improvement https://issues.apache.org/jira/browse/SOLR-5725 


Alexey

-Original Message-
From: soodyogesh [mailto:soodyog...@gmail.com] 
Sent: Thursday, February 20, 2014 16:20
To: solr-user@lucene.apache.org
Subject: Grouping performance improvement

Im facing slow performance for query where im grouping on a field while 
querying.

Size of index 57 million records, and we would be targeting 100 million + 

Im using grouping to create category based autosuggest.

so when user press a

I go and search for a and group by field say products. Now i have noticed 
performance of query is really get bad with group by clause.

Im at experimental stage so I can change schema or try other alternative.

Please let me know if there are way to cleverly design your schema to improve 
performance  or im meeting some option to fine tune.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Grouping-performance-improvement-tp4118549.html
Sent from the Solr - User mailing list archive at Nabble.com.


Contract JOB** Austin TX-- SOLR/Zookeeper expert

2014-02-20 Thread FiservBrian
We are looking for a expert in SOLR with Zookeeper architecture. This gig
will be to review current architecture and implementation, look for and fix
issue with ingestion or search. Provide recommendations for future evolution
of the stack. If you are interested, please contact Brian, VP of Solution
Delivery @ Fiserv.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Contract-JOB-Austin-TX-SOLR-Zookeeper-expert-tp4118568.html
Sent from the Solr - User mailing list archive at Nabble.com.


CloudServer 4.2.1 and SolrCloud 4.3.1

2014-02-20 Thread KNitin
Hi

 I have a question on CloudServer client for solrcloud. How does
CloudServer route requests to solr?  Does it use round robin internally or
does it take into account any other parameter for the node (example how
many replicas it has, etc) ?


Thanks
Nitin


Re: org.eclipse.jetty.io.EofException: early EOF

2014-02-20 Thread Shawn Heisey

On 2/20/2014 1:41 AM, lalitjangra wrote:

I am using solr 4.6 along with Apache Manifold CF 1.4.1 to index alfresco
cms repository. While indexing alfresco i am getting below error in solr
logs while indexing media content such as image or video.

ERROR - 2014-02-20 12:50:45.108; org.apache.solr.common.SolrException;
null:org.apache.commons.fileupload.FileUploadBase$IOFileUploadException:
Processing of multipart/form-data request failed. early EOF


snip


Caused by: org.eclipse.jetty.io.EofException: early EOF



Every time I've seen EofException, it's caused when the client 
connection to Solr terminates the TCP connection early, before Solr has 
seen the whole request or before Solr has processed it and sends a response.


ManifoldCF probably has the socket timeout configured on its HTTP 
client, likely at either 30 or 60 seconds ... but it is taking longer 
than that for the request to complete.  Based on where the exception 
occurs, it sounds like it is during file upload, before Solr even 
finishes receiving the file.


Thanks,
Shawn



No suggestions when I set spellcheck.q

2014-02-20 Thread Hakim Benoudjit
Hi guys,

Suppose that a user is browsing a webpage where he has already filtered its
articles. I want to get suggestions only in the filtered content (i.e.
current category).

To achieve this I have set `spellcheck.q` to the current query or category,
but by doing this the query no longer returns `suggestions`.

So, what's the workaround.

P.s:
I have set these params in suggestion:
Reload(true);
Build(true);
Collate(true);
ExtendedResults(true);
CollateExtendedResults(true);


Re: CloudServer 4.2.1 and SolrCloud 4.3.1

2014-02-20 Thread Shawn Heisey

On 2/20/2014 12:09 PM, KNitin wrote:

  I have a question on CloudServer client for solrcloud. How does
CloudServer route requests to solr?  Does it use round robin internally or
does it take into account any other parameter for the node (example how
many replicas it has, etc) ?


Mixing SolrJ and Solr versions when you are using CloudSolrServer and 
SolrCloud is a bad idea right now.  SolrCloud is evolving *VERY* 
quickly.  At some point in the future there will be more stability 
between versions, but right now, success is unlikely unless they are the 
same version.


By default, CloudSolrServer in version 4.2.1 will send updates to shard 
leaders (in a round-robin fashion). For queries, it uses a full 
round-robin.  I *think* it will limit the round-robin to only nodes that 
are hosting that collection, but I'm not sure about that part.


http://lucene.apache.org/solr/4_2_1/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrServer.html#CloudSolrServer%28java.lang.String,%20org.apache.solr.client.solrj.impl.LBHttpSolrServer,%20boolean%29

Newer versions of CloudSolrServer (4.5.x if I remember correctly, and 
4.6.x for sure) can route update requests directly to the leader of the 
correct shard.  It does require the same version of Solr.


Thanks,
Shawn



Can set a boost function as a default within requesthandler?

2014-02-20 Thread Peter Dodd
For my search I’ve established a boost function which enhances result
ranking. In a query it looks something like: 

   q={!boost b=product(answeredStatus, articleType)}connectivity

I’d like to make this boost function a default for all others who use the
search. My default search handler is configured like this in solrConfig.xml:

requestHandler class=solr.SearchHandler name=standard default=true
 lst name=defaults
  str name=echoParamsexplicit/str
  str name=fl*,score/str
 /lst
 arr name=components
  strcollapse/str
  strfacet/str
  strmlt/str
  strhighlight/str
  strstats/str
  strdebug/str
 /arr
/requestHandler

Can I add the boost function as a default in here?

Many  thanks,
Peter




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-set-a-boost-function-as-a-default-within-requesthandler-tp4118647.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can set a boost function as a default within requesthandler?

2014-02-20 Thread Ahmet Arslan
Hi Peter,

According to 

http://wiki.apache.org/solr/LocalParams#Parameter_dereferencing

if
    q={!dismax qf=myfield}solr rocks 

is equivalent to

    q={!type=dismax qf=myfield v=$qq}qq=solr rocks
    
then
  q={!boost b=product(answeredStatus, articleType)}connectivity
  
is equivalent to

  q={!type=boost b=product(answeredStatus, articleType) 
v=$qq}qq=connectivity
    
Then you can define q in defaults section of request handler in solrconfig.xml 
then pass qq parameter in request.

Ahmet





On Thursday, February 20, 2014 9:32 PM, Peter Dodd pd...@microsoft.com wrote:
For my search I’ve established a boost function which enhances result
ranking. In a query it looks something like: 

   q={!boost b=product(answeredStatus, articleType)}connectivity

I’d like to make this boost function a default for all others who use the
search. My default search handler is configured like this in solrConfig.xml:

requestHandler class=solr.SearchHandler name=standard default=true
lst name=defaults
  str name=echoParamsexplicit/str
  str name=fl*,score/str
/lst
arr name=components
  strcollapse/str
  strfacet/str
  strmlt/str
  strhighlight/str
  strstats/str
  strdebug/str
/arr
/requestHandler

Can I add the boost function as a default in here?

Many  thanks,
Peter




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-set-a-boost-function-as-a-default-within-requesthandler-tp4118647.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: CloudServer 4.2.1 and SolrCloud 4.3.1

2014-02-20 Thread KNitin
Thanks, Shawn.


On Thu, Feb 20, 2014 at 11:29 AM, Shawn Heisey s...@elyograg.org wrote:

 On 2/20/2014 12:09 PM, KNitin wrote:

   I have a question on CloudServer client for solrcloud. How does
 CloudServer route requests to solr?  Does it use round robin internally or
 does it take into account any other parameter for the node (example how
 many replicas it has, etc) ?


 Mixing SolrJ and Solr versions when you are using CloudSolrServer and
 SolrCloud is a bad idea right now.  SolrCloud is evolving *VERY* quickly.
  At some point in the future there will be more stability between versions,
 but right now, success is unlikely unless they are the same version.

 By default, CloudSolrServer in version 4.2.1 will send updates to shard
 leaders (in a round-robin fashion). For queries, it uses a full
 round-robin.  I *think* it will limit the round-robin to only nodes that
 are hosting that collection, but I'm not sure about that part.

 http://lucene.apache.org/solr/4_2_1/solr-solrj/org/apache/
 solr/client/solrj/impl/CloudSolrServer.html#CloudSolrServer%28java.lang.
 String,%20org.apache.solr.client.solrj.impl.LBHttpSolrServer,%20boolean%29

 Newer versions of CloudSolrServer (4.5.x if I remember correctly, and
 4.6.x for sure) can route update requests directly to the leader of the
 correct shard.  It does require the same version of Solr.

 Thanks,
 Shawn




Re: Can set a boost function as a default within requesthandler?

2014-02-20 Thread rainfall83
Hi Peter,
how about:

lst name=defaults
  str name=echoParamsexplicit/str
  str name=fl*,score/str
  str name=bfproduct(answeredStatus, articleType)/str
 /lst
 arr name=components
  strcollapse/str
  strfacet/str
  strmlt/str
  strhighlight/str
  strstats/str
  strdebug/str
 /arr
/requestHandler

?

W dniu 2014-02-20 20:20:08 użytkownik Peter Dodd pd...@microsoft.com napisał:
 For my search I’ve established a boost function which enhances result
 ranking. In a query it looks something like: 
 
q={!boost b=product(answeredStatus, articleType)}connectivity
 
 I’d like to make this boost function a default for all others who use the
 search. My default search handler is configured like this in solrConfig.xml:
 
 requestHandler class=solr.SearchHandler name=standard default=true
  lst name=defaults
   str name=echoParamsexplicit/str
   str name=fl*,score/str
  /lst
  arr name=components
   strcollapse/str
   strfacet/str
   strmlt/str
   strhighlight/str
   strstats/str
   strdebug/str
  /arr
 /requestHandler
 
 Can I add the boost function as a default in here?
 
 Many  thanks,
 Peter
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Can-set-a-boost-function-as-a-default-within-requesthandler-tp4118647.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 





Re: Can set a boost function as a default within requesthandler?

2014-02-20 Thread Ahmet Arslan
Hi rainfall,

bf is a (e)dismax query parser specific parameter. It seems that Peter is using 
default lucene query parser.

Ahmet


On Thursday, February 20, 2014 10:15 PM, rainfall83 rainfal...@op.pl wrote:
Hi Peter,
how about:

lst name=defaults
  str name=echoParamsexplicit/str
  str name=fl*,score/str
  str name=bfproduct(answeredStatus, articleType)/str
/lst
arr name=components
  strcollapse/str
  strfacet/str
  strmlt/str
  strhighlight/str
  strstats/str
  strdebug/str
/arr
/requestHandler

?

W dniu 2014-02-20 20:20:08 użytkownik Peter Dodd pd...@microsoft.com napisał:
 For my search I’ve established a boost function which enhances result
 ranking. In a query it looks something like: 
 
    q={!boost b=product(answeredStatus, articleType)}connectivity
 
 I’d like to make this boost function a default for all others who use the
 search. My default search handler is configured like this in solrConfig.xml:
 
 requestHandler class=solr.SearchHandler name=standard default=true
  lst name=defaults
   str name=echoParamsexplicit/str
   str name=fl*,score/str
  /lst
  arr name=components
   strcollapse/str
   strfacet/str
   strmlt/str
   strhighlight/str
   strstats/str
   strdebug/str
  /arr
 /requestHandler
 
 Can I add the boost function as a default in here?
 
 Many  thanks,
 Peter
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Can-set-a-boost-function-as-a-default-within-requesthandler-tp4118647.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Re: Can set a boost function as a default within requesthandler?

2014-02-20 Thread rainfall83
Hi Ahmet, you are right, I have forgotten to mention the necessity of query 
parser change in my case.
So just like you wrote previously something like:

str name=q_query_:{!boost b=product(answeredStatus, articleType) 
v=$qq}/str added to the defaults section of request handler definition and 
referrencing qq instead of q in requests should do the deal.

Thanks Ahmet.

W dniu 2014-02-20 21:20:24 użytkownik Ahmet Arslan iori...@yahoo.com napisał:
 Hi rainfall,
 
 bf is a (e)dismax query parser specific parameter. It seems that Peter is 
 using default lucene query parser.
 
 Ahmet
 
 
 On Thursday, February 20, 2014 10:15 PM, rainfall83 rainfal...@op.pl wrote:
 Hi Peter,
 how about:
 
 lst name=defaults
   str name=echoParamsexplicit/str
   str name=fl*,score/str
   str name=bfproduct(answeredStatus, articleType)/str
 /lst
 arr name=components
   strcollapse/str
   strfacet/str
   strmlt/str
   strhighlight/str
   strstats/str
   strdebug/str
 /arr
 /requestHandler
 
 ?
 
 W dniu 2014-02-20 20:20:08 użytkownik Peter Dodd pd...@microsoft.com 
 napisał:
  For my search I’ve established a boost function which enhances result
  ranking. In a query it looks something like: 
  
     q={!boost b=product(answeredStatus, articleType)}connectivity
  
  I’d like to make this boost function a default for all others who use the
  search. My default search handler is configured like this in solrConfig.xml:
  
  requestHandler class=solr.SearchHandler name=standard default=true
   lst name=defaults
    str name=echoParamsexplicit/str
    str name=fl*,score/str
   /lst
   arr name=components
    strcollapse/str
    strfacet/str
    strmlt/str
    strhighlight/str
    strstats/str
    strdebug/str
   /arr
  /requestHandler
  
  Can I add the boost function as a default in here?
  
  Many  thanks,
  Peter
  
  
  
  
  --
  View this message in context: 
  http://lucene.472066.n3.nabble.com/Can-set-a-boost-function-as-a-default-within-requesthandler-tp4118647.html
  Sent from the Solr - User mailing list archive at Nabble.com.
  
 





RE: Solr4 performance

2014-02-20 Thread Joshi, Shital
Hi!

I have few other questions regarding Solr4 performance issue we're facing. 

We're committing data to Solr4 every ~30 seconds (up to 20K rows). We use 
commit=false in update URL. We have only hard commit setting in Solr4 config. 

autoCommit
   maxTime${solr.autoCommit.maxTime:60}/maxTime
   maxDocs10/maxDocs
   openSearchertrue/openSearcher   
 /autoCommit


Since we're not using Soft commit at all (commit=false), the caches will not 
get reloaded for every commit and recently added documents will not be visible, 
correct? 

What we see is queries which usually take few milli seconds, takes ~40 seconds 
once in a while. Can high IO during hard commit cause queries to slow down? 

For some shards we see 98% full physical memory. We have 60GB machine (30 GB 
JVM, 28 GB free RAM, ~35 GB of index). We're ruling out that high physical 
memory would cause queries to slow down. We're in process of reducing JVM size 
anyways. 

We have never run optimization till now. QA optimization didn't yield in 
performance gain. 

Thanks much for all help.

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Tuesday, February 18, 2014 4:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr4 performance

On 2/18/2014 2:14 PM, Joshi, Shital wrote:
 Thanks much for all suggestions. We're looking into reducing allocated heap 
 size of Solr4 JVM.

 We're using NRTCachingDirectoryFactory. Does it use MMapDirectory internally? 
 Can someone please confirm?

In Solr, NRTCachingDirectory does indeed use MMapDirectory as its 
default delegate.  That's probably also the case with Lucene -- these 
are Lucene classes, after all.

MMapDirectory is almost always the most efficient way to handle on-disk 
indexes.

Thanks,
Shawn



Re: Caching Solr boost functions?

2014-02-20 Thread Chris Hostetter

: I'd like to tell Solr to cache those boost queries for the life of the
: Searcher so they don't get recomputed every time. Is there any way to do
: that out of the box?

if the functions never change, you could just index the computed value up 
front and save cycles at query time -- but that's the only option i can 
think of off the top of my head.

: In a different custom QParser we have we wrote a CachingValueSource that
: wrapped a ValueSource with a custom ValueSource cache. Would it make sense
: to implement that as a standard Solr function so that one could do:
: 
: boost=cache(expensiveFunctionQuery())

Yeah... that could be handy.  Something like this perhaps?

  cache name=some_cache_name
 class=solr.LRUCache
 size=4096
 autowarmCount=1024
 regenerator=solr.ValueSourceCacheRegenerator
  /

  valueSourceParser name=cache class=solr.ValueSourceCachingParser
str name=cacheNamesome_cache_name/str
  /valueSourceParser



-Hoss
http://www.lucidworks.com/


Tweaking Solr Query Result Cache

2014-02-20 Thread KNitin
Hello

  I have a 4 node cluster running Solr cloud 4.3.1. I have a few large
collections sharded 8 ways across all the 4 nodes (with 2 shards per node).
The size of the shard for the large collections is around 600-700Mb
containing around 250K+ documents.

Currently the size of the query cache is around 512. We have a few jobs
that run tail queries on these collections. The hit ratio of the cache
drops to 0 when running these queries and also at the same time CPU spikes.
The latencies are in the order of seconds in the above case. I verified GC
behavior is normal (not killing cpu)

The following are my questions


   1. Is it a good practice to vary the Query Result Cache size based on
   the size of the collection (large collections have large cache)?
   2. If most of your queries are tail queries, what is a good way to make
   your cache usage effective (higher hits)
   3. If lets say all your queries miss the cache, it is an OK behavior if
   your CPU spikes (to 90+%)
   4. Is there a recommended shard size (# of doc, size ) to use. A few of
   my collections are 100-200 Mb and the large ones are in teh order of 800-1Gb

Thanks a lot in advance
Nitin


RE: Solr4 performance

2014-02-20 Thread Michael Della Bitta
Hi,

As for your first question, setting openSearcher to true means you will see
the new docs after every hard commit. Soft and hard commits only become
isolated from one another with that set to false.

Your second problem might be explained by your large heap and garbage
collection. Walking a heap that large can take an appreciable amount of
time. You might consider turning on the JVM options for logging GC and
seeing if you can correlate your slow responses to times when your JVM is
garbage collecting.

Hope that helps,
On Feb 20, 2014 4:52 PM, Joshi, Shital shital.jo...@gs.com wrote:

 Hi!

 I have few other questions regarding Solr4 performance issue we're facing.

 We're committing data to Solr4 every ~30 seconds (up to 20K rows). We use
 commit=false in update URL. We have only hard commit setting in Solr4
 config.

 autoCommit
maxTime${solr.autoCommit.maxTime:60}/maxTime
maxDocs10/maxDocs
openSearchertrue/openSearcher
  /autoCommit


 Since we're not using Soft commit at all (commit=false), the caches will
 not get reloaded for every commit and recently added documents will not be
 visible, correct?

 What we see is queries which usually take few milli seconds, takes ~40
 seconds once in a while. Can high IO during hard commit cause queries to
 slow down?

 For some shards we see 98% full physical memory. We have 60GB machine (30
 GB JVM, 28 GB free RAM, ~35 GB of index). We're ruling out that high
 physical memory would cause queries to slow down. We're in process of
 reducing JVM size anyways.

 We have never run optimization till now. QA optimization didn't yield in
 performance gain.

 Thanks much for all help.

 -Original Message-
 From: Shawn Heisey [mailto:s...@elyograg.org]
 Sent: Tuesday, February 18, 2014 4:55 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr4 performance

 On 2/18/2014 2:14 PM, Joshi, Shital wrote:
  Thanks much for all suggestions. We're looking into reducing allocated
 heap size of Solr4 JVM.
 
  We're using NRTCachingDirectoryFactory. Does it use MMapDirectory
 internally? Can someone please confirm?

 In Solr, NRTCachingDirectory does indeed use MMapDirectory as its
 default delegate.  That's probably also the case with Lucene -- these
 are Lucene classes, after all.

 MMapDirectory is almost always the most efficient way to handle on-disk
 indexes.

 Thanks,
 Shawn




Clearing the suggestion dictionary

2014-02-20 Thread Hamish Campbell
Hi all,

Given an Suggester SpellCheckComponent how do I clear it?

I have a curious issue where suggestion terms are appearing in the
dictionary from an unknown source. I've carefully checked that the source
fields don't contain the terms but it's not obvious how the dictionary is
reset. Restarting solr doesn't seem ot have the desired effect, and running
spellcheck.build doesn't seem to remove old terms?

Is there an explicit method to clear the suggester dictionary (preferably
without restarting solr).

-- 
Hamish Campbell
Koordinates Ltd http://koordinates.com/?_bzhc=esig
PH   +64 9 966 0433
FAX +64 9 966 0045


Re: Tweaking Solr Query Result Cache

2014-02-20 Thread Erick Erickson
What you _do_ want to do is add replicas so you distribute the CPU
load across a bunch of machines.

The QueryResultCache isn't very useful unless you have multiple queries
that
1 reference the _exact_ same query, q, fq, sorting and all
2 don't page very far.

This cache really only holds the document (internal Lucene) IDs for a
window
of hits. So say your window (configured in solrconfig.xml) is set to 50.
For each
of the query keys, 50 IDs are stored. Next time that exact query comes in,
and
_assuming_ start+rows  50, you'll get the IDs from the cache and not much
action occurs. The design intent here is to satisfy a few pages of results.

If you mean by tail queries that there is very little repetition of
queries, then
why bother with a cache at all? If the hit ratio is going towards 0 it's
not doing
you enough good to matter.


FWIW,
Erick


On Thu, Feb 20, 2014 at 1:58 PM, KNitin nitin.t...@gmail.com wrote:

 Hello

   I have a 4 node cluster running Solr cloud 4.3.1. I have a few large
 collections sharded 8 ways across all the 4 nodes (with 2 shards per node).
 The size of the shard for the large collections is around 600-700Mb
 containing around 250K+ documents.

 Currently the size of the query cache is around 512. We have a few jobs
 that run tail queries on these collections. The hit ratio of the cache
 drops to 0 when running these queries and also at the same time CPU spikes.
 The latencies are in the order of seconds in the above case. I verified GC
 behavior is normal (not killing cpu)

 The following are my questions


1. Is it a good practice to vary the Query Result Cache size based on
the size of the collection (large collections have large cache)?
2. If most of your queries are tail queries, what is a good way to make
your cache usage effective (higher hits)
3. If lets say all your queries miss the cache, it is an OK behavior if
your CPU spikes (to 90+%)
4. Is there a recommended shard size (# of doc, size ) to use. A few of
my collections are 100-200 Mb and the large ones are in teh order of
 800-1Gb

 Thanks a lot in advance
 Nitin



Re: Re: Can set a boost function as a default within requesthandler?

2014-02-20 Thread Peter Dodd
So I'd need to change the query syntax to load and pass *qq
*(qq=connectivity) instead of *q* (q=connectivity) ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-set-a-boost-function-as-a-default-within-requesthandler-tp4118647p4118703.html
Sent from the Solr - User mailing list archive at Nabble.com.


how many shards required to search data

2014-02-20 Thread search engn dev
Data size is 250 GB of small records. each record is of around 0.3kb size. It
consists around 1 billion records. my index has 20 different fields. .
Majorly queries will be very simple or spacial queries mainly on on 2-3
fields. 
all 20 fields will be stored. Any suggestions on how many shards will I need
to search data?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-many-shards-required-to-search-data-tp4118715.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how many shards required to search data

2014-02-20 Thread search engn dev
Shard1 config : Config: 32GB RAM, 4 core
Shard2 config : Config: 32GB RAM, 4 core




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-many-shards-required-to-search-data-tp4118715p4118717.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud can't correctly create collection after zookeeper ensemble recovery

2014-02-20 Thread Chia-Chun Shih
Hi all,

This is my test procedure:

1. start a Zookeeper ensemble and a SolrCloud node
2. stop Zookeeper ensemble
3. start Zookeeper ensemble
4. fail to create a collection (with 1 shard and 1 replica) because of
timeout
5. restart the SolrCloud node
6. fail to create a collection with the same name in step 4 because the
collection already exists. But the collection doesn't assign to any
SolrCloud node.

I am using Solr 4.6.1 and Zookeeper 3.4.5

Thanks,
Chia-Chun


Setting up solr on production server

2014-02-20 Thread Jay Potharaju
Hi,
I 'm looking for some tips or guidelines to installing solr on the
production server. I am currently using jetty in my dev environment.
Is it recommended to use tomcat on the production server?  Are there are
major advantages of using one over another.

Thanks
J


RE: Setting up solr on production server

2014-02-20 Thread Suresh Soundararajan
You can go ahead with Tomcat by deploying the solr war in it. It is highly 
scalable.

Thanks,
SureshKumar.S


From: Jay Potharaju jspothar...@gmail.com
Sent: Friday, February 21, 2014 11:10 AM
To: solr-user@lucene.apache.org
Subject: Setting up solr on production server

Hi,
I 'm looking for some tips or guidelines to installing solr on the
production server. I am currently using jetty in my dev environment.
Is it recommended to use tomcat on the production server?  Are there are
major advantages of using one over another.

Thanks
J
[Aspire Systems]

This e-mail message and any attachments are for the sole use of the intended 
recipient(s) and may contain proprietary, confidential, trade secret or 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited and may be a violation of law. If you are not the 
intended recipient, please contact the sender by reply e-mail and destroy all 
copies of the original message.


Setting up solr with IBM Websphere 7

2014-02-20 Thread Prasi S
Hi,
I have a requirement to setup solr in IBM websphere server 7.x. Has anybody
done the same in your project? Is there any blog/ link with the set of
instructions for doing the same?
Please advice.

Thanks,
Prasi


Re: Setting up solr on production server

2014-02-20 Thread Shawn Heisey
On 2/20/2014 10:40 PM, Jay Potharaju wrote:
 I 'm looking for some tips or guidelines to installing solr on the
 production server. I am currently using jetty in my dev environment.
 Is it recommended to use tomcat on the production server?  Are there are
 major advantages of using one over another.

The recommendation is to use the jetty (version 8) that comes with Solr.
 It has had a number of unnecessary components removed, and its config
has had a few things tweaked, but otherwise it is unchanged from what
you get if you download the same version from www.eclipse.org.

Jetty is not a toy servlet container.  It is a battle-tested
enterprise-ready system.  It is also the only servlet container that
gets officially tested with Solr.  I have been using Solr under jetty
for nearly four years.

If you don't want to mess with creating your own startup scripts, you
could use a packaged jetty made for (or provided by) your operating
system, but the configuration will probably need tweaking.  At the very
least you may need to increase maxThreads.

There's only one good reason I can think of to use something like
Tomcat.  That is when you already know another servlet container
*really* well, and know how to tune it for varying application requirements.

Thanks,
Shawn



Converting solrdocument response into pojo

2014-02-20 Thread Navaa
hi,
I m using solr for searching... here I used for search names on the basis of
their locations
so i get response in docs list containing solrdocument like

response = {docs =[{SolrDocument[name=abcd
id=[6,6,],..},{SolrDocument[name=xyz id=435,..},]}

at the time of getting this type of response into
response.getBeans(Pojo.class)

it throws exception to id field..*id=[6,6]*. my pojo is 
class Pojo{
   @field(name)
   private String name;
  
   @field(id)
   private Integer id;
   .
   .
   .
}

So how can I resolve this exceptionplease help me ASAP... Thanks in
advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Converting-solrdocument-response-into-pojo-tp4118743.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how many shards required to search data

2014-02-20 Thread Shawn Heisey
On 2/20/2014 9:24 PM, search engn dev wrote:
 Data size is 250 GB of small records. each record is of around 0.3kb size. It
 consists around 1 billion records. my index has 20 different fields. .
 Majorly queries will be very simple or spacial queries mainly on on 2-3
 fields. 
 all 20 fields will be stored. Any suggestions on how many shards will I need
 to search data?

Your question is impossible to answer.  I will tell you that this is a
very big index, and it's going to take a lot of hardware.  It's not the
biggest I've heard of, but it is quite large.  Any situation that would
result in a performance issue on a small index is going to be far worse
on a large index.

http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-ont-have-a-definitive-answer/

Two machines with 32GB of RAM each are not going to be anywhere near
enough.  If you can't get more than 32GB of RAM in each server, you're
probably going to need a lot of them.

Since all your fields will be stored, the *minimum* size of your index
will be approximately equal to the original data size after compression
-- assuming you're using 4.1.0 or later, where compression was
introduced.  That will not be the end, though -- it doesn't take into
account the size of the *indexed* data.

Although it is theoretically possible to look at a schema and the
original data to calculate the size of the indexed data, in reality the
only way to be SURE is to actually index a significant percentage of
your real data with the same schema you would use in production.

Once you know how big your index is actually going to be, you can begin
to figure out how much total RAM you'll need across all the servers for
a single copy of the index (no redundancy).  If you want redundancy, the
requirements will be at least twice what you calculate.

http://wiki.apache.org/solr/SolrPerformanceProblems

The number of shards and replicas that you're going to need is going to
depend on the query volume, the nature of the queries, and the nature of
the data.  Just like with index size, the only way to know is to try it
with all your real data.

If your query volume is large, you'll need multiple copies of the
complete index, which means more servers.

If you don't care how long each query takes and your query volume will
be low, then your server requirements will be a LOT smaller.

Thanks,
Shawn



Re: Converting solrdocument response into pojo

2014-02-20 Thread Alexandre Rafalovitch
Looks like at least one of your documents has multiple values in ID
field (all valued as 6...) , but your POJO is expecting one. You may
want to check your schema definition to ensure it does not allow
multiples and also your indexing process to identify why one got
through.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Fri, Feb 21, 2014 at 6:02 PM, Navaa
navnath.thomb...@xtremumsolutions.com wrote:
 hi,
 I m using solr for searching... here I used for search names on the basis of
 their locations
 so i get response in docs list containing solrdocument like

 response = {docs =[{SolrDocument[name=abcd
 id=[6,6,],..},{SolrDocument[name=xyz id=435,..},]}

 at the time of getting this type of response into
 response.getBeans(Pojo.class)

 it throws exception to id field..*id=[6,6]*. my pojo is
 class Pojo{
@field(name)
private String name;

@field(id)
private Integer id;
.
.
.
 }

 So how can I resolve this exceptionplease help me ASAP... Thanks in
 advance



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Converting-solrdocument-response-into-pojo-tp4118743.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Re: Can set a boost function as a default within requesthandler?

2014-02-20 Thread Ahmet Arslan
Hi Peter,

Yes you are correct. Send qq=connectivity with the following defaults section

lst name=defaults
  str name=echoParamsexplicit/str
  str name=fl*,score/str
  str name=q{!type=boost b=product(answeredStatus, articleType) 
v=$qq}/str  
/lst

Alternatively, you can switch to edismax query parser. It has native/built in 
boost and bf parameters. Former is multiplicative boost, latter is additive 
boost.

Ahmet 



On Friday, February 21, 2014 3:36 AM, Peter Dodd pd...@microsoft.com wrote:
So I'd need to change the query syntax to load and pass *qq
*(qq=connectivity) instead of *q* (q=connectivity) ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-set-a-boost-function-as-a-default-within-requesthandler-tp4118647p4118703.html

Sent from the Solr - User mailing list archive at Nabble.com.