org.eclipse.jetty.io.EofException: early EOF
Hi, I am using solr 4.6 along with Apache Manifold CF 1.4.1 to index alfresco cms repository. While indexing alfresco i am getting below error in solr logs while indexing media content such as image or video. ERROR - 2014-02-20 12:50:45.108; org.apache.solr.common.SolrException; null:org.apache.commons.fileupload.FileUploadBase$IOFileUploadException: Processing of multipart/form-data request failed. early EOF at org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:367) at org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126) at org.apache.solr.servlet.SolrRequestParsers$MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:547) at org.apache.solr.servlet.SolrRequestParsers$StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:681) at org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:150) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:393) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:636) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) Caused by: org.eclipse.jetty.io.EofException: early EOF at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:65) at java.io.FilterInputStream.read(Unknown Source) at org.apache.commons.fileupload.util.LimitedInputStream.read(LimitedInputStream.java:125) at org.apache.commons.fileupload.MultipartStream$ItemInputStream.makeAvailable(MultipartStream.java:977) at org.apache.commons.fileupload.MultipartStream$ItemInputStream.read(MultipartStream.java:887) at java.io.InputStream.read(Unknown Source) at org.apache.commons.fileupload.util.Streams.copy(Streams.java:94) at org.apache.commons.fileupload.util.Streams.copy(Streams.java:64) at org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:362) ... 31 more ERROR - 2014-02-20 12:50:45.117; org.apache.solr.common.SolrException; null:org.apache.commons.fileupload.FileUploadBase$IOFileUploadException: Processing of multipart/form-data request failed. early EOF at org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:367) at org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126) at org.apache.solr.servlet.SolrRequestParsers$MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:547) at org.apache.solr.servlet.SolrRequestParsers$StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:681)
Group on multiple fields in a sharded environment
Dear all, I would like to group my query results on two different fields (not at the same time). I also would like to get the exact group count. And I'm working with a sharded index. I know that to get the exact group count, all documents from a group must be indexed in a unique shard. Now, is there a good magic way other than create two collections with two different sharding keys ? I could as well imagine a sort of document balancer which could move (reindex) the documents by changing the sharding key, but it could be quite complex and indexing performances could be bad I think. Any advice ? Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Group-on-multiple-fields-in-a-sharded-environment-tp4118527.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Additive boost function
Jack, Could you, please, suggest how to use SOLR query functions to make all fields boosts added on such query as I specified in the topic ? -- View this message in context: http://lucene.472066.n3.nabble.com/Additive-boost-function-tp4118066p4118537.html Sent from the Solr - User mailing list archive at Nabble.com.
Grouping performance improvement
Im facing slow performance for query where im grouping on a field while querying. Size of index 57 million records, and we would be targeting 100 million + Im using grouping to create category based autosuggest. so when user press a I go and search for a and group by field say products. Now i have noticed performance of query is really get bad with group by clause. Im at experimental stage so I can change schema or try other alternative. Please let me know if there are way to cleverly design your schema to improve performance or im meeting some option to fine tune. -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-performance-improvement-tp4118549.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Grouping performance improvement
You can think of using facets by category field instead of grouping. It will be faster and categorization can be done against multiple category fields. Try different facet methods. If you don't need number of documents in each category and number of unique categories is relatively low, you might be interested in following performance improvement https://issues.apache.org/jira/browse/SOLR-5725 Alexey -Original Message- From: soodyogesh [mailto:soodyog...@gmail.com] Sent: Thursday, February 20, 2014 16:20 To: solr-user@lucene.apache.org Subject: Grouping performance improvement Im facing slow performance for query where im grouping on a field while querying. Size of index 57 million records, and we would be targeting 100 million + Im using grouping to create category based autosuggest. so when user press a I go and search for a and group by field say products. Now i have noticed performance of query is really get bad with group by clause. Im at experimental stage so I can change schema or try other alternative. Please let me know if there are way to cleverly design your schema to improve performance or im meeting some option to fine tune. -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-performance-improvement-tp4118549.html Sent from the Solr - User mailing list archive at Nabble.com.
Contract JOB** Austin TX-- SOLR/Zookeeper expert
We are looking for a expert in SOLR with Zookeeper architecture. This gig will be to review current architecture and implementation, look for and fix issue with ingestion or search. Provide recommendations for future evolution of the stack. If you are interested, please contact Brian, VP of Solution Delivery @ Fiserv. -- View this message in context: http://lucene.472066.n3.nabble.com/Contract-JOB-Austin-TX-SOLR-Zookeeper-expert-tp4118568.html Sent from the Solr - User mailing list archive at Nabble.com.
CloudServer 4.2.1 and SolrCloud 4.3.1
Hi I have a question on CloudServer client for solrcloud. How does CloudServer route requests to solr? Does it use round robin internally or does it take into account any other parameter for the node (example how many replicas it has, etc) ? Thanks Nitin
Re: org.eclipse.jetty.io.EofException: early EOF
On 2/20/2014 1:41 AM, lalitjangra wrote: I am using solr 4.6 along with Apache Manifold CF 1.4.1 to index alfresco cms repository. While indexing alfresco i am getting below error in solr logs while indexing media content such as image or video. ERROR - 2014-02-20 12:50:45.108; org.apache.solr.common.SolrException; null:org.apache.commons.fileupload.FileUploadBase$IOFileUploadException: Processing of multipart/form-data request failed. early EOF snip Caused by: org.eclipse.jetty.io.EofException: early EOF Every time I've seen EofException, it's caused when the client connection to Solr terminates the TCP connection early, before Solr has seen the whole request or before Solr has processed it and sends a response. ManifoldCF probably has the socket timeout configured on its HTTP client, likely at either 30 or 60 seconds ... but it is taking longer than that for the request to complete. Based on where the exception occurs, it sounds like it is during file upload, before Solr even finishes receiving the file. Thanks, Shawn
No suggestions when I set spellcheck.q
Hi guys, Suppose that a user is browsing a webpage where he has already filtered its articles. I want to get suggestions only in the filtered content (i.e. current category). To achieve this I have set `spellcheck.q` to the current query or category, but by doing this the query no longer returns `suggestions`. So, what's the workaround. P.s: I have set these params in suggestion: Reload(true); Build(true); Collate(true); ExtendedResults(true); CollateExtendedResults(true);
Re: CloudServer 4.2.1 and SolrCloud 4.3.1
On 2/20/2014 12:09 PM, KNitin wrote: I have a question on CloudServer client for solrcloud. How does CloudServer route requests to solr? Does it use round robin internally or does it take into account any other parameter for the node (example how many replicas it has, etc) ? Mixing SolrJ and Solr versions when you are using CloudSolrServer and SolrCloud is a bad idea right now. SolrCloud is evolving *VERY* quickly. At some point in the future there will be more stability between versions, but right now, success is unlikely unless they are the same version. By default, CloudSolrServer in version 4.2.1 will send updates to shard leaders (in a round-robin fashion). For queries, it uses a full round-robin. I *think* it will limit the round-robin to only nodes that are hosting that collection, but I'm not sure about that part. http://lucene.apache.org/solr/4_2_1/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrServer.html#CloudSolrServer%28java.lang.String,%20org.apache.solr.client.solrj.impl.LBHttpSolrServer,%20boolean%29 Newer versions of CloudSolrServer (4.5.x if I remember correctly, and 4.6.x for sure) can route update requests directly to the leader of the correct shard. It does require the same version of Solr. Thanks, Shawn
Can set a boost function as a default within requesthandler?
For my search I’ve established a boost function which enhances result ranking. In a query it looks something like: q={!boost b=product(answeredStatus, articleType)}connectivity I’d like to make this boost function a default for all others who use the search. My default search handler is configured like this in solrConfig.xml: requestHandler class=solr.SearchHandler name=standard default=true lst name=defaults str name=echoParamsexplicit/str str name=fl*,score/str /lst arr name=components strcollapse/str strfacet/str strmlt/str strhighlight/str strstats/str strdebug/str /arr /requestHandler Can I add the boost function as a default in here? Many thanks, Peter -- View this message in context: http://lucene.472066.n3.nabble.com/Can-set-a-boost-function-as-a-default-within-requesthandler-tp4118647.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can set a boost function as a default within requesthandler?
Hi Peter, According to http://wiki.apache.org/solr/LocalParams#Parameter_dereferencing if q={!dismax qf=myfield}solr rocks is equivalent to q={!type=dismax qf=myfield v=$qq}qq=solr rocks then q={!boost b=product(answeredStatus, articleType)}connectivity is equivalent to q={!type=boost b=product(answeredStatus, articleType) v=$qq}qq=connectivity Then you can define q in defaults section of request handler in solrconfig.xml then pass qq parameter in request. Ahmet On Thursday, February 20, 2014 9:32 PM, Peter Dodd pd...@microsoft.com wrote: For my search I’ve established a boost function which enhances result ranking. In a query it looks something like: q={!boost b=product(answeredStatus, articleType)}connectivity I’d like to make this boost function a default for all others who use the search. My default search handler is configured like this in solrConfig.xml: requestHandler class=solr.SearchHandler name=standard default=true lst name=defaults str name=echoParamsexplicit/str str name=fl*,score/str /lst arr name=components strcollapse/str strfacet/str strmlt/str strhighlight/str strstats/str strdebug/str /arr /requestHandler Can I add the boost function as a default in here? Many thanks, Peter -- View this message in context: http://lucene.472066.n3.nabble.com/Can-set-a-boost-function-as-a-default-within-requesthandler-tp4118647.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: CloudServer 4.2.1 and SolrCloud 4.3.1
Thanks, Shawn. On Thu, Feb 20, 2014 at 11:29 AM, Shawn Heisey s...@elyograg.org wrote: On 2/20/2014 12:09 PM, KNitin wrote: I have a question on CloudServer client for solrcloud. How does CloudServer route requests to solr? Does it use round robin internally or does it take into account any other parameter for the node (example how many replicas it has, etc) ? Mixing SolrJ and Solr versions when you are using CloudSolrServer and SolrCloud is a bad idea right now. SolrCloud is evolving *VERY* quickly. At some point in the future there will be more stability between versions, but right now, success is unlikely unless they are the same version. By default, CloudSolrServer in version 4.2.1 will send updates to shard leaders (in a round-robin fashion). For queries, it uses a full round-robin. I *think* it will limit the round-robin to only nodes that are hosting that collection, but I'm not sure about that part. http://lucene.apache.org/solr/4_2_1/solr-solrj/org/apache/ solr/client/solrj/impl/CloudSolrServer.html#CloudSolrServer%28java.lang. String,%20org.apache.solr.client.solrj.impl.LBHttpSolrServer,%20boolean%29 Newer versions of CloudSolrServer (4.5.x if I remember correctly, and 4.6.x for sure) can route update requests directly to the leader of the correct shard. It does require the same version of Solr. Thanks, Shawn
Re: Can set a boost function as a default within requesthandler?
Hi Peter, how about: lst name=defaults str name=echoParamsexplicit/str str name=fl*,score/str str name=bfproduct(answeredStatus, articleType)/str /lst arr name=components strcollapse/str strfacet/str strmlt/str strhighlight/str strstats/str strdebug/str /arr /requestHandler ? W dniu 2014-02-20 20:20:08 użytkownik Peter Dodd pd...@microsoft.com napisał: For my search I’ve established a boost function which enhances result ranking. In a query it looks something like: q={!boost b=product(answeredStatus, articleType)}connectivity I’d like to make this boost function a default for all others who use the search. My default search handler is configured like this in solrConfig.xml: requestHandler class=solr.SearchHandler name=standard default=true lst name=defaults str name=echoParamsexplicit/str str name=fl*,score/str /lst arr name=components strcollapse/str strfacet/str strmlt/str strhighlight/str strstats/str strdebug/str /arr /requestHandler Can I add the boost function as a default in here? Many thanks, Peter -- View this message in context: http://lucene.472066.n3.nabble.com/Can-set-a-boost-function-as-a-default-within-requesthandler-tp4118647.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can set a boost function as a default within requesthandler?
Hi rainfall, bf is a (e)dismax query parser specific parameter. It seems that Peter is using default lucene query parser. Ahmet On Thursday, February 20, 2014 10:15 PM, rainfall83 rainfal...@op.pl wrote: Hi Peter, how about: lst name=defaults str name=echoParamsexplicit/str str name=fl*,score/str str name=bfproduct(answeredStatus, articleType)/str /lst arr name=components strcollapse/str strfacet/str strmlt/str strhighlight/str strstats/str strdebug/str /arr /requestHandler ? W dniu 2014-02-20 20:20:08 użytkownik Peter Dodd pd...@microsoft.com napisał: For my search I’ve established a boost function which enhances result ranking. In a query it looks something like: q={!boost b=product(answeredStatus, articleType)}connectivity I’d like to make this boost function a default for all others who use the search. My default search handler is configured like this in solrConfig.xml: requestHandler class=solr.SearchHandler name=standard default=true lst name=defaults str name=echoParamsexplicit/str str name=fl*,score/str /lst arr name=components strcollapse/str strfacet/str strmlt/str strhighlight/str strstats/str strdebug/str /arr /requestHandler Can I add the boost function as a default in here? Many thanks, Peter -- View this message in context: http://lucene.472066.n3.nabble.com/Can-set-a-boost-function-as-a-default-within-requesthandler-tp4118647.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Re: Can set a boost function as a default within requesthandler?
Hi Ahmet, you are right, I have forgotten to mention the necessity of query parser change in my case. So just like you wrote previously something like: str name=q_query_:{!boost b=product(answeredStatus, articleType) v=$qq}/str added to the defaults section of request handler definition and referrencing qq instead of q in requests should do the deal. Thanks Ahmet. W dniu 2014-02-20 21:20:24 użytkownik Ahmet Arslan iori...@yahoo.com napisał: Hi rainfall, bf is a (e)dismax query parser specific parameter. It seems that Peter is using default lucene query parser. Ahmet On Thursday, February 20, 2014 10:15 PM, rainfall83 rainfal...@op.pl wrote: Hi Peter, how about: lst name=defaults str name=echoParamsexplicit/str str name=fl*,score/str str name=bfproduct(answeredStatus, articleType)/str /lst arr name=components strcollapse/str strfacet/str strmlt/str strhighlight/str strstats/str strdebug/str /arr /requestHandler ? W dniu 2014-02-20 20:20:08 użytkownik Peter Dodd pd...@microsoft.com napisał: For my search I’ve established a boost function which enhances result ranking. In a query it looks something like: q={!boost b=product(answeredStatus, articleType)}connectivity I’d like to make this boost function a default for all others who use the search. My default search handler is configured like this in solrConfig.xml: requestHandler class=solr.SearchHandler name=standard default=true lst name=defaults str name=echoParamsexplicit/str str name=fl*,score/str /lst arr name=components strcollapse/str strfacet/str strmlt/str strhighlight/str strstats/str strdebug/str /arr /requestHandler Can I add the boost function as a default in here? Many thanks, Peter -- View this message in context: http://lucene.472066.n3.nabble.com/Can-set-a-boost-function-as-a-default-within-requesthandler-tp4118647.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr4 performance
Hi! I have few other questions regarding Solr4 performance issue we're facing. We're committing data to Solr4 every ~30 seconds (up to 20K rows). We use commit=false in update URL. We have only hard commit setting in Solr4 config. autoCommit maxTime${solr.autoCommit.maxTime:60}/maxTime maxDocs10/maxDocs openSearchertrue/openSearcher /autoCommit Since we're not using Soft commit at all (commit=false), the caches will not get reloaded for every commit and recently added documents will not be visible, correct? What we see is queries which usually take few milli seconds, takes ~40 seconds once in a while. Can high IO during hard commit cause queries to slow down? For some shards we see 98% full physical memory. We have 60GB machine (30 GB JVM, 28 GB free RAM, ~35 GB of index). We're ruling out that high physical memory would cause queries to slow down. We're in process of reducing JVM size anyways. We have never run optimization till now. QA optimization didn't yield in performance gain. Thanks much for all help. -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Tuesday, February 18, 2014 4:55 PM To: solr-user@lucene.apache.org Subject: Re: Solr4 performance On 2/18/2014 2:14 PM, Joshi, Shital wrote: Thanks much for all suggestions. We're looking into reducing allocated heap size of Solr4 JVM. We're using NRTCachingDirectoryFactory. Does it use MMapDirectory internally? Can someone please confirm? In Solr, NRTCachingDirectory does indeed use MMapDirectory as its default delegate. That's probably also the case with Lucene -- these are Lucene classes, after all. MMapDirectory is almost always the most efficient way to handle on-disk indexes. Thanks, Shawn
Re: Caching Solr boost functions?
: I'd like to tell Solr to cache those boost queries for the life of the : Searcher so they don't get recomputed every time. Is there any way to do : that out of the box? if the functions never change, you could just index the computed value up front and save cycles at query time -- but that's the only option i can think of off the top of my head. : In a different custom QParser we have we wrote a CachingValueSource that : wrapped a ValueSource with a custom ValueSource cache. Would it make sense : to implement that as a standard Solr function so that one could do: : : boost=cache(expensiveFunctionQuery()) Yeah... that could be handy. Something like this perhaps? cache name=some_cache_name class=solr.LRUCache size=4096 autowarmCount=1024 regenerator=solr.ValueSourceCacheRegenerator / valueSourceParser name=cache class=solr.ValueSourceCachingParser str name=cacheNamesome_cache_name/str /valueSourceParser -Hoss http://www.lucidworks.com/
Tweaking Solr Query Result Cache
Hello I have a 4 node cluster running Solr cloud 4.3.1. I have a few large collections sharded 8 ways across all the 4 nodes (with 2 shards per node). The size of the shard for the large collections is around 600-700Mb containing around 250K+ documents. Currently the size of the query cache is around 512. We have a few jobs that run tail queries on these collections. The hit ratio of the cache drops to 0 when running these queries and also at the same time CPU spikes. The latencies are in the order of seconds in the above case. I verified GC behavior is normal (not killing cpu) The following are my questions 1. Is it a good practice to vary the Query Result Cache size based on the size of the collection (large collections have large cache)? 2. If most of your queries are tail queries, what is a good way to make your cache usage effective (higher hits) 3. If lets say all your queries miss the cache, it is an OK behavior if your CPU spikes (to 90+%) 4. Is there a recommended shard size (# of doc, size ) to use. A few of my collections are 100-200 Mb and the large ones are in teh order of 800-1Gb Thanks a lot in advance Nitin
RE: Solr4 performance
Hi, As for your first question, setting openSearcher to true means you will see the new docs after every hard commit. Soft and hard commits only become isolated from one another with that set to false. Your second problem might be explained by your large heap and garbage collection. Walking a heap that large can take an appreciable amount of time. You might consider turning on the JVM options for logging GC and seeing if you can correlate your slow responses to times when your JVM is garbage collecting. Hope that helps, On Feb 20, 2014 4:52 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi! I have few other questions regarding Solr4 performance issue we're facing. We're committing data to Solr4 every ~30 seconds (up to 20K rows). We use commit=false in update URL. We have only hard commit setting in Solr4 config. autoCommit maxTime${solr.autoCommit.maxTime:60}/maxTime maxDocs10/maxDocs openSearchertrue/openSearcher /autoCommit Since we're not using Soft commit at all (commit=false), the caches will not get reloaded for every commit and recently added documents will not be visible, correct? What we see is queries which usually take few milli seconds, takes ~40 seconds once in a while. Can high IO during hard commit cause queries to slow down? For some shards we see 98% full physical memory. We have 60GB machine (30 GB JVM, 28 GB free RAM, ~35 GB of index). We're ruling out that high physical memory would cause queries to slow down. We're in process of reducing JVM size anyways. We have never run optimization till now. QA optimization didn't yield in performance gain. Thanks much for all help. -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Tuesday, February 18, 2014 4:55 PM To: solr-user@lucene.apache.org Subject: Re: Solr4 performance On 2/18/2014 2:14 PM, Joshi, Shital wrote: Thanks much for all suggestions. We're looking into reducing allocated heap size of Solr4 JVM. We're using NRTCachingDirectoryFactory. Does it use MMapDirectory internally? Can someone please confirm? In Solr, NRTCachingDirectory does indeed use MMapDirectory as its default delegate. That's probably also the case with Lucene -- these are Lucene classes, after all. MMapDirectory is almost always the most efficient way to handle on-disk indexes. Thanks, Shawn
Clearing the suggestion dictionary
Hi all, Given an Suggester SpellCheckComponent how do I clear it? I have a curious issue where suggestion terms are appearing in the dictionary from an unknown source. I've carefully checked that the source fields don't contain the terms but it's not obvious how the dictionary is reset. Restarting solr doesn't seem ot have the desired effect, and running spellcheck.build doesn't seem to remove old terms? Is there an explicit method to clear the suggester dictionary (preferably without restarting solr). -- Hamish Campbell Koordinates Ltd http://koordinates.com/?_bzhc=esig PH +64 9 966 0433 FAX +64 9 966 0045
Re: Tweaking Solr Query Result Cache
What you _do_ want to do is add replicas so you distribute the CPU load across a bunch of machines. The QueryResultCache isn't very useful unless you have multiple queries that 1 reference the _exact_ same query, q, fq, sorting and all 2 don't page very far. This cache really only holds the document (internal Lucene) IDs for a window of hits. So say your window (configured in solrconfig.xml) is set to 50. For each of the query keys, 50 IDs are stored. Next time that exact query comes in, and _assuming_ start+rows 50, you'll get the IDs from the cache and not much action occurs. The design intent here is to satisfy a few pages of results. If you mean by tail queries that there is very little repetition of queries, then why bother with a cache at all? If the hit ratio is going towards 0 it's not doing you enough good to matter. FWIW, Erick On Thu, Feb 20, 2014 at 1:58 PM, KNitin nitin.t...@gmail.com wrote: Hello I have a 4 node cluster running Solr cloud 4.3.1. I have a few large collections sharded 8 ways across all the 4 nodes (with 2 shards per node). The size of the shard for the large collections is around 600-700Mb containing around 250K+ documents. Currently the size of the query cache is around 512. We have a few jobs that run tail queries on these collections. The hit ratio of the cache drops to 0 when running these queries and also at the same time CPU spikes. The latencies are in the order of seconds in the above case. I verified GC behavior is normal (not killing cpu) The following are my questions 1. Is it a good practice to vary the Query Result Cache size based on the size of the collection (large collections have large cache)? 2. If most of your queries are tail queries, what is a good way to make your cache usage effective (higher hits) 3. If lets say all your queries miss the cache, it is an OK behavior if your CPU spikes (to 90+%) 4. Is there a recommended shard size (# of doc, size ) to use. A few of my collections are 100-200 Mb and the large ones are in teh order of 800-1Gb Thanks a lot in advance Nitin
Re: Re: Can set a boost function as a default within requesthandler?
So I'd need to change the query syntax to load and pass *qq *(qq=connectivity) instead of *q* (q=connectivity) ? -- View this message in context: http://lucene.472066.n3.nabble.com/Can-set-a-boost-function-as-a-default-within-requesthandler-tp4118647p4118703.html Sent from the Solr - User mailing list archive at Nabble.com.
how many shards required to search data
Data size is 250 GB of small records. each record is of around 0.3kb size. It consists around 1 billion records. my index has 20 different fields. . Majorly queries will be very simple or spacial queries mainly on on 2-3 fields. all 20 fields will be stored. Any suggestions on how many shards will I need to search data? -- View this message in context: http://lucene.472066.n3.nabble.com/how-many-shards-required-to-search-data-tp4118715.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how many shards required to search data
Shard1 config : Config: 32GB RAM, 4 core Shard2 config : Config: 32GB RAM, 4 core -- View this message in context: http://lucene.472066.n3.nabble.com/how-many-shards-required-to-search-data-tp4118715p4118717.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud can't correctly create collection after zookeeper ensemble recovery
Hi all, This is my test procedure: 1. start a Zookeeper ensemble and a SolrCloud node 2. stop Zookeeper ensemble 3. start Zookeeper ensemble 4. fail to create a collection (with 1 shard and 1 replica) because of timeout 5. restart the SolrCloud node 6. fail to create a collection with the same name in step 4 because the collection already exists. But the collection doesn't assign to any SolrCloud node. I am using Solr 4.6.1 and Zookeeper 3.4.5 Thanks, Chia-Chun
Setting up solr on production server
Hi, I 'm looking for some tips or guidelines to installing solr on the production server. I am currently using jetty in my dev environment. Is it recommended to use tomcat on the production server? Are there are major advantages of using one over another. Thanks J
RE: Setting up solr on production server
You can go ahead with Tomcat by deploying the solr war in it. It is highly scalable. Thanks, SureshKumar.S From: Jay Potharaju jspothar...@gmail.com Sent: Friday, February 21, 2014 11:10 AM To: solr-user@lucene.apache.org Subject: Setting up solr on production server Hi, I 'm looking for some tips or guidelines to installing solr on the production server. I am currently using jetty in my dev environment. Is it recommended to use tomcat on the production server? Are there are major advantages of using one over another. Thanks J [Aspire Systems] This e-mail message and any attachments are for the sole use of the intended recipient(s) and may contain proprietary, confidential, trade secret or privileged information. Any unauthorized review, use, disclosure or distribution is prohibited and may be a violation of law. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.
Setting up solr with IBM Websphere 7
Hi, I have a requirement to setup solr in IBM websphere server 7.x. Has anybody done the same in your project? Is there any blog/ link with the set of instructions for doing the same? Please advice. Thanks, Prasi
Re: Setting up solr on production server
On 2/20/2014 10:40 PM, Jay Potharaju wrote: I 'm looking for some tips or guidelines to installing solr on the production server. I am currently using jetty in my dev environment. Is it recommended to use tomcat on the production server? Are there are major advantages of using one over another. The recommendation is to use the jetty (version 8) that comes with Solr. It has had a number of unnecessary components removed, and its config has had a few things tweaked, but otherwise it is unchanged from what you get if you download the same version from www.eclipse.org. Jetty is not a toy servlet container. It is a battle-tested enterprise-ready system. It is also the only servlet container that gets officially tested with Solr. I have been using Solr under jetty for nearly four years. If you don't want to mess with creating your own startup scripts, you could use a packaged jetty made for (or provided by) your operating system, but the configuration will probably need tweaking. At the very least you may need to increase maxThreads. There's only one good reason I can think of to use something like Tomcat. That is when you already know another servlet container *really* well, and know how to tune it for varying application requirements. Thanks, Shawn
Converting solrdocument response into pojo
hi, I m using solr for searching... here I used for search names on the basis of their locations so i get response in docs list containing solrdocument like response = {docs =[{SolrDocument[name=abcd id=[6,6,],..},{SolrDocument[name=xyz id=435,..},]} at the time of getting this type of response into response.getBeans(Pojo.class) it throws exception to id field..*id=[6,6]*. my pojo is class Pojo{ @field(name) private String name; @field(id) private Integer id; . . . } So how can I resolve this exceptionplease help me ASAP... Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Converting-solrdocument-response-into-pojo-tp4118743.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how many shards required to search data
On 2/20/2014 9:24 PM, search engn dev wrote: Data size is 250 GB of small records. each record is of around 0.3kb size. It consists around 1 billion records. my index has 20 different fields. . Majorly queries will be very simple or spacial queries mainly on on 2-3 fields. all 20 fields will be stored. Any suggestions on how many shards will I need to search data? Your question is impossible to answer. I will tell you that this is a very big index, and it's going to take a lot of hardware. It's not the biggest I've heard of, but it is quite large. Any situation that would result in a performance issue on a small index is going to be far worse on a large index. http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-ont-have-a-definitive-answer/ Two machines with 32GB of RAM each are not going to be anywhere near enough. If you can't get more than 32GB of RAM in each server, you're probably going to need a lot of them. Since all your fields will be stored, the *minimum* size of your index will be approximately equal to the original data size after compression -- assuming you're using 4.1.0 or later, where compression was introduced. That will not be the end, though -- it doesn't take into account the size of the *indexed* data. Although it is theoretically possible to look at a schema and the original data to calculate the size of the indexed data, in reality the only way to be SURE is to actually index a significant percentage of your real data with the same schema you would use in production. Once you know how big your index is actually going to be, you can begin to figure out how much total RAM you'll need across all the servers for a single copy of the index (no redundancy). If you want redundancy, the requirements will be at least twice what you calculate. http://wiki.apache.org/solr/SolrPerformanceProblems The number of shards and replicas that you're going to need is going to depend on the query volume, the nature of the queries, and the nature of the data. Just like with index size, the only way to know is to try it with all your real data. If your query volume is large, you'll need multiple copies of the complete index, which means more servers. If you don't care how long each query takes and your query volume will be low, then your server requirements will be a LOT smaller. Thanks, Shawn
Re: Converting solrdocument response into pojo
Looks like at least one of your documents has multiple values in ID field (all valued as 6...) , but your POJO is expecting one. You may want to check your schema definition to ensure it does not allow multiples and also your indexing process to identify why one got through. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Feb 21, 2014 at 6:02 PM, Navaa navnath.thomb...@xtremumsolutions.com wrote: hi, I m using solr for searching... here I used for search names on the basis of their locations so i get response in docs list containing solrdocument like response = {docs =[{SolrDocument[name=abcd id=[6,6,],..},{SolrDocument[name=xyz id=435,..},]} at the time of getting this type of response into response.getBeans(Pojo.class) it throws exception to id field..*id=[6,6]*. my pojo is class Pojo{ @field(name) private String name; @field(id) private Integer id; . . . } So how can I resolve this exceptionplease help me ASAP... Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Converting-solrdocument-response-into-pojo-tp4118743.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Re: Can set a boost function as a default within requesthandler?
Hi Peter, Yes you are correct. Send qq=connectivity with the following defaults section lst name=defaults str name=echoParamsexplicit/str str name=fl*,score/str str name=q{!type=boost b=product(answeredStatus, articleType) v=$qq}/str /lst Alternatively, you can switch to edismax query parser. It has native/built in boost and bf parameters. Former is multiplicative boost, latter is additive boost. Ahmet On Friday, February 21, 2014 3:36 AM, Peter Dodd pd...@microsoft.com wrote: So I'd need to change the query syntax to load and pass *qq *(qq=connectivity) instead of *q* (q=connectivity) ? -- View this message in context: http://lucene.472066.n3.nabble.com/Can-set-a-boost-function-as-a-default-within-requesthandler-tp4118647p4118703.html Sent from the Solr - User mailing list archive at Nabble.com.