Re: Bad contentType for search handler :text/xml; charset=UTF-8
On Wed, Apr 22, 2015 at 4:17 PM, Yonik Seeley ysee...@gmail.com wrote: On Wed, Apr 22, 2015 at 11:00 AM, didier deshommes dfdes...@gmail.com wrote: curl http://localhost:8983/solr/gettingstarted/select?wt=jsonindent=trueq=foundation -H Content-type:application/json You're telling Solr the body encoding is JSON, but then you don't send any body. We could catch that error earlier perhaps, but it still looks like an error? Agreed, it's still an error. But the traceback looks like something horrible has happened to Solr, and is not particularly informative to the user. An error message like Empty request body would help. I suspect that this issue about content-type may come up again in libraries that interact with Solr since its behavior pre-5.1 was to just ignore empty body requests and return the response anyway. -Yonik
Re: Bad contentType for search handler :text/xml; charset=UTF-8
A similar problem seems to happen when sending application/json to the search handler. Solr returns a NullPointerException for some reason: vagrant@precise64:~/solr-5.1.0$ curl http://localhost:8983/solr/gettingstarted/select?wt=jsonindent=trueq=foundation; -H Content-type:application/json { responseHeader:{ status:500, QTime:2, params:{ indent:true, json:, q:foundation, wt:json}}, error:{ trace:java.lang.NullPointerException\n\tat org.apache.solr.request.json.ObjectUtil$ConflictHandler.mergeMap(ObjectUtil.java:60)\n\tat org.apache.solr.request.json.ObjectUtil.mergeObjects(ObjectUtil.java:114)\n\tat org.apache.solr.request.json.RequestUtil.mergeJSON(RequestUtil.java:259)\n\tat org.apache.solr.request.json.RequestUtil.processParams(RequestUtil.java:176)\n\tat org.apache.solr.util.SolrPluginUtils.setDefaults(SolrPluginUtils.java:166)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:140)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:368)\n\tat org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)\n\tat org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\n\tat org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)\n\tat org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)\n\tat org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)\n\tat org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)\n\tat org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)\n\tat org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)\n\tat java.lang.Thread.run(Thread.java:745)\n, code:500}} On Wed, Apr 22, 2015 at 9:41 AM, Walter Underwood wun...@wunderwood.org wrote: text/xml is not a safe content-type, because of the way that HTTP handles charsets. Always use application/xml. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Apr 22, 2015, at 3:01 AM, bengates benga...@aliceadsl.fr wrote: Looks like Solarium hardcodes a default header Content-Type: text/xml; charset=utf-8 if none provided. Removing it solves the problem. It seems that Solr 5.1 doesn't support this content-type. -- View this message in context: http://lucene.472066.n3.nabble.com/Bad-contentType-for-search-handler-text-xml-charset-UTF-8-tp4200314p4201579.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr cloud does not start with many collections
It would be a huge step forward if one could have several hundreds of Solr collections, but only have a small portion of them opened/loaded at the same time. This is similar to ElasticSearch's close index api, listed here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-open-close.html . I've opened an issue to implement the same in Solr here a few months ago: https://issues.apache.org/jira/browse/SOLR-6399 On Thu, Mar 5, 2015 at 4:42 PM, Damien Kamerman dami...@gmail.com wrote: I've tried a few variations, with 3 x ZK, 6 X nodes, solr 4.10.3, solr 5.0 without any success and no real difference. There is a tipping point at around 3,000-4,000 cores (varies depending on hardware) from where I can restart the cloud OK within ~4min, to the cloud not working and continuous 'conflicting information about the leader of shard' warnings. On 5 March 2015 at 14:15, Shawn Heisey apa...@elyograg.org wrote: On 3/4/2015 5:37 PM, Damien Kamerman wrote: I'm running on Solaris x86, I have plenty of memory and no real limits # plimit 15560 15560: /opt1/jdk/bin/java -d64 -server -Xss512k -Xms32G -Xmx32G -XX:MaxMetasp resource current maximum time(seconds) unlimited unlimited file(blocks) unlimited unlimited data(kbytes) unlimited unlimited stack(kbytes) unlimited unlimited coredump(blocks) unlimited unlimited nofiles(descriptors) 65536 65536 vmemory(kbytes) unlimited unlimited I've been testing with 3 nodes, and that seems OK up to around 3,000 cores total. I'm thinking of testing with more nodes. I have opened an issue for the problems I encountered while recreating a config similar to yours, which I have been doing on Linux. https://issues.apache.org/jira/browse/SOLR-7191 It's possible that the only thing the issue will lead to is improvements in the documentation, but I'm hopeful that there will be code improvements too. Thanks, Shawn -- Damien Kamerman
Re: Unload collection in SolrCloud
I added a JIRA issue here: https://issues.apache.org/jira/browse/SOLR-6399 On Thu, May 22, 2014 at 4:16 PM, Erick Erickson erickerick...@gmail.com wrote: Age out in this context is just implementing a LRU cache for open cores. When the cache limit is exceeded, the oldest core is closed automatically. Best, Erick On Thu, May 22, 2014 at 10:27 AM, Saumitra Srivastav saumitra.srivast...@gmail.com wrote: Eric, Can you elaborate more on what you mean by age out? -- View this message in context: http://lucene.472066.n3.nabble.com/Unload-collection-in-SolrCloud-tp4135706p4137707.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unload collection in SolrCloud
On Thu, May 22, 2014 at 10:30 AM, Erick Erickson erickerick...@gmail.comwrote: If we manage to extend the lazy core loading from stand-alone to lazy collection loading in SolrCloud would that satisfy the use-case? It still doesn't allow manual unloading of the collection, but the large collection would age out if it was truly not used all that much. That said, I don't know if there's active work in this direction right now. This is a nice option to have but having the ability to manually load and unload a collection is still needed. For example, if you're doing analytics work and storing a day's data in a collection, you still want to be able to access day 32 and day 64 even if you keep 30 days of data loaded in memory. I also think having a manual option would allow people more flexibility in how they manage the number of collections they keep loaded. Best, Erick On Thu, May 22, 2014 at 5:35 AM, Saumitra Srivastav saumitra.srivast...@gmail.com wrote: Yes, that's what I am doing. IMO in addition to search, Solr satisfies the needs of lot of analytics applications as well, and on-demand loading is a common use case in analytics(to keep TCO low), so it would be nice to keep this supported. Regards, Saumitra On Thu, May 22, 2014 at 5:37 PM, Shalin Shekhar Mangar [via Lucene] ml-node+s472066n4137630...@n3.nabble.com wrote: Ah, I see. So if I understand it correctly, you are sharing the cluster with other collections which are more frequently used and you want to keep resources available for them so you keep your collection dormant most of the time until requested. No, we don't have such an API. It'd be cool to have a lazy loaded collection though. Thank you for describing the use-case because the way that we're moving towards (ZK as truth etc.), the core admin APIs will gradually be phased out and satisfying your use-case would become impossible. Let me think more on this. On Thu, May 22, 2014 at 4:57 PM, Saumitra Srivastav [hidden email] http://user/SendEmail.jtp?type=nodenode=4137630i=0 wrote: I don't want to delete the collection/shards. I just want to unload all shards/replica of the collection temporarily. Let me explain my use case. I have a collection alias say *collectionA* which consists of n collections(n=5) each with 8 shards and 2 replica over a 16 machine cluster. *collectionA* is quite big in size and used very rarely, so we keep all shards/replica of *collectionA* unloaded most of the time. Only when user request to use it, we load it in memory. To load/unload shards/replica of aliased *collectionA*, we use CLUSTERSTATUS api to get list of all shards/replicas in aliased collection and then use CORE ADMIN api to load/unload them. As you can see there is lot of manual work involved, so I want to know if there is an API to load/unload ALL shards/replicas of a collection? Regards, Saumitra On Thu, May 22, 2014 at 4:36 PM, Shalin Shekhar Mangar [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=4137630i=1 wrote: You can use the delete Collection API. https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api6 On Thu, May 22, 2014 at 3:56 PM, Saumitra Srivastav [hidden email] http://user/SendEmail.jtp?type=nodenode=4137608i=0 wrote: Guys, any suggestions for this?? -- View this message in context: http://lucene.472066.n3.nabble.com/Unload-collection-in-SolrCloud-tp4135706p4137602.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Unload-collection-in-SolrCloud-tp4135706p4137608.html To start a new topic under Solr - User, email [hidden email] http://user/SendEmail.jtp?type=nodenode=4137630i=2 To unsubscribe from Unload collection in SolrCloud, click here . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Unload-collection-in-SolrCloud-tp4135706p4137612.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- If you reply to this email, your message will be added to the
Re: Solrcloud - adding a node as a replica?
Thanks Furkan, That's exactly what I was looking for. On Wed, Sep 18, 2013 at 4:21 PM, Furkan KAMACI furkankam...@gmail.comwrote: Are yoh looking for that: http://lucene.472066.n3.nabble.com/SOLR-Cloud-Collection-Management-quesiotn-td4063305.html 18 Eylül 2013 Çarşamba tarihinde didier deshommes dfdes...@gmail.com adlı kullanıcı şöyle yazdı: Hi, How do I add a node as a replica to a solrcloud cluster? Here is my situation: some time ago, I created several collections with replicationFactor=2. Now I need to add a new replica. I thought just starting a new node and re-using the same zokeeper instance would make it automatically a replica, but that isn't the case. Do I need to delete and re-create my collections with the right replicationFactor (3 in this case) again? I am using solr 4.3.0. Thanks, didier
Solrcloud - adding a node as a replica?
Hi, How do I add a node as a replica to a solrcloud cluster? Here is my situation: some time ago, I created several collections with replicationFactor=2. Now I need to add a new replica. I thought just starting a new node and re-using the same zokeeper instance would make it automatically a replica, but that isn't the case. Do I need to delete and re-create my collections with the right replicationFactor (3 in this case) again? I am using solr 4.3.0. Thanks, didier
Re: Collection - loadOnStartup
For Solr 4.3.0, I don't think you can pass loadOnStartup to the Collections API, although the Cores API accepts it. That's been my experience anyway. On Mon, Aug 5, 2013 at 6:27 AM, Srivatsan ranjith.venkate...@gmail.comwrote: No errors in zookeeper and solr. I m using CloudSolrServer for creating collections as said above.I just want to set loadOnStartup to false for cores in solr.xml. I dont want all cores to loadonstartup. Hence when creating collection, i m trying to set this parameter to false. But still i m getting same value for loadOnStartup in solr.xml -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-loadOnStartup-tp4082531p4082546.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: transientCacheSize doesn't seem to have any effect, except on startup
Any idea on this? I still cannot get the combination of transient cores and transientCacheSize to work as I think it should: give me the ability to create a large number cores and automatically load and unload them for me based on a limit that I set. If anyone else is using this feature and it is working for you, let me know how you got it working! On Fri, May 3, 2013 at 2:11 PM, didier deshommes dfdes...@gmail.com wrote: On Fri, May 3, 2013 at 11:18 AM, Erick Erickson erickerick...@gmail.comwrote: The cores aren't loaded (or at least shouldn't be) for getting the status. The _names_ of the cores should be returned, but those are (supposed) to be retrieved from a list rather than loaded cores. So are you sure that's not what you are seeing? How are you determining whether the cores are actually loaded or not? I'm looking at the output of : $ curl http://localhost:8983/solr/admin/cores?wt=jsonaction=status; cores that are loaded have a startTime and upTime value. Cores that are unloaded don't appear in the output at all. For example, I created 3 transient cores with transientCacheSize=2 . When I asked for a list of all cores, all 3 cores were returned. I explicitly unloaded 1 core and got back 2 cores when I asked for the list again. It would be nice if cores had a isTransient and a isCurrentlyLoaded value so that one could see exactly which cores are loaded. That said, it's perfectly possible that the status command is doing something we didn't anticipate, but I took a quick look at the code (got to rush to a plane) and CoreAdminHandler _appears_ to be just returning whatever info it can about an unloaded core for status. I _think_ you'll get more info if the core has ever been loaded though, even though if it's been removed from the transient cache. Ditto for the create action. So let's figure out whether you're really seeing loaded cores or not, and then raise a JIRA if so... Thanks for reporting! Erick On Thu, May 2, 2013 at 1:27 PM, didier deshommes dfdes...@gmail.com wrote: Hi, I've been very interested in the transient core feature of solr to manage a large number of cores. I'm especially interested in this use case, that the wiki lists at http://wiki.apache.org/solr/LotsOfCores (looks to be down now): loadOnStartup=false transient=true: This is really the use-case. There are a large number of cores in your system that are short-duration use. You want Solr to load them as necessary, but unload them when the cache gets full on an LRU basis. I'm creating 10 transient core via core admin like so $ curl http://localhost:8983/solr/admin/cores?wt=jsonaction=CREATEname=new_core2instanceDir=collection1/dataDir=new_core2transient=trueloadOnStartup=false and have transientCacheSize=2 in my solr.xml file, which I take means I should have at most 2 transient cores loaded at any time. The problem is that these cores are still loaded when when I ask solr to list cores: $ curl http://localhost:8983/solr/admin/cores?wt=jsonaction=status; From the explanation in the wiki, it looks like solr would manage loading and unloading transient cores for me without having to worry about them, but this is not what's happening. The situation is different when I restart solr; it does the right thing by loading the maximum cores set by transientCacheSize. When I add more cores, the old behavior happens again, where all created transient cores are loaded in solr. I'm using the development branch lucene_solr_4_3 to run my example. I can open a jira if need be.
Re: transientCacheSize doesn't seem to have any effect, except on startup
On Fri, May 3, 2013 at 11:18 AM, Erick Erickson erickerick...@gmail.comwrote: The cores aren't loaded (or at least shouldn't be) for getting the status. The _names_ of the cores should be returned, but those are (supposed) to be retrieved from a list rather than loaded cores. So are you sure that's not what you are seeing? How are you determining whether the cores are actually loaded or not? I'm looking at the output of : $ curl http://localhost:8983/solr/admin/cores?wt=jsonaction=status; cores that are loaded have a startTime and upTime value. Cores that are unloaded don't appear in the output at all. For example, I created 3 transient cores with transientCacheSize=2 . When I asked for a list of all cores, all 3 cores were returned. I explicitly unloaded 1 core and got back 2 cores when I asked for the list again. It would be nice if cores had a isTransient and a isCurrentlyLoaded value so that one could see exactly which cores are loaded. That said, it's perfectly possible that the status command is doing something we didn't anticipate, but I took a quick look at the code (got to rush to a plane) and CoreAdminHandler _appears_ to be just returning whatever info it can about an unloaded core for status. I _think_ you'll get more info if the core has ever been loaded though, even though if it's been removed from the transient cache. Ditto for the create action. So let's figure out whether you're really seeing loaded cores or not, and then raise a JIRA if so... Thanks for reporting! Erick On Thu, May 2, 2013 at 1:27 PM, didier deshommes dfdes...@gmail.com wrote: Hi, I've been very interested in the transient core feature of solr to manage a large number of cores. I'm especially interested in this use case, that the wiki lists at http://wiki.apache.org/solr/LotsOfCores (looks to be down now): loadOnStartup=false transient=true: This is really the use-case. There are a large number of cores in your system that are short-duration use. You want Solr to load them as necessary, but unload them when the cache gets full on an LRU basis. I'm creating 10 transient core via core admin like so $ curl http://localhost:8983/solr/admin/cores?wt=jsonaction=CREATEname=new_core2instanceDir=collection1/dataDir=new_core2transient=trueloadOnStartup=false and have transientCacheSize=2 in my solr.xml file, which I take means I should have at most 2 transient cores loaded at any time. The problem is that these cores are still loaded when when I ask solr to list cores: $ curl http://localhost:8983/solr/admin/cores?wt=jsonaction=status; From the explanation in the wiki, it looks like solr would manage loading and unloading transient cores for me without having to worry about them, but this is not what's happening. The situation is different when I restart solr; it does the right thing by loading the maximum cores set by transientCacheSize. When I add more cores, the old behavior happens again, where all created transient cores are loaded in solr. I'm using the development branch lucene_solr_4_3 to run my example. I can open a jira if need be.
transientCacheSize doesn't seem to have any effect, except on startup
Hi, I've been very interested in the transient core feature of solr to manage a large number of cores. I'm especially interested in this use case, that the wiki lists at http://wiki.apache.org/solr/LotsOfCores (looks to be down now): loadOnStartup=false transient=true: This is really the use-case. There are a large number of cores in your system that are short-duration use. You want Solr to load them as necessary, but unload them when the cache gets full on an LRU basis. I'm creating 10 transient core via core admin like so $ curl http://localhost:8983/solr/admin/cores?wt=jsonaction=CREATEname=new_core2instanceDir=collection1/dataDir=new_core2transient=trueloadOnStartup=false and have transientCacheSize=2 in my solr.xml file, which I take means I should have at most 2 transient cores loaded at any time. The problem is that these cores are still loaded when when I ask solr to list cores: $ curl http://localhost:8983/solr/admin/cores?wt=jsonaction=status; From the explanation in the wiki, it looks like solr would manage loading and unloading transient cores for me without having to worry about them, but this is not what's happening. The situation is different when I restart solr; it does the right thing by loading the maximum cores set by transientCacheSize. When I add more cores, the old behavior happens again, where all created transient cores are loaded in solr. I'm using the development branch lucene_solr_4_3 to run my example. I can open a jira if need be.
Re: transientCacheSize not working
I've created an issue and patch here that makes it possible to specify transient and loadOnStatup on core creation: https://issues.apache.org/jira/browse/SOLR-4631 On Wed, Mar 20, 2013 at 10:14 AM, didier deshommes dfdes...@gmail.comwrote: Thanks. Is there a way to pass loadOnStartup and/or transient as parameters to the core admin http api? This doesn't seem to work: curl http://localhost:8983/solr/admin/cores?action=CREATEtransient=truename=c1 On Tue, Mar 19, 2013 at 7:29 PM, Mark Miller markrmil...@gmail.comwrote: I don't think SolrCloud works with the transient stuff. - Mark On Mar 19, 2013, at 8:04 PM, didier deshommes dfdes...@gmail.com wrote: Hi, I cannot get Solrcloud to respect transientCacheSize when creating multiple cores via the web api. I'm runnig solr 4.2 like this: java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=conf1 -DzkRun -DnumShards=1 -jar start.jar I'm creating multiple cores via the core admin http api: curl http://localhost:8983/solr/admin/cores?action=CREATEname=tmp1 curl http://localhost:8983/solr/admin/cores?action=CREATEname=tmp2 curl http://localhost:8983/solr/admin/cores?action=CREATEname=tmp3 My solr.xml looks like: ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores transientCacheSize=2 adminPath=/admin/cores shareSchema=true zkClientTimeout=${zkClientTimeout:15000} hostPort=8983 hostContext=solr /cores /solr When I list all cores currently loaded, via curl http://localhost:8983/solr/admin/cores?action=status , I notice that all 3 cores are still running, even though transientCacheSize is 2. Can anyone tell me why that is? Also, is there a way to pass loadOnStartup and transient to the core admin http api? Specifying these when creating a core doesn't seem to work: curl http://localhost:8983/solr/admin/cores?action=CREATEtransient=true Thanks, didier
Re: transientCacheSize not working
Thanks. Is there a way to pass loadOnStartup and/or transient as parameters to the core admin http api? This doesn't seem to work: curl http://localhost:8983/solr/admin/cores?action=CREATEtransient=truename=c1 On Tue, Mar 19, 2013 at 7:29 PM, Mark Miller markrmil...@gmail.com wrote: I don't think SolrCloud works with the transient stuff. - Mark On Mar 19, 2013, at 8:04 PM, didier deshommes dfdes...@gmail.com wrote: Hi, I cannot get Solrcloud to respect transientCacheSize when creating multiple cores via the web api. I'm runnig solr 4.2 like this: java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=conf1 -DzkRun -DnumShards=1 -jar start.jar I'm creating multiple cores via the core admin http api: curl http://localhost:8983/solr/admin/cores?action=CREATEname=tmp1 curl http://localhost:8983/solr/admin/cores?action=CREATEname=tmp2 curl http://localhost:8983/solr/admin/cores?action=CREATEname=tmp3 My solr.xml looks like: ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores transientCacheSize=2 adminPath=/admin/cores shareSchema=true zkClientTimeout=${zkClientTimeout:15000} hostPort=8983 hostContext=solr /cores /solr When I list all cores currently loaded, via curl http://localhost:8983/solr/admin/cores?action=status , I notice that all 3 cores are still running, even though transientCacheSize is 2. Can anyone tell me why that is? Also, is there a way to pass loadOnStartup and transient to the core admin http api? Specifying these when creating a core doesn't seem to work: curl http://localhost:8983/solr/admin/cores?action=CREATEtransient=true Thanks, didier
transientCacheSize not working
Hi, I cannot get Solrcloud to respect transientCacheSize when creating multiple cores via the web api. I'm runnig solr 4.2 like this: java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=conf1 -DzkRun -DnumShards=1 -jar start.jar I'm creating multiple cores via the core admin http api: curl http://localhost:8983/solr/admin/cores?action=CREATEname=tmp1 curl http://localhost:8983/solr/admin/cores?action=CREATEname=tmp2 curl http://localhost:8983/solr/admin/cores?action=CREATEname=tmp3 My solr.xml looks like: ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores transientCacheSize=2 adminPath=/admin/cores shareSchema=true zkClientTimeout=${zkClientTimeout:15000} hostPort=8983 hostContext=solr /cores /solr When I list all cores currently loaded, via curl http://localhost:8983/solr/admin/cores?action=status , I notice that all 3 cores are still running, even though transientCacheSize is 2. Can anyone tell me why that is? Also, is there a way to pass loadOnStartup and transient to the core admin http api? Specifying these when creating a core doesn't seem to work: curl http://localhost:8983/solr/admin/cores?action=CREATEtransient=true Thanks, didier
Re: Cache replication
Consider putting a cache (memcached, redis, etc) *in front* of your solr slaves. Just make sure to update it when replication occurs. didier On Tue, Aug 9, 2011 at 6:07 PM, arian487 akarb...@tagged.com wrote: I'm wondering if the caches on all the slaves are replicated across (such as queryResultCache). That is to say, if I hit one of my slaves and cache a result, and I make a search later and that search happens to hit a different slave, will that first cached result be available for use? This is pretty important because I'm going to have a lot of slaves and if this isn't done, then I'd have a high chance of running a lot uncached queries. Thanks :) -- View this message in context: http://lucene.472066.n3.nabble.com/Cache-replication-tp3240708p3240708.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: QTime Solr Query
On Thu, Feb 10, 2011 at 4:08 PM, Stijn Vanhoorelbeke stijn.vanhoorelb...@gmail.com wrote: Hi, I've done some stress testing onto my solr system ( running in the ec2 cloud ). From what I've noticed during the tests, the QTime drops to just 1 or 2 ms ( on a index of ~2 million documents ). My first thought pointed me to the different Solr caches; so I've disabled all of them. Yet QTime stays low. Then the Lucence internal Field Cache came into sight. This cache is hidden deep into Lucence and is not configurable trough Solr. To cope with this I thought I would lower the memory allocated to Solr - that way a smaller cache is forced. But yet QTime stays low. When stress-testing Solr, I usually flush the OS cache also. This is the command to do it on linux: # sync; echo 3 /proc/sys/vm/drop_caches didier Can Solr be so fast to retrieve queries in just 1/2 ms - even if I only allocate 100 Mb to Solr?
multiple cores, solr.xml and replication
Hi there, I noticed that the java-based replication does not make replication of multiple core automatic. For example, if I have a master with 7 cores, any slave I set up has to explicitly know about each of the 7 cores to be able to replicate them. This information is stored in solr.xml, and since this file is out of the conf/ directory, it's impossible to make the java-based replication copy this file over each slave. Is this by design? For those of you doing multicore replication, how do you handle it? Is overwriting solr.xml when persist=true is used thread-safe? What happens if I create 2 different cores at the same time? I ask because I have 7 cores total and I always end with only 2 or 3 cores in my solr.xml after doing a bulk delta-import across cores. didier
Re: multiple cores, solr.xml and replication
On Thu, Oct 21, 2010 at 3:00 PM, Shawn Heisey s...@elyograg.org wrote: On 10/21/2010 1:42 PM, didier deshommes wrote: I noticed that the java-based replication does not make replication of multiple core automatic. For example, if I have a master with 7 cores, any slave I set up has to explicitly know about each of the 7 cores to be able to replicate them. This information is stored in solr.xml, and since this file is out of the conf/ directory, it's impossible to make the java-based replication copy this file over each slave. Is this by design? For those of you doing multicore replication, how do you handle it? My slave replication handler looks like this, used for all cores. The solr.core.name parameter is dynamically replaced with the name of the current core: I use this configuration too but doesn't this assume that solr.xml is the same in master and slave? what happens when master creates a new core? didier requestHandler name=/replication class=solr.ReplicationHandler lst name=slave str name=masterUrlhttp://HOST:8983/solr/${solr.core.name}/replication/str str name=pollInterval00:00:15/str /lst /requestHandler Shawn
Re: Jetty rerturning HTTP error code 413
Hi Alexandre, Have you tried setting a higher headerBufferSize? Look in etc/jetty.xml and search for 'headerBufferSize'; I think it controls the size of the url. By default it is 8192. didier On Wed, Aug 18, 2010 at 2:43 PM, Alexandre Rocco alel...@gmail.com wrote: Guys, We are facing an issue executing very large query (~4000 bytes in the URL) in Solr. When we execute the query, Solr (probably Jetty) returns a HTTP 413 error (FULL HEAD). I guess that this is related to the very big query being executed, and currently we can't make it short. Is there any configuration that need to be tweaked on Jetty or other component to make this query work? Any advice is really appreciated. Thanks! Alexandre Rocco
Re: help finding illegal chars in XML doc
For xml 1.1 documents, you can view if any of your documents have these restricted characters defined here: http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-RestrictedChar If they are, you'll have to remove them. didier On Sun, Jul 18, 2010 at 11:16 AM, robert mena robert.m...@gmail.com wrote: Hi, I am doing some tests with solr 1.4.1. I've created a XML file with the documents I'd like to index. With a few items (1000) everything went fine. When I went to a more representative import (around 6) I got error java -jar example/exampledocs/post.jar doc.xml SimplePostTool: version 1.2 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported SimplePostTool: POSTing files to http://localhost:8983/solr/update.. SimplePostTool: POSTing file add.xml SimplePostTool: FATAL: Solr returned an error: Illegal_character_CTRLCHAR_code_27__at_rowcol_unknownsource_37022847 I've tried to track where this problem is located without luck. Any ideas?
Re: how to get tf-idf values in solr
Have you taken a look at Solr's TermVector component? It's probably what you want: http://wiki.apache.org/solr/TermVectorComponent didier On Tue, Jun 15, 2010 at 8:38 AM, sarfaraz masood sarfarazmasood2...@yahoo.com wrote: I am Sarfaraz, working on a Search Engine project which is based on Nutch Solr. I am trying to implement a new Search Algorithm for this engine. Our search engine is crawling the web and storing the documents in form of large strings in the database indexed by their urls. Now to implement my algorithm i need tf - idf values(0 - 1) for each document given by the crawler. but i m unable to find any method in solr or lucene which can serve my purpose. For my algorithm i need to maintain a relevance matrix of the following type : eg term1 term2 term3 term4... url1 0.7 0.8 0.3 0.1 url2 0.4 0.1 0.4 0.5 url3 . . . and for this purpose i need a core java method/function in solr that returns me the tf idf values for all terms in all documents for the available document list.. Plz help I will highly grateful to you all -Sarfaraz Masood
Re: Multiple Cores Vs. Single Core for the following use case
On Wed, Jan 27, 2010 at 9:48 AM, Matthieu Labour matthieu_lab...@yahoo.com wrote: What I am trying to understand is the search/filter algorithm. If I have 1 core with all documents and I search for Paris for userId=123, is lucene going to first search for all Paris documents and then apply a filter on the userId ? If this is the case, then I am better off having a specific index for the user=123 because this will be faster If you want to apply the filter to userid first, use filter queries (http://wiki.apache.org/solr/CommonQueryParameters#fq). This will filter by userid first then search for Paris. didier --- On Wed, 1/27/10, Marc Sturlese marc.sturl...@gmail.com wrote: From: Marc Sturlese marc.sturl...@gmail.com Subject: Re: Multiple Cores Vs. Single Core for the following use case To: solr-user@lucene.apache.org Date: Wednesday, January 27, 2010, 2:22 AM In case you are going to use core per user take a look to this patch: http://wiki.apache.org/solr/LotsOfCores Trey-13 wrote: Hi Matt, In most cases you are going to be better off going with the userid method unless you have a very small number of users and a very large number of docs/user. The userid method will likely be much easier to manage, as you won't have to spin up a new core every time you add a new user. I would start here and see if the performance is good enough for your requirements before you start worrying about it not being efficient. That being said, I really don't have any idea what your data looks like. How many users do you have? How many documents per user? Are any documents shared by multiple users? -Trey On Tue, Jan 26, 2010 at 7:27 PM, Matthieu Labour matthieu_lab...@yahoo.comwrote: Hi Shall I set up Multiple Core or Single core for the following use case: I have X number of users. When I do a search, I always know for which user I am doing a search Shall I set up X cores, 1 for each user ? Or shall I set up 1 core and add a userId field to each document? If I choose the 1 core solution then I am concerned with performance. Let's say I search for NewYork ... If lucene returns all New York matches for all users and then filters based on the userId, then this is going to be less efficient than if I have sharded per user and send the request for New York to the user's core Thank you for your help matt -- View this message in context: http://old.nabble.com/Multiple-Cores-Vs.-Single-Core-for-the-following-use-case-tp27332288p27335403.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr perf
Have you tried loading solr instances as you need them and unloading those that are not being used? I wish I could help more, I don't know many people running that many use cores. didier On Sun, Dec 20, 2009 at 2:38 PM, Matthieu Labour matth...@strateer.com wrote: Hi I have a slr instance in which i created 700 core. 1 Core per user of my application. The total size of the data indexed on disk is 35GB with solr cores going from 100KB and few documents to 1.2GB and 50 000 documents. Searching seems very slow and indexing as well This is running on a EC2 xtra large instance (6CPU, 15GB Memory, Raid0 disk) I would appreciate if anybody has some tips, articles etc... as what to do to understand and improve performance Thank you
Re: question about merging indexes
On Sun, Oct 25, 2009 at 1:15 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I need some help about the mergeindex command. I have 2 cores A and B : that I want to merge into a new index RES. A has 100 docs and B 10 : docs. All of B's docs are from A, except that one attribute is : changed. The goal is to bring the updated attributes from B into A. that's not how mergeindex works ... merging two indexes is essentially just adding all the docs from one index into the other (but w/o the reindexing step - it works by copying the raw term info) There is no way to modify a doc once it's been indexed. Oh thanks. This is definitely not what I need then. Thanks for the clarification! didier : When I issue the mergeindexes command my RES core only has 10 docs. I : expect RES to have 100 or even 110 docs, but 10 is very puzzling. Am : I misunderstanding something about merging indexes? what exactly was the command you used to do the merge? you should have gotten 110 docs. -Hoss
question about merging indexes
Hi there, I need some help about the mergeindex command. I have 2 cores A and B that I want to merge into a new index RES. A has 100 docs and B 10 docs. All of B's docs are from A, except that one attribute is changed. The goal is to bring the updated attributes from B into A. When I issue the mergeindexes command my RES core only has 10 docs. I expect RES to have 100 or even 110 docs, but 10 is very puzzling. Am I misunderstanding something about merging indexes? What I really want to do is to be able to merge 2 cores, but it looks like this is still in the works (solr-1331). Thanks! didier
indexing frequently-changing fields
I am using Solr to index data in a SQL database. Most of the data doesn't change after initial commit, except for a single boolean field that indicates whether an item is flagged as 'needing attention'. So I have a need_attention field in the database that I update whenever a user marks an item as needing attention in my UI. The problem I have is that I want to offer the ability to include need_attention in my user's queries, but do not want to incur the expense of having to reindex whenever this flag changes on an individual document. I have thought about different solutions to this problem, including using multi-core and having a smaller core for recently-marked items that I am willing to do 'near-real-time' commits on. Are there are any common solutions to this problem, which I have to imagine is common in this community?
OutOfMemoryError due to auto-warming
Hi there, We are running solr and allocating 1GB to it and we keep having OutOfMemoryErrors. We get messages like this: Error during auto-warming of key:org.apache.solr.search.queryresult...@c785194d:java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.lt;initgt;(String.java:216) at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122) at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:169) at org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:701) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:676) at org.apache.solr.search.MissingLastOrdComparator.setNextReader(MissingStringLastComparatorSource.java:181) at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:252) at org.apache.lucene.search.Searcher.search(Searcher.java:173) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884) at org.apache.solr.search.SolrIndexSearcher.access$000(SolrIndexSearcher.java:51) at org.apache.solr.search.SolrIndexSearcher$3.regenerateItem(SolrIndexSearcher.java:332) at org.apache.solr.search.LRUCache.warm(LRUCache.java:194) at org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:1481) at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1154) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) And like this: Error during auto-warming of key:org.apache.solr.search.queryresult...@33cf792:java.lang.OutOfMemoryError: Java heap space We've searched and one suggestion was to reduce the size of the various caches that do sorting in solrconfig.xml (http://osdir.com/ml/solr-user.lucene.apache.org/2009-05/msg01043.html). Does this solution generally work? Can anyone think of any other cause for this problem? didier
Re: OutOfMemoryError due to auto-warming
On Thu, Sep 24, 2009 at 5:40 PM, Francis Yakin fya...@liquid.com wrote: You also can increase the JVM HeapSize if you have enough physical memory, like for example if you have 4GB physical, gives the JVM heapsize 2GB or 2.5GB. Thanks, we can definitely do that (we have 4GB available). I also forgot to add that we're running a development version of solr (git clone from ~ 3 weeks ago). Thanks, didier Francis -Original Message- From: didier deshommes [mailto:dfdes...@gmail.com] Sent: Thursday, September 24, 2009 3:32 PM To: solr-user@lucene.apache.org Cc: Andrew Montalenti Subject: OutOfMemoryError due to auto-warming Hi there, We are running solr and allocating 1GB to it and we keep having OutOfMemoryErrors. We get messages like this: Error during auto-warming of key:org.apache.solr.search.queryresult...@c785194d:java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.lt;initgt;(String.java:216) at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122) at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:169) at org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:701) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:676) at org.apache.solr.search.MissingLastOrdComparator.setNextReader(MissingStringLastComparatorSource.java:181) at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:252) at org.apache.lucene.search.Searcher.search(Searcher.java:173) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884) at org.apache.solr.search.SolrIndexSearcher.access$000(SolrIndexSearcher.java:51) at org.apache.solr.search.SolrIndexSearcher$3.regenerateItem(SolrIndexSearcher.java:332) at org.apache.solr.search.LRUCache.warm(LRUCache.java:194) at org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:1481) at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1154) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) And like this: Error during auto-warming of key:org.apache.solr.search.queryresult...@33cf792:java.lang.OutOfMemoryError: Java heap space We've searched and one suggestion was to reduce the size of the various caches that do sorting in solrconfig.xml (http://osdir.com/ml/solr-user.lucene.apache.org/2009-05/msg01043.html). Does this solution generally work? Can anyone think of any other cause for this problem? didier