Solr Indexing MAX FILE LIMIT
Hello Guys Im using Apache solr 3.6.1 on tomcat 7 for indexing csv files using curl on windows machine ** My question is that what would be the max csv file size limit when doing a HTTP POST or while using the following curl command curl http://localhost:8080/solr/update/csv -F stream.file=D:\eighth.csv -F commit=true -F optimize=true -F encapsulate= -F keepEmpty=true ** My requirement is quite large because we have to index CSV files ranging between 8 to 10 GB ** What would be the optimum settings for index parameters like commit for better perfomance on a machine with 8gb RAM Please guide me on it Thanks in Advance -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr Indexing MAX FILE LIMIT
Hi - instead of trying to make the system ingest such large files perhaps you can split the files in many small pieces. -Original message- From:mitra mitra.re...@ornext.com Sent: Tue 13-Nov-2012 09:05 To: solr-user@lucene.apache.org Subject: Solr Indexing MAX FILE LIMIT Hello Guys Im using Apache solr 3.6.1 on tomcat 7 for indexing csv files using curl on windows machine ** My question is that what would be the max csv file size limit when doing a HTTP POST or while using the following curl command curl http://localhost:8080/solr/update/csv -F stream.file=D:\eighth.csv -F commit=true -F optimize=true -F encapsulate= -F keepEmpty=true ** My requirement is quite large because we have to index CSV files ranging between 8 to 10 GB ** What would be the optimum settings for index parameters like commit for better perfomance on a machine with 8gb RAM Please guide me on it Thanks in Advance -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr Indexing MAX FILE LIMIT
Thankyou *** I understand that the default size for HTTP POST in tomcat is 2mb can we change that somehow so that i dont need to split the 10gb csv into 2mb chunks curl http://localhost:8080/solr/update/csv -F stream.file=D:\eighth.csv -F commit=true -F optimize=true -F encapsulate= -F keepEmpty=true *** As I mentioned im using the above command to post rather than using this below format curl http://localhost:8080/solr/update/csv --data-binary @eighth.csv -H 'Content-type:text/plain; charset=utf-8' ***My question Is the Limit still applicable even when not using the above data binary format also -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952p4019965.html Sent from the Solr - User mailing list archive at Nabble.com.
Removing Shards from Zookeeper - no servers hosting shard
Hi all, We've just updated to SOLR 4.0 production and Zookeeper 3.3.6 from SOLR 4.0 development version circa November 2011. We keep 6 months of data online in our primary cluster, and archive off old stuff to a slower disk archive cluster. We used to remove SOLR cores with the following code, but everything has changed in Zookeeper now. Old code to remove cores from Zookeeper: curl http://127.0.0.1:8080/solr/admin/cores?action=UNLOADcore=${SHARD}http://127.0.0.1:8080/solr/admin/cores?action=UNLOADcore=$%7bSHARD%7d echo Removing indexes from all Zookeeper hosts for (( i=0; i${#ZK_HOSTS[*]}; i++ )) do $JAVA -cp .:/apps/zookeeper-3.3.5/zookeeper-3.3.5.jar:/apps/zookeeper-3.3.5/lib/jline-0.9.94.jar:/apps/zookeeper-3.3.5/lib/log4j-1.2.15.jar org.apache.zookeeper.ZooKeeperMain -server ${ZK_HOSTS[$i]} delete /collections/polecat/shards/solrenglish:8080_solr_$SHARD/$HOSTNAME:8080_solr_$SHARD $JAVA -cp .:/apps/zookeeper-3.3.5/zookeeper-3.3.5.jar:/apps/zookeeper-3.3.5/lib/jline-0.9.94.jar:/apps/zookeeper-3.3.5/lib/log4j-1.2.15.jar org.apache.zookeeper.ZooKeeperMain -server ${ZK_HOSTS[$i]} delete /collections/polecat/shards/solrenglish:8080_solr_$SHARD Done curl http://solrmaster01:8080/solr/admin/cores?action=RELOADcore=master Now that we have migrated, I have tried removing cores from Zookeeper by removing the stuff for the unloaded core in leaders and leader_elect, but for some reason SOLR keeps sending the requests to the shard, and I end up with the no servers hosting shard error. Does anyone know how to remove a SOLR core from a SOLR server and have Zookeeper updated, and have distributed queries still work? The only thing I know how to do now is stop tomcat, stop zookeeper, clear out the data directory and then restart both. This isn't really ideal for a process I'd like to have running each night, and surely it is something others have it. I've tried google searching, and what I find is references to the bug where solr notifies zookeeper on core unloads which is marked as fixed, and people talking about how it doesn't work but if your run reloads on each core, it will work. (also doesn't work when I do it) Regards, Gilles Comeau
Re: Nested Join Queries
Gerald, I wonder if you tried to approach BlockJoin for your problem? Can you afford less frequent updates? On Wed, Nov 7, 2012 at 5:40 PM, Gerald Blanck gerald.bla...@barometerit.com wrote: Thank you Erick for your reply. I understand that search is not an RDBMS. Yes, we do have a huge combinatorial explosion if we de-normalize and duplicate data. In fact, I believe our use case is exactly what the Solr developers were trying to solve with the addition of the Join query. And while the example I gave illustrates the problem we are solving with the Join functionality, it is simplistic in nature compared to what we have in actuality. Am still looking for an answer here if someone can shed some light. Thanks. On Sat, Nov 3, 2012 at 9:38 PM, Erick Erickson erickerick...@gmail.com wrote: I'm going to go a bit sideways on you, partly because I can't answer the question G... But, every time I see someone doing what looks like substituting core for table and then trying to use Solr like a DB, I get on my soap-box and preach.. In this case, consider de-normalizing your DB so you can ask the query in terms of search rather than joins. e.g. Make each document a combination of the author and the book, with an additional field author_has_written_a_bestseller. Now your query becomes a really simple search, author:name AND author_has_written_a_bestseller:true. True, this kind of approach isn't as flexible as an RDBMS, but it's a _search_ rather than a query. Yes, it replicates data, but unless you have a huge combinatorial explosion, that's not a problem. And the join functionality isn't called pseudo for nothing. It was written for a specific use-case. It is often expensive, especially when the field being joined has many unique values. FWIW, Erick On Fri, Nov 2, 2012 at 11:32 AM, Gerald Blanck gerald.bla...@barometerit.com wrote: At a high level, I have a need to be able to execute a query that joins across cores, and that query during its joining may join back to the originating core. Example: Find all Books written by an Author who has written a best selling Book. In Solr query syntax A) against the book core - bestseller:true B) against the author core - {!join fromIndex=book from=id to=bookid}bestseller:true C) against the book core - {!join fromIndex=author from=id to=authorid}{!join fromIndex=book from=id to=bookid}bestseller:true A - returns results B - returns results C - does not return results Given that A and C use the same core, I started looking for join code that compares the originating core to the fromIndex and found this in JoinQParserPlugin (line #159). if (info.getReq().getCore() == fromCore) { // if this is the same core, use the searcher passed in... otherwise we could be warming and // get an older searcher from the core. fromSearcher = searcher; } else { // This could block if there is a static warming query with a join in it, and if useColdSearcher is true. // Deadlock could result if two cores both had useColdSearcher and had joins that used eachother. // This would be very predictable though (should happen every time if misconfigured) fromRef = fromCore.getSearcher(false, true, null); // be careful not to do anything with this searcher that requires the thread local // SolrRequestInfo in a manner that requires the core in the request to match fromSearcher = fromRef.get(); } I found that if I were to modify the above code so that it always follows the logic in the else block, I get the results I expect. Can someone explain to me why the code is written as it is? And if we were to run with only the else block being executed, what type of adverse impacts we might have? Does anyone have other ideas on how to solve this issue? Thanks in advance. -Gerald -- *Gerald Blanck* baro*m*eter*IT* 1331 Tyler Street NE, Suite 100 Minneapolis, MN 55413 612.208.2802 gerald.bla...@barometerit.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: The question about ConcurrentUpdateSolrServer
L'ubov', Yes it does. There were only two long request with huge bodies contains roughly about 125K docs. You can also check Solr side LogUpdateProcessor log messages regarding number of requests and docs passed in each. On Wed, Nov 7, 2012 at 5:26 PM, Lyuba Romanchuk lyuba.romanc...@gmail.comwrote: Hi, If I run my application that uses solrj API (ConcurrentUpdateSolrServer with buffer 10 and thread count 2) I get the logs (see below) with only two rows like *Status for: uid is 200. * Does it mean that only two http requests were send? The application indexes 2,500,000 different documents, and this is the number of docs that I get in web ui. But I thought that I should see a lot of rows like this not only 2, something like ~250,000. *17:47:42,842 INFO org.apache.solr.client.solrj.impl.HttpClientUtil:102 - Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false* *17:47:43,122 INFO org.apache.solr.client.solrj.impl.HttpClientUtil:102 - Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false* *17:47:43,128 INFO org.apache.solr.client.solrj.impl.HttpClientUtil:102 - Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false* *17:47:43,539 INFO org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer:121 - starting runner: org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner@3bc8c52e * *17:47:43,539 INFO org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer:121 - starting runner: org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner@7a096dab * *17:50:46,257 INFO org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer:200 - Status for: 5e41920f-b49b-4062-8f01-06e3d36926c9 is 200* *17:50:46,257 INFO org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer:200 - Status for: 185b1dfd-d0b7-4f75-bfc5-1e38e89a05f2 is 200* *17:50:46,258 INFO org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer:240 - finished: org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner@3bc8c52e * *17:50:46,258 INFO org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer:240 - finished: org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner@7a096dab * Best regards, Lyuba -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: How to speed up Facet count (Big index) ??!!!!
Thanks Yonik. Should I consider sharding in this case ( actually I have one big index with replication) ? Or create 2 index (one for search and other for facet on a different machine) ? Thanks folks With love from Paris (it's raining today :( Le mardi 13 novembre 2012, Yonik Seeley a écrit : On Mon, Nov 12, 2012 at 8:39 PM, Aeroox Aeroox aero...@gmail.comjavascript:; wrote: Hi folks, I have a solr index with up to 50M documents. A document contain 62 fields (docid, name, location). The facet count took 1 to 2 minutes with this params : http://.../select/?q=solr; version=2.2start=0rows=0facet=truefacet.limit=6facet.mincount=1mm=3-1facet.field=schoolname_hlfacet.method=fc It should hopefully just take that long the first time? How much time does it take to facet on the same field subsequent times? And my cache policy : filterCache class=solr.FastLRUCache size=4096 initialSize=4096 autowarmCount=4096/ queryResultCache class=solr.LRUCache size=5000 initialSize=5000 autowarmCount=5000/ These are relatively big caches - consider reducing them if you can. Especially the filter cache, depending on what percent of the entries are bitsets. Worst case would be 50M / 8 * 4096 = 25GB of bitsets. * i'm using solr 1.4 (LUCENE_36) * 64GB Ram (with 60GB allocated to java/tomcat6) Reduce this if you can - it doesn't leave enough memory for the OS to cache the index files and can contribute to slowness (more disk IO). -Yonik http://lucidworks.com
Re: how to sort the solr suggester's result
Could you just sort the suggestions at the app level? That is, read them all into a list and sort before presenting them to the user? Best Erick On Sun, Nov 11, 2012 at 10:52 PM, 徐郑 eyun...@gmail.com wrote: following is my config , it suggests words well . i want to get a sorted result when it suggest, so i added a transformer , it will add a tab(\t) separated float weight string to the end of the Suggestion field , but the suggestion result still does't sorted correctly. my suggest result( note the float number at the end is the weight) lst name=spellcheck lst name=suggestions lst name=我 int name=numFound10/int int name=startOffset1/int int name=endOffset2/int arr name=suggestion str我脑中的橡皮擦 2.12/str str我老婆是大佬3 2.07/str str我老婆是大佬2 2.12/str schema.xml field name=Suggestion type=string indexed=true stored=true/ solrconfig.xml searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=fieldSuggestion/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str !-- float name=threshold0.0001/float -- str name=spellcheckIndexDirspellchecker/str str name=comparatorClassfreq/str str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.count10/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler -- eyun The truth, whether or not Q:276770341 G+:eyun...@gmail.com
Re: How to speed up Facet count (Big index) ??!!!!
I'd say you are at a point where sharding may well help. But, as others have suggested, you have other issues to consider first - less memory for Solr and upgrade to a more modern Solr. Also, if as Yonik asks only the first query is slow, you can set up a NewSearcher query in your solrconfig.xml to run this first query on every commit, meaning your users will always get faster queries. Upayavira On Tue, Nov 13, 2012, at 11:16 AM, Aeroox Aeroox wrote: Thanks Yonik. Should I consider sharding in this case ( actually I have one big index with replication) ? Or create 2 index (one for search and other for facet on a different machine) ? Thanks folks With love from Paris (it's raining today :( Le mardi 13 novembre 2012, Yonik Seeley a écrit : On Mon, Nov 12, 2012 at 8:39 PM, Aeroox Aeroox aero...@gmail.comjavascript:; wrote: Hi folks, I have a solr index with up to 50M documents. A document contain 62 fields (docid, name, location). The facet count took 1 to 2 minutes with this params : http://.../select/?q=solr; version=2.2start=0rows=0facet=truefacet.limit=6facet.mincount=1mm=3-1facet.field=schoolname_hlfacet.method=fc It should hopefully just take that long the first time? How much time does it take to facet on the same field subsequent times? And my cache policy : filterCache class=solr.FastLRUCache size=4096 initialSize=4096 autowarmCount=4096/ queryResultCache class=solr.LRUCache size=5000 initialSize=5000 autowarmCount=5000/ These are relatively big caches - consider reducing them if you can. Especially the filter cache, depending on what percent of the entries are bitsets. Worst case would be 50M / 8 * 4096 = 25GB of bitsets. * i'm using solr 1.4 (LUCENE_36) * 64GB Ram (with 60GB allocated to java/tomcat6) Reduce this if you can - it doesn't leave enough memory for the OS to cache the index files and can contribute to slowness (more disk IO). -Yonik http://lucidworks.com
Re: Unable to run two multicore Solr instances under Tomcat
At a guess you have leftover jars from your earlier installation in your classpath that are being picked up. I've always found that figuring out how _that_ happened is...er... interesting... Best Erick On Mon, Nov 12, 2012 at 7:44 AM, Adam Neal an...@mass.co.uk wrote: Hi, I have been running two multicore Solr instances under Tomcat using a nightly build of 4.0 from September 2011. This has been running fine but when I try to update these instances to the release version of 4.0 I'm hitting problems when the second instance starts up. If I have one instance on the release version and one on the nightly build it also works fine. It's running on a Solaris 10 box using Tomcat 6.0.26 and Java 1.6.0_20 I can run up either instance on it's own and it works fine, it's just when starting both together so I'm pretty sure my configs aren't the issue. Snippet from the log is below, please note that I have had to type this out so there may be some typos, hopefully not! Any ideas? Adam 12-Nov-2012 09:58:50 org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: Using JNDI solr.home: /conf_solr/instance2 12-Nov-2012 09:58:50 org.apache.solr.core.SolrResourceLoader init INFO: new SolrResourceLoader for deduced Solr Home: '/conf_solr/instance2/' 12-Nov-2012 09:58:52 org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() 12-Nov-2012 09:58:52 org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: Using JNDI solr.home /conf_solr/instance2 12-Nov-2012 09:58:52 org.apache.solr.core.CoreContainer$Initializer initialize INFO: looking for solr.xml: /conf_solr/instance2/solr.xml 12-Nov-2012 09:58:52 org.apache.solr.core.CoreContainer init INFO: New CoreContainer 15471347 12-Nov-2012 09:58:52 org.apache.solr.core.CoreContainer load INFO: Loading CoreContainer using Solr Home: '/conf_solr/instance2/' 12-Nov-2012 09:58:52 org.apache.solr.core.SolrResourceLoader init INFO: new SOlrResourceLoader for directory: '/conf_solr/instance2/' 12-Nov-2012 09:58:52 org.apache.solr.servlet.SolrDispatchFilter init SEVERE: Could not start Solr. Check solr/home property and the logs 12-Nov-2012 09:58:52 org.apache.solr.common.SolrException log SEVERE: null:java.lang.ClassCastException: org.apache.xerces.parsers.XIncludeAwareParserConfiguration cannot be cast to org.apache.xerces.xni.parser.XMLParserConfiguration at org.apache.xerces.parsers.DOMParser.init(Unknown Source) at org.apache.xerces.parsers.DOMParser.init(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.init(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderFactoryImpl.newDocumentBuilder(Unknown Source) at com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.createDocument(SAX2DOM.java:324) at com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.init(SAX2DOM.java:84) at com.sun.org.apache.xalan.internal.xsltc.runtime.output.TranslateOutputHandlerFactory.getSerializationHanlder(TransletOutputHandlerFactory.java:187) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getOutputHandler(TransformerImpl.java:392) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:298) at org.apache.solr.core.CoreContainer.copyDoc(CoreContainer.java:551) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:381) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3838) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4488) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:546) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:637) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:563) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:498) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
Re: Admin Permissions
Slap them firmly on the wrist if they do? The Solr admin is really designed with trusted users in mind. There are no provisions that I know of for securing some of the functions. Your developers have access to the Solr server through the browser, right? They can do all of that via URL, see: http://wiki.apache.org/solr/CoreAdmin, they don't need to use the admin server at all. So unless you're willing to put a lot of effort into it, I don't think you really can lock it down. If you really don't trust them to not do bad things, set up a dev environment and lock them out of your production servers totally? Best Erick On Mon, Nov 12, 2012 at 12:41 PM, Michael Long ml...@bizjournals.comwrote: I really like the new admin in solr 4.0, but specifically I don't want developers to be able to unload, rename, swap, reload, optimize, or add core. Any ideas on how I could still give access to the rest of the admin without giving access to these? It is very helpful for them to have access to the Query, Analysis, etc.
RE: Unable to run two multicore Solr instances under Tomcat
Hi Erick, Thanks for the info, I figured out that it was a jar problem earlier today but I don't think it is an old jar. Both of the instances I ran included the extraction libraries and it appears that the problem is due to the xercesImpl-2.9.1.jar. If I remove the extraction tool jars from one of the instances, or even just the specific jar, then everything works as normal. Fortunately I only need the extraction tools in one of my instances so this work around is good for now. I can't see any old jars that would interfere, I will try and test this at some point on a clean install of 4.0 and see if the same problem occurs. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tue 13/11/2012 12:05 To: solr-user@lucene.apache.org Subject: Re: Unable to run two multicore Solr instances under Tomcat At a guess you have leftover jars from your earlier installation in your classpath that are being picked up. I've always found that figuring out how _that_ happened is...er... interesting... Best Erick On Mon, Nov 12, 2012 at 7:44 AM, Adam Neal an...@mass.co.uk wrote: Hi, I have been running two multicore Solr instances under Tomcat using a nightly build of 4.0 from September 2011. This has been running fine but when I try to update these instances to the release version of 4.0 I'm hitting problems when the second instance starts up. If I have one instance on the release version and one on the nightly build it also works fine. It's running on a Solaris 10 box using Tomcat 6.0.26 and Java 1.6.0_20 I can run up either instance on it's own and it works fine, it's just when starting both together so I'm pretty sure my configs aren't the issue. Snippet from the log is below, please note that I have had to type this out so there may be some typos, hopefully not! Any ideas? Adam 12-Nov-2012 09:58:50 org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: Using JNDI solr.home: /conf_solr/instance2 12-Nov-2012 09:58:50 org.apache.solr.core.SolrResourceLoader init INFO: new SolrResourceLoader for deduced Solr Home: '/conf_solr/instance2/' 12-Nov-2012 09:58:52 org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() 12-Nov-2012 09:58:52 org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: Using JNDI solr.home /conf_solr/instance2 12-Nov-2012 09:58:52 org.apache.solr.core.CoreContainer$Initializer initialize INFO: looking for solr.xml: /conf_solr/instance2/solr.xml 12-Nov-2012 09:58:52 org.apache.solr.core.CoreContainer init INFO: New CoreContainer 15471347 12-Nov-2012 09:58:52 org.apache.solr.core.CoreContainer load INFO: Loading CoreContainer using Solr Home: '/conf_solr/instance2/' 12-Nov-2012 09:58:52 org.apache.solr.core.SolrResourceLoader init INFO: new SOlrResourceLoader for directory: '/conf_solr/instance2/' 12-Nov-2012 09:58:52 org.apache.solr.servlet.SolrDispatchFilter init SEVERE: Could not start Solr. Check solr/home property and the logs 12-Nov-2012 09:58:52 org.apache.solr.common.SolrException log SEVERE: null:java.lang.ClassCastException: org.apache.xerces.parsers.XIncludeAwareParserConfiguration cannot be cast to org.apache.xerces.xni.parser.XMLParserConfiguration at org.apache.xerces.parsers.DOMParser.init(Unknown Source) at org.apache.xerces.parsers.DOMParser.init(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.init(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderFactoryImpl.newDocumentBuilder(Unknown Source) at com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.createDocument(SAX2DOM.java:324) at com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.init(SAX2DOM.java:84) at com.sun.org.apache.xalan.internal.xsltc.runtime.output.TranslateOutputHandlerFactory.getSerializationHanlder(TransletOutputHandlerFactory.java:187) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getOutputHandler(TransformerImpl.java:392) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:298) at org.apache.solr.core.CoreContainer.copyDoc(CoreContainer.java:551) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:381) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3838) at
Re: java.io.IOException: Map failed :: OutOfMemory
Have you tried the really simple solution of giving your JVM more memory (-Xmx option)? Best Erick On Tue, Nov 13, 2012 at 2:38 AM, uwe72 uwe.clem...@exxcellent.de wrote: Version is 3.6.1 of solr -- View this message in context: http://lucene.472066.n3.nabble.com/java-io-IOException-Map-failed-OutOfMemory-tp4019802p4019950.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Indexing MAX FILE LIMIT
Have you considered writing a small SolrJ (or other client) program that processed the rows in your huge file and sent them to solr in sensible chunks? That would give you much finer control over how the file was processed, how many docs were sent to Solr at a time, what to do with errors. You could even run N simultaneous programs to increase throughput... FWIW, Erick On Tue, Nov 13, 2012 at 3:42 AM, mitra mitra.re...@ornext.com wrote: Thankyou *** I understand that the default size for HTTP POST in tomcat is 2mb can we change that somehow so that i dont need to split the 10gb csv into 2mb chunks curl http://localhost:8080/solr/update/csv -F stream.file=D:\eighth.csv -F commit=true -F optimize=true -F encapsulate= -F keepEmpty=true *** As I mentioned im using the above command to post rather than using this below format curl http://localhost:8080/solr/update/csv --data-binary @eighth.csv -H 'Content-type:text/plain; charset=utf-8' ***My question Is the Limit still applicable even when not using the above data binary format also -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952p4019965.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: java.io.IOException: Map failed :: OutOfMemory
Thanks Eric. We are using: export JAVA_OPTS=-XX:MaxPermSize=400m -Xmx2000m -Xms200M -Dsolr.solr.home=/home/connect/ConnectPORTAL/preview/solr-home We have arround 5 Millions documents. The index size is arround 50GB. Before we add a document we delete the same id in the cache, doesn't matter if the doc exists or not. We use here the functionality in solrj to delete a list of ids. Always in this deletion the error occurs. -- View this message in context: http://lucene.472066.n3.nabble.com/java-io-IOException-Map-failed-OutOfMemory-tp4019802p4020027.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is leading wildcard search turned on by default in Solr 3.6.1?
Just a quick comment from our experience: since we have quite a lot of data indexed in our Solr, we take some extra measures to ensure, no bogus wild-card queries are accepted by the system (for instance *, **, *** etc). And that is done in the QueryParser. Wanted to mention this approach as one way of handling simple query security checks. -- Dmitry On Tue, Nov 13, 2012 at 6:22 AM, Jack Krupansky j...@basetechnology.comwrote: Be sure to realize that even with reverse wildcard support, the user can add a trailing wildcard as well (double-ended wildcard) and then you are back in the same boat. The overall idea is that: 1) Hardware is much faster than just 3 or 4 years ago, and 2) even though document counts are getting much larger, the number of unique terms (which is all that matters for wildcard performance) does not tend to grow as fast as document count grows. And, some fields have a much more limited vocabulary (unique terms), so a leading wildcard is not necessarily a big performance hit. Technology advances. We should permit our mindsets to advance as well. -- Jack Krupansky -Original Message- From: François Schiettecatte Sent: Monday, November 12, 2012 2:38 PM To: solr-user@lucene.apache.org Subject: Re: Is leading wildcard search turned on by default in Solr 3.6.1? John You can still use leading wildcards even if you dont have the ReversedWildcardFilterFactory in your analysis but it means you will be scanning the entire dictionary when the search is run which can be a performance issue. If you do use ReversedWildcardFilterFactory you wont have that performance issue but you will increase the overall size of your index. Its a tradeoff. When I looked into it for a site I built I decided that the tradeoff was not worth it (after benchmarking) given how few leading wildcards searches it was getting. Best regards François On Nov 12, 2012, at 5:33 PM, johnmu...@aol.com wrote: Hi, I'm migrating from Solr 1.2 to 3.6.1. I used the same analyzer as I was, and re-indexed my data. I did not add solr.**ReversedWildcardFilterFactory to my index analyzer, but yet leading wild cards are working!! Does this mean it's turned on by default? If so, how do I turn it off, and what are the implication of leaving ON? Won't my searches be slower and consume more memory? Thanks, --MJ -- Regards, Dmitry Kan
Re: Removing Shards from Zookeeper - no servers hosting shard
Odd...the unload command should be enough... On Tue, Nov 13, 2012 at 5:26 AM, Gilles Comeau gilles.com...@polecat.co wrote: Hi all, We've just updated to SOLR 4.0 production and Zookeeper 3.3.6 from SOLR 4.0 development version circa November 2011. We keep 6 months of data online in our primary cluster, and archive off old stuff to a slower disk archive cluster. We used to remove SOLR cores with the following code, but everything has changed in Zookeeper now. Old code to remove cores from Zookeeper: curl http://127.0.0.1:8080/solr/admin/cores?action=UNLOADcore=${SHARD}http://127.0.0.1:8080/solr/admin/cores?action=UNLOADcore=$%7bSHARD%7d echo Removing indexes from all Zookeeper hosts for (( i=0; i${#ZK_HOSTS[*]}; i++ )) do $JAVA -cp .:/apps/zookeeper-3.3.5/zookeeper-3.3.5.jar:/apps/zookeeper-3.3.5/lib/jline-0.9.94.jar:/apps/zookeeper-3.3.5/lib/log4j-1.2.15.jar org.apache.zookeeper.ZooKeeperMain -server ${ZK_HOSTS[$i]} delete /collections/polecat/shards/solrenglish:8080_solr_$SHARD/$HOSTNAME:8080_solr_$SHARD $JAVA -cp .:/apps/zookeeper-3.3.5/zookeeper-3.3.5.jar:/apps/zookeeper-3.3.5/lib/jline-0.9.94.jar:/apps/zookeeper-3.3.5/lib/log4j-1.2.15.jar org.apache.zookeeper.ZooKeeperMain -server ${ZK_HOSTS[$i]} delete /collections/polecat/shards/solrenglish:8080_solr_$SHARD Done curl http://solrmaster01:8080/solr/admin/cores?action=RELOADcore=master Now that we have migrated, I have tried removing cores from Zookeeper by removing the stuff for the unloaded core in leaders and leader_elect, but for some reason SOLR keeps sending the requests to the shard, and I end up with the no servers hosting shard error. Does anyone know how to remove a SOLR core from a SOLR server and have Zookeeper updated, and have distributed queries still work? The only thing I know how to do now is stop tomcat, stop zookeeper, clear out the data directory and then restart both. This isn't really ideal for a process I'd like to have running each night, and surely it is something others have it. I've tried google searching, and what I find is references to the bug where solr notifies zookeeper on core unloads which is marked as fixed, and people talking about how it doesn't work but if your run reloads on each core, it will work. (also doesn't work when I do it) Regards, Gilles Comeau -- - Mark
Re: Role/purpose of Overseer?
The Overseer isn't mentioned much because it's an implementation detail that the user doesn't have to really consider. The Overseer first came about to handle writing the clusterstate.json file, as a suggestion by Ted Dunning. Originally, each node would try and update the custerstate.json file themselves - and use optimistic locking and retries. We decided that a cleaner method was to have an overseer and let new nodes register themselves and their latest state as part of a list - the Overseer then watches this list, and when things change, publishes a new clusterstate.json - no optimistic locking and retries needed. All the other nodes watch clusterstate.json and are notified to re-read it when it changes. Since, the Overseer has picked up a few other duties when it makes sense. For example, it handles the shard assignments if a user does not specify them. It also does the work for the collections api - eventually this will be beneficial in that it will use a distributed work queue and be able to resume operations that fail before completing. I think over time, there are lots of useful applications for the Overseer. He is elected in the same manner as a leader for a shard - if the Overseer goes down, someone simply takes his place. I don't think the Overseer is going away any time soon. - Mark On Mon, Nov 12, 2012 at 9:48 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, I was looking at http://wiki.apache.org/solr/SolrCloud and noticed that while overseer is mentioned a handful of times, there is nothing there that explains what exactly Overseer does. This 8-word Javadoc is the best I could find: http://search-lucene.com/jd/solr/solr-core/org/apache/solr/cloud/Overseer.html The first diagram on http://wiki.apache.org/solr/SolrCloud shows 1 Overseer. Does that make it a SPOF? If not, what happens when it goes down? Also, is Overseer here to stay? The other day, I saw an issue in JIRA questioning its use or something along those line. Thanks, Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html -- - Mark
Re: Nested Join Queries
Please find reference materials http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html http://blog.griddynamics.com/2012/08/block-join-query-performs.html On Tue, Nov 13, 2012 at 3:25 PM, Gerald Blanck gerald.bla...@barometerit.com wrote: Thank you. I've not heard of BlockJoin. I will look into it today. Thanks. On Tue, Nov 13, 2012 at 5:05 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Replied. pls check maillist. On Tue, Nov 13, 2012 at 11:44 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Gerald, I wonder if you tried to approach BlockJoin for your problem? Can you afford less frequent updates? On Wed, Nov 7, 2012 at 5:40 PM, Gerald Blanck gerald.bla...@barometerit.com wrote: Thank you Erick for your reply. I understand that search is not an RDBMS. Yes, we do have a huge combinatorial explosion if we de-normalize and duplicate data. In fact, I believe our use case is exactly what the Solr developers were trying to solve with the addition of the Join query. And while the example I gave illustrates the problem we are solving with the Join functionality, it is simplistic in nature compared to what we have in actuality. Am still looking for an answer here if someone can shed some light. Thanks. On Sat, Nov 3, 2012 at 9:38 PM, Erick Erickson erickerick...@gmail.com wrote: I'm going to go a bit sideways on you, partly because I can't answer the question G... But, every time I see someone doing what looks like substituting core for table and then trying to use Solr like a DB, I get on my soap-box and preach.. In this case, consider de-normalizing your DB so you can ask the query in terms of search rather than joins. e.g. Make each document a combination of the author and the book, with an additional field author_has_written_a_bestseller. Now your query becomes a really simple search, author:name AND author_has_written_a_bestseller:true. True, this kind of approach isn't as flexible as an RDBMS, but it's a _search_ rather than a query. Yes, it replicates data, but unless you have a huge combinatorial explosion, that's not a problem. And the join functionality isn't called pseudo for nothing. It was written for a specific use-case. It is often expensive, especially when the field being joined has many unique values. FWIW, Erick On Fri, Nov 2, 2012 at 11:32 AM, Gerald Blanck gerald.bla...@barometerit.com wrote: At a high level, I have a need to be able to execute a query that joins across cores, and that query during its joining may join back to the originating core. Example: Find all Books written by an Author who has written a best selling Book. In Solr query syntax A) against the book core - bestseller:true B) against the author core - {!join fromIndex=book from=id to=bookid}bestseller:true C) against the book core - {!join fromIndex=author from=id to=authorid}{!join fromIndex=book from=id to=bookid}bestseller:true A - returns results B - returns results C - does not return results Given that A and C use the same core, I started looking for join code that compares the originating core to the fromIndex and found this in JoinQParserPlugin (line #159). if (info.getReq().getCore() == fromCore) { // if this is the same core, use the searcher passed in... otherwise we could be warming and // get an older searcher from the core. fromSearcher = searcher; } else { // This could block if there is a static warming query with a join in it, and if useColdSearcher is true. // Deadlock could result if two cores both had useColdSearcher and had joins that used eachother. // This would be very predictable though (should happen every time if misconfigured) fromRef = fromCore.getSearcher(false, true, null); // be careful not to do anything with this searcher that requires the thread local // SolrRequestInfo in a manner that requires the core in the request to match fromSearcher = fromRef.get(); } I found that if I were to modify the above code so that it always follows the logic in the else block, I get the results I expect. Can someone explain to me why the code is written as it is? And if we were to run with only the else block being executed, what type of adverse impacts we might have? Does anyone have other ideas on how to solve this issue? Thanks in advance. -Gerald -- *Gerald Blanck* baro*m*eter*IT* 1331 Tyler Street NE, Suite 100 Minneapolis, MN 55413 612.208.2802 gerald.bla...@barometerit.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics
Re: java.io.IOException: Map failed :: OutOfMemory
Kernel: 2.6.32.29-0.3-default #1 SMP 2011-02-25 13:36:59 +0100 x86_64 x86_64 x86_64 GNU/Linux SUSE Linux Enterprise Server 11 SP1 (x86_64) physical Memory: 4 GB portadm@smtcax0033:/srv/connect/tomcat/instances/SYSTEST_Portal_01/bin java -version java version 1.6.0_33 Java(TM) SE Runtime Environment (build 1.6.0_33-b03) Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03, mixed mode) -- View this message in context: http://lucene.472066.n3.nabble.com/java-io-IOException-Map-failed-OutOfMemory-tp4019802p4020078.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr4.0 problem zkHost with multiple hosts throws out of range exception
On Tue, Nov 13, 2012 at 12:22 AM, deniz denizdurmu...@gmail.com wrote: so do we need to add one of the servers from the -DzkHost string to -DzkRun? should it look like -DzkRun=host1:port -DzkHost=host:port, host1:port, host2:port in the start up command? Yeah, something to that affect. and will wiki page be updated? because the example there is still letting into the error that was mentioned here nearly a month ago... Yeah, it would be nice if the wiki pointed this out. It shouldn't be necessarily required for the example because it should work with localhost with just zkRun - but that does set you up for failure when you move to multiple machines - so the wiki should point it out. -- - Mark
Re: solr4.0 problem zkHost with multiple hosts throws out of range exception
On Tue, Nov 13, 2012 at 12:22 AM, deniz denizdurmu...@gmail.com wrote: so do we need to add one of the servers from the -DzkHost string to -DzkRun? By the way - not just any of the servers has to be added to zkRun - but the address for the current server - that is, the server you are running the command on. This is so we know which of the zk address belongs to the localhost. It lets us handle some zookeeper setup for you (specifying a myid for each node).
RE: Removing Shards from Zookeeper - no servers hosting shard
When I do the unload through the UI, I see the below messages in the solr log. Nothing in the zookeeper log. Then right after I try: http://217.147.83.124:9090/solr/experiment_master/select?q=*%3A*wt=xmldistrib=true and get str name=msgno servers hosting shard:/str. Also, I still see the shard being referenced in the cloud tab in the UI. [cid:image001.png@01CDC1BB.FD2BE590] Does this work for anyone else using SOLR 4.0 production with external zookeeper and distributed queries and if so, can you let me know exactly what versions and steps you take to not get this error? ☺ Anyone else have any problems getting this to work? My setup is pretty basic: Local external zookeeper 3.3.6, solr 4.0 with three cores seen above. Regards, Gilles INFO: [02_10_2012_experiment] CLOSING SolrCore org.apache.solr.core.SolrCore@11e3c2c6 13-Nov-2012 16:19:13 org.apache.solr.core.SolrCore closeSearcher INFO: [02_10_2012_experiment] Closing main searcher on request. 13-Nov-2012 16:19:13 org.apache.solr.search.SolrIndexSearcher close FINE: Closing Searcher@7cd47880 main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=7,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=1,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} queryResultCache{lookups=4,hits=3,hitratio=0.75,inserts=2,evictions=0,size=2,warmupTime=0,cumulative_lookups=4,cumulative_hits=3,cumulative_hitratio=0.75,cumulative_inserts=1,cumulative_evictions=0} documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 13-Nov-2012 16:19:13 org.apache.solr.core.CachingDirectoryFactory close FINE: Closing: CachedDirorg.apache.lucene.store.MMapDirectory@/solr2/cores/02_10_2012/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@717757ad;refCount=1;path=/solr2/cores/02_10_2012/data/index;done=false 13-Nov-2012 16:19:13 org.apache.solr.update.DirectUpdateHandler2 close INFO: closing DirectUpdateHandler2{commits=0,autocommits=0,soft autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0} 13-Nov-2012 16:19:13 org.apache.solr.update.DefaultSolrCoreState decref INFO: SolrCoreState ref count has reached 0 - closing IndexWriter 13-Nov-2012 16:19:13 org.apache.solr.update.DefaultSolrCoreState decref INFO: Closing SolrCoreState - canceling any ongoing recovery 13-Nov-2012 16:19:13 org.apache.solr.core.CoreContainer persistFile INFO: Persisting cores config to /solr2/solr.xml 13-Nov-2012 16:19:13 org.apache.solr.core.Config getVal FINE: null solr/cores/@adminPath=/admin/cores 13-Nov-2012 16:19:13 org.apache.solr.core.Config getNode FINE: null missing optional solr/cores/@shareSchema 13-Nov-2012 16:19:13 org.apache.solr.core.Config getVal FINE: null solr/cores/@hostPort=9090 13-Nov-2012 16:19:13 org.apache.solr.core.Config getVal FINE: null solr/cores/@zkClientTimeout=1 13-Nov-2012 16:19:13 org.apache.solr.core.Config getVal FINE: null solr/cores/@hostContext=solr 13-Nov-2012 16:19:13 org.apache.solr.core.Config getNode FINE: null missing optional solr/cores/@leaderVoteWait 13-Nov-2012 16:19:13 org.apache.solr.core.SolrXMLSerializer persistFile INFO: Persisting cores config to /solr2/solr.xml 13-Nov-2012 16:19:13 org.apache.solr.common.cloud.ZkStateReader updateClusterState INFO: Updating cloud state from ZooKeeper... 13-Nov-2012 16:19:13 org.apache.solr.common.cloud.ZkStateReader$2 process INFO: A cluster state change has occurred - updating... -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: 13 November 2012 14:13 To: solr-user@lucene.apache.org Subject: Re: Removing Shards from Zookeeper - no servers hosting shard Odd...the unload command should be enough... On Tue, Nov 13, 2012 at 5:26 AM, Gilles Comeau gilles.com...@polecat.comailto:gilles.com...@polecat.co wrote: Hi all, We've just updated to SOLR 4.0 production and Zookeeper 3.3.6 from SOLR 4.0 development version circa November 2011. We keep 6 months of data online in our primary cluster, and archive off old stuff to a slower disk archive cluster. We used to remove SOLR cores with the following code, but everything has changed in Zookeeper now. Old code to remove cores from Zookeeper: curl
RE: Removing Shards from Zookeeper - no servers hosting shard
Sorry forgot.. pictures are no good.. From cluster.json, the same information, the core I unloaded shard sticks around: “solrexperiment:8080_solr_experiment_02_10_2012:{replicas:{” Do I need a special command to delete the shard or something? I’ve never seen a command that does that? Regards, Gilles experiment:{ solrexperiment:8080_solr_experiment_master:{replicas:{IS-17093:9090_solr_experiment_master:{ shard:solrexperiment:8080_solr_experiment_master, roles:null, state:active,core:experiment_master,collection:experiment,node_name:IS-17093:9090_solr,base_url:http://IS-17093:9090/solr,leader:true}}}, solrexperiment:8080_solr_experiment_01_10_2012:{replicas:{IS-17093:9090_solr_01_10_2012_experiment:{ shard:solrexperiment:8080_solr_experiment_01_10_2012,roles:null,state:active,core:01_10_2012_experiment, collection:experiment,node_name:IS-17093:9090_solr,base_url:http://IS-17093:9090/solr,leader:true}}}, solrexperiment:8080_solr_experiment_02_10_2012:{replicas:{ From: Gilles Comeau [mailto:gilles.com...@polecat.co] Sent: 13 November 2012 16:29 To: solr-user@lucene.apache.org; markrmil...@gmail.com Subject: RE: Removing Shards from Zookeeper - no servers hosting shard When I do the unload through the UI, I see the below messages in the solr log. Nothing in the zookeeper log. Then right after I try: http://217.147.83.124:9090/solr/experiment_master/select?q=*%3A*wt=xmldistrib=true and get str name=msgno servers hosting shard:/str. Also, I still see the shard being referenced in the cloud tab in the UI. [cid:image001.png@01CDC1BB.FD2BE590] Does this work for anyone else using SOLR 4.0 production with external zookeeper and distributed queries and if so, can you let me know exactly what versions and steps you take to not get this error? ☺ Anyone else have any problems getting this to work? My setup is pretty basic: Local external zookeeper 3.3.6, solr 4.0 with three cores seen above. Regards, Gilles INFO: [02_10_2012_experiment] CLOSING SolrCore org.apache.solr.core.SolrCore@11e3c2c6mailto:org.apache.solr.core.SolrCore@11e3c2c6 13-Nov-2012 16:19:13 org.apache.solr.core.SolrCore closeSearcher INFO: [02_10_2012_experiment] Closing main searcher on request. 13-Nov-2012 16:19:13 org.apache.solr.search.SolrIndexSearcher close FINE: Closing Searcher@7cd47880 main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=7,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=1,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} queryResultCache{lookups=4,hits=3,hitratio=0.75,inserts=2,evictions=0,size=2,warmupTime=0,cumulative_lookups=4,cumulative_hits=3,cumulative_hitratio=0.75,cumulative_inserts=1,cumulative_evictions=0} documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 13-Nov-2012 16:19:13 org.apache.solr.core.CachingDirectoryFactory close FINE: Closing: CachedDirorg.apache.lucene.store.MMapDirectory@/solr2/cores/02_10_2012/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@717757ad;refCount=1;path=/solr2/cores/02_10_2012/data/index;done=falsemailto:org.apache.lucene.store.MMapDirectory@/solr2/cores/02_10_2012/data/index%20lockFactory=org.apache.lucene.store.NativeFSLockFactory@717757ad;refCount=1;path=/solr2/cores/02_10_2012/data/index;done=false 13-Nov-2012 16:19:13 org.apache.solr.update.DirectUpdateHandler2 close INFO: closing DirectUpdateHandler2{commits=0,autocommits=0,soft autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0} 13-Nov-2012 16:19:13 org.apache.solr.update.DefaultSolrCoreState decref INFO: SolrCoreState ref count has reached 0 - closing IndexWriter 13-Nov-2012 16:19:13 org.apache.solr.update.DefaultSolrCoreState decref INFO: Closing SolrCoreState - canceling any ongoing recovery 13-Nov-2012 16:19:13 org.apache.solr.core.CoreContainer persistFile INFO: Persisting cores config to /solr2/solr.xml 13-Nov-2012 16:19:13 org.apache.solr.core.Config getVal FINE: null solr/cores/@adminPath=/admin/cores 13-Nov-2012 16:19:13 org.apache.solr.core.Config getNode FINE: null missing optional solr/cores/@shareSchema 13-Nov-2012 16:19:13 org.apache.solr.core.Config getVal FINE: null solr/cores/@hostPort=9090 13-Nov-2012 16:19:13 org.apache.solr.core.Config getVal FINE: null solr/cores/@zkClientTimeout=1 13-Nov-2012 16:19:13 org.apache.solr.core.Config getVal FINE: null
Re: Testing Solr Cloud with ZooKeeper
https://issues.apache.org/jira/browse/SOLR-3993 has been resolved. Just few question, is it in trunk, I mean in main distrib downloadable on main solr site. Because I have downloaded it and get still same behaviour while running first instance..or second shards. -- View this message in context: http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4020118.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: java.io.IOException: Map failed :: OutOfMemory
today the same exception: INFO: [] webapp=/solr path=/update params={waitSearcher=truecommit=truewt=javabinwaitFlush=trueversion=2} status=0 QTime=1009 Nov 13, 2012 2:02:27 PM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=/net/smtcax0033/connect/Portal/solr-home/data/index,segFN=segments_3gm,version=1352803609067,generation=4486,filenames=[_21c.fdt, _4mv.tis, _4mh.fnm, _1si.fdt, _4n0.fdx, _4mx.nrm, _1si.fdx, _2n0.nrm, _2n0.prx, _4mv.tii, _3ii.fnm, _4mz.tvd, _4mv.nrm, _2ie.frq, _1l9.fnm, _4my.fnm, _21c.fdx, _308.tvd, _4mz.tvf, _308.tvf, _sc.tis, _4mw.tii, _4n1.fnm, _4mv.fdt, _1o2.nrm, _1si.nrm, _4mw.fdt, _it.tvf, _4mv.fdx, _sc.tii, _4mw.tis, _4mw.fdx, _37y.tvx, _4mz.tvx, _4mh.nrm, _1si.prx, _1o2.prx, _it.tvx, _3ii.tis, _3yn.nrm, _43w.tii, _37y.tvd, _3yn.prx, _308.prx, _cv.nrm, _37y.tvf, _1b9.nrm, _3xp.frq, _43w.tis, _4mf.tvf, _4mf.tvd, _1b9.fdt, _4ag.fdt, _1b9.fdx, _4mz.frq, _4ag.fdx, _418.tvx, _4mf.tvx, _418.frq, _473.tis, _3ii.nrm, _4mx.fnm, _cv.frq, _3yn.tvd, _418.tvd, _3yn.tvf, _418.tvf, _2ie.tvf, _2ie.tvd, _sc.frq, _1b9.frq, _4ag.nrm, _37y.tii, _cv.prx, _4mx.tis, _4ag.prx, _2ie.tvx, _2n0.fdx, _4mx.tii, _4mh.prx, _4my.prx, _4mz.nrm, _4lc.prx, _2ie.nrm, _3yn.tis, _4n0.tii, _4mw.prx, _3yn.tvx, _it.fnm, _2n0.fdt, _4ag.frq, _21c.tvf, _21c.tvd, _21c.nrm, _43w.prx, _308.fdt, _4my.frq, _1si.tvx, _4n3.prx, _3yn.tii, _37y.tis, _4dj.fdt, _473.frq, _1l9.prx, _2ie.fnm, _4dj.fdx, _308.fdx, _473.tvx, _cv.fdx, _4mz.tii, _473.tii, _cv.fdt, _3xp.tii, _4lc.nrm, _2em.fnm, _it.tis, _418.fdx, _4n3.fdx, _3xp.tis, _418.fdt, _1ih.fdx, _it.tii, _4n3.fdt, _4ix.tis, _1ih.fdt, _4lc.fdt, _4ix.tii, _4mz.tis, _1b9.prx, _4n0.tis, _4lc.fdx, _473.tvd, _1ih.nrm, _2n0.frq, _473.tvf, _4mz.fdx, _sc.fdx, _it.nrm, _4mz.fdt, _4my.tvx, _4mx.tvf, _3ii.tii, _1b9.tvf, _4mx.tvd, _1b9.tvd, _418.prx, _3ii.tvx, _3xp.fnm, _4mv.tvx, _sc.fdt, _sc.prx, segments_3gm, _418.fnm, _2n0.tii, _4mf.tis, _sc.nrm, _4mf.tii, _4dj.nrm, _3ii.tvd, _1ih.frq, _3ii.tvf, _4n1.prx, _1o2.tii, _37y.frq, _2em.prx, _4n3.frq, _4ix.fdt, _473.fdt, _21c.prx, _1o2.tvx, _3xp.nrm, _473.fdx, _sc.fnm, _2n0.tis, _43w.fdt, _4mf.fnm, _4ix.fdx, _43w.fdx, _4dj.tis, _473.nrm, _4my.tvf, _4mx.tvx, _4mv.tvd, _1o2.tvd, _4my.tvd, _1o2.tvf, _4dj.tii, _4mv.frq, _1si.tvf, _4mv.tvf, _1si.tvd, _473.fnm, _4ix.frq, _cv.tvx, _4dj.tvd, _21c.tii, _473.prx, _4n1.tvx, _1ih.tvx, _1si.tis, _cv.tvf, _4ag.fnm, _1b9.tvx, _1ih.tvf, _1l9.fdx, _4lc.tii, _1ih.tvd, _4n1.fdx, _4lc.tis, _1l9.fdt, _21c.tis, _4dj.tvf, _1si.tii, _4n1.fdt, _4n0.fnm, _cv.tvd, _it.frq, _4mv.prx, _4mh.tis, _3xp.tvf, _4n0.tvf, _3xp.tvd, _4n0.tvd, _4mx.fdx, _4my.nrm, _4dj.frq, _4mx.fdt, _43w.frq, _1o2.frq, _4n0.tvx, _it.tvd, _1si.fnm, _4n3.tvx, _3xp.tvx, _4mz.prx, _4my.tis, _21c.tvx, _37y.prx, _1ih.tii, _4ix.prx, _4mh.fdt, _2n0.fnm, _4n3.tvf, _21c.fnm, _4mh.fdx, _2em.tvx, _1b9.tii, _308.frq, _4mx.prx, _37y.fdx, _3yn.fnm, _4n3.tvd, _4mh.tii, _4ag.tis, _4my.tii, _1b9.tis, _2ie.prx, _1ih.prx, _4ag.tii, _4n1.tvd, _1ih.fnm, _3ii.prx, _4ix.nrm, _4n1.tvf, _4n1.nrm, _2em.tvd, _4mv.fnm, _4mw.fnm, _37y.nrm, _it.fdx, _4mf.frq, _4n0.nrm, _3ii.frq, _it.fdt, _1o2.tis, _37y.fdt, _4dj.tvx, _4n3.fnm, _4lc.fnm, _4my.fdt, _4lc.frq, _2em.tvf, _4my.fdx, _37y.fnm, _4n0.prx, _1l9.tvd, _418.nrm, _2em.tis, _4mw.nrm, _3xp.prx, _2ie.tis, _3xp.fdx, _1l9.frq, _1l9.tvf, _4mf.nrm, _2em.tii, _4ix.fnm, _3xp.fdt, _4mh.tvd, _4mh.tvf, _2ie.tii, _1o2.fdt, _4mh.tvx, _4mf.fdt, _4n0.frq, _308.tii, _4mw.tvx, _4ag.tvx, _308.tis, _4n1.frq, _4mf.fdx, _sc.tvd, _sc.tvf, _3yn.fdt, _4mw.tvf, _4ag.tvf, _4mw.tvd, _3yn.fdx, _1o2.fdx, _43w.fnm, _1o2.fnm, _4ag.tvd, _1si.frq, _sc.tvx, _cv.tis, _4dj.fnm, _4mh.frq, _1ih.tis, _4lc.tvf, _2em.fdt, _4lc.tvd, _2em.frq, _4ix.tvd, _21c.frq, _3ii.fdt, _2em.fdx, _4ix.tvf, _4n1.tis, _cv.tii, _4mz.fnm, _308.tvx, _4dj.prx, _4lc.tvx, _43w.tvf, _308.fnm, _3yn.frq, _43w.tvd, _43w.nrm, _it.prx, _4mx.frq, _cv.fnm, _2n0.tvx, _1l9.tii, _4n0.fdt, _418.tis, _418.tii, _1l9.tis, _4n3.nrm, _1l9.nrm, _4mw.frq, _4mf.prx, _4ix.tvx, _1l9.tvx, _2ie.fdx, _1b9.fnm, _43w.tvx, _2n0.tvd, _4n3.tii, _2n0.tvf, _3ii.fdx, _4n1.tii, _2em.nrm, _4n3.tis, _308.nrm, _2ie.fdt] Nov 13, 2012 2:02:27 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1352803609067 Nov 13, 2012 2:02:27 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[SingleCoreWireImpl@3005994, SingleCoreWireImpl@3005997, SingleCoreWireImpl@3005996, SingleCoreWireImpl@3005999, SingleCoreWireImpl@3005998, SingleCoreWireImpl@3005985, SingleCoreWireImpl@3005984, SingleCoreWireImpl@3005987, ... (500 adds)]} 0 85 Nov 13, 2012 2:02:27 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/update params={wt=javabinversion=2} status=0 QTime=85 Nov 13, 2012 2:02:27 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false) Exception in thread Lucene Merge Thread #0 org.apache.lucene.index.MergePolicy$MergeException:
SolrCloudServer and SolrServerException No live SolrServers available
Hi, I'm using solr 4 (4.0.0.2012.03.17.15.05.35) with cloud architecture and I would use SolrCloudServer from solrJ, but I received a SolrServerException. org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:322) at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:257) This is my Junit test code CloudSolrServer server = new CloudSolrServer(myEndPointZkHost); MapString, String map = new HashMapString, String(); map.put(collection, myCollectionName); map.put(q, *:*); SolrParams q = new MapSolrParams(map); SolrRequest request = new QueryRequest(q ); NamedList responseList = server.request(request); and this is ZkStatus live nodes: [search01:8889_solr, search02:8889_solr, search01:_solr, search02:_solr] collections:{ myCollectionName={ shard1=shard1:{ search01:_solr_myCollectionName:{ shard:shard1, leader:true, state:active, core:myCollectionName, collection:myCollectionName, node_name:search01:_solr, base_url:http://search01:/solr}, search02:8889_solr_myCollectionName:{ shard:shard1, state:active, core:myCollectionName, collection:myCollectionName, node_name:search02:8889_solr, base_url:http://search02:8889/solr}, replicas:{}}, shard2=shard2:{ search01:8889_solr_myCollectionName:{ shard:shard2, leader:true, state:active, core:myCollectionName, collection:myCollectionName, node_name:search01:8889_solr, base_url:http://search01:8889/solr}, search02:_solr_myCollectionName:{ shard:shard2, state:active, core:myCollectionName, collection:myCollectionName, node_name:search02:_solr, base_url:http://search02:/solr}, replicas:{}} } I'm doing something wrong? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloudServer-and-SolrServerException-No-live-SolrServers-available-tp4020091.html Sent from the Solr - User mailing list archive at Nabble.com.
AW: java.io.IOException: Map failed :: OutOfMemory
I just saw that you are running on SUSE 11 - unlike RHEL for example, it does not have virtual memory set to unlimited by default. Please check is the virtual memory limit (ulimit -v, check this for the operating system user that runs Tomcat /Solr). Since 3.1, Solr maps the index files to virtual memory. So if the size of your index files are larger than the allowed virtual memory, it is likely that you get these kind of exceptions. Regards, André Von: uwe72 [uwe.clem...@exxcellent.de] Gesendet: Dienstag, 13. November 2012 17:58 An: solr-user@lucene.apache.org Betreff: Re: java.io.IOException: Map failed :: OutOfMemory today the same exception: INFO: [] webapp=/solr path=/update params={waitSearcher=truecommit=truewt=javabinwaitFlush=trueversion=2} status=0 QTime=1009 Nov 13, 2012 2:02:27 PM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=/net/smtcax0033/connect/Portal/solr-home/data/index,segFN=segments_3gm,version=1352803609067,generation=4486,filenames=[_21c.fdt, _4mv.tis, _4mh.fnm, _1si.fdt, _4n0.fdx, _4mx.nrm, _1si.fdx, _2n0.nrm, _2n0.prx, _4mv.tii, _3ii.fnm, _4mz.tvd, _4mv.nrm, _2ie.frq, _1l9.fnm, _4my.fnm, _21c.fdx, _308.tvd, _4mz.tvf, _308.tvf, _sc.tis, _4mw.tii, _4n1.fnm, _4mv.fdt, _1o2.nrm, _1si.nrm, _4mw.fdt, _it.tvf, _4mv.fdx, _sc.tii, _4mw.tis, _4mw.fdx, _37y.tvx, _4mz.tvx, _4mh.nrm, _1si.prx, _1o2.prx, _it.tvx, _3ii.tis, _3yn.nrm, _43w.tii, _37y.tvd, _3yn.prx, _308.prx, _cv.nrm, _37y.tvf, _1b9.nrm, _3xp.frq, _43w.tis, _4mf.tvf, _4mf.tvd, _1b9.fdt, _4ag.fdt, _1b9.fdx, _4mz.frq, _4ag.fdx, _418.tvx, _4mf.tvx, _418.frq, _473.tis, _3ii.nrm, _4mx.fnm, _cv.frq, _3yn.tvd, _418.tvd, _3yn.tvf, _418.tvf, _2ie.tvf, _2ie.tvd, _sc.frq, _1b9.frq, _4ag.nrm, _37y.tii, _cv.prx, _4mx.tis, _4ag.prx, _2ie.tvx, _2n0.fdx, _4mx.tii, _4mh.prx, _4my.prx, _4mz.nrm, _4lc.prx, _2ie.nrm, _3yn.tis, _4n0.tii, _4mw.prx, _3yn.tvx, _it.fnm, _2n0.fdt, _4ag.frq, _21c.tvf, _21c.tvd, _21c.nrm, _43w.prx, _308.fdt, _4my.frq, _1si.tvx, _4n3.prx, _3yn.tii, _37y.tis, _4dj.fdt, _473.frq, _1l9.prx, _2ie.fnm, _4dj.fdx, _308.fdx, _473.tvx, _cv.fdx, _4mz.tii, _473.tii, _cv.fdt, _3xp.tii, _4lc.nrm, _2em.fnm, _it.tis, _418.fdx, _4n3.fdx, _3xp.tis, _418.fdt, _1ih.fdx, _it.tii, _4n3.fdt, _4ix.tis, _1ih.fdt, _4lc.fdt, _4ix.tii, _4mz.tis, _1b9.prx, _4n0.tis, _4lc.fdx, _473.tvd, _1ih.nrm, _2n0.frq, _473.tvf, _4mz.fdx, _sc.fdx, _it.nrm, _4mz.fdt, _4my.tvx, _4mx.tvf, _3ii.tii, _1b9.tvf, _4mx.tvd, _1b9.tvd, _418.prx, _3ii.tvx, _3xp.fnm, _4mv.tvx, _sc.fdt, _sc.prx, segments_3gm, _418.fnm, _2n0.tii, _4mf.tis, _sc.nrm, _4mf.tii, _4dj.nrm, _3ii.tvd, _1ih.frq, _3ii.tvf, _4n1.prx, _1o2.tii, _37y.frq, _2em.prx, _4n3.frq, _4ix.fdt, _473.fdt, _21c.prx, _1o2.tvx, _3xp.nrm, _473.fdx, _sc.fnm, _2n0.tis, _43w.fdt, _4mf.fnm, _4ix.fdx, _43w.fdx, _4dj.tis, _473.nrm, _4my.tvf, _4mx.tvx, _4mv.tvd, _1o2.tvd, _4my.tvd, _1o2.tvf, _4dj.tii, _4mv.frq, _1si.tvf, _4mv.tvf, _1si.tvd, _473.fnm, _4ix.frq, _cv.tvx, _4dj.tvd, _21c.tii, _473.prx, _4n1.tvx, _1ih.tvx, _1si.tis, _cv.tvf, _4ag.fnm, _1b9.tvx, _1ih.tvf, _1l9.fdx, _4lc.tii, _1ih.tvd, _4n1.fdx, _4lc.tis, _1l9.fdt, _21c.tis, _4dj.tvf, _1si.tii, _4n1.fdt, _4n0.fnm, _cv.tvd, _it.frq, _4mv.prx, _4mh.tis, _3xp.tvf, _4n0.tvf, _3xp.tvd, _4n0.tvd, _4mx.fdx, _4my.nrm, _4dj.frq, _4mx.fdt, _43w.frq, _1o2.frq, _4n0.tvx, _it.tvd, _1si.fnm, _4n3.tvx, _3xp.tvx, _4mz.prx, _4my.tis, _21c.tvx, _37y.prx, _1ih.tii, _4ix.prx, _4mh.fdt, _2n0.fnm, _4n3.tvf, _21c.fnm, _4mh.fdx, _2em.tvx, _1b9.tii, _308.frq, _4mx.prx, _37y.fdx, _3yn.fnm, _4n3.tvd, _4mh.tii, _4ag.tis, _4my.tii, _1b9.tis, _2ie.prx, _1ih.prx, _4ag.tii, _4n1.tvd, _1ih.fnm, _3ii.prx, _4ix.nrm, _4n1.tvf, _4n1.nrm, _2em.tvd, _4mv.fnm, _4mw.fnm, _37y.nrm, _it.fdx, _4mf.frq, _4n0.nrm, _3ii.frq, _it.fdt, _1o2.tis, _37y.fdt, _4dj.tvx, _4n3.fnm, _4lc.fnm, _4my.fdt, _4lc.frq, _2em.tvf, _4my.fdx, _37y.fnm, _4n0.prx, _1l9.tvd, _418.nrm, _2em.tis, _4mw.nrm, _3xp.prx, _2ie.tis, _3xp.fdx, _1l9.frq, _1l9.tvf, _4mf.nrm, _2em.tii, _4ix.fnm, _3xp.fdt, _4mh.tvd, _4mh.tvf, _2ie.tii, _1o2.fdt, _4mh.tvx, _4mf.fdt, _4n0.frq, _308.tii, _4mw.tvx, _4ag.tvx, _308.tis, _4n1.frq, _4mf.fdx, _sc.tvd, _sc.tvf, _3yn.fdt, _4mw.tvf, _4ag.tvf, _4mw.tvd, _3yn.fdx, _1o2.fdx, _43w.fnm, _1o2.fnm, _4ag.tvd, _1si.frq, _sc.tvx, _cv.tis, _4dj.fnm, _4mh.frq, _1ih.tis, _4lc.tvf, _2em.fdt, _4lc.tvd, _2em.frq, _4ix.tvd, _21c.frq, _3ii.fdt, _2em.fdx, _4ix.tvf, _4n1.tis, _cv.tii, _4mz.fnm, _308.tvx, _4dj.prx, _4lc.tvx, _43w.tvf, _308.fnm, _3yn.frq, _43w.tvd, _43w.nrm, _it.prx, _4mx.frq, _cv.fnm, _2n0.tvx, _1l9.tii, _4n0.fdt, _418.tis, _418.tii, _1l9.tis, _4n3.nrm, _1l9.nrm, _4mw.frq, _4mf.prx, _4ix.tvx, _1l9.tvx, _2ie.fdx, _1b9.fnm, _43w.tvx, _2n0.tvd, _4n3.tii, _2n0.tvf, _3ii.fdx, _4n1.tii, _2em.nrm, _4n3.tis, _308.nrm, _2ie.fdt] Nov 13, 2012 2:02:27 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1352803609067 Nov 13, 2012 2:02:27 PM org.apache.solr.update.processor.LogUpdateProcessor finish
Re: AW: java.io.IOException: Map failed :: OutOfMemory
Thanks Andrew! Parallel i also found this thread: http://grokbase.com/t/lucene/solr-user/117m8e9n8t/solr-3-3-exception-in-thread-lucene-merge-thread-1 they are talking about the same We just started the importer again, with the unlimited-flag (/ulimit -v unlimited /), then we will see. -- View this message in context: http://lucene.472066.n3.nabble.com/java-io-IOException-Map-failed-OutOfMemory-tp4019802p4020134.html Sent from the Solr - User mailing list archive at Nabble.com.
Searchers, threads and performance
We're getting close to deploying our Solr search solution, and we're doing performance testing, and we've run into some questions and concerns. Our number one problem: Doing a commit from loading records, which can happen throughout the day, makes all queries stop for 5-7 seconds. This is a showstopper for deployment. Here's what we've observed: Upon commit, Solr finishes processing queries in flight, starts up a new searcher, warms it, shuts down the old searcher and puts the new searcher into effect. Does the old searcher stop taking requests before the new searcher is warmed or after? How wide is the window of time wherein Solr is not serving requests? For us, it's about five seconds and we need to drop that dramatically. In general, what is the difference between accepting the delay of waiting for warming vs. accepting the delay of running useColdSearcher=true? Is there any such thing as/any sense in running more than one searcher in our scenario? What are the benefits of multiple searchers? Erik Erikson posts in 2012: Unless you have warming happening, there should only be a single searcher open at any given time. Except: If your queries run across several commits you'll get multiple searchers open. Not sure if this is a general observation, or specific to the particular poster's situation. Finally, what do people mean when they blog that they have Solr set up for n threads? Is that the same thing as saying that Solr can be processing n requests simultaneously? Thanks for any insight or even links to relevant pages. We've been Googling all over and haven't found answers to the above. Thanks, Andy -- Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance
Searchers, threads and performance
We're getting close to deploying our Solr search solution, and we're doing performance testing, and we've run into some questions and concerns. Our number one problem: Doing a commit from loading records, which can happen throughout the day, makes all queries stop for 5-7 seconds. This is a showstopper for deployment. Here's what we've observed: Upon commit, Solr finishes processing queries in flight, starts up a new searcher, warms it, shuts down the old searcher and puts the new searcher into effect. Does the old searcher stop taking requests before the new searcher is warmed or after? How wide is the window of time wherein Solr is not serving requests? For us, it's about five seconds and we need to drop that dramatically. In general, what is the difference between accepting the delay of waiting for warming vs. accepting the delay of running useColdSearcher=true? Is there any such thing as/any sense in running more than one searcher in our scenario? What are the benefits of multiple searchers? Erik Erikson posts in 2012: Unless you have warming happening, there should only be a single searcher open at any given time. Except: If your queries run across several commits you'll get multiple searchers open. Not sure if this is a general observation, or specific to the particular poster's situation. Finally, what do people mean when they blog that they have Solr set up for n threads? Is that the same thing as saying that Solr can be processing n requests simultaneously? Thanks for any insight or even links to relevant pages. We've been Googling all over and haven't found answers to the above. Thanks, xoa -- Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance
URL parameters to use FieldAnalysisRequestHandler
Hello, I would like to send a request to the FieldAnalysisRequestHandler. The javadoc lists the parameter names such as analysis.field, but sending those as URL parameters does not seem to work: mysolr.umich.edu/analysis/field?analysis.name=titleq=fire-fly leaving out the analysis doesn't work either: mysolr.umich.edu/analysis/field?name=titleq=fire-fly No matter what field I specify, the analysis returned is for the default field. (See repsonse excerpt below) Is there a page somewhere that shows the correct syntax for sending get requests to the FieldAnalysisRequestHandler? Tom lst name=analysis lst name=field_types/ lst name=field_names lst name=ocr
Re: URL parameters to use FieldAnalysisRequestHandler
I think the UI uses this behind the scenes, as in no more analysis.jsp like before? So maybe try using something like burpsuite and just using the analysis UI in your browser to see what requests its sending. On Tue, Nov 13, 2012 at 11:00 AM, Tom Burton-West tburt...@umich.edu wrote: Hello, I would like to send a request to the FieldAnalysisRequestHandler. The javadoc lists the parameter names such as analysis.field, but sending those as URL parameters does not seem to work: mysolr.umich.edu/analysis/field?analysis.name=titleq=fire-fly leaving out the analysis doesn't work either: mysolr.umich.edu/analysis/field?name=titleq=fire-fly No matter what field I specify, the analysis returned is for the default field. (See repsonse excerpt below) Is there a page somewhere that shows the correct syntax for sending get requests to the FieldAnalysisRequestHandler? Tom lst name=analysis lst name=field_types/ lst name=field_names lst name=ocr
Re: Searchers, threads and performance
Andy, Solr is supposed to serve requests by old searcher for a while. If the pause lasts few seconds you can take a thread dump and see clear what it waits for. Just a guess: if you have many threads configured in servlet container pool and push high load then warming can significantly impact your search latency - try to limit acceptable load by reducing number of concurrent requests. What's cpu utilization, and io stats during in your test? do you watch GC log, for me GC spike is higly probable than a warming attack. Are you sure that you use MMapDirectory on which OS? Once again: - thread dump? - io/vm-stat dump? - gclog? - thread pool size at servlet container config? - directory impl and os? On Tue, Nov 13, 2012 at 7:20 PM, Andy Lester a...@petdance.com wrote: We're getting close to deploying our Solr search solution, and we're doing performance testing, and we've run into some questions and concerns. Our number one problem: Doing a commit from loading records, which can happen throughout the day, makes all queries stop for 5-7 seconds. This is a showstopper for deployment. Here's what we've observed: Upon commit, Solr finishes processing queries in flight, starts up a new searcher, warms it, shuts down the old searcher and puts the new searcher into effect. Does the old searcher stop taking requests before the new searcher is warmed or after? How wide is the window of time wherein Solr is not serving requests? For us, it's about five seconds and we need to drop that dramatically. In general, what is the difference between accepting the delay of waiting for warming vs. accepting the delay of running useColdSearcher=true? Is there any such thing as/any sense in running more than one searcher in our scenario? What are the benefits of multiple searchers? Erik Erikson posts in 2012: Unless you have warming happening, there should only be a single searcher open at any given time. Except: If your queries run across several commits you'll get multiple searchers open. Not sure if this is a general observation, or specific to the particular poster's situation. Finally, what do people mean when they blog that they have Solr set up for n threads? Is that the same thing as saying that Solr can be processing n requests simultaneously? Thanks for any insight or even links to relevant pages. We've been Googling all over and haven't found answers to the above. Thanks, xoa -- Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
RE: sort by function error
Hi Yonik I will give the latest 4.0 release a try. Thanks anyway. Cheers Ben From: ysee...@gmail.com [ysee...@gmail.com] on behalf of Yonik Seeley [yo...@lucidworks.com] Sent: Tuesday, November 13, 2012 2:04 PM To: solr-user@lucene.apache.org Subject: Re: sort by function error I can't reproduce this with the example data. Here's an example of what I tried: http://localhost:8983/solr/query?q=*:*sort=geodist(store,-32.123323,108.123323)+ascgroup.field=inStockgroup=true Perhaps this is an issue that's since been fixed. -Yonik http://lucidworks.com On Mon, Nov 12, 2012 at 11:19 PM, Kuai, Ben ben.k...@sensis.com.au wrote: Hi Yonik Thanks for the reply. My sample query, q=cafesort=geodist(geoLocation,-32.123323,108.123323)+ascgroup.field=familyId field name=geoLocation type=latLon indexed=true stored=false / field name=familyId type=string indexed=true stored=false / as long as I remove the group field the query working. BTW, I just find out that the version of solr we are using is an old copy of 4.0 snapshot before the alpha release. Could that be the problem? we have some customized parsers so it will take quite some time to upgrade. Ben From: ysee...@gmail.com [ysee...@gmail.com] on behalf of Yonik Seeley [yo...@lucidworks.com] Sent: Tuesday, November 13, 2012 6:46 AM To: solr-user@lucene.apache.org Subject: Re: sort by function error On Mon, Nov 12, 2012 at 5:24 AM, Kuai, Ben ben.k...@sensis.com.au wrote: more information, problem only happends when I have both sort by function and grouping in query. I haven't been able to duplicate this with a few ad-hoc queries. Could you give your complete request (or at least all of the relevant grouping and sorting parameters), as well as the field type you are grouping on? -Yonik http://lucidworks.com
Re: URL parameters to use FieldAnalysisRequestHandler
Thanks Robert, Somehow I read the doc but still entered the params wrong. Should have been analysis.fieldname instead of analysis.name Works fine now. Tom On Tue, Nov 13, 2012 at 2:11 PM, Robert Muir rcm...@gmail.com wrote: I think the UI uses this behind the scenes, as in no more analysis.jsp like before? So maybe try using something like burpsuite and just using the analysis UI in your browser to see what requests its sending. On Tue, Nov 13, 2012 at 11:00 AM, Tom Burton-West tburt...@umich.edu wrote: Hello, I would like to send a request to the FieldAnalysisRequestHandler. The javadoc lists the parameter names such as analysis.field, but sending those as URL parameters does not seem to work: mysolr.umich.edu/analysis/field?analysis.name=titleq=fire-fly leaving out the analysis doesn't work either: mysolr.umich.edu/analysis/field?name=titleq=fire-fly No matter what field I specify, the analysis returned is for the default field. (See repsonse excerpt below) Is there a page somewhere that shows the correct syntax for sending get requests to the FieldAnalysisRequestHandler? Tom lst name=analysis lst name=field_types/ lst name=field_names lst name=ocr
Re: Solr v4: Synonyms... better at index time or query time?
Don't use query time synonyms. Explanation here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory wunder On Nov 13, 2012, at 1:25 PM, dm_tim wrote: I'm looking at the sample docs for Solr v4 and I noted something in the schema.xml file: The field type uses the synonymFilterFactory in the query section but has it commented out in the index section. What would the trade-offs be to using the synonymFilterFactory in the index section instead. I assume that it would be pointless to use it in both sections. Example below: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType
Re: Solr GC issues - Too many BooleanQuery BooleanClause objects in heap
We do have a custom query parser that is responsible for expanding the user input query into a bunch of prefix, phrase and regular boolean queries in a manner similar to that done by DisMax. Analyzing heap with jhat/YourKit is on my list of things to do but I haven't gotten around to doing it yet. Our big heap size (13G) makes it a little difficult to do a full blown heap dump analysis. Thanks a ton for the reply Otis! Prasanna On Mon, Nov 12, 2012 at 5:42 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, I've never seen this. You don't have a custom query parser or anything else custom, do you? Have you tried dumping and analyzing heap? YourKit has a 7 day eval, or you can use things like jhat, which may be included on your machine already (see http://docs.oracle.com/javase/6/docs/technotes/tools/share/jhat.html). Otis -- Performance Monitoring - http://sematext.com/spm/index.html On Mon, Nov 12, 2012 at 8:35 PM, Prasanna R plistma...@gmail.com wrote: We have been using Solr in a custom setup where we generate results for user queries by expanding it to a large boolean query consisting of multiple prefix queries. There have been some GC issues recently with the Old/tenured generation becoming nearly 100% full leading to near constant full GC cycles. We are running Solr 3.1 on servers with 13G of heap. jmap live object histogram is as follows: num #instances #bytes class name -- 1: 27441222 1550723760 [Ljava.lang.Object; 2: 23546318 879258496 [C 3: 23813405 762028960 java.lang.String 4: 22700095 726403040 org.apache.lucene.search.BooleanQuery 5: 27431515 658356360 java.util.ArrayList 6: 22911883 549885192 org.apache.lucene.search.BooleanClause 7: 21651039 519624936 org.apache.lucene.index.Term 8: 6876651 495118872 org.apache.lucene.index.FieldsReader$LazyField 9: 11354214 363334848 org.apache.lucene.search.PrefixQuery 10: 4281624 137011968 java.util.HashMap$Entry 11: 3466680 83200320 org.apache.lucene.search.TermQuery 12: 1987450 79498000 org.apache.lucene.search.PhraseQuery 13:631994 70148624 [Ljava.util.HashMap$Entry; . I have looked at the Solr cache settings multiple times but am not able to figure out how/why the high number of BooleanQuery and BooleanClause object instances stay alive. These objects are live and do not get collected even when the traffic is disabled and a manual GC is triggered which indicates that someone is holding onto references. Can anyone provide more details on the circumstances under which these objects stay alive and/or cached? If they are cached then is the caching configurable? Any and all tips/suggestions/pointers will be much appreciated. Thanks, Prasanna
Custom Solr indexer/searcher
Suppose I have a special data search type (something different than a string or numeric value) that I want to integrate into the Solr server. For example, suppose I wanted to implement a KD-tree as a filter that would integrate with standard Solr filters and queries. I might want to say find all of the documents in the index with the word 'tree' in them that are within a certain distance of a particular document in the KD-tree. Let me add that I'm not really looking for a KD-Tree implementation for Solr; I just assume that a fair number of people will know what a KD-tree is and so, have some idea that I'm talking about adding a new data type (different than string, long, etc.) that Solr will need to be able to index and search with. It's important that the new data type should integrate with the existing standard Solr data types for searching purposes. First, is there a way to build and specify a plugin that provides Solr both the indexer and search interfaces and therefore hides the internal details of what's going on in the search from Solr so it just thinks it's another search type? Or, would I have to hack Solr in a lot of places to add my custom data type in? Second, if the interface(s) exists to add in a new data type, is there documentation (tutorial, examples, etc.) anywhere on how to do this. Or, is my only option to dig into the Solr code? Mostly, I'm looking for some links or suggestions on where to start looking. I doubt this subject is simple enough to fit into an email post (though I'd be happy to be surprised :) ). You can assume Solr 4.0 if that makes things easier. You can also assume that I have some familiarity with Lucene (though I haven't hacked that code either). Hopefully, I've explained this well enough so that people know what I'm looking for. Cheers Scott
Re: Testing Solr Cloud with ZooKeeper
Looks like after timeout has finished, first solr instance respond I was not waiting enough. Is it possible to reduce this *timeout* value ? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4020190.html Sent from the Solr - User mailing list archive at Nabble.com.
Has anyone HunspellStemFilterFactory working?
If so, would you be willing to share the .dic and .aff files with me? When I try to load a dictionary file, Solr is complaining that: java.lang.RuntimeException: java.io.IOException: Unable to load hunspell data! [dictionary=en_GB.dic,affix=en_GB.aff] at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:116) ... Caused by: java.text.ParseException: The first non-comment line in the affix file must be a 'SET charset', was: 'FLAG num' at org.apache.lucene.analysis.hunspell.HunspellDictionary.getDictionaryEncoding(HunspellDictionary.java:306) at org.apache.lucene.analysis.hunspell.HunspellDictionary.init(HunspellDictionary.java:130) at org.apache.lucene.analysis.hunspell.HunspellStemFilterFactory.inform(HunspellStemFilterFactory.java:103) ... 46 more When I change the first line to 'SET charset' it is still not happy. I got the dictionary files from the OpenOffice website. I'm using Solr 4.0 (but had the same problem with 3.6) - Rob
Solr 4.0 Dismax woes (2 specifically)
Heck, I originally started using the default query parser but gave up on it because all of my search results are equally important and idf was messing up my results pretty badly. So I discovered the DisMax query parser which doesn't use idf. I was elated until I started testing. My initial results looked good but when I cut down the query string from clothes to clot I got zero results. I've been reading about how disMax is supposed to do fuzzy searches but I can't make it work at all. To complicate matters I discovered that my all of my search words are being used against all of the query fields. I had previously assumed that each search word would only be applied to individual query fields. So for example my q is: clothe 95 And my qf: tag cid So I believe that the words clothe and 95 are being searched on both fields (tag and cid) which is not what I wanted to do. I was hoping to have cloth applied only to the tag field and 95 applied only to the cid field. I really don't have it in me to write my own query parser so I'm hoping to find a way to do a fuzzy search without scores being screwed by idf. Is there a way to achieve my desired results with existing code? Regards, (A tired) Tim -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Dismax-woes-2-specifically-tp4020197.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr v4: Synonyms... better at index time or query time?
Good to know. Thanks. T -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-v4-Synonyms-better-at-index-time-or-query-time-tp4020179p4020198.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr4.0 / SolrCloud queries
Thanks Mark. I meant ConcurrentMergeScheduler and ramBufferSizeMB (not maxBuffer). These are my settings for Merge. / ramBufferSizeMB960/ramBufferSizeMB mergeFactor40/mergeFactor mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler/ / --Shreejay Mark Miller-3 wrote On Nov 9, 2012, at 1:20 PM, shreejay lt; shreejayn@ gt; wrote: Instead of doing an optimize, I have now changed the Merge settings by keeping a maxBuffer = 960, a merge Factor = 40 and ConcurrentMergePolicy. Don't you mean ConcurrentMergeScheduler? Keep in mind that if you use the default TieredMergePolicy, mergeFactor will have no affect. You need to use maxMergeAtOnce and segmentsPerTier as sub args to the merge policy config (see the commented out example in solrconfig.xml). Also, it's probably best to avoid using maxBufferedDocs at all. - Mark Mark Miller-3 wrote On Nov 9, 2012, at 1:20 PM, shreejay lt; shreejayn@ gt; wrote: Instead of doing an optimize, I have now changed the Merge settings by keeping a maxBuffer = 960, a merge Factor = 40 and ConcurrentMergePolicy. Don't you mean ConcurrentMergeScheduler? Keep in mind that if you use the default TieredMergePolicy, mergeFactor will have no affect. You need to use maxMergeAtOnce and segmentsPerTier as sub args to the merge policy config (see the commented out example in solrconfig.xml). Also, it's probably best to avoid using maxBufferedDocs at all. - Mark -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4-0-SolrCloud-queries-tp4016825p4020200.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Nested Join Queries
Thank you Mikhail. Unfortunately BlockJoinQuery is not an option we can leverage. - We have modeled our document types as different indexes/cores. - Our relationships which we are attempting to join across are not single-parent to many-children relationships. They are in fact many to many. - Additionally, memory usage is a concern. FYI. After making the code change I mentioned in my original post, we have completed a full test cycle and did not experience any adverse impacts to the change. And our join query functionality returns the results we wanted. I would still be interested in hearing an explanation as to why the code is written as it is in v4.0.0. Thanks. On Tue, Nov 13, 2012 at 8:31 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Please find reference materials http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html http://blog.griddynamics.com/2012/08/block-join-query-performs.html On Tue, Nov 13, 2012 at 3:25 PM, Gerald Blanck gerald.bla...@barometerit.com wrote: Thank you. I've not heard of BlockJoin. I will look into it today. Thanks. On Tue, Nov 13, 2012 at 5:05 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Replied. pls check maillist. On Tue, Nov 13, 2012 at 11:44 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Gerald, I wonder if you tried to approach BlockJoin for your problem? Can you afford less frequent updates? On Wed, Nov 7, 2012 at 5:40 PM, Gerald Blanck gerald.bla...@barometerit.com wrote: Thank you Erick for your reply. I understand that search is not an RDBMS. Yes, we do have a huge combinatorial explosion if we de-normalize and duplicate data. In fact, I believe our use case is exactly what the Solr developers were trying to solve with the addition of the Join query. And while the example I gave illustrates the problem we are solving with the Join functionality, it is simplistic in nature compared to what we have in actuality. Am still looking for an answer here if someone can shed some light. Thanks. On Sat, Nov 3, 2012 at 9:38 PM, Erick Erickson erickerick...@gmail.comwrote: I'm going to go a bit sideways on you, partly because I can't answer the question G... But, every time I see someone doing what looks like substituting core for table and then trying to use Solr like a DB, I get on my soap-box and preach.. In this case, consider de-normalizing your DB so you can ask the query in terms of search rather than joins. e.g. Make each document a combination of the author and the book, with an additional field author_has_written_a_bestseller. Now your query becomes a really simple search, author:name AND author_has_written_a_bestseller:true. True, this kind of approach isn't as flexible as an RDBMS, but it's a _search_ rather than a query. Yes, it replicates data, but unless you have a huge combinatorial explosion, that's not a problem. And the join functionality isn't called pseudo for nothing. It was written for a specific use-case. It is often expensive, especially when the field being joined has many unique values. FWIW, Erick On Fri, Nov 2, 2012 at 11:32 AM, Gerald Blanck gerald.bla...@barometerit.com wrote: At a high level, I have a need to be able to execute a query that joins across cores, and that query during its joining may join back to the originating core. Example: Find all Books written by an Author who has written a best selling Book. In Solr query syntax A) against the book core - bestseller:true B) against the author core - {!join fromIndex=book from=id to=bookid}bestseller:true C) against the book core - {!join fromIndex=author from=id to=authorid}{!join fromIndex=book from=id to=bookid}bestseller:true A - returns results B - returns results C - does not return results Given that A and C use the same core, I started looking for join code that compares the originating core to the fromIndex and found this in JoinQParserPlugin (line #159). if (info.getReq().getCore() == fromCore) { // if this is the same core, use the searcher passed in... otherwise we could be warming and // get an older searcher from the core. fromSearcher = searcher; } else { // This could block if there is a static warming query with a join in it, and if useColdSearcher is true. // Deadlock could result if two cores both had useColdSearcher and had joins that used eachother. // This would be very predictable though (should happen every time if misconfigured) fromRef = fromCore.getSearcher(false, true, null); // be careful not to do anything with this searcher that requires the thread local // SolrRequestInfo in a manner that
Re: Solr 4.0 - distributed updates without zookeeper?
Yes, basically I want to at least avoid leader election and the other dynamic behaviors. I don't have any experience with ZK, and a lot of magic behavior seems baked in now that's I'm concerned I'd need to dig into SK to debug or monitor what's really happening as we scale out. We also have a somewhat non-typical use case, of lots of small cores/indexes on the same server, rather large indexes that might need multiple shards. We have master servers that have persistent (but sometimes slower) storage, and slaves with faster non-persistent disk. My colleague noticed that their is a param to flag a server as eligible to be a shard leader, so I guess we could enable that for only the preferred master? I'm also having trouble understanding config handling from the docs. Even browsing the java code I don't see if Solr is creating the instance dirs, or somehow just linking to config files? It sounds as though if I create a core using core admin, it would get associated with a collection of the same name. -Peter On Mon, Nov 12, 2012 at 9:41 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi Peter, Not sure I have the answer for you, but are you looking to avoid using ZK for some reason? Or are you OK with ZK per se, but just don't want any leader re-election and any other dynamic/cloudy behaviour? Could you not simply treat 1 node as the master to which you send all your updates and let SolrCloud distribute that to the rest of the cluster? Is your main/only worry around what happens if this 1 node that you designated as the master goes down? What would you like to happen? You'd like indexing to start failing, while the search functionality remains up? Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Sun, Nov 11, 2012 at 7:42 PM, Peter Wolanin peter.wola...@acquia.comwrote: Looking at how we could upgrade some of our infrastructure to Solr 4.0 - I would really like to take advantage of distributed updates to get NRT, but we want to keep our fixed master and slave server roles since we use different hardware appropriate to the different roles. Looking at the solr 4.0 distributed update code, it seems really hard-coded and bound to zookeeper. Is there a way to have a solr master distribute updates without using ZK, or a way to mock the ZK interface to provide a fixed cluster topography that will work when sending updates just to the master? To be clear, if the master goes doen I don't want a slave promoted, nor do I want most of the other SolrCloud features - we have already built out a system for managing groups of servers. Thanks, Peter -- Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com : 781-313-8322 Get a free, hosted Drupal 7 site: http://www.drupalgardens.com;
Re: Solr GC issues - Too many BooleanQuery BooleanClause objects in heap
Hi, Yeah, large heap can be problematic like that. :) But if there is some sort of a leak, and if I had to bet I'd put my money on your custom QP knowing what I know about this situation, you could also start Solr with a much smaller heap and grab the heap snapshot as soon as you see some number of those objects appearing towards the top of jmap - that should be enough to trace them to their roots. Otis -- Solr Performance Monitoring - http://sematext.com/spm/index.html On Tue, Nov 13, 2012 at 5:18 PM, Prasanna R plistma...@gmail.com wrote: We do have a custom query parser that is responsible for expanding the user input query into a bunch of prefix, phrase and regular boolean queries in a manner similar to that done by DisMax. Analyzing heap with jhat/YourKit is on my list of things to do but I haven't gotten around to doing it yet. Our big heap size (13G) makes it a little difficult to do a full blown heap dump analysis. Thanks a ton for the reply Otis! Prasanna On Mon, Nov 12, 2012 at 5:42 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, I've never seen this. You don't have a custom query parser or anything else custom, do you? Have you tried dumping and analyzing heap? YourKit has a 7 day eval, or you can use things like jhat, which may be included on your machine already (see http://docs.oracle.com/javase/6/docs/technotes/tools/share/jhat.html). Otis -- Performance Monitoring - http://sematext.com/spm/index.html On Mon, Nov 12, 2012 at 8:35 PM, Prasanna R plistma...@gmail.com wrote: We have been using Solr in a custom setup where we generate results for user queries by expanding it to a large boolean query consisting of multiple prefix queries. There have been some GC issues recently with the Old/tenured generation becoming nearly 100% full leading to near constant full GC cycles. We are running Solr 3.1 on servers with 13G of heap. jmap live object histogram is as follows: num #instances #bytes class name -- 1: 27441222 1550723760 [Ljava.lang.Object; 2: 23546318 879258496 [C 3: 23813405 762028960 java.lang.String 4: 22700095 726403040 org.apache.lucene.search.BooleanQuery 5: 27431515 658356360 java.util.ArrayList 6: 22911883 549885192 org.apache.lucene.search.BooleanClause 7: 21651039 519624936 org.apache.lucene.index.Term 8: 6876651 495118872 org.apache.lucene.index.FieldsReader$LazyField 9: 11354214 363334848 org.apache.lucene.search.PrefixQuery 10: 4281624 137011968 java.util.HashMap$Entry 11: 3466680 83200320 org.apache.lucene.search.TermQuery 12: 1987450 79498000 org.apache.lucene.search.PhraseQuery 13:631994 70148624 [Ljava.util.HashMap$Entry; . I have looked at the Solr cache settings multiple times but am not able to figure out how/why the high number of BooleanQuery and BooleanClause object instances stay alive. These objects are live and do not get collected even when the traffic is disabled and a manual GC is triggered which indicates that someone is holding onto references. Can anyone provide more details on the circumstances under which these objects stay alive and/or cached? If they are cached then is the caching configurable? Any and all tips/suggestions/pointers will be much appreciated. Thanks, Prasanna
Re: Run multiple instances of solr using single data directory
Hi, If you have high query rate, running multiple instances of Solr on the same server doesn't typically make sense. I'd stop and rethink :) Otis -- Solr Performance Monitoring - http://sematext.com/spm/index.html On Tue, Nov 13, 2012 at 5:46 PM, Rohit Harchandani rhar...@gmail.comwrote: Hi All, I am currently using solr 4.0. The application I am working on requires a high rate of queries per second. Currently, we have setup a single master and a single slave on a production machine. We want to bring up multiple instances of solr (slaves). Are there any problems, when bringing them up on different ports but using the same data directory? These will be only serving up queries and all the indexing will take place on the master machine. Also, if i have multiple instances from the same data directory and i perform replication. Would that re-open searchers on all the instances? Thanks, Rohit
Re: Searchers, threads and performance
Hello Andy, On Tue, Nov 13, 2012 at 1:26 PM, Andy Lester a...@petdance.com wrote: We're getting close to deploying our Solr search solution, and we're doing performance testing, and we've run into some questions and concerns. Our number one problem: Doing a commit from loading records, which can happen throughout the day, makes all queries stop for 5-7 seconds. This is a showstopper for deployment. Here's what we've observed: Upon commit, Solr finishes processing queries in flight, starts up a new searcher, warms it, shuts down the old searcher and puts the new searcher into effect. Does the old searcher stop taking requests before the new searcher is warmed or after? How wide is the window of time wherein Solr is not serving requests? For us, it's about five seconds and we need to drop that dramatically. In general, what is the difference between accepting the delay of waiting for warming vs. accepting the delay of running useColdSearcher=true? Old searcher is used while the new one is being warmed up. Solr should always be serving requests - it's not designed to have a a moment when it's not serving them because of searcher swap. Don't use cold searcher - the first unlucky user will pay the price and will likely block all other queries for a while. Your queries probably don't actually stop for 5-7 seconds, they just slow down. Look at your system's performance metrics during this period. GC high? Disk IO high? CPU high? Is there any such thing as/any sense in running more than one searcher in our scenario? What are the benefits of multiple searchers? Erik Erikson posts in 2012: Unless you have warming happening, there should only be a single searcher open at any given time. Except: If your queries run across several commits you'll get multiple searchers open. Not sure if this is a general observation, or specific to the particular poster's situation. Finally, what do people mean when they blog that they have Solr set up for n threads? Is that the same thing as saying that Solr can be processing n requests simultaneously? Not sure what they mean :) It is the servlet container that deals with threads, not Solr. Maybe this is referring to indexing with N threads in parallel to speed up indexing (in pre-Solr 4.0 days). Thanks for any insight or even links to relevant pages. We've been Googling all over and haven't found answers to the above. Try http://search-lucene.com If you are doing performance testing and warming is a concern, one of the reports in SPM for Solr shows where warming time is being spent - on which caches or on the searcher, and how much time goes on each. Oh, which reminds me - it is also possible your cache settings are such that they require a lot of warming and it is possible that your warmup queries are too heavy or numerous. Otis -- Solr Performance Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html
Using CJK analyzer
Hi, Using Solr 1.2.0, the following works (and I get hits searching on Chinese text): fieldType name=text class=solr.TextField analyzer type=index class=org.apache.lucene.analysis.cjk.CJKAnalyzer” / analyzer type=query class=org.apache.lucene.analysis.cjk.CJKAnalyzer” / /fieldType But it won't work using Solr 3.6.1. Any idea what I might be missing? Yes, I also tried (in Solr 3.6.1): !-- CJK bigram (see text_ja for a Japanese configuration using morphological analysis) -- fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ !-- normalize width before bigram, as e.g. half-width dakuten combine -- filter class=solr.CJKWidthFilterFactory/ !-- for any non-CJK -- filter class=solr.LowerCaseFilterFactory/ filter class=solr.CJKBigramFilterFactory/ /analyzer /fieldType and it won't work. I run it through the analyzer and I see this (I hope the table will show up fine on the mailing list): Index Analyzer org.apache.lucene.analysis.cn.ChineseAnalyzer {} position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 term text 去 除 商 品 操 作 在 订 购 单 中 留 下 空 白 行 startOffset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 endOffset 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Query Analyzer org.apache.lucene.analysis.cn.ChineseAnalyzer {} position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 term text 去 除 商 品 操 作 在 订 购 单 中 留 下 空 白 行 startOffset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 endOffset 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 --MJ
Re: Nested Join Queries
Gerald, Nice to hear the the your problem is solved. Can you contribute a test case to reproduce this issue? FWIW, my team successfully deals with Many-to-Many in BlockJoin. It works, but solution is a little bit immature yet. On Wed, Nov 14, 2012 at 5:59 AM, Gerald Blanck gerald.bla...@barometerit.com wrote: Thank you Mikhail. Unfortunately BlockJoinQuery is not an option we can leverage. - We have modeled our document types as different indexes/cores. - Our relationships which we are attempting to join across are not single-parent to many-children relationships. They are in fact many to many. - Additionally, memory usage is a concern. FYI. After making the code change I mentioned in my original post, we have completed a full test cycle and did not experience any adverse impacts to the change. And our join query functionality returns the results we wanted. I would still be interested in hearing an explanation as to why the code is written as it is in v4.0.0. Thanks. On Tue, Nov 13, 2012 at 8:31 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Please find reference materials http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html http://blog.griddynamics.com/2012/08/block-join-query-performs.html On Tue, Nov 13, 2012 at 3:25 PM, Gerald Blanck gerald.bla...@barometerit.com wrote: Thank you. I've not heard of BlockJoin. I will look into it today. Thanks. On Tue, Nov 13, 2012 at 5:05 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Replied. pls check maillist. On Tue, Nov 13, 2012 at 11:44 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Gerald, I wonder if you tried to approach BlockJoin for your problem? Can you afford less frequent updates? On Wed, Nov 7, 2012 at 5:40 PM, Gerald Blanck gerald.bla...@barometerit.com wrote: Thank you Erick for your reply. I understand that search is not an RDBMS. Yes, we do have a huge combinatorial explosion if we de-normalize and duplicate data. In fact, I believe our use case is exactly what the Solr developers were trying to solve with the addition of the Join query. And while the example I gave illustrates the problem we are solving with the Join functionality, it is simplistic in nature compared to what we have in actuality. Am still looking for an answer here if someone can shed some light. Thanks. On Sat, Nov 3, 2012 at 9:38 PM, Erick Erickson erickerick...@gmail.comwrote: I'm going to go a bit sideways on you, partly because I can't answer the question G... But, every time I see someone doing what looks like substituting core for table and then trying to use Solr like a DB, I get on my soap-box and preach.. In this case, consider de-normalizing your DB so you can ask the query in terms of search rather than joins. e.g. Make each document a combination of the author and the book, with an additional field author_has_written_a_bestseller. Now your query becomes a really simple search, author:name AND author_has_written_a_bestseller:true. True, this kind of approach isn't as flexible as an RDBMS, but it's a _search_ rather than a query. Yes, it replicates data, but unless you have a huge combinatorial explosion, that's not a problem. And the join functionality isn't called pseudo for nothing. It was written for a specific use-case. It is often expensive, especially when the field being joined has many unique values. FWIW, Erick On Fri, Nov 2, 2012 at 11:32 AM, Gerald Blanck gerald.bla...@barometerit.com wrote: At a high level, I have a need to be able to execute a query that joins across cores, and that query during its joining may join back to the originating core. Example: Find all Books written by an Author who has written a best selling Book. In Solr query syntax A) against the book core - bestseller:true B) against the author core - {!join fromIndex=book from=id to=bookid}bestseller:true C) against the book core - {!join fromIndex=author from=id to=authorid}{!join fromIndex=book from=id to=bookid}bestseller:true A - returns results B - returns results C - does not return results Given that A and C use the same core, I started looking for join code that compares the originating core to the fromIndex and found this in JoinQParserPlugin (line #159). if (info.getReq().getCore() == fromCore) { // if this is the same core, use the searcher passed in... otherwise we could be warming and // get an older searcher from the core. fromSearcher = searcher; } else { // This could block if there is a static warming query with a join in it, and if useColdSearcher is true. // Deadlock could result if two cores both had useColdSearcher and had joins that used eachother.