Re: SolrCloud setup - any advice?
Hi Neil, Although you haven't mentioned it, just wanted to confirm - do you have soft commits enabled? Also what's the version of solr you are using for the solr cloud setup? 4.0.0 had lots of memory and zk related issues. What's the warmup time for your caches? Have you tried disabling the caches? Is this is static index or you documents are added continuously? The answers to these questions might help us pin point the issue... On Thursday, September 19, 2013, Neil Prosser wrote: Apologies for the giant email. Hopefully it makes sense. We've been trying out SolrCloud to solve some scalability issues with our current setup and have run into problems. I'd like to describe our current setup, our queries and the sort of load we see and am hoping someone might be able to spot the massive flaw in the way I've been trying to set things up. We currently run Solr 4.0.0 in the old style Master/Slave replication. We have five slaves, each running Centos with 96GB of RAM, 24 cores and with 48GB assigned to the JVM heap. Disks aren't crazy fast (i.e. not SSDs) but aren't slow either. Our GC parameters aren't particularly exciting, just -XX:+UseConcMarkSweepGC. Java version is 1.7.0_11. Our index size ranges between 144GB and 200GB (when we optimise it back down, since we've had bad experiences with large cores). We've got just over 37M documents some are smallish but most range between 1000-6000 bytes. We regularly update documents so large portions of the index will be touched leading to a maxDocs value of around 43M. Query load ranges between 400req/s to 800req/s across the five slaves throughout the day, increasing and decreasing gradually over a period of hours, rather than bursting. Most of our documents have upwards of twenty fields. We use different fields to store territory variant (we have around 30 territories) values and also boost based on the values in some of these fields (integer ones). So an average query can do a range filter by two of the territory variant fields, filter by a non-territory variant field. Facet by a field or two (may be territory variant). Bring back the values of 60 fields. Boost query on field values of a non-territory variant field. Boost by values of two territory-variant fields. Dismax query on up to 20 fields (with boosts) and phrase boost on those fields too. They're pretty big queries. We don't do any index-time boosting. We try to keep things dynamic so we can alter our boosts on-the-fly. Another common query is to list documents with a given set of IDs and select documents with a common reference and order them by one of their fields. Auto-commit every 30 minutes. Replication polls every 30 minutes. Document cache: * initialSize - 32768 * size - 32768 Filter cache: * autowarmCount - 128 * initialSize - 8192 * size - 8192 Query result cache: * autowarmCount - 128 * initialSize - 8192 * size - 8192 After a replicated core has finished downloading (probably while it's warming) we see requests which usually take around 100ms taking over 5s. GC logs show concurrent mode failure. I was wondering whether anyone can help with sizing the boxes required to split this index down into shards for use with SolrCloud and roughly how much memory we should be assigning to the JVM. Everything I've read suggests that running with a 48GB heap is way too high but every attempt I've made to reduce the cache sizes seems to wind up causing out-of-memory problems. Even dropping all cache sizes by 50% and reducing the heap by 50% caused problems. I've already tried using SolrCloud 10 shards (around 3.7M documents per shard, each with one replica) and kept the cache sizes low: Document cache: * initialSize - 1024 * size - 1024 Filter cache: * autowarmCount - 128 * initialSize - 512 * size - 512 Query result cache: * autowarmCount - 32 * initialSize - 128 * size - 128 Even when running on six machines in AWS with SSDs, 24GB heap (out of 60GB memory) and four shards on two boxes and three on the rest I still see concurrent mode failure. This looks like it's causing ZooKeeper to mark the node as down and things begin to struggle. Is concurrent mode failure just something that will inevitably happen or is it avoidable by dropping the CMSInitiatingOccupancyFraction? If anyone has anything that might shove me in the right direction I'd be very grateful. I'm wondering whether our set-up will just never work and maybe we're expecting too much. Many thanks, Neil
Re: What do you use for solr's logging analysis?
There are a lot of tools out there with varying degrees of functionality ( and ease of setup) we also have multiple solr servers in production ( both cloud and single nodes ) and we have decided to use http://loggly. http://loggly.com/ We will probably be setting it up for all our servers in the next few weeks. . There are plenty of other such log analysis tools. It all depends on your particular use case. --Shreejay On Sunday, August 11, 2013, adfel70 wrote: Hi I'm looking at a tool that could help me perform solr logging analysis. I use SolrCloud on multiple servers, so the tool should be able to collect logs from multiple servers. Any tool you use and can advice of? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/What-do-you-use-for-solr-s-logging-analysis-tp4083809.html Sent from the Solr - User mailing list archive at Nabble.com. -- -- Shreejay Nair Sent from my mobile device. Please excuse brevity and typos.
Re: commit vs soft-commit
Yes a new searcher is opened with every soft commit. It's still considered faster because it does not write to the disk which is a slow IO operation and might take a lot more time. On Sunday, August 11, 2013, tamanjit.bin...@yahoo.co.in wrote: Hi, Some confusion in my head. http:// http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22 http:// http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22 says that /A soft commit is much faster since it only makes index changes visible and does not fsync index files or write a new index descriptor./ So this means that even with every softcommit a new searcher opens right? If it does, isn't it still very heavy? -- View this message in context: http://lucene.472066.n3.nabble.com/commit-vs-soft-commit-tp4083817.html Sent from the Solr - User mailing list archive at Nabble.com. -- -- Shreejay Nair Sent from my mobile device. Please excuse brevity and typos.
Re: Need help on Solr
org.apache.solr.common.SolrException: [schema.xml] Duplicate field definition for 'id' You might have defined an id field in the schema file. The out of box schema file already contains an id field . -- Shreejay On Thursday, June 20, 2013 at 9:16, Abhishek Bansal wrote: Hello, I am trying to index a pdf file on Solr. I am running icurrently Solr on Apache Tomcat 6. When I try to index it I get below error. Please help. I was not able to rectify this error with help of internet. ERROR - 2013-06-20 20:43:41.549; org.apache.solr.core.CoreContainer; Unable to create core: collection1 org.apache.solr.common.SolrException: [schema.xml] Duplicate field definition for 'id' [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] and [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:176) at org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) ERROR - 2013-06-20 20:43:41.551; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: Unable to create core: collection1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: [schema.xml] Duplicate field definition for 'id' [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] and [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:176) at org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984) ... 10 more INFO - 2013-06-20 20:43:41.553; org.apache.solr.servlet.SolrDispatchFilter; user.dir=C:\Program Files\Apache Software Foundation\Tomcat 6.0 INFO - 2013-06-20 20:43:41.553; org.apache.solr.servlet.SolrDispatchFilter; SolrDispatchFilter.init() done ERROR - 2013-06-20 20:43:41.820; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: [schema.xml] Duplicate field definition for 'id' [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] and [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1212) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155
Re: ngroups does not show correct number of groups when used in SolrCloud
Hi Markus, For ngroups to work in a cloud environment you have to make sure that all docs belonging to a group reside on the same shard. Custom hashing has been introduced in the recent versions of solr cloud. You might want to look into that https://issues.apache.org/jira/browse/SOLR-2592 All queries on SolrCloud are run individually on each shard. And then the results are merged. When u run a group query SolrCloud runs the query on each shard and when the results are merged the ngroups from each shard are added up. This is why the ngroups is incorrect when using SolrCloud. -- Shreejay On Friday, June 14, 2013 at 5:11, Markus.Mirsberger wrote: Hi, I just noticed (after long time testing and finally looking into the docu :p) that the ngroups parameter does not show the correct number of groups when used in anything else than a single shard environment (in my case SolrCloud). Is there another way to get the amount of all groups without iterating through alot of resultsets? I dont need the values of the grouping. I just need the complete number of groups. Or can this be done with facets maybe? I dont need to to use grouping but as far as I know I cant get the complete amount of facets without iterating through the resultsets. So this seemed to me the only way to achieve something equal to a distinct count in sql. any ideas how this can be done with solr? Thanks, Markus
Re: How spell checker used if indexed document is containing misspelled words
Hi, Have you tried this? http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.onlyMorePopular Of course this is assuming that your corpus has correct words occurring more frequently than incorrect ones! -- Shreejay On Friday, June 14, 2013 at 2:49, venkatesham.gu...@igate.com wrote: My data is picked from social media sites and misspelled words are very frequent in social text because of the informal mode of communication.Spellchecker does not work here because misspelled words are present in the text corpus and not in the search query. Finding documents with all the different misspelled forms for a given word is not possible using spellchecker, how to go ahead with this. -- View this message in context: http://lucene.472066.n3.nabble.com/How-spell-checker-used-if-indexed-document-is-containing-misspelled-words-tp4070463.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: out of memory during indexing do to large incoming queue
A couple of things: 1) can you give some more details about your setup ? Like whether its cloud or single instance . How many nodes if its cloud. The hardware - memory per machine , JVM options. Etc 2) any specific reason for using 4.0 beta? The latest version is 4.3. I used 4.0 for a few weeks and there were a lot if bugs related to memory and communication between nodes ( zookeeper) 3) if you haven't seen it already , please go through this wiki page . It's an excellent starting point for troubleshooting memory n indexing issues. Specially section 3 to 7 http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations -- Shreejay On Sunday, June 2, 2013 at 7:16, Yoni Amir wrote: Hello, I am receiving OutOfMemoryError during indexing, and after investigating the heap dump, I am still missing some information, and I thought this might be a good place for help. I am using Solr 4.0 beta, and I have 5 threads that send update requests to Solr. Each request is a bulk of 100 SolrInputDocuments (using solrj), and my goal is to index around 2.5 million documents. Solr is configured to do a hard-commit every 10 seconds, so initially I thought that it can only accumulate in memory 10 seconds worth of updates, but that's not the case. I can see in a profiler how it accumulates memory over time, even with 4 to 6 GB of memory. It is also configured to optimize with mergeFactor=10. At first I thought that optimization is a blocking, synchronous operation. It is, in the sense that the index can't be updated during optimization. However, it is not synchronous, in the sense that the update request coming from my code is not blocked - Solr just returns an OK response, even while the index is optimizing. This indicates that Solr has an internal queue of inbound requests, and that the OK response just means that it is in the queue. I get confirmation for this from a friend who is a Solr expert (or so I hope). My main question is: how can I put a bound on this internal queue, and make update requests synchronous in case the queue is full? Put it another way, I need to know if Solr is really ready to receive more requests, so I don't overload it and cause OOME. I performed several tests, with slow and fast disks, and on the really fasts disk the problem didn't occur. However, I can't demand such fast disk from all the clients, and also even with a fast disk the problem will occur eventually when I try to index 10 million documents. I also tried to perform indexing with optimization disabled, but it didn't help. Thanks, Yoni Confidentiality: This communication and any attachments are intended for the above-named persons only and may be confidential and/or legally privileged. Any opinions expressed in this communication are not necessarily those of NICE Actimize. If this communication has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender by e-mail immediately. Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. Viruses: Although we have taken steps toward ensuring that this e-mail and attachments are free from any virus, we advise that in keeping with good computing practice the recipient should ensure they are actually virus free.
Re: Highlighting fields
Are the fields you are trying to highlight stored? If yes then can you show the exact query you are using? Which version of solr? And which highlighter? ( you can paste the relevant highlight section from solr config file) -- Shreejay On Thursday, May 30, 2013 at 22:56, Sagar Chaturvedi wrote: Sorry for wrong subject. Corrected it. -Original Message- From: Sagar Chaturvedi [mailto:sagar.chaturv...@nectechnologies.in] Sent: Friday, May 31, 2013 11:25 AM To: solr-user@lucene.apache.org Subject: RE: Support for Mongolian language Hi, On solr admin UI, in a query I am trying to highlight some fields. I have set hl = true, given name of comma separated fields in hl.fl but fields are not getting highlighted. Any insights? Regards, Sagar DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . --- DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
Re: Highlighting fields
Didn't notice the original message. Sorry about that. -- Shreejay On Friday, May 31, 2013 at 5:55, Jack Krupansky wrote: Please do not respond to hijacked message threads, other than to encourage the sender to start a new message thread. -- Jack Krupansky -Original Message- From: Shreejay Sent: Friday, May 31, 2013 5:10 AM To: solr-user@lucene.apache.org Subject: Re: Highlighting fields Are the fields you are trying to highlight stored? If yes then can you show the exact query you are using? Which version of solr? And which highlighter? ( you can paste the relevant highlight section from solr config file) -- Shreejay On Thursday, May 30, 2013 at 22:56, Sagar Chaturvedi wrote: Sorry for wrong subject. Corrected it. -Original Message- From: Sagar Chaturvedi [mailto:sagar.chaturv...@nectechnologies.in] Sent: Friday, May 31, 2013 11:25 AM To: solr-user@lucene.apache.org Subject: RE: Support for Mongolian language Hi, On solr admin UI, in a query I am trying to highlight some fields. I have set hl = true, given name of comma separated fields in hl.fl but fields are not getting highlighted. Any insights? Regards, Sagar DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . --- DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
Re: Tool to read Solr4.2 index
This might help http://wiki.apache.org/solr/LukeRequestHandler -- Shreejay Nair Sent from my mobile device. Please excuse brevity and typos. On Wednesday, May 22, 2013 at 13:47, gpssolr2020 wrote: Hi All, We can use lukeall4.0 for reading Solr3.x index . Like that do we have anything to read solr 4.x index. Please help. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Tool-to-read-Solr4-2-index-tp4065448.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: seeing lots of autowarming messages in log during DIH indexing
Every time a commit is done a new searcher is opened. In the solr config file caches are defined with a parameter called autowarm. Autowarm basically tries to copy the cache values from previous searcher into the current one. If you are doing a bulk update and do not care for searching till your indexing is over then you can specify openSearcher=false while doing a commit. That should speed up your indexing a lot. -- Shreejay Nair Sent from my mobile device. Please excuse brevity and typos. On Monday, May 20, 2013 at 7:16, geeky2 wrote: hello, we are tracking down some performance issues with our DIH process. not sure if this is related - but i am seeing tons of the messages below in the logs during re-indexing of the core. what do these messages mean? 2013-05-18 19:37:30,623 INFO [org.apache.solr.update.UpdateHandler] (pool-11-thread-1) end_commit_flush 2013-05-18 19:37:30,623 INFO [org.apache.solr.search.SolrIndexSearcher] (pool-10-thread-1) autowarming Searcher@5b8d745 main from Searcher@1fb355af main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 2013-05-18 19:37:30,624 INFO [org.apache.solr.search.SolrIndexSearcher] (pool-10-thread-1) autowarming result for Searcher@5b8d745 main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 2013-05-18 19:37:30,624 INFO [org.apache.solr.search.SolrIndexSearcher] (pool-10-thread-1) autowarming Searcher@5b8d745 main from Searcher@1fb355af main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 2013-05-18 19:37:30,625 INFO [org.apache.solr.search.SolrIndexSearcher] (pool-10-thread-1) autowarming result for Searcher@5b8d745 main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 2013-05-18 19:37:30,625 INFO [org.apache.solr.search.SolrIndexSearcher] (pool-10-thread-1) autowarming Searcher@5b8d745 main from Searcher@1fb355af main queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=3,evictions=0,size=3,warmupTime=1,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 2013-05-18 19:37:30,628 INFO [org.apache.solr.search.SolrIndexSearcher] (pool-10-thread-1) autowarming result for Searcher@5b8d745 main queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=3,evictions=0,size=3,warmupTime=3,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 2013-05-18 19:37:30,628 INFO [org.apache.solr.search.SolrIndexSearcher] (pool-10-thread-1) autowarming Searcher@5b8d745 main from Searcher@1fb355af main documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 2013-05-18 19:37:30,628 INFO [org.apache.solr.search.SolrIndexSearcher] (pool-10-thread-1) autowarming result for Searcher@5b8d745 main documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/seeing-lots-of-autowarming-messages-in-log-during-DIH-indexing-tp4064649.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: seeing lots of autowarming messages in log during DIH indexing
geeky2 wrote you mean i would add this switch to my script that kicks of the dataimport? exmaple: OUTPUT=$(curl -v http://${SERVER}.intra.searshc.com:${PORT}/solrpartscat/${CORE}/dataimport -F command=full-import -F clean=${CLEAN} -F commit=${COMMIT} -F optimize=${OPTIMIZE} -F openSearcher=false) Yes. Thats correct geeky2 wrote what needs to be done _AFTER_ the DIH finishes (if anything)? eg, does this need to be turned back on after the DIH has finished? Yes. You need to open the searcher to be able to search. Just run another commit with openSearcher = true , once your indexing process finishes. -- View this message in context: http://lucene.472066.n3.nabble.com/seeing-lots-of-autowarming-messages-in-log-during-DIH-indexing-tp4064649p4064768.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: replication without automated polling, just manual trigger?
You can disable polling so that the slave never polls the Master(In Solr 4.3 you can disable it from the Admin interface). . And you can trigger a replication using the HTTP API http://wiki.apache.org/solr/SolrReplication#HTTP_API or again, use the Admin interface to trigger a manual replication. On Wed, May 15, 2013 at 12:47 PM, Jonathan Rochkind rochk...@jhu.eduwrote: I want to set up Solr replication between a master and slave, where no automatic polling every X minutes happens, instead the slave only replicates on command. [1] So the basic question is: What's the best way to do that? But I'll provide what I've been doing etc., for anyone interested. Until recently, my appliation was running on Solr 1.4. I had a setup that was working to accomplish this in Solr 1.4, but as I work on moving it to Solr 4.3, it's unclear to me if it can/will work the same way. In Solr 1.4, on slave, I supplied a masterUrl, but did NOT supply any pollInterval at all on slave. I did NOT supply an enable false in slave, because I think that would have prevented even manual replication. This seemed to result in the slave never polling, although I'm not sure if that was just an accident of Solr implementation or not. Can anyone say if the same thing would happen in Solr 4.3? If I look at the admin screen for my slave set up this way in Solr 4.3, it does say polling enabled, but I realize that doesn't neccesarily mean any polling will take place, since I've set no pollInterval. In Solr 1.4 under this setup, I could go to the slave's admin/replication, and there was a replicate now button that I could use for manually triggered replication. This button seems to no longer be there in 4.3 replication admin screen, although I suppose I could still, somewhat less conveniently, issue a `replication?command=**fetchindex` to the slave, to manually trigger a replication? Thanks for any advice or ideas. [1]: Why, you ask? The master is actually my 'indexing' server. Due to business needs, indexing only happens in bulk/mass indexing, and only happens periodically -- sometimes nightly, sometimes less. So I index on master, at a periodic schedule, and then when indexing is complete and verified, tell slave to replicate. I don't want slave accidentally replicating in the middle of the bulk indexing process either, when the index might be in an unfinished state.
Request to be added to ContributorsGroup
Hello Wiki Admins, Request you to please add me to the ContributorsGroup. I have been using Solr for a few years now and I would like to contribute back by adding more information to the wiki Pages. Wiki User Name : Shreejay --Shreejay
Re: Frequent OOM - (Unknown source in logs).
We ended up using a Solr 4.0 (now 4.2) without the cloud option. And it seems to be holding good. -- View this message in context: http://lucene.472066.n3.nabble.com/Frequent-OOM-Unknown-source-in-logs-tp4029361p4061945.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Frequent OOM - (Unknown source in logs).
Hi Otis, Following is the setup: 6 Solr individual servers (VMs) running on Jetty. 3 Shards. Each shard with a leader and replica. *Solr Version *: /Solr 4.0 (with a patch from Solr-2592)./ *OS*: /CentOS release 5.8 (Final)/ *Java*: /java version 1.6.0_32 Java(TM) SE Runtime Environment (build 1.6.0_32-b05) Java HotSpot(TM) 64-Bit Server VM (build 20.7-b02, mixed mode) / *Memory*: /4 servers have 32 GB, 2 have 30 GB. / *Disk space*: /500 GB on each server. / *Queries*: Usual select queries with upto 6 filters. facets on around 8 fields. (returning only top 20) . *Java options while starting the server:* /JAVA_OPTIONS=-Xms15360m -Xmx15360m -DSTOP.PORT=1234 -DSTOP.KEY= -XX:NewRatio=1 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCompressedOops -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/ABC/LOGFOLDER -XX:-TraceClassUnloading -Dbootstrap_confdir=./solr/collection123/conf -Dcollection.configName=123conf -DzkHost=ZooKeeper001:,ZooKeeper002:,SGAZZooKeeper003: -DnumShards=3 -jar start.jar LOG_FILE=/ABC/LOGFOLDER/solrlogfile.log / I run a *commit* using a curl command every 30 mins using a cron job. /curl --silent http://11.111.111.111:1234/solr/collection123/update/?commit=trueopenSearcher=false/ In my SolrConfig file I have these *Commit settings*: /updateHandler class=solr.DirectUpdateHandler2 autoCommit maxDocs0/maxDocs maxTime0/maxTime /autoCommit autoSoftCommit maxTime0/maxTime /autoSoftCommit openSearcherfalse/openSearcher waitSearcherfalse/waitSearcher updateLog str name=dir${solr.data.dir:}/str /updateLog /updateHandler / Please let me know if you would like more information. I am Not indexing any documents right now and I again got a OOM around an hour back one one of the nodes. Lets call it Node1. The node is in recovery right now. and keeps erroring with this message: /SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: Server at http://NODE2:8983/solr/collection1 returned non ok status:500, message:Server Error / Although its still showing as recovering it is serving queries according to the log file. The other instance in this shard became the leader and is up and running properly (serving queries). -- View this message in context: http://lucene.472066.n3.nabble.com/Frequent-OOM-Unknown-source-in-logs-tp4029361p4029459.html Sent from the Solr - User mailing list archive at Nabble.com.
Frequent OOM - (Unknown source in logs).
Hello, I am seeing frequent OOMs for the past 2 days on a SolrCloud Cluster (Solr4.0 with a patch from Solr-2592) setup (3 shards, each shard with 2 instances. Each instance is running CentOS with 30GB memory, 500GB disk space), with a separate Zoo Keeper ensemble of 3. Here is the stacktrace: http://pastebin.com/cV5DxD4N I also saw there is a Jira issue which looks similar, the difference being, in the stacktrace I get, I can Not see which process is trying to do a expandCapacity. /java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)/ Where as the stacktrace mentioned in this issue (https://issues.apache.org/jira/browse/SOLR-3881) is /at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)/ Has anyone seen this issue before? Any fixes for this? -- View this message in context: http://lucene.472066.n3.nabble.com/Frequent-OOM-Unknown-source-in-logs-tp4029361.html Sent from the Solr - User mailing list archive at Nabble.com.
Commit and OpenSearcher not working as expected.
Hello. I am running a commit on a solrCloud collection using a cron job. The command is as follows: aa.aa.aa.aa:8983/solr/ABCCollection/update?commit=trueopensearcher=false But when i see the logs I see the the commit has been called with openSearcher=true. The directupdatehandler2 in my solrconfig file looks like this: /updateHandler class=solr.DirectUpdateHandler2 autoCommit maxDocs0/maxDocs maxTime0/maxTime /autoCommit autoSoftCommit maxTime0/maxTime /autoSoftCommit openSearcherfalse/openSearcher waitSearcherfalse/waitSearcher updateLog str name=dir${solr.data.dir:}/str /updateLog /updateHandler/ And these are the logs : http://pastebin.com/bGh2GRvx I am not sure why openSearcher is being called. I am indexing a ton of documents right now, and am not using search at all. Also read in the Wiki, that keeping openSearcher=false is recommended for solrcloud. http://wiki.apache.org/solr/SolrConfigXml#Update_Handler_Section Is there some place else where openSearcher has to be set while calling a commit? --Shreejay -- View this message in context: http://lucene.472066.n3.nabble.com/Commit-and-OpenSearcher-not-working-as-expected-tp4027419.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Commit and OpenSearcher not working as expected.
Hi Mark, That was a typo in my post. I am using openSearcher only. But still see the same log files. /update/?commit=trueopenSearcher=false -- View this message in context: http://lucene.472066.n3.nabble.com/Commit-and-OpenSearcher-not-working-as-expected-tp4027419p4027451.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Commit and OpenSearcher not working as expected.
Ok this is very surprising. I just ran the curl command curl --silent http://xx.xx.xx.xx:8985/solr/collectionABC/update/?commit=trueopenSearcher=false And on the solr log file I can see these messages: /Dec 16, 2012 10:44:14 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false} Dec 16, 2012 10:44:25 PM org.apache.solr.core.SolrDeletionPolicy onCommit INFO: SolrDeletionPolicy.onCommit: commits:num=2 commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/solrfolder/example/solr/collection1/data/index.2012121519458 lockFactor/ *After the searcher is opened I see these: * /Dec 16, 2012 10:44:28 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush Dec 16, 2012 10:44:28 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [resumesnew] webapp=/solr path=/update params={waitSearcher=truecommit=truecommit_end_point=trueexpungeDeletes=falsewt=javabinsoftCommit=falseversion=2} {commit=} 0 14155/ I can see that openSearcher is still being called, because a new searcher is created in the log files. I am not sure why it is being called? Does solrCloud ignore openSearcher=false? --Shreejay shreejay wrote Hi Mark, That was a typo in my post. I am using openSearcher only. But still see the same log files. /update/?commit=trueopenSearcher=false -- View this message in context: http://lucene.472066.n3.nabble.com/Commit-and-OpenSearcher-not-working-as-expected-tp4027419p4027462.html Sent from the Solr - User mailing list archive at Nabble.com.
A SPI class of type org.apache.lucene.codecs.Codec with name 'Lucene41' does not exist.
I am getting the following error : Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1326) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1438) at org.apache.solr.core.SolrCore.init(SolrCore.java:700) ... 45 more Caused by: java.lang.IllegalArgumentException: A SPI class of type org.apache.lucene.codecs.Codec with name 'Lucene41' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath.The current classpath supports the following names: [Lucene40, Lucene3x] at org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:104) at org.apache.lucene.codecs.Codec.forName(Codec.java:95) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:299) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:87) at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34) at org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:119) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:130 I was using a patched version of branch_4x for indexing. But due to some issues I am now reverting back to 4.0 (with one of the patches for Solr 2592 ) . I am using the data directories from my previous instance. Would just adding the codes folder lucene41 under the folder lucene\core\src\java\org\apache\lucene\codecs and compiling it suffice ? -- Shreejay -- View this message in context: http://lucene.472066.n3.nabble.com/A-SPI-class-of-type-org-apache-lucene-codecs-Codec-with-name-Lucene41-does-not-exist-tp4026118.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud OOM heap space
Hi All, I am getting constant OOM errors on a SolrCloud instance. (3 shards, 2 solr instance in each shard, each server with 22gb Of Memory, Xmx = 12GB for java ) . Here is a error log: http://pastie.org/private/dcga3kfatvvamslmtvrp0g As of now Iam not indexing any more documents. The total size of index on each server is around 36-40 GB. -Xmx12288m -DSTOP.PORT=8079 -DSTOP.KEY=ABC -XX:NewRatio=1 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log -Djetty.port=8983 -DzkHost=ZooKeeperServer001:2181 -jar start.jar If anyone has faced similar issues please let me know. --Shreejay -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-heap-space-tp4025821.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SolrCloud OOM heap space
Thanks Markus. Is this issue only on 4.x and 5.x branches? I am currently running a v recent build of 4.x branch with an applied patch. I just want to make sure that this is not an issue with 4.0. In which case I can think of applying my patch to 4.0 instead of 4x or 5x. --Shreejay -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-heap-space-tp4025821p4025839.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SolrCloud OOM heap space
Thanks Marcus. I will apply the patch to the 4x branch I have, and report back. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-heap-space-tp4025821p4025858.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Nightly build server down ?
Hi All, Is the server hosting nightly builds of Solr down? https://builds.apache.org/job/Solr-Artifacts-4.x/lastSuccessfulBuild/artifact/solr/package/ If anyone knows an alternate link to download the nightly build please let me know. --Shreejay -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Nightly-build-server-down-tp4024493.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 4.0 ngroups issue workaround
Hi All, I have a Solrcloud instance with 6 million documents. We are using the ngroups feature in a few places and I am aware that this is still a JIRA issue with work in progress (and some patches). Apart from using the patch here https://issues.apache.org/jira/browse/SOLR-2592 , and re-indexing data so that all documents with same group field are on the same server, has anyone else tried or used any alternate methods? I wanted to see if there would be any other options before re-indexing. Thanks. --Shreejay -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-ngroups-issue-workaround-tp4024513.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr4.0 / SolrCloud queries
Hi all , I have managed to successfully index around 6 million documents, but while indexing (and even now after the indexing has stopped), I am running into a bunch of errors. The most common error I see is / null:org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://ABC:8983/solr/xyzabc/ I have made sure that the servers are able to communicate with each other using the same names. Another error I keep getting is that the leader stops recovering and goes red / recovery failed. /Error while trying to recover. core=ABC123:org.apache.solr.common.SolrException: We are not the leader/ The servers intermittently go offline taking down one of the shards and in turn stopping all search queries. The configuration I have Shard1: Server1 - Memory - 22GB , JVM - 8gb Server2 - Memory - 22GB , JVM - 10gb (This one is on recovery failed status, but still acting as a leader). Shard2: Server1 - Memory - 22GB , JVM - 8 GB (This one is on recovery failed status, but still acting as a leader). Server2 - Memory - 22 GB, JVM - 8 GB Shard3 Server1 - Memory - 22 GB, JVM - 10 GB Server2 - Memory - 22 GB, JVM - 8 GB While typing his post I did a Reload from the Core Admin page, and both servers (Shard1-Server2 and Shard2-Server1)came back up again. Has anyone else encountered these issues? Any steps to prevent these? Thanks. --Shreejay -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4-0-SolrCloud-queries-tp4016825p4021154.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr4.0 / SolrCloud queries
Thanks Mark. I meant ConcurrentMergeScheduler and ramBufferSizeMB (not maxBuffer). These are my settings for Merge. / ramBufferSizeMB960/ramBufferSizeMB mergeFactor40/mergeFactor mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler/ / --Shreejay Mark Miller-3 wrote On Nov 9, 2012, at 1:20 PM, shreejay lt; shreejayn@ gt; wrote: Instead of doing an optimize, I have now changed the Merge settings by keeping a maxBuffer = 960, a merge Factor = 40 and ConcurrentMergePolicy. Don't you mean ConcurrentMergeScheduler? Keep in mind that if you use the default TieredMergePolicy, mergeFactor will have no affect. You need to use maxMergeAtOnce and segmentsPerTier as sub args to the merge policy config (see the commented out example in solrconfig.xml). Also, it's probably best to avoid using maxBufferedDocs at all. - Mark Mark Miller-3 wrote On Nov 9, 2012, at 1:20 PM, shreejay lt; shreejayn@ gt; wrote: Instead of doing an optimize, I have now changed the Merge settings by keeping a maxBuffer = 960, a merge Factor = 40 and ConcurrentMergePolicy. Don't you mean ConcurrentMergeScheduler? Keep in mind that if you use the default TieredMergePolicy, mergeFactor will have no affect. You need to use maxMergeAtOnce and segmentsPerTier as sub args to the merge policy config (see the commented out example in solrconfig.xml). Also, it's probably best to avoid using maxBufferedDocs at all. - Mark -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4-0-SolrCloud-queries-tp4016825p4020200.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr4.0 / SolrCloud queries
Thanks Erick. I will try optimizing after indexing everything. I was doing it after every batch since it was taking way too long to Optimize (which was expected), but it was not finishing merging it into lesser number of segments (1 segment). Instead of doing an optimize, I have now changed the Merge settings by keeping a maxBuffer = 960, a merge Factor = 40 and ConcurrentMergePolicy. I am also going to check the infoStream option so I can see how the indexing is going on. Thanks for your inputs. --Shreejay -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4-0-SolrCloud-queries-tp4016825p4019373.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr4.0 / SolrCloud queries
Thanks Everyone. As Shawn mentioned, it was a memory issue. I reduced the amount allocated to Java to 6 GB. And its been working pretty good. I am re-indexing one of the SolrCloud. I was having trouble with optimizing the data when I indexed last time I am hoping optimizing will not be an issue this time due to the memory changes. I will post more info once I am done. Thanks once again. --Shreejay -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4-0-SolrCloud-queries-tp4016825p4018176.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr4.0 / SolrCloud queries
Hi All, I am trying to run two SolrCloud with 3 and 2 shards respectively (lets say Cloud3shards and Clouds2Shards). All servers are identical with 18GB Ram (16GB assigned for Java). I am facing a few issues on both clouds and would be grateful if any one else has seen / solved these. 1) Every now and then, Solr would take off one of the servers (It either shows as recovering (orange) or its taken offline completely). The Logging tab on Admin page shows these errors for Cloud3shards /Error while trying to recover:org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://xxx:8983/solr/xxx / /Error while trying to recover. core=xxx:org.apache.solr.common.SolrException: I was asked to wait on state recovering for xxx:8983_solr but I still do not see the request state. I see state: recovering live:false/ On the Cloud2shards also I see similar messages I have noticed it does happen more while indexing documents, but I have also seen this happening while only querying Solr. Both SolrClouds are managed by the same Zookeeper ensemble (set of 3 ZK servers). 2) I am able to Commit but Optimize never seems to work. Right now I have an average of 30 segments on every Solr Server. Has any one else faced this issue? I have tried Optimize from the admin page and as a Http post request. Both of them fail. Its not because of the hard disk space since my index size is less than 50Gb and I have 500GB space on each server. 3) If I try to query solr with rows = 5000 or more (for Cloud2) . for cloud1 its around 20,000 documents. org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:[http://ABC1:8983/solr/aaa, http://ABC2:8983/solr/aaa]. 4) I have also noticed that ZK would switch leaders every now and then. I am attributing it to point 1 above, where as soon as the leader is down, another server takes its place. My concern is the frequency with which this switch happens. I guess this is completely related to point 1 , and if that is solved, I will not be having this issue either. --Shreejay -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4-0-SolrCloud-queries-tp4016825.html Sent from the Solr - User mailing list archive at Nabble.com.
NewSearcher old cache
Hello Everyone, I was configuring a Solr installation and had a few queries about NewSearcher. As I understand a NewSearcher event will be triggered if there is an already existing registered searcher. Q1) As soon as a new searcher is opened, the caches begin populating from the older caches. What happens if the NewSearcher event has queries defined in them? does these queries ignore the old cache altogether and load only results of the queries defined in the listener event? Or do these get added after the new caches have been warmed by old caches? Q2) I am running edismax queries on the Solr Server. Can I specify these queries in NewSearcher and FirstSearcher also? Or are the queries supposed to be simple queries? Thanks. --Shreejay -- View this message in context: http://lucene.472066.n3.nabble.com/NewSearcher-old-cache-tp4013225.html Sent from the Solr - User mailing list archive at Nabble.com.