solrinitialisationerrors: Error during shutdown of writer.
I dont why all of a sudden I started getting this errror : this is the sreenshot: http://lucene.472066.n3.nabble.com/file/n4104874/Untitled.png I thought there might be some problem with tomcat,so I uninstalled it ,but i still get the same error. I have no idea why is this happening,initially it worked really well. In tomcat java-options home var is : *-Dsolr.solr.home=C:\solr* I am using the initial solr.xml only,I have created two cores n folder structure is as desired. My folder structure is: 1)C:\solr\contract\conf 2)C:\solr\document\conf 3)C:\solr\lib These are my config files: *solr.xml* ?xml version=1.0 encoding=UTF-8 ? solr persistent=true sharedLib=lib cores adminPath=/admin/cores zkClientTimeout=${zkClientTimeout:15000} hostPort=8080 hostContext=solr core loadOnStartup=true instanceDir=document\ transient=false name=document/ core loadOnStartup=true instanceDir=contract\ transient=false name=contract/ /cores /solr This i got after i re-installed tomcat: INFO: closing IndexWriter with IndexWriterCloser Dec 04, 2013 2:09:30 PM org.apache.solr.update.DefaultSolrCoreState closeIndexWriter *SEVERE: Error during shutdown of writer.* java.lang.NoClassDefFoundError: org/apache/solr/request/LocalSolrQueryRequest at org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:682) at org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:69) at org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:278) at org.apache.solr.update.SolrCoreState.decrefSolrCoreState(SolrCoreState.java:73) at org.apache.solr.core.SolrCore.close(SolrCore.java:972) at org.apache.solr.core.CoreContainer.shutdown(CoreContainer.java:771) at org.apache.solr.servlet.SolrDispatchFilter.destroy(SolrDispatchFilter.java:134) at org.apache.catalina.core.ApplicationFilterConfig.release(ApplicationFilterConfig.java:311) at org.apache.catalina.core.StandardContext.filterStop(StandardContext.java:4660) at org.apache.catalina.core.StandardContext.stopInternal(StandardContext.java:5442) at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:232) at org.apache.catalina.core.ContainerBase.removeChild(ContainerBase.java:1001) at org.apache.catalina.startup.HostConfig.checkResources(HostConfig.java:1272) at org.apache.catalina.startup.HostConfig.check(HostConfig.java:1450) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:295) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119) at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90) at org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1338) at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1496) at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1506) at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1485) at java.lang.Thread.run(Unknown Source) Please help me, after implementing so much this error has screwed me up. *Thanks in advance.* -- View this message in context: http://lucene.472066.n3.nabble.com/solrinitialisationerrors-Error-during-shutdown-of-writer-tp4104874.html Sent from the Solr - User mailing list archive at Nabble.com.
Faceting Query in Solr
Hi, I indexed data into solr by using 5 categories. Each category is differentiated by categoryId. Now i have a situation that i need to show the results based on facets. Ex: []-category1 []-category2 []-category3 []-category4 []-category5 If the user checks the category1 it has to show the results based on categoryId-1 If the user checks 2 categories it has to show the results from two categories which the user checked If the user checks 3 categories it has to show the results from three categories and son on.like how many categories user checked i have to show results from checked categories My Schema is in the following way.. field name=id type=string indexed=true stored=true required=true multiValued=false / field name=categoryId type=int indexed=true stored=false required=true / field name=url type=string indexed=true stored=true required=true / field name=content type=string indexed=false stored=true multiValued=true required=true / Anyone help me how can i achieve this. Regards, Kumar -- View this message in context: http://lucene.472066.n3.nabble.com/Faceting-Query-in-Solr-tp4104881.html Sent from the Solr - User mailing list archive at Nabble.com.
how to increase each index file size
Hi I'm using the SolrCloud integreted with HDFS,I found there are lots of small size files. So,I'd like to increase the index file size while doing DIH full-import. Any suggestion to achieve this goal. Regards.
Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent
I'm using the following query to do a fuzzy search on Solr 4.5.1 and am getting empty result. qt=standardq=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2) +(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO 2013-12-04T00:23:00Z] -endDate:[* TO 2013-12-04T00:23:00Z])start=0rows=10fl=id If I change it to a not fuzzy query by simply dropping tildes from the terms (see below) then it returns the expected result! Is this a bug? Shouldn't fuzzy version of a query always return a super set of its not-fuzzy equivalent? qt=standardq=+(field1|en_CA|:Swimming field1|en|:Swimming) +(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO 2013-12-04T00:23:00Z] -endDate:[* TO 2013-12-04T00:23:00Z])start=0rows=10fl=id
Solr Suggester ranked by boost
I want to implement a Solr Suggester (http://wiki.apache.org/solr/Suggester) that ranks suggestions by document boost factor. As I understand the documentation, the following config should work: Solrconfig.xml: ... requestHandler name=/suggest class=solr.SearchHandler lst name=defaults str name=spellchecktrue/str str name=spellcheck.count7/str str name=spellcheck.onlyMorePopulartrue/str /lst arr name=last-components strsuggest/str /arr /requestHandler searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namedefault/str str name=fieldsuggesttext/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactory/str str name=buildOnCommittrue/str /lst /searchComponent ... Schema.xml: ... field name=suggesttext type=text indexed=true stored=true multiValued=true/ ... fieldType name=text class=solr.TextField omitNorms=false/ ... I added three documents with a document boost: { add: { commitWithin: 5000, overwrite: true, boost: 3.0, doc: { id: 1, suggesttext: text bb } }, add: { commitWithin: 5000, overwrite: true, boost: 2.0, doc: { id: 2, suggesttext: text cc } }, add: { commitWithin: 5000, overwrite: true, boost: 1.0, doc: { id: 3, suggesttext: text aa } } } A query the suggest handler (with spellcheck.q=te) gives the following response: { responseHeader:{ status:0, QTime:6}, command:build, response:{numFound:3,start:0,docs:[ { id:1, suggesttext:[text bb]}, { id:2, suggesttext:[text cc]}, { id:3, suggesttext:[text aa]}] }, spellcheck:{ suggestions:[ te,{ numFound:3, startOffset:0, endOffset:2, suggestion:[text aa, text bb, text cc]}]}} The search results are ranked by boost as expected. However, the suggestions are not ranked by boost (but alphabetically instead). I also tried the TSTLookup and FSTLookup lookup implementations with the same result. Any ideas what I'm missing? Thanks, Mirko
Re: Automatically build spellcheck dictionary on replicas
Ok, thanks for pointing that out! 2013/12/3 Kydryavtsev Andrey werde...@yandex.ru Yep, sorry, it doesn't work for file-based dictionaries: In particular, you still need to index the dictionary file once by issuing a search with spellcheck.build=true on the end of the URL; if you system doesn't update that dictionary file, then this only needs to be done once. This manual step may be required even if your configuration sets build=true and reload=true. http://wiki.apache.org/solr/FileBasedSpellChecker 03.12.2013, 21:27, Mirko idonthaveenoughinformat...@googlemail.com: Yes, I have that, but it doesn't help. It seems Solr still needs the query with the spellcheck.build parameter to build the spellchecker index. 2013/12/3 Kydryavtsev Andrey werde...@yandex.ru Did you try to add str name=buildOnCommittrue/str parameter to your slave's spellcheck configuration? 03.12.2013, 12:04, Mirko idonthaveenoughinformat...@googlemail.com : Hi all, We use a Solr SpellcheckComponent with a file-based dictionary. We run a master and some replica slave servers. To update the dictionary, we copy the dictionary txt file to the master, from where it is automatically replicated to all slaves. However, it seems we need to run the spellcheck.build query on all servers individually. Is there a way to automatically build the spellcheck dictionary on all servers without calling spellcheck.build on all slaves individually? We use Solr 4.0.0 Thanks, Mirko
RE: SolrCloud FunctionQuery inconsistency
Hi Raju, Collection is a concept in solrcloud, and core is in standalone mode. So you can create multiple cores in solr standalone mode, not collections. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346p4104888.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: post filtering for boolean filter queries
Thanks Yonik. For our use case, we would like to skip caching only one particular filter cache, yet apply a high cost for it to make sure it executes last of all filter queries. So this means, the rest of the fqs will execute and cache as usual. On Tue, Dec 3, 2013 at 9:58 PM, Yonik Seeley yo...@heliosearch.com wrote: On Tue, Dec 3, 2013 at 4:45 AM, Dmitry Kan solrexp...@gmail.com wrote: ok, we were able to confirm the behavior regarding not caching the filter query. It works as expected. It does not cache with {!cache=false}. We are still looking into clarifying the cost assignment: i.e. whether it works as expected for long boolean filter queries. Yes, filters should be ordered by cost (cheapest first) whenever you use {!cache=false} -Yonik http://heliosearch.com -- making solr shine -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan
Re: Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent
Chances are you're not getting those fuzzy terms analyzed as you'd like. See debug (debug=true) output to be sure. Most likely the fuzzy terms are not being lowercased. See http://wiki.apache.org/solr/MultitermQueryAnalysis for more details (this applies to fuzzy, not just wildcard) terms too. Erik On Dec 4, 2013, at 4:46 AM, Mhd Wrk mhd...@gmail.com wrote: I'm using the following query to do a fuzzy search on Solr 4.5.1 and am getting empty result. qt=standardq=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2) +(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO 2013-12-04T00:23:00Z] -endDate:[* TO 2013-12-04T00:23:00Z])start=0rows=10fl=id If I change it to a not fuzzy query by simply dropping tildes from the terms (see below) then it returns the expected result! Is this a bug? Shouldn't fuzzy version of a query always return a super set of its not-fuzzy equivalent? qt=standardq=+(field1|en_CA|:Swimming field1|en|:Swimming) +(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO 2013-12-04T00:23:00Z] -endDate:[* TO 2013-12-04T00:23:00Z])start=0rows=10fl=id
Re: What type of Solr plugin do I need for filtering results?
Thanks a lot for both of your answers! The QParserPlugin is probably what I meant, but join queries also look interesting and like the could maybe solve my use case, too, without any custom code. However, since this would make it impossible (I think) to have a score for the results but I do want to do fulltext searches on the returned field set (with score) it will probably not be enough. Anyways, I'll look into both of your suggestions. Thanks a lot again! On 2013-12-02 05:39, Ahmet Arslan wrote: It depends on your use case. What is you custom criteria how is stored etc. For example I had two tables, lets say items and permissions tables. Permissions table was holding itemId,userId pairs. Meaning userId can see this itemId. My initial effort was index items and add a multivalued field named WhoCanSeeMe. And fiterQuery on that field using current user. After sometime indexing become troublesome. Indexing was slowing down. I switched to two cores for each table and used query time join. (JoinQParser) as a fq. I didnt have anly plugin for the above. By the way here is an example of post filter Joel advises : http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/ On Monday, December 2, 2013 5:14 AM, Joel Bernstein joels...@gmail.com wrote: What you're looking for is a QParserPlugin. Here is an example: http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_4_6_0/solr/core/src/java/org/apache/solr/search/FunctionRangeQParserPlugin.java?revision=1544545view=markup You're probably want to implement the QParserPlugin as PostFilter. On Sun, Dec 1, 2013 at 3:46 PM, Thomas Seidl re...@gmx.net wrote: Hi, I'm currently looking at writing my first Solr plugin, but I could not really find any overview information about how a Solr request works internally, what the control flow is and what kind of plugins are available to customize this at which point. The Solr wiki page on plugins [1], in my opinion, already assumes too much knowledge and is too terse in its descriptions. [1] http://wiki.apache.org/solr/SolrPlugins If anyone knows of any good ressources to get me started, that would be awesome! However, also pretty helpful would be just to know what kind of plugin I should create for my use case, as I could then at least try to find information specific to that. What I want to do is filter the search results (at the time fq filters are applied, so before sorting, facetting, range selection, etc. takes place) by some custom criterion (passed in the URL). The plan is to add the data needed for that custom filter as a separate set of documents to Solr and look them up from the Solr index when filtering the query. Basically the thing discussed in [2], at 29:07. [2] http://www.youtube.com/watch?v=kJa-3PEc90gfeature=youtu.bet=29m7s So, the question is, what kind of plugin would I use (and how would it have to be configured)? I first thought it'd have to be a SearchComponent, but I think with that I'd only get the results after they are sorted and trimmed to the range, right? Thanks a lot in advance, Thomas Seidl
Solr cuts highlighted sentences
Hi guys, when searching for a phrase I get results and would like to show a highlighting. The highlightings beeing shown begin somewhere in the sentence, beginning with a coma or something else. I'd like to get highlightings beginning with a stence. How to manage this. I've tried so many things found in internet, but nothing helped. Example: query.setHighlight(true).setParam(hl.useFastVectorHighlighte, true); query.setHighlight(true).setParam(hl.fragsize, 500); query.setHighlight(true).setParam(hl.fragmenter, regex); query.setHighlight(true).setParam(hl.regex.slop, 0.8); query.setHighlight(true).setParam(hl.regex.pattern, [\\w][^.!?]{400,600}[.!?]); //\w[^\.!\?]{400,600}[\.!\?] query.setHighlight(true).setParam(hl.bs.type, SENTENCE); etc etc.. Whats wrong about this? Thx -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-cuts-highlighted-sentences-tp4104894.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using the flexible query parser in Solr instead of classic
Hi Jack Kurpansky, Hi folks We could recreate the edismax QueryParser from classic to flexible. But is this a need for someone else? In long text: ExtendedDismaxQParser uses ExtendedSolrQueryParser. ExtendedSolrQueryParser is derived from SolrQueryParser. So it is based on org.apache.solr.parser.QueryParser.jj which is a slightly change of org.apache.lucene.queryparser.classic.QueryParser.jj If SolrQueryParser switches to lucene flexible QueryParser the ExtendedSolrQueryParser will be a good example how to generate Subclasses without the classic logic of overwriting the methodes getFuzzyQuery, getPrefixQuery, getWildcardQuery ... (and using instead subclasses of FuzzyQueryNodeProcessor, WildcardQueryNodeProcessor ..). So again: Is this a need for someone else? Best regards Karsten -- View this message in context: http://lucene.472066.n3.nabble.com/Using-the-flexible-query-parser-in-Solr-instead-of-classic-tp4104584p4104895.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: json update moves doc to end
Hi Erick Here are the last 2 results from a search and i am not understanding why the last one with the boost editorschoice^200 isn't at the top. By the way can i also give a substantial boost to results that contain the hole search-request and not just 3 or 4 letters (tokens)? str name=dms:1003 -Infinity = (MATCH) sum of: 0.013719446 = (MATCH) max of: 0.013719446 = (MATCH) sum of: 2.090396E-4 = (MATCH) weight(plain_text:ber in 841) [DefaultSimilarity], result of: 2.090396E-4 = score(doc=841,freq=8.0 = termFreq=8.0 ), product of: 0.009452709 = queryWeight, product of: 1.3343692 = idf(docFreq=611, maxDocs=855) 0.0070840283 = queryNorm 0.022114253 = fieldWeight in 841, product of: 2.828427 = tf(freq=8.0), with freq of: 8.0 = termFreq=8.0 1.3343692 = idf(docFreq=611, maxDocs=855) 0.005859375 = fieldNorm(doc=841) 0.0012402858 = (MATCH) weight(plain_text:eri in 841) [DefaultSimilarity], result of: 0.0012402858 = score(doc=841,freq=9.0 = termFreq=9.0 ), product of: 0.022357063 = queryWeight, product of: 3.1559815 = idf(docFreq=98, maxDocs=855) 0.0070840283 = queryNorm 0.05547624 = fieldWeight in 841, product of: 3.0 = tf(freq=9.0), with freq of: 9.0 = termFreq=9.0 3.1559815 = idf(docFreq=98, maxDocs=855) 0.005859375 = fieldNorm(doc=841) 5.0511415E-4 = (MATCH) weight(plain_text:ric in 841) [DefaultSimilarity], result of: 5.0511415E-4 = score(doc=841,freq=1.0 = termFreq=1.0 ), product of: 0.024712078 = queryWeight, product of: 3.4884217 = idf(docFreq=70, maxDocs=855) 0.0070840283 = queryNorm 0.020439971 = fieldWeight in 841, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 3.4884217 = idf(docFreq=70, maxDocs=855) 0.005859375 = fieldNorm(doc=841) 8.721528E-4 = (MATCH) weight(plain_text:ich in 841) [DefaultSimilarity], result of: 8.721528E-4 = score(doc=841,freq=12.0 = termFreq=12.0 ), product of: 0.017446788 = queryWeight, product of: 2.4628344 = idf(docFreq=197, maxDocs=855) 0.0070840283 = queryNorm 0.049989305 = fieldWeight in 841, product of: 3.4641016 = tf(freq=12.0), with freq of: 12.0 = termFreq=12.0 2.4628344 = idf(docFreq=197, maxDocs=855) 0.005859375 = fieldNorm(doc=841) 7.725705E-4 = (MATCH) weight(plain_text:cht in 841) [DefaultSimilarity], result of: 7.725705E-4 = score(doc=841,freq=4.0 = termFreq=4.0 ), product of: 0.021610687 = queryWeight, product of: 3.050621 = idf(docFreq=109, maxDocs=855) 0.0070840283 = queryNorm 0.035749465 = fieldWeight in 841, product of: 2.0 = tf(freq=4.0), with freq of: 4.0 = termFreq=4.0 3.050621 = idf(docFreq=109, maxDocs=855) 0.005859375 = fieldNorm(doc=841) 0.0010287998 = (MATCH) weight(plain_text:beri in 841) [DefaultSimilarity], result of: 0.0010287998 = score(doc=841,freq=1.0 = termFreq=1.0 ), product of: 0.035267927 = queryWeight, product of: 4.978513 = idf(docFreq=15, maxDocs=855) 0.0070840283 = queryNorm 0.029170973 = fieldWeight in 841, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 4.978513 = idf(docFreq=15, maxDocs=855) 0.005859375 = fieldNorm(doc=841) 0.0010556461 = (MATCH) weight(plain_text:eric in 841) [DefaultSimilarity], result of: 0.0010556461 = score(doc=841,freq=1.0 = termFreq=1.0 ), product of: 0.035725117 = queryWeight, product of: 5.0430512 = idf(docFreq=14, maxDocs=855) 0.0070840283 = queryNorm 0.02954913 = fieldWeight in 841, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 5.0430512 = idf(docFreq=14, maxDocs=855) 0.005859375 = fieldNorm(doc=841) 5.653785E-4 = (MATCH) weight(plain_text:rich in 841) [DefaultSimilarity], result of: 5.653785E-4 = score(doc=841,freq=1.0 = termFreq=1.0 ), product of: 0.02614473 = queryWeight, product of: 3.6906586 = idf(docFreq=57, maxDocs=855) 0.0070840283 = queryNorm 0.021624953 = fieldWeight in 841, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 3.6906586 = idf(docFreq=57, maxDocs=855) 0.005859375 = fieldNorm(doc=841) 0.0010596104 = (MATCH) weight(plain_text:icht in 841) [DefaultSimilarity], result of: 0.0010596104 = score(doc=841,freq=3.0 = termFreq=3.0 ), product of: 0.027196141 = queryWeight, product of: 3.8390784 = idf(docFreq=49, maxDocs=855)
Solr Doubts
Hai Team, I am new to Solr. I am trying to index 7GB CSV file. My questions: 1.How to index without using uniquekey ? I tried with uniquekey required = false id/uniquekey I got -- Document is missing mandatory uniqueKey field: id I am using query to update csv : localhost:9050/solr-4.5.1/collection1/update/csv?stream.file=D:\Solr\comma15_Id.csvcommit=trueheader=falsefieldnames=ORD,ORC,SBN,BNA,POB,NUM,DST,STM,DDL,DLO,PTN,PCD,CTA,CTP,CTT 2. how to increase jvm heap space in solr ? since my file is too large i am getting java heap space error I am not interested to split my large file into batches.however i need to complete indexing with 7GB CSV file. please assist me to index my csv file with regards Jiyas Problems are only opportunities with thorns on them.
Fwd: [Solr Wiki] Your wiki account data
Hello, We've recently launched a job search engine using Solr, and would like to add it here: https://wiki.apache.org/solr/PublicServers Would it be possible to allow me be part of the publishing group? Thank you for your help Kind Regards, Mehdi Burgy New Job Search Engine: www.jobreez.com -- Forwarded message -- From: Apache Wiki wikidi...@apache.org Date: 2013/12/4 Subject: [Solr Wiki] Your wiki account data To: Apache Wiki wikidi...@apache.org Somebody has requested to email you a password recovery token. If you lost your password, please go to the password reset URL below or go to the password recovery page again and enter your username and the recovery token. Login Name: madeinch
Re: [Solr Wiki] Your wiki account data
Sure. Unfortunately we had a problem a while ago with spam bots creating pages so had to lock it down. Done, you should be able to edit the Solr Wiki. Erick On Wed, Dec 4, 2013 at 8:06 AM, Mehdi Burgy gla...@gmail.com wrote: Hello, We've recently launched a job search engine using Solr, and would like to add it here: https://wiki.apache.org/solr/PublicServers Would it be possible to allow me be part of the publishing group? Thank you for your help Kind Regards, Mehdi Burgy New Job Search Engine: www.jobreez.com -- Forwarded message -- From: Apache Wiki wikidi...@apache.org Date: 2013/12/4 Subject: [Solr Wiki] Your wiki account data To: Apache Wiki wikidi...@apache.org Somebody has requested to email you a password recovery token. If you lost your password, please go to the password reset URL below or go to the password recovery page again and enter your username and the recovery token. Login Name: madeinch
Re: Deleting and committing inside a SearchComponent
I agree with Upayavira. This seems architecturally questionable. In your example, the crux of the matter is Only differ by one field. Figuring that out is going to be expensive, are you burdening searches with this kind of logic? Why not create a custom update processor that does this and use such a component? Or build it into your updates when you ingest the docs? Or build a signature field and issue a delete by query on that when you update? Best, Erick On Tue, Dec 3, 2013 at 9:48 PM, Peyman Faratin peymanfara...@gmail.comwrote: On Dec 3, 2013, at 8:41 PM, Upayavira u...@odoko.co.uk wrote: On Tue, Dec 3, 2013, at 03:22 PM, Peyman Faratin wrote: Hi Is it possible to delete and commit updates to an index inside a custom SearchComponent? I know I can do it with solrj but due to several business logic requirements I need to build the logic inside the search component. I am using SOLR 4.5.0. That just doesn't make sense. Search components are read only. i can think of many situations that it makes sense. for instance, you search for a document and your index contains many duplicates that only differ by one field, such as the time they were indexed (think news feeds from multiple sources). So after the search we want to delete the duplicate documents that satisfy some policy (here date, but it could be some other policy). What are you trying to do? What stuff do you need to change? Could you do it within an UpdateProcessor? Solution i am working with UpdateRequestProcessorChain processorChain = rb.req.getCore().getUpdateProcessingChain(rb.req.getParams().get(UpdateParams.UPDATE_CHAIN)); UpdateRequestProcessor processor = processorChain.createProcessor(rb.req, rb.rsp); ... docId = f(); ... DeleteUpdateCommand cmd = new DeleteUpdateCommand(req); cmd.setId(docId.toString()); processor.processDelete(cmd); Upayavira
Re: solrinitialisationerrors: Error during shutdown of writer.
The crux is: java.lang.NoClassDefFoundError: Usually this means your classpath is wrong and the JVM can't find the jars. Or you have multiple jars from different versions in your classpath. It's pretty tedious to track down, but that's where I'd start. In your log, you'll see a bunch of lines like this: 2794 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/Users/Erick/apache/4x/solr/contrib/clustering/lib/jackson-mapper-asl-1.7.4.jar' to classloader showing you exactly where Solr is trying to load jars from, that'll help. Best, Erick On Wed, Dec 4, 2013 at 4:08 AM, Nutan nutanshinde1...@gmail.com wrote: I dont why all of a sudden I started getting this errror : this is the sreenshot: http://lucene.472066.n3.nabble.com/file/n4104874/Untitled.png I thought there might be some problem with tomcat,so I uninstalled it ,but i still get the same error. I have no idea why is this happening,initially it worked really well. In tomcat java-options home var is : *-Dsolr.solr.home=C:\solr* I am using the initial solr.xml only,I have created two cores n folder structure is as desired. My folder structure is: 1)C:\solr\contract\conf 2)C:\solr\document\conf 3)C:\solr\lib These are my config files: *solr.xml* ?xml version=1.0 encoding=UTF-8 ? solr persistent=true sharedLib=lib cores adminPath=/admin/cores zkClientTimeout=${zkClientTimeout:15000} hostPort=8080 hostContext=solr core loadOnStartup=true instanceDir=document\ transient=false name=document/ core loadOnStartup=true instanceDir=contract\ transient=false name=contract/ /cores /solr This i got after i re-installed tomcat: INFO: closing IndexWriter with IndexWriterCloser Dec 04, 2013 2:09:30 PM org.apache.solr.update.DefaultSolrCoreState closeIndexWriter *SEVERE: Error during shutdown of writer.* java.lang.NoClassDefFoundError: org/apache/solr/request/LocalSolrQueryRequest at org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:682) at org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:69) at org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:278) at org.apache.solr.update.SolrCoreState.decrefSolrCoreState(SolrCoreState.java:73) at org.apache.solr.core.SolrCore.close(SolrCore.java:972) at org.apache.solr.core.CoreContainer.shutdown(CoreContainer.java:771) at org.apache.solr.servlet.SolrDispatchFilter.destroy(SolrDispatchFilter.java:134) at org.apache.catalina.core.ApplicationFilterConfig.release(ApplicationFilterConfig.java:311) at org.apache.catalina.core.StandardContext.filterStop(StandardContext.java:4660) at org.apache.catalina.core.StandardContext.stopInternal(StandardContext.java:5442) at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:232) at org.apache.catalina.core.ContainerBase.removeChild(ContainerBase.java:1001) at org.apache.catalina.startup.HostConfig.checkResources(HostConfig.java:1272) at org.apache.catalina.startup.HostConfig.check(HostConfig.java:1450) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:295) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119) at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90) at org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1338) at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1496) at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1506) at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1485) at java.lang.Thread.run(Unknown Source) Please help me, after implementing so much this error has screwed me up. *Thanks in advance.* -- View this message in context: http://lucene.472066.n3.nabble.com/solrinitialisationerrors-Error-during-shutdown-of-writer-tp4104874.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Faceting Query in Solr
The standard way of handling this kind of thing is with filter queries. For multi-select, you have to put in some javascript or something to make an OR clause when they check the boxes. So your query looks like fq=categoryID:(1 OR 2 OR 3) rather than fq=categoryID:1fq=categoryID:2fq=categoryID:3 Best, Erick On Wed, Dec 4, 2013 at 4:36 AM, kumar pavan2...@gmail.com wrote: Hi, I indexed data into solr by using 5 categories. Each category is differentiated by categoryId. Now i have a situation that i need to show the results based on facets. Ex: []-category1 []-category2 []-category3 []-category4 []-category5 If the user checks the category1 it has to show the results based on categoryId-1 If the user checks 2 categories it has to show the results from two categories which the user checked If the user checks 3 categories it has to show the results from three categories and son on.like how many categories user checked i have to show results from checked categories My Schema is in the following way.. field name=id type=string indexed=true stored=true required=true multiValued=false / field name=categoryId type=int indexed=true stored=false required=true / field name=url type=string indexed=true stored=true required=true / field name=content type=string indexed=false stored=true multiValued=true required=true / Anyone help me how can i achieve this. Regards, Kumar -- View this message in context: http://lucene.472066.n3.nabble.com/Faceting-Query-in-Solr-tp4104881.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Performance Issue
I am having almost 5 to 6 crores of indexed documents in solr. And when i am going to change anything in the configuration file solr server is going down. As a new user to solr i can't able to find the exact reason for going server down. I am using cache's in the following way : filterCache class=solr.FastLRUCache size=16384 initialSize=4096 autowarmCount=4096/ queryResultCache class=solr.FastLRUCache size=16384 initialSize=4096 autowarmCount=1024/ and i am not using any documentCache, fieldValueCahe's Whether this can lead any performance issue means going server down. And i am seeing logging in the server it is showing exception in the following way Servlet.service() for servlet [default] in context with path [/solr] threw exception [java.lang.IllegalStateException: Cannot call sendError() after the response has been committed] with root cause Can anybody help me how can i solve this problem. Kumar. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Performance-Issue-tp4104907.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: post filtering for boolean filter queries
OK, so cache=false and cost=100 should do it, see: http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/ Best, Erick On Wed, Dec 4, 2013 at 5:56 AM, Dmitry Kan solrexp...@gmail.com wrote: Thanks Yonik. For our use case, we would like to skip caching only one particular filter cache, yet apply a high cost for it to make sure it executes last of all filter queries. So this means, the rest of the fqs will execute and cache as usual. On Tue, Dec 3, 2013 at 9:58 PM, Yonik Seeley yo...@heliosearch.com wrote: On Tue, Dec 3, 2013 at 4:45 AM, Dmitry Kan solrexp...@gmail.com wrote: ok, we were able to confirm the behavior regarding not caching the filter query. It works as expected. It does not cache with {!cache=false}. We are still looking into clarifying the cost assignment: i.e. whether it works as expected for long boolean filter queries. Yes, filters should be ordered by cost (cheapest first) whenever you use {!cache=false} -Yonik http://heliosearch.com -- making solr shine -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan
Re: how to increase each index file size
Why do you want to do this? Are you seeing performance problems? If not, I'd just ignore this problem, premature optimization and all that. If you _really_ want to do this, your segments files are closed every time you to a commit, opensearcher=true|false doesn't matter. BUT, the longer these are the bigger your transaction log will be, which may lead to other issues, particularly on restart. See: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ The key is the section on truncating the tlog. And note the sizes of these segments will change as they're merged anyway. Best, Erick On Wed, Dec 4, 2013 at 4:42 AM, YouPeng Yang yypvsxf19870...@gmail.comwrote: Hi I'm using the SolrCloud integreted with HDFS,I found there are lots of small size files. So,I'd like to increase the index file size while doing DIH full-import. Any suggestion to achieve this goal. Regards.
Re: Solr Performance Issue
You need to give us more of the exception trace, the real cause is often buried down the stack with some text like Caused by... But at a glance your cache sizes and autowarm counts are far higher than they should be. Try reducing particularly the autowarm count down to, say, 16 or so. It's actually rare that you really need very many. I'd actually go back to the defaults to start with to test whether this is the problem. Further, we need to know exactly what you mean by change anything in the configuration file. Change what? Details matter. Of course the last thing you changed before you started seeing this problem is the most likely culprit. Best, Erick On Wed, Dec 4, 2013 at 8:31 AM, kumar pavan2...@gmail.com wrote: I am having almost 5 to 6 crores of indexed documents in solr. And when i am going to change anything in the configuration file solr server is going down. As a new user to solr i can't able to find the exact reason for going server down. I am using cache's in the following way : filterCache class=solr.FastLRUCache size=16384 initialSize=4096 autowarmCount=4096/ queryResultCache class=solr.FastLRUCache size=16384 initialSize=4096 autowarmCount=1024/ and i am not using any documentCache, fieldValueCahe's Whether this can lead any performance issue means going server down. And i am seeing logging in the server it is showing exception in the following way Servlet.service() for servlet [default] in context with path [/solr] threw exception [java.lang.IllegalStateException: Cannot call sendError() after the response has been committed] with root cause Can anybody help me how can i solve this problem. Kumar. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Performance-Issue-tp4104907.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: json update moves doc to end
Well, both have a score of -Infinity. So they're equal and the tiebreaker is the internal Lucene doc ID. Now this is not helpful since the question now is where -Infinity comes from, this looks suspicious: -Infinity = (MATCH) FunctionQuery(log(int(clicks))), product of: -Infinity = log(int(clicks)=0) not much help I know, but Erick On Wed, Dec 4, 2013 at 7:24 AM, Andreas Owen a...@conx.ch wrote: Hi Erick Here are the last 2 results from a search and i am not understanding why the last one with the boost editorschoice^200 isn't at the top. By the way can i also give a substantial boost to results that contain the hole search-request and not just 3 or 4 letters (tokens)? str name=dms:1003 -Infinity = (MATCH) sum of: 0.013719446 = (MATCH) max of: 0.013719446 = (MATCH) sum of: 2.090396E-4 = (MATCH) weight(plain_text:ber in 841) [DefaultSimilarity], result of: 2.090396E-4 = score(doc=841,freq=8.0 = termFreq=8.0 ), product of: 0.009452709 = queryWeight, product of: 1.3343692 = idf(docFreq=611, maxDocs=855) 0.0070840283 = queryNorm 0.022114253 = fieldWeight in 841, product of: 2.828427 = tf(freq=8.0), with freq of: 8.0 = termFreq=8.0 1.3343692 = idf(docFreq=611, maxDocs=855) 0.005859375 = fieldNorm(doc=841) 0.0012402858 = (MATCH) weight(plain_text:eri in 841) [DefaultSimilarity], result of: 0.0012402858 = score(doc=841,freq=9.0 = termFreq=9.0 ), product of: 0.022357063 = queryWeight, product of: 3.1559815 = idf(docFreq=98, maxDocs=855) 0.0070840283 = queryNorm 0.05547624 = fieldWeight in 841, product of: 3.0 = tf(freq=9.0), with freq of: 9.0 = termFreq=9.0 3.1559815 = idf(docFreq=98, maxDocs=855) 0.005859375 = fieldNorm(doc=841) 5.0511415E-4 = (MATCH) weight(plain_text:ric in 841) [DefaultSimilarity], result of: 5.0511415E-4 = score(doc=841,freq=1.0 = termFreq=1.0 ), product of: 0.024712078 = queryWeight, product of: 3.4884217 = idf(docFreq=70, maxDocs=855) 0.0070840283 = queryNorm 0.020439971 = fieldWeight in 841, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 3.4884217 = idf(docFreq=70, maxDocs=855) 0.005859375 = fieldNorm(doc=841) 8.721528E-4 = (MATCH) weight(plain_text:ich in 841) [DefaultSimilarity], result of: 8.721528E-4 = score(doc=841,freq=12.0 = termFreq=12.0 ), product of: 0.017446788 = queryWeight, product of: 2.4628344 = idf(docFreq=197, maxDocs=855) 0.0070840283 = queryNorm 0.049989305 = fieldWeight in 841, product of: 3.4641016 = tf(freq=12.0), with freq of: 12.0 = termFreq=12.0 2.4628344 = idf(docFreq=197, maxDocs=855) 0.005859375 = fieldNorm(doc=841) 7.725705E-4 = (MATCH) weight(plain_text:cht in 841) [DefaultSimilarity], result of: 7.725705E-4 = score(doc=841,freq=4.0 = termFreq=4.0 ), product of: 0.021610687 = queryWeight, product of: 3.050621 = idf(docFreq=109, maxDocs=855) 0.0070840283 = queryNorm 0.035749465 = fieldWeight in 841, product of: 2.0 = tf(freq=4.0), with freq of: 4.0 = termFreq=4.0 3.050621 = idf(docFreq=109, maxDocs=855) 0.005859375 = fieldNorm(doc=841) 0.0010287998 = (MATCH) weight(plain_text:beri in 841) [DefaultSimilarity], result of: 0.0010287998 = score(doc=841,freq=1.0 = termFreq=1.0 ), product of: 0.035267927 = queryWeight, product of: 4.978513 = idf(docFreq=15, maxDocs=855) 0.0070840283 = queryNorm 0.029170973 = fieldWeight in 841, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 4.978513 = idf(docFreq=15, maxDocs=855) 0.005859375 = fieldNorm(doc=841) 0.0010556461 = (MATCH) weight(plain_text:eric in 841) [DefaultSimilarity], result of: 0.0010556461 = score(doc=841,freq=1.0 = termFreq=1.0 ), product of: 0.035725117 = queryWeight, product of: 5.0430512 = idf(docFreq=14, maxDocs=855) 0.0070840283 = queryNorm 0.02954913 = fieldWeight in 841, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 5.0430512 = idf(docFreq=14, maxDocs=855) 0.005859375 = fieldNorm(doc=841) 5.653785E-4 = (MATCH) weight(plain_text:rich in 841) [DefaultSimilarity], result of: 5.653785E-4 = score(doc=841,freq=1.0 = termFreq=1.0 ), product of: 0.02614473 = queryWeight, product of: 3.6906586 = idf(docFreq=57, maxDocs=855) 0.0070840283 = queryNorm
Re: Solr Doubts
bq: uniquekey required = false id/uniquekey This isn't correct, there's no required param for uniqueKey. Just remove the entire uniqueKey node AND make the field definition required=false. I.e. you should have something like: field name=id type=string indexed=true stored=true required=true multiValued=false / set required=false there. To increase memory, you just specify -Xmx when you start, something like: java -Xmx2G -Xms2G -jar start.jar But interested or not in splitting the csv file, working with 7G input files is going to be painful no matter what. You may find yourself having to split it up for expediency's sake. Best, Erick On Wed, Dec 4, 2013 at 7:46 AM, Jiyas Basha H jiyasbas...@mobiusservices.in wrote: Hai Team, I am new to Solr. I am trying to index 7GB CSV file. My questions: 1.How to index without using uniquekey ? I tried with uniquekey required = false id/uniquekey I got -- Document is missing mandatory uniqueKey field: id I am using query to update csv : localhost:9050/solr-4.5.1/collection1/update/csv?stream.file=D:\Solr\comma15_Id.csvcommit=trueheader=falsefieldnames=ORD,ORC,SBN,BNA,POB,NUM,DST,STM,DDL,DLO,PTN,PCD,CTA,CTP,CTT 2. how to increase jvm heap space in solr ? since my file is too large i am getting java heap space error I am not interested to split my large file into batches.however i need to complete indexing with 7GB CSV file. please assist me to index my csv file with regards Jiyas Problems are only opportunities with thorns on them.
Programmatically upload configuration into ZooKeeper
What is the best way to upload Solr configuration files into ZooKeeper programmatically, i.e. - from within Java code? I know that there are cloud-scripts for this, but in the end they should use some Java client library, don't they? This question raised because we use special configuration system (Java-based) to store all configuration files (not only Solr) and it'd be cool if we could export modified files into ZooKeeper when applying changes. We would then reload collections remotely via REST API. I've digged a little into ZkCli class and it seems that SolrZkClient can do something along the lines above. Is it the right tool for the job? Any hints would be appreciated. Regards, Artem.
Questions about commits and OOE
Hi all, let me first explain our situation : We have - two virtual servers with each : 4x SolR 4.4.0 on Tomcat 6 (+ with mod_cluster 1.2.0), each JVM has -Xms2048m -Xmx2048m -XX:MaxPermSize=384m 1x Zookeeper 3.4.5 (Only one of the two Zookeeper is active.) CentOS 6.4 Sun JDK 1.6.0-31 16 GB of RAM 4 vCPU - only one core and one shard - ~25 docs and 50-100 MB of index size - two load balancers (apache + mod_cluster) who are both connected to the 8 SolR nodes - 1 VIP pointing to these two LB The commit configuration is - every update request do a soft commit (i.e. param softCommit=true in the http request) - autosoftcommit disabled - autocommit enabled every 15 seconds The client application is a java app with SolRj client using the previous VIP as an endpoint. We need NearRealTime modifications visible by the end users. During the day, the client uses SolR with about 80% of select requests and 20% of update requests. Every morning, the client is sending a massive bunch of updates (about 1 in a few minutes). During this massive update, we have sometimes a peak of active threads exceeding the limit of 8192 process authorized for the user running the tomcat and zookeeper process. When this happens, every hardCommit is failing with an OutOfMemory : unable to create native thread message. Now, I have some questions : - Why are there some many threads created ? Is the softCommit on every update that opens a new thread ? - Once an OOE occurs, every hardcommit will be broken, even if the number of threads opened on the system is low. Is there any way to free the JVM ? The only solution we have found is to restart all the JVM. - When the OOE occurs, the SolR cloud console shows the leader node as active and the others as recovering o is the replication working at that moment ? o as all the hardcommits are failing but the softcommits not, am I very sure that I will not lose some updates when restarting all the nodes ? By the way, we are planning to - disable the softCommit parameter on the client side and to enable the autosoftcommit instead. - create another server and make 3 zookeeper chorum instead of a unique zookeeper master. - skip the use of load balancers and let zookeeper decide which node will respond to the requests Any help would be appreciated ! Metin OSMAN
Re: Using Payloads as a Coefficient For Score At a Custom QParser That extends ExtendedDismaxQParser
Sounds great Furkan, Do you have the permission to donate this code? I would be great if you could create a JIra ticket. Thanks, Joel On Tue, Dec 3, 2013 at 3:26 PM, Furkan KAMACI furkankam...@gmail.comwrote: I've implemented what I want. I can add payload score into the document score. I've modified ExtendedDismaxQParser and I can use all the abilities of edismax at my case. I will explain what I did at my blog. Thanks; Furkan KAMACI 2013/12/1 Furkan KAMACI furkankam...@gmail.com Hi; I use Solr 4.5.1 I have a case: When a user searches for some specific keywords some documents should be listed at much more higher than its usual score. I mean I have probabilities of which documents user may want to see for given keywords. I have come up with that idea. I can put a new field to my schema. This field holds keyword and probability as payload. When a user searches for a keyword I will calculate usual document score for given fields and also I will make a search on payloaded field and I will multiply the total score with that payload. I followed that example: http://sujitpal.blogspot.com/2013/07/porting-payloads-to-solr4.html#! owever that example extends Qparser directly but I want to use capabilities of edismax. So I found that example: http://digitalpebble.blogspot.com/2010/08/using-payloads-with-dismaxqparser-in.htmlhis one exteds dismax and but I could not used payloads at that example. I want to combine above to solutions. First solution has that case: @Override public Similarity get(String name) { if (payloads.equals(name) || cscores.equals(name)) { return new PayloadSimilarity(); } else { return new DefaultSimilarity(); } } However dismax behaves different. i.e. when you search for cscores:A it changes that into that: *+((text:cscores:y text:cscores text:y text:cscoresy)) ()* When I debug it name is text instead of cscores and does not work. My idea is combining two examples and extending edismax. Do you have any idea how to extend it for edismax or do you have any idea what to do for my case. *PS:* I've sent same question at Lucene user list too. I ask it here to get an idea from Solr perspective too. Thanks; Furkan KAMACI -- Joel Bernstein Search Engineer at Heliosearch
Re: Programmatically upload configuration into ZooKeeper
Hi Artem, This question (or one very like it) has been asked on this list before so there's some prior art you could modify to suit your needs. Taken from Timothy Potter thelabd...@gmail.com: ** public static void updateClusterstateJsonInZk(CloudSolrServer cloudSolrServer, CommandLine cli) throws Exception { String updateClusterstateJson = cli.getOptionValue(updateClusterstateJson); ZkStateReader zkStateReader = cloudSolrServer.getZkStateReader(); SolrZkClient zkClient = zkStateReader.getZkClient(); File jsonFile = new File(updateClusterstateJson); if (!jsonFile.isFile()) { System.err.println(jsonFile.getAbsolutePath()+ not found.); return; } byte[] clusterstateJson = readFile(jsonFile); // validate the user is passing is valid JSON InputStreamReader bytesReader = new InputStreamReader(new ByteArrayInputStream(clusterstateJson), UTF-8); JSONParser parser = new JSONParser(bytesReader); parser.toString(); zkClient.setData(/clusterstate.json, clusterstateJson, true); System.out.println(Updated /clusterstate.json with data from +jsonFile.getAbsolutePath()); } ** You should be able to modify that or use it as a basis for uploading the changed files in your config. Thanks, Greg On Dec 4, 2013, at 8:36 AM, Artem Karpenko gooy...@gmail.com wrote: What is the best way to upload Solr configuration files into ZooKeeper programmatically, i.e. - from within Java code? I know that there are cloud-scripts for this, but in the end they should use some Java client library, don't they? This question raised because we use special configuration system (Java-based) to store all configuration files (not only Solr) and it'd be cool if we could export modified files into ZooKeeper when applying changes. We would then reload collections remotely via REST API. I've digged a little into ZkCli class and it seems that SolrZkClient can do something along the lines above. Is it the right tool for the job? Any hints would be appreciated. Regards, Artem.
Re: Solr Performance Issue
On 12/4/2013 6:31 AM, kumar wrote: I am having almost 5 to 6 crores of indexed documents in solr. And when i am going to change anything in the configuration file solr server is going down. If you mean crore and not core, then you are talking about 50 to 60 million documents. That's a lot. Solr is perfectly capable of handling that many documents, but you do need to have very good hardware. Even if they are small, your index is likely to be many gigabytes in size. If the documents are large, that might be measured in terabytes. Large indexes require a lot of memory for good performance. This will be discussed in more detail below. As a new user to solr i can't able to find the exact reason for going server down. I am using cache's in the following way : filterCache class=solr.FastLRUCache size=16384 initialSize=4096 autowarmCount=4096/ queryResultCache class=solr.FastLRUCache size=16384 initialSize=4096 autowarmCount=1024/ and i am not using any documentCache, fieldValueCahe's As Erick said, these cache sizes are HUGE. In particular, your autowarmCount values are extremely high. Whether this can lead any performance issue means going server down. Another thing that Erick pointed out is that you haven't really told us what's happening. When you say that the server goes down, what EXACTLY do you mean? And i am seeing logging in the server it is showing exception in the following way Servlet.service() for servlet [default] in context with path [/solr] threw exception [java.lang.IllegalStateException: Cannot call sendError() after the response has been committed] with root cause This message comes from your servlet container, not Solr. You're probably using Tomcat, not the included Jetty. There is some indirect evidence that this can be fixed by increasing the servlet container's setting for the maximum number of request parameters. http://forums.adobe.com/message/4590864 Here's what I can say without further information: You're likely having performance issues. One potential problem is your insanely high autowarmCount values. Your cache configuration tells Solr that every time you have a soft commit or a hard commit with openSearcher=true, you're going to execute up to 1024 queries and up to 4096 filters from the old caches, in order to warm the new caches. Even if you have an optimal setup, this takes a lot of time. I suspect that you don't have an optimal setup. Another potential problem is that you don't have enough memory for the size of your index. A number of potential performance problems are discussed on this wiki page: http://wiki.apache.org/solr/SolrPerformanceProblems A lot more details are required. Here's some things that will be helpful, and more is always better: * Exact symptoms. * Excerpts from the Solr logfile that include entire stacktraces. * Operating system and version. * Total server index size on disk. * Total machine memory. * Java heap size for your servlet container. * Which servlet container you are using to run Solr. * Solr version. * Server hardware details. Thanks, Shawn
RE: Questions about commits and OOE
Hi Metin, I think removing the softCommit=true parameter on the client side will definitely help as NRT wasn't designed to re-open searchers after every document. Try every 1 second (or even every few seconds), I doubt your users will notice. To get an idea of what threads are running in your JVM process, you can use jstack. Cheers, Timothy Potter Sr. Software Engineer, LucidWorks www.lucidworks.com From: OSMAN Metin metin.os...@canal-plus.com Sent: Wednesday, December 04, 2013 7:36 AM To: solr-user@lucene.apache.org Subject: Questions about commits and OOE Hi all, let me first explain our situation : We have - two virtual servers with each : 4x SolR 4.4.0 on Tomcat 6 (+ with mod_cluster 1.2.0), each JVM has -Xms2048m -Xmx2048m -XX:MaxPermSize=384m 1x Zookeeper 3.4.5 (Only one of the two Zookeeper is active.) CentOS 6.4 Sun JDK 1.6.0-31 16 GB of RAM 4 vCPU - only one core and one shard - ~25 docs and 50-100 MB of index size - two load balancers (apache + mod_cluster) who are both connected to the 8 SolR nodes - 1 VIP pointing to these two LB The commit configuration is - every update request do a soft commit (i.e. param softCommit=true in the http request) - autosoftcommit disabled - autocommit enabled every 15 seconds The client application is a java app with SolRj client using the previous VIP as an endpoint. We need NearRealTime modifications visible by the end users. During the day, the client uses SolR with about 80% of select requests and 20% of update requests. Every morning, the client is sending a massive bunch of updates (about 1 in a few minutes). During this massive update, we have sometimes a peak of active threads exceeding the limit of 8192 process authorized for the user running the tomcat and zookeeper process. When this happens, every hardCommit is failing with an OutOfMemory : unable to create native thread message. Now, I have some questions : - Why are there some many threads created ? Is the softCommit on every update that opens a new thread ? - Once an OOE occurs, every hardcommit will be broken, even if the number of threads opened on the system is low. Is there any way to free the JVM ? The only solution we have found is to restart all the JVM. - When the OOE occurs, the SolR cloud console shows the leader node as active and the others as recovering o is the replication working at that moment ? o as all the hardcommits are failing but the softcommits not, am I very sure that I will not lose some updates when restarting all the nodes ? By the way, we are planning to - disable the softCommit parameter on the client side and to enable the autosoftcommit instead. - create another server and make 3 zookeeper chorum instead of a unique zookeeper master. - skip the use of load balancers and let zookeeper decide which node will respond to the requests Any help would be appreciated ! Metin OSMAN
Re: Programmatically upload configuration into ZooKeeper
Hello Greg, so it's SolrZkClient indeed. I've tried it out and it seems to do just the job I need. Thank you! On a related note - is there a similar way to create/reload core/collection, using maybe CloudSolrServer or smth. inside it? Didn't found any methods that could do the thing. Regards, Artem. 04.12.2013 17:15, Greg Walters пишет: Hi Artem, This question (or one very like it) has been asked on this list before so there's some prior art you could modify to suit your needs. Taken from Timothy Potter thelabd...@gmail.com: ** public static void updateClusterstateJsonInZk(CloudSolrServer cloudSolrServer, CommandLine cli) throws Exception { String updateClusterstateJson = cli.getOptionValue(updateClusterstateJson); ZkStateReader zkStateReader = cloudSolrServer.getZkStateReader(); SolrZkClient zkClient = zkStateReader.getZkClient(); File jsonFile = new File(updateClusterstateJson); if (!jsonFile.isFile()) { System.err.println(jsonFile.getAbsolutePath()+ not found.); return; } byte[] clusterstateJson = readFile(jsonFile); // validate the user is passing is valid JSON InputStreamReader bytesReader = new InputStreamReader(new ByteArrayInputStream(clusterstateJson), UTF-8); JSONParser parser = new JSONParser(bytesReader); parser.toString(); zkClient.setData(/clusterstate.json, clusterstateJson, true); System.out.println(Updated /clusterstate.json with data from +jsonFile.getAbsolutePath()); } ** You should be able to modify that or use it as a basis for uploading the changed files in your config. Thanks, Greg On Dec 4, 2013, at 8:36 AM, Artem Karpenko gooy...@gmail.com wrote: What is the best way to upload Solr configuration files into ZooKeeper programmatically, i.e. - from within Java code? I know that there are cloud-scripts for this, but in the end they should use some Java client library, don't they? This question raised because we use special configuration system (Java-based) to store all configuration files (not only Solr) and it'd be cool if we could export modified files into ZooKeeper when applying changes. We would then reload collections remotely via REST API. I've digged a little into ZkCli class and it seems that SolrZkClient can do something along the lines above. Is it the right tool for the job? Any hints would be appreciated. Regards, Artem.
Setting routerField/shardKey on specific collection?
Hi, I'm using Solr 4.6 and trying to specify a router.field (shard key) on a specific collection so that all documents with the same value in the specified field end up in the same collection. However, I can't find an example of how to do this via the solr.xml? I see in this ticket https://issues.apache.org/jira/browse/SOLR-5017 there is a mention of a routeField property. Should the solr.xml contain the following? cores adminPath=/admin/cores defaultCoreName=collection1 core name=collection1 instanceDir=collection1 routerField=consolidationGroupId / /cores Any help would be greatly appreciate? I've been yak shaving all afternoon reading various Jira tickets and wikis trying to get this to work :-) Best wishes, Daniel -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk http://www.tai-dev.co.uk/* daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk | +44 (0) 7799406399 | Twitter: @taidevcouk https://twitter.com/taidevcouk
RE: json update moves doc to end
I changed my boost-function log(clickrate)^8 to div(clciks,displays)^8 and it works now. I get the following output from debug 0.0022668892 = (MATCH) FunctionQuery(div(const(2),const(5))), product of: 0.4 = div(const(2),const(5)) 8.0 = boost 7.0840283E-4 = queryNorm Am i undestanding this right, that 0.4 and 8.0 result in 7.084? I'm having trouble undestanding how much i boosted it. As i use NgramFilterFactory i get a lot of hits because of the tokens. Can i make the boost higher if the hole search-term is found and not just part of it? -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Mittwoch, 4. Dezember 2013 15:07 To: solr-user@lucene.apache.org Subject: Re: json update moves doc to end Well, both have a score of -Infinity. So they're equal and the tiebreaker is the internal Lucene doc ID. Now this is not helpful since the question now is where -Infinity comes from, this looks suspicious: -Infinity = (MATCH) FunctionQuery(log(int(clicks))), product of: -Infinity = log(int(clicks)=0) not much help I know, but Erick On Wed, Dec 4, 2013 at 7:24 AM, Andreas Owen a...@conx.ch wrote: Hi Erick Here are the last 2 results from a search and i am not understanding why the last one with the boost editorschoice^200 isn't at the top. By the way can i also give a substantial boost to results that contain the hole search-request and not just 3 or 4 letters (tokens)? str name=dms:1003 -Infinity = (MATCH) sum of: 0.013719446 = (MATCH) max of: 0.013719446 = (MATCH) sum of: 2.090396E-4 = (MATCH) weight(plain_text:ber in 841) [DefaultSimilarity], result of: 2.090396E-4 = score(doc=841,freq=8.0 = termFreq=8.0 ), product of: 0.009452709 = queryWeight, product of: 1.3343692 = idf(docFreq=611, maxDocs=855) 0.0070840283 = queryNorm 0.022114253 = fieldWeight in 841, product of: 2.828427 = tf(freq=8.0), with freq of: 8.0 = termFreq=8.0 1.3343692 = idf(docFreq=611, maxDocs=855) 0.005859375 = fieldNorm(doc=841) 0.0012402858 = (MATCH) weight(plain_text:eri in 841) [DefaultSimilarity], result of: 0.0012402858 = score(doc=841,freq=9.0 = termFreq=9.0 ), product of: 0.022357063 = queryWeight, product of: 3.1559815 = idf(docFreq=98, maxDocs=855) 0.0070840283 = queryNorm 0.05547624 = fieldWeight in 841, product of: 3.0 = tf(freq=9.0), with freq of: 9.0 = termFreq=9.0 3.1559815 = idf(docFreq=98, maxDocs=855) 0.005859375 = fieldNorm(doc=841) 5.0511415E-4 = (MATCH) weight(plain_text:ric in 841) [DefaultSimilarity], result of: 5.0511415E-4 = score(doc=841,freq=1.0 = termFreq=1.0 ), product of: 0.024712078 = queryWeight, product of: 3.4884217 = idf(docFreq=70, maxDocs=855) 0.0070840283 = queryNorm 0.020439971 = fieldWeight in 841, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 3.4884217 = idf(docFreq=70, maxDocs=855) 0.005859375 = fieldNorm(doc=841) 8.721528E-4 = (MATCH) weight(plain_text:ich in 841) [DefaultSimilarity], result of: 8.721528E-4 = score(doc=841,freq=12.0 = termFreq=12.0 ), product of: 0.017446788 = queryWeight, product of: 2.4628344 = idf(docFreq=197, maxDocs=855) 0.0070840283 = queryNorm 0.049989305 = fieldWeight in 841, product of: 3.4641016 = tf(freq=12.0), with freq of: 12.0 = termFreq=12.0 2.4628344 = idf(docFreq=197, maxDocs=855) 0.005859375 = fieldNorm(doc=841) 7.725705E-4 = (MATCH) weight(plain_text:cht in 841) [DefaultSimilarity], result of: 7.725705E-4 = score(doc=841,freq=4.0 = termFreq=4.0 ), product of: 0.021610687 = queryWeight, product of: 3.050621 = idf(docFreq=109, maxDocs=855) 0.0070840283 = queryNorm 0.035749465 = fieldWeight in 841, product of: 2.0 = tf(freq=4.0), with freq of: 4.0 = termFreq=4.0 3.050621 = idf(docFreq=109, maxDocs=855) 0.005859375 = fieldNorm(doc=841) 0.0010287998 = (MATCH) weight(plain_text:beri in 841) [DefaultSimilarity], result of: 0.0010287998 = score(doc=841,freq=1.0 = termFreq=1.0 ), product of: 0.035267927 = queryWeight, product of: 4.978513 = idf(docFreq=15, maxDocs=855) 0.0070840283 = queryNorm 0.029170973 = fieldWeight in 841, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 4.978513 = idf(docFreq=15, maxDocs=855) 0.005859375 = fieldNorm(doc=841) 0.0010556461 = (MATCH) weight(plain_text:eric in 841) [DefaultSimilarity],
Re: Programmatically upload configuration into ZooKeeper
On 12/4/2013 9:23 AM, Artem Karpenko wrote: so it's SolrZkClient indeed. I've tried it out and it seems to do just the job I need. Thank you! On a related note - is there a similar way to create/reload core/collection, using maybe CloudSolrServer or smth. inside it? Didn't found any methods that could do the thing. This should probably work for reloading collection1. I can't test it right now, as I'm about to start my morning commute. CloudSolrServer srv = new CloudSolrServer(zoo1:2181,zoo2:2181,zoo3:2181/mysolr); srv.setDefaultCollection(collection2); SolrQuery q = new SolrQuery(); q.setRequestHandler(/admin/collections); q.set(action, RELOAD); q.set(name, collection1); QueryResponse x = srv.query(q); If you want to reload an individual core, you'd need to use HttpSolrServer, not CloudSolrServer. SOLR-4140 made it possible to use the collections API with CloudSolrServer, but as far as I can tell, it doesn't enable the CoreAdmin API. Note that reloads don't work right with SolrCloud unless the server version is at least 4.4, due to a bug. Thanks, Shawn
Re: SolrCloud FunctionQuery inconsistency
: There is no default value for ptime. It is generated by users. thank you, that rules out my previous wild guess. : I was trying query with a function query({!boost b=dateDeboost(ptime)} : channelid:0082 title:abc), which leads differents results from the same : shard(using the param: shards=shard3). : : The diffenence is maxScore, which is not consistent. And the maxScore is Ok ... but you still haven't provided enough information for us to make a guess as to why you are seeing inconsistent scores coming back form your queries -- at a minimum we need to see the debugQuery=true output for each of the different replicas that are generating differnet scores. It's possible that the descrepency you are seeing is a minor one resulting from slightly different term stats (ie: segments being merged slightly differnetly on differnet replicas), or it could be a symptom of a larger problem. -Hoss http://www.lucidworks.com/
Re: json update moves doc to end
: Well, both have a score of -Infinity. So they're equal and : the tiebreaker is the internal Lucene doc ID. : : Now this is not helpful since the question now is where : -Infinity comes from, this looks suspicious: : -Infinity = (MATCH) FunctionQuery(log(int(clicks))), product of: : -Infinity = log(int(clicks)=0) If the score of this doc was not -Infinity before your doc update, and it became -Infinity after your update, and your update did not intentionally change the value of the clicks field to 0 then i suspect what you are seeing is the result of not having all of your fields as stored=true... https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents /!\ All original source fields must be stored for field modifiers to work correctly, which is the Solr default -Hoss http://www.lucidworks.com/
Re: Questions about commits and OOE
I'd second the use of jstack to check your threads. Each request (be it a search or update) will generate a request handler thread on the Solr side (unless you've set the limits in the HttpShardHandlerFactory (solr.xml for solr-wide faults and/or under the requestHandler in SolrConfig.xml), we set maxConnectionsPerHost, corePoolSize and maximumPoolSize, since we ran into a similar issue. Our system ironically didn't crash, we just had a JVM with.about 256000 threads, which was rather SSLLOOWW :) On the softCommit front, we have had some success with small softCommit times, but then we use SSDs (and have lots of memory and lots of shards). Once we get concrete figures, we'll publish them, but we are a fair way below 1s now with no major impact on indexing throughput (yet). But I would agree that unless you are really really sure you need it (and most people don't), keep to the known limits. On 4 December 2013 16:09, Tim Potter tim.pot...@lucidworks.com wrote: Hi Metin, I think removing the softCommit=true parameter on the client side will definitely help as NRT wasn't designed to re-open searchers after every document. Try every 1 second (or even every few seconds), I doubt your users will notice. To get an idea of what threads are running in your JVM process, you can use jstack. Cheers, Timothy Potter Sr. Software Engineer, LucidWorks www.lucidworks.com From: OSMAN Metin metin.os...@canal-plus.com Sent: Wednesday, December 04, 2013 7:36 AM To: solr-user@lucene.apache.org Subject: Questions about commits and OOE Hi all, let me first explain our situation : We have - two virtual servers with each : 4x SolR 4.4.0 on Tomcat 6 (+ with mod_cluster 1.2.0), each JVM has -Xms2048m -Xmx2048m -XX:MaxPermSize=384m 1x Zookeeper 3.4.5 (Only one of the two Zookeeper is active.) CentOS 6.4 Sun JDK 1.6.0-31 16 GB of RAM 4 vCPU - only one core and one shard - ~25 docs and 50-100 MB of index size - two load balancers (apache + mod_cluster) who are both connected to the 8 SolR nodes - 1 VIP pointing to these two LB The commit configuration is - every update request do a soft commit (i.e. param softCommit=true in the http request) - autosoftcommit disabled - autocommit enabled every 15 seconds The client application is a java app with SolRj client using the previous VIP as an endpoint. We need NearRealTime modifications visible by the end users. During the day, the client uses SolR with about 80% of select requests and 20% of update requests. Every morning, the client is sending a massive bunch of updates (about 1 in a few minutes). During this massive update, we have sometimes a peak of active threads exceeding the limit of 8192 process authorized for the user running the tomcat and zookeeper process. When this happens, every hardCommit is failing with an OutOfMemory : unable to create native thread message. Now, I have some questions : - Why are there some many threads created ? Is the softCommit on every update that opens a new thread ? - Once an OOE occurs, every hardcommit will be broken, even if the number of threads opened on the system is low. Is there any way to free the JVM ? The only solution we have found is to restart all the JVM. - When the OOE occurs, the SolR cloud console shows the leader node as active and the others as recovering o is the replication working at that moment ? o as all the hardcommits are failing but the softcommits not, am I very sure that I will not lose some updates when restarting all the nodes ? By the way, we are planning to - disable the softCommit parameter on the client side and to enable the autosoftcommit instead. - create another server and make 3 zookeeper chorum instead of a unique zookeeper master. - skip the use of load balancers and let zookeeper decide which node will respond to the requests Any help would be appreciated ! Metin OSMAN
RE: Setting routerField/shardKey on specific collection?
Hi Daniel, I'm not sure how this would apply to an existing collection (in your case collection1). Try using the collections API to create a new collection and pass the router.field parameter. Grep'ing over the code, the parameter is named: router.field (not routerField or routeField). Cheers, Timothy Potter Sr. Software Engineer, LucidWorks www.lucidworks.com From: Daniel Bryant daniel.bry...@tai-dev.co.uk Sent: Wednesday, December 04, 2013 9:40 AM To: solr-user@lucene.apache.org Subject: Setting routerField/shardKey on specific collection? Hi, I'm using Solr 4.6 and trying to specify a router.field (shard key) on a specific collection so that all documents with the same value in the specified field end up in the same collection. However, I can't find an example of how to do this via the solr.xml? I see in this ticket https://issues.apache.org/jira/browse/SOLR-5017 there is a mention of a routeField property. Should the solr.xml contain the following? cores adminPath=/admin/cores defaultCoreName=collection1 core name=collection1 instanceDir=collection1 routerField=consolidationGroupId / /cores Any help would be greatly appreciate? I've been yak shaving all afternoon reading various Jira tickets and wikis trying to get this to work :-) Best wishes, Daniel -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk http://www.tai-dev.co.uk/* daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk | +44 (0) 7799406399 | Twitter: @taidevcouk https://twitter.com/taidevcouk
Re: Setting routerField/shardKey on specific collection?
Many thanks Timothy, I tried this today but ran into issues getting the new collection to persist (so that I could search for the parameter). It's good to have this confirmed as a viable approach though, and I'll persevere with this tomorrow. If I figure it out I'll reply with the details. Thanks again, Daniel On 04/12/2013 17:41, Tim Potter wrote: Hi Daniel, I'm not sure how this would apply to an existing collection (in your case collection1). Try using the collections API to create a new collection and pass the router.field parameter. Grep'ing over the code, the parameter is named: router.field (not routerField or routeField). Cheers, Timothy Potter Sr. Software Engineer, LucidWorks www.lucidworks.com From: Daniel Bryant daniel.bry...@tai-dev.co.uk Sent: Wednesday, December 04, 2013 9:40 AM To: solr-user@lucene.apache.org Subject: Setting routerField/shardKey on specific collection? Hi, I'm using Solr 4.6 and trying to specify a router.field (shard key) on a specific collection so that all documents with the same value in the specified field end up in the same collection. However, I can't find an example of how to do this via the solr.xml? I see in this ticket https://issues.apache.org/jira/browse/SOLR-5017 there is a mention of a routeField property. Should the solr.xml contain the following? cores adminPath=/admin/cores defaultCoreName=collection1 core name=collection1 instanceDir=collection1 routerField=consolidationGroupId / /cores Any help would be greatly appreciate? I've been yak shaving all afternoon reading various Jira tickets and wikis trying to get this to work :-) Best wishes, Daniel -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk http://www.tai-dev.co.uk/* daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk | +44 (0) 7799406399 | Twitter: @taidevcouk https://twitter.com/taidevcouk -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk http://www.tai-dev.co.uk/* daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk | +44 (0) 7799406399 | Twitter: @taidevcouk https://twitter.com/taidevcouk
Re: Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent
Debug shows that all terms are lowercased properly. Thanks On Dec 4, 2013 3:18 AM, Erik Hatcher erik.hatc...@gmail.com wrote: Chances are you're not getting those fuzzy terms analyzed as you'd like. See debug (debug=true) output to be sure. Most likely the fuzzy terms are not being lowercased. See http://wiki.apache.org/solr/MultitermQueryAnalysis for more details (this applies to fuzzy, not just wildcard) terms too. Erik On Dec 4, 2013, at 4:46 AM, Mhd Wrk mhd...@gmail.com wrote: I'm using the following query to do a fuzzy search on Solr 4.5.1 and am getting empty result. qt=standardq=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2) +(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO 2013-12-04T00:23:00Z] -endDate:[* TO 2013-12-04T00:23:00Z])start=0rows=10fl=id If I change it to a not fuzzy query by simply dropping tildes from the terms (see below) then it returns the expected result! Is this a bug? Shouldn't fuzzy version of a query always return a super set of its not-fuzzy equivalent? qt=standardq=+(field1|en_CA|:Swimming field1|en|:Swimming) +(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO 2013-12-04T00:23:00Z] -endDate:[* TO 2013-12-04T00:23:00Z])start=0rows=10fl=id
Solr Stalls on Bulk indexing, no logs or errors
I am finding with a bulk index using SOLR 4.3 on Tomcat, that when I reach 69578 records the server stops adding anything more. I've tried reducing the data sent to the bare minimum of fields and using ASC and DESC data to see if it could be a field issue. Is there anything I could look at for this? As I'm not finding anything similar noted before. Does tomcat have issues with closing connections that look like DDOS attacks? Or could it be related to too many commits in too short a time? Any help will be very greatly appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: a core for every user, lots of users... are there issues
Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we were *trying* to use the rest API function create to create cores without having to manually mess with files on the server. Is this what create was supposed to do? If so it was borken or we werent using it right. In any case in 4.6 is that the right way to programmatically add cores in discovery mode? On Tue, Dec 3, 2013 at 7:37 PM, Erick Erickson erickerick...@gmail.comwrote: bq: Do you have any sense of what a good upper limit might be, or how we might figure that out? As always, it depends (tm). And the biggest thing it depends upon is the number of simultaneous users you have and the size of their indexes. And we've arrived at the black box of estimating size again. Siiihh... I'm afraid that the only way is to test and establish some rules of thumb. The transient core constraint will limit the number of cores loaded at once. If you allow too many cores at once, you'll get OOM errors when all the users pile on at the same time. Let's say you've determined that 100 is the limit for transient cores. What I suspect you'll see is degrading response times if this is too low. Say 110 users are signed on and say they submit queries perfectly in order, one after the other. Every request will require the core to be opened and it'll take a bit. So that'll be a flag. Or that's a fine limit but your users have added more and more documents and you're coming under memory pressure. As you can tell I don't have any good answers. I've seen between 10M and 300M documents on a single machine BTW, on a _very_ casual test I found about 1000 cores/second were found in discovery mode. While they aren't loaded if they're transient, it's still a consideration if you have 10s of thousands. Best, Erick On Tue, Dec 3, 2013 at 3:33 PM, hank williams hank...@gmail.com wrote: On Tue, Dec 3, 2013 at 3:20 PM, Erick Erickson erickerick...@gmail.com wrote: You probably want to look at transient cores, see: http://wiki.apache.org/solr/LotsOfCores But millions will be interesting for a single node, you must have some kind of partitioning in mind? Wow. Thanks for that great link. Yes we are sharding so its not like there would be millions of cores on one machine or even cluster. And since the cores are one per user, this is a totally clean approach. But still we want to make sure that we are not overloading the machine. Do you have any sense of what a good upper limit might be, or how we might figure that out? Best, Erick On Tue, Dec 3, 2013 at 2:38 PM, hank williams hank...@gmail.com wrote: We are building a system where there is a core for every user. There will be many tens or perhaps ultimately hundreds of thousands or millions of users. We do not need each of those users to have “warm” data in memory. In fact doing so would consume lots of memory unnecessarily, for users that might not have logged in in a long time. So my question is, is the default behavior of Solr to try to keep all of our cores warm, and if so, can we stop it? Also given the number of cores that we will likely have is there anything else we should be keeping in mind to maximize performance and minimize memory usage? -- blog: whydoeseverythingsuck.com -- blog: whydoeseverythingsuck.com
Re: a core for every user, lots of users... are there issues
On 12/4/2013 12:34 PM, hank williams wrote: Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we were *trying* to use the rest API function create to create cores without having to manually mess with files on the server. Is this what create was supposed to do? If so it was borken or we werent using it right. In any case in 4.6 is that the right way to programmatically add cores in discovery mode? If you are NOT in SolrCloud mode, in order to create new cores, the config files need to already exist on the disk. This is the case with all versions of Solr. If you're running in SolrCloud mode, the core is associated with a collection. Collections have a link to aconfig in zookeeper. The config is not stored with the core on the disk. Thanks, Shawn
Re: Solr Stalls on Bulk indexing, no logs or errors
There's a known issue with SolrCloud with multiple shards, but you haven't told us whether you're using that. The test for whether you're running in to that is whether you can continue to _query_, just not update. But you need to tell us more about our setup. In particular hour commit settings (hard and soft), your solrconfig settings, particularly around autowarming, how you're bulk indexing, SolrJ? DIH? a huge CSV file? Best, Erick On Wed, Dec 4, 2013 at 2:30 PM, steven crichton stevencrich...@mac.comwrote: I am finding with a bulk index using SOLR 4.3 on Tomcat, that when I reach 69578 records the server stops adding anything more. I've tried reducing the data sent to the bare minimum of fields and using ASC and DESC data to see if it could be a field issue. Is there anything I could look at for this? As I'm not finding anything similar noted before. Does tomcat have issues with closing connections that look like DDOS attacks? Or could it be related to too many commits in too short a time? Any help will be very greatly appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tika not extracting content from ODT / ODS (open document / libreoffice) in Solr 4.2.1
Hello everybody, First of all, sorry about my bad english. Giving updates on this bug, i maybe have found a solution for it. I would like to have opinions on this solution. I have found out that tika, when reading .odt files, would return more than one document. The first one for content.xml, which have the actual content of the file, and the second one for styles.xml. To test this, try to modify an .odt file removing styles.xml and solr should parse its contents normally. Solr, when receiving the second document (styles.xml), erases anything it has read before. In general, styles.xml doesnt have any text on it, so it receives just some spaces. I just modified a function inside 'SolrContentHandler.java' that erases the content of the first document. I made this function to just add an space, do not erase any previous content, so will always add up any document tika is returning for solr. I guess this behavior is going to work for previous cases, but i need your opinion about this. Here is the only modification i made on 'SolrContentHandler.java' @Override public void startDocument() throws SAXException { document.clear(); //catchAllBuilder.setLength(0); //Augusto Camarotti - 28-11-2013 //As tika may parse more than one documents in one file, i have to append every documento tika parses me, //so, i will only append a whitespace and wait for new content everytime. Otherwise, Solr would just get the last document of the file catchAllBuilder.append(' '); for (StringBuilder builder : fieldBuilders.values()) { builder.setLength(0); } bldrStack.clear(); bldrStack.add(catchAllBuilder); } Regards, Augusto Camarotti Alexandre Rafalovitch arafa...@gmail.com 10/05/2013 21:13 I would try DIH with the flags as in jira issue I linked to. If possible. Just in case. Regards, Alex On 10 May 2013 19:53, Sebastián Ramírez sebastian.rami...@senseta.com wrote: OK Jack, I'll switch to MS Office ...hahaha Many thanks for your interest and help... and the bug report in JIRA. Best, Sebastián Ramírez On Fri, May 10, 2013 at 5:48 PM, Jack Krupansky j...@basetechnology.com wrote: I filed SOLR-4809 - OpenOffice document body is not indexed by SolrCell, including some test files. https://issues.apache.org/**jira/browse/SOLR-4809 https://issues.apache.org/jira/browse/SOLR-4809 Yeah, at this stage, switching to Microsoft Office seems like the best bet! -- Jack Krupansky -Original Message- From: Sebastián Ramírez Sent: Friday, May 10, 2013 6:33 PM To: solr-user@lucene.apache.org Subject: Re: Tika not extracting content from ODT / ODS (open document / libreoffice) in Solr 4.2.1 Many thanks Jack for your attention and effort on solving the problem. Best, Sebastián Ramírez On Fri, May 10, 2013 at 5:23 PM, Jack Krupansky j...@basetechnology.com * *wrote: I downloaded the latest Apache OpenOffice 3.4.1 and it does in fact fail to index the proper content, both for .ODP and .ODT files. If I do extractOnly=trueextractFormat=text, I see the extracted text clearly in addition to the metadata. I tested on 4.3, and then tested on Solr 3.6.1 and it also exhibited the problem. I just see spaces in both cases. But whether the problem is due to Solr or Tika, is not apparent. In any case, a Jira is warranted. -- Jack Krupansky -Original Message- From: Sebastián Ramírez Sent: Friday, May 10, 2013 11:24 AM To: solr-user@lucene.apache.org Subject: Tika not extracting content from ODT / ODS (open document / libreoffice) in Solr 4.2.1 Hello everyone, I'm having a problem indexing content from opendocument format files. The files created with OpenOffice and LibreOffice (odt, ods...). Tika is being able to read the files but Solr is not indexing the content. It's not a problem of commiting or something like that, after I post a file it is indexed and all the metadata is indexed/stored but the content isn't there. - I modified the solrconfig.xml file to catch everything: requestHandler name=/update/extract... !-- here is the interesting part -- !-- str name=uprefixignored_/str -- str name=defaultFieldall_txt/str - Then I submitted the file to Solr: curl ' http://localhost:8983/solr/update/extract?commit=true** http://localhost:8983/solr/**update/extract?commit=true** literal.id=newodshttp://**localhost:8983/solr/update/** extract?commit=trueliteral.**id=newods http://localhost:8983/solr/update/extract?commit=trueliteral.id=newods ' -H 'Content-type: application/vnd.oasis.opendocument.spreadsheet' --data-binary @test_ods.ods - Now when I do a search in Solr I get this result, there is something in the content, but that's not the actual content of the original file: result name=response numFound=1 start=0 doc str
facet.method=fcs vs facet.method=fc on solr slaves
Is there any advantage on a Solr slave to receive queries using facet.method=fcs instead of the default of facet.method=fc? Most of the segment files are unchanged between replication events - but I wasn't sure if replication would cause the unchanged segment field caches to be lost anyway. -- Patrick O'Lone Director of Software Development TownNews.com E-mail ... pol...@townnews.com Phone 309-743-0809 Fax .. 309-743-0830
Re: a core for every user, lots of users... are there issues
Super helpful. Thanks. On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey s...@elyograg.org wrote: On 12/4/2013 12:34 PM, hank williams wrote: Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we were *trying* to use the rest API function create to create cores without having to manually mess with files on the server. Is this what create was supposed to do? If so it was borken or we werent using it right. In any case in 4.6 is that the right way to programmatically add cores in discovery mode? If you are NOT in SolrCloud mode, in order to create new cores, the config files need to already exist on the disk. This is the case with all versions of Solr. If you're running in SolrCloud mode, the core is associated with a collection. Collections have a link to aconfig in zookeeper. The config is not stored with the core on the disk. Thanks, Shawn -- blog: whydoeseverythingsuck.com
Re: Solr Stalls on Bulk indexing, no logs or errors
Yes I can continue to query after this importer goes down and whilst it running. The bulk commit is done via a JSON handler in php. There is 121,000 records that need to go into the index. So this is done in 5000 chunked mySQL retrieve calls and parsing to the data as required. workflow: get record create {add doc… } JSON Post to CORE/update/json I stopped doing a hard commit every 1000 records. To see if that was an issue. the auto commit settings are :: autoCommit maxDocs${solr.autoCommit.MaxDocs:5000}/maxDocs maxTime${solr.autoCommit.MaxTime:24000}/maxTime /autoCommit I’ve pretty much worked out of the drupal schemas for SOLR 4 https://drupal.org/project/apachesolr At one point I thought it could be malformed data, but even reducing the records down to just the id and title now .. it crashes at the same point. As in the query still works but the import handler does nothing at all Tomcat logs seem to indicate no major issues. There’s not a strange variable that is set to make an upper index limit is there? Regards, Steven On 4 Dec 2013, at 20:02, Erick Erickson [via Lucene] ml-node+s472066n4104984...@n3.nabble.com wrote: There's a known issue with SolrCloud with multiple shards, but you haven't told us whether you're using that. The test for whether you're running in to that is whether you can continue to _query_, just not update. But you need to tell us more about our setup. In particular hour commit settings (hard and soft), your solrconfig settings, particularly around autowarming, how you're bulk indexing, SolrJ? DIH? a huge CSV file? Best, Erick On Wed, Dec 4, 2013 at 2:30 PM, steven crichton [hidden email]wrote: I am finding with a bulk index using SOLR 4.3 on Tomcat, that when I reach 69578 records the server stops adding anything more. I've tried reducing the data sent to the bare minimum of fields and using ASC and DESC data to see if it could be a field issue. Is there anything I could look at for this? As I'm not finding anything similar noted before. Does tomcat have issues with closing connections that look like DDOS attacks? Or could it be related to too many commits in too short a time? Any help will be very greatly appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981p4104984.html To unsubscribe from Solr Stalls on Bulk indexing, no logs or errors, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981p4104990.html Sent from the Solr - User mailing list archive at Nabble.com.
Querying for results
Hello, I am running Solr from Magento and using DIH to import/index data from 1 other source (external). I am trying to query for results...two questions: 1. The query I'm using runs against fulltext_1_en which is a specific shard created by the Magento deployment in solrconfig.xml. Should I be using/querying from another field/store (e.g. not fulltext_1*) to get results from both Magento and the other data source? How would I add the data from my DIH indexing to that specific shard so it was all in the same place? 2. OR do I need to add another shard to correspond to the DIH data elements? 3. OR is there something else I'm missing in trying to query for data from 2 sources? Thanks!
starting up solr automatically
Hey all, I'm pretty new to solr. I'm installing it on an amazon linux (rpm based) ec2 instance and have it running. I even have nutch feeding it pages from a crawl. I'm very happy about that. I want solr to start on a reboot and am following the instructions at http://wiki.apache.org/solr/SolrJetty#Starting I'm using solr 4.5.1 and when I check the jetty version I get this java -jar start.jar --version Active Options: [default, *] Version Information on 17 entries in the classpath. Note: order presented here is how they would appear on the classpath. changes to the OPTIONS=[option,option,...] command line option will be reflected here. 0:(dir) | ${jetty.home}/resources 1: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-xml-8.1.10.v20130312.jar 2: 3.0.0.v201112011016 | ${jetty.home}/lib/servlet-api-3.0.jar 3: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-http-8.1.10.v20130312.jar 4: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-continuation-8.1.10.v20130312.jar 5: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-server-8.1.10.v20130312.jar 6: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-security-8.1.10.v20130312.jar 7: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-servlet-8.1.10.v20130312.jar 8: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-webapp-8.1.10.v20130312.jar 9: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-deploy-8.1.10.v20130312.jar 10:1.6.6 | ${jetty.home}/lib/ext/jcl-over-slf4j-1.6.6.jar 11:1.6.6 | ${jetty.home}/lib/ext/jul-to-slf4j-1.6.6.jar 12: 1.2.16 | ${jetty.home}/lib/ext/log4j-1.2.16.jar 13:1.6.6 | ${jetty.home}/lib/ext/slf4j-api-1.6.6.jar 14:1.6.6 | ${jetty.home}/lib/ext/slf4j-log4j12-1.6.6.jar 15: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-util-8.1.10.v20130312.jar 16: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-io-8.1.10.v20130312.jar the instructions reference a jetty.sh script for version 6 and a different one for 7. Does the version 7 one work with jetty 8? If not where can I get the one for version 8? BTW - this is just the standard install of solr from the gzip file. thanks in advance for your help. -- Eric Palmer U of Richmond
Re: starting up solr automatically
I found the instructions and scripts on that page to be unclear and/or not work. Here's the script I've been using for solr 4.5.1: https://gist.github.com/gregwalters/7795791 Do note that you'll have to change a couple of paths to get things working correctly. Thanks, Greg On Dec 4, 2013, at 3:15 PM, Eric Palmer e...@ericfpalmer.com wrote: Hey all, I'm pretty new to solr. I'm installing it on an amazon linux (rpm based) ec2 instance and have it running. I even have nutch feeding it pages from a crawl. I'm very happy about that. I want solr to start on a reboot and am following the instructions at http://wiki.apache.org/solr/SolrJetty#Starting I'm using solr 4.5.1 and when I check the jetty version I get this java -jar start.jar --version Active Options: [default, *] Version Information on 17 entries in the classpath. Note: order presented here is how they would appear on the classpath. changes to the OPTIONS=[option,option,...] command line option will be reflected here. 0:(dir) | ${jetty.home}/resources 1: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-xml-8.1.10.v20130312.jar 2: 3.0.0.v201112011016 | ${jetty.home}/lib/servlet-api-3.0.jar 3: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-http-8.1.10.v20130312.jar 4: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-continuation-8.1.10.v20130312.jar 5: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-server-8.1.10.v20130312.jar 6: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-security-8.1.10.v20130312.jar 7: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-servlet-8.1.10.v20130312.jar 8: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-webapp-8.1.10.v20130312.jar 9: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-deploy-8.1.10.v20130312.jar 10:1.6.6 | ${jetty.home}/lib/ext/jcl-over-slf4j-1.6.6.jar 11:1.6.6 | ${jetty.home}/lib/ext/jul-to-slf4j-1.6.6.jar 12: 1.2.16 | ${jetty.home}/lib/ext/log4j-1.2.16.jar 13:1.6.6 | ${jetty.home}/lib/ext/slf4j-api-1.6.6.jar 14:1.6.6 | ${jetty.home}/lib/ext/slf4j-log4j12-1.6.6.jar 15: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-util-8.1.10.v20130312.jar 16: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-io-8.1.10.v20130312.jar the instructions reference a jetty.sh script for version 6 and a different one for 7. Does the version 7 one work with jetty 8? If not where can I get the one for version 8? BTW - this is just the standard install of solr from the gzip file. thanks in advance for your help. -- Eric Palmer U of Richmond
Re: Querying for results
Follow-up: Would anyone very familiar with DIH be willing to jump on a side thread with me and my developer to help troubleshoot some issues we're having? Please little r me at: robert [at] mavenbridge.com. Thanks! On Wed, Dec 4, 2013 at 1:14 PM, Rob Veliz rob...@mavenbridge.com wrote: Hello, I am running Solr from Magento and using DIH to import/index data from 1 other source (external). I am trying to query for results...two questions: 1. The query I'm using runs against fulltext_1_en which is a specific shard created by the Magento deployment in solrconfig.xml. Should I be using/querying from another field/store (e.g. not fulltext_1*) to get results from both Magento and the other data source? How would I add the data from my DIH indexing to that specific shard so it was all in the same place? 2. OR do I need to add another shard to correspond to the DIH data elements? 3. OR is there something else I'm missing in trying to query for data from 2 sources? Thanks! -- *Rob Veliz*, Founder | *Mavenbridge* | rob...@mavenbridge.com | M: +1 (206) 909 - 3490 Follow us at: http://twitter.com/mavenbridge
Re: starting up solr automatically
I almost forgot, you'll need a file to setup the environment a bit too: ** JAVA_HOME=/usr/java/default JAVA_OPTIONS=-Xmx15g \ -Xms15g \ -XX:+PrintGCApplicationStoppedTime \ -XX:+PrintGCDateStamps \ -XX:+PrintGCDetails \ -XX:+UseConcMarkSweepGC \ -XX:+UseParNewGC \ -XX:+UseTLAB \ -XX:+CMSParallelRemarkEnabled \ -XX:+CMSScavengeBeforeRemark \ -XX:+UseCMSInitiatingOccupancyOnly \ -XX:CMSInitiatingOccupancyFraction=50 \ -XX:CMSWaitDuration=30 \ -XX:GCTimeRatio=40 \ -Xloggc:/tmp/solr45_gc.log \ -Dbootstrap_conf=true \ -Dbootstrap_confdir=/var/lib/answers/atlascloud/solr45/solr/wa-en-collection_1/conf/ \ -Dcollection.configName=wa-en-collection \ -DzkHost=hosts \ -DnumShards=shards \ -Dsolr.solr.home=/var/lib/answers/atlascloud/solr45/solr/ \ -Dlog4j.configuration=file:///var/lib/answers/atlascloud/solr45/resources/log4j.properties \ -Djetty.port=9101 \ $JAVA_OPTIONS JETTY_HOME=/var/lib/answers/atlascloud/solr45/ JETTY_USER=tomcat JETTY_LOGS=/var/lib/answers/atlascloud/solr45/logs ** On Dec 4, 2013, at 3:21 PM, Greg Walters greg.walt...@answers.com wrote: I found the instructions and scripts on that page to be unclear and/or not work. Here's the script I've been using for solr 4.5.1: https://gist.github.com/gregwalters/7795791 Do note that you'll have to change a couple of paths to get things working correctly. Thanks, Greg On Dec 4, 2013, at 3:15 PM, Eric Palmer e...@ericfpalmer.com wrote: Hey all, I'm pretty new to solr. I'm installing it on an amazon linux (rpm based) ec2 instance and have it running. I even have nutch feeding it pages from a crawl. I'm very happy about that. I want solr to start on a reboot and am following the instructions at http://wiki.apache.org/solr/SolrJetty#Starting I'm using solr 4.5.1 and when I check the jetty version I get this java -jar start.jar --version Active Options: [default, *] Version Information on 17 entries in the classpath. Note: order presented here is how they would appear on the classpath. changes to the OPTIONS=[option,option,...] command line option will be reflected here. 0:(dir) | ${jetty.home}/resources 1: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-xml-8.1.10.v20130312.jar 2: 3.0.0.v201112011016 | ${jetty.home}/lib/servlet-api-3.0.jar 3: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-http-8.1.10.v20130312.jar 4: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-continuation-8.1.10.v20130312.jar 5: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-server-8.1.10.v20130312.jar 6: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-security-8.1.10.v20130312.jar 7: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-servlet-8.1.10.v20130312.jar 8: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-webapp-8.1.10.v20130312.jar 9: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-deploy-8.1.10.v20130312.jar 10:1.6.6 | ${jetty.home}/lib/ext/jcl-over-slf4j-1.6.6.jar 11:1.6.6 | ${jetty.home}/lib/ext/jul-to-slf4j-1.6.6.jar 12: 1.2.16 | ${jetty.home}/lib/ext/log4j-1.2.16.jar 13:1.6.6 | ${jetty.home}/lib/ext/slf4j-api-1.6.6.jar 14:1.6.6 | ${jetty.home}/lib/ext/slf4j-log4j12-1.6.6.jar 15: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-util-8.1.10.v20130312.jar 16: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-io-8.1.10.v20130312.jar the instructions reference a jetty.sh script for version 6 and a different one for 7. Does the version 7 one work with jetty 8? If not where can I get the one for version 8? BTW - this is just the standard install of solr from the gzip file. thanks in advance for your help. -- Eric Palmer U of Richmond
Re: starting up solr automatically
thanks greg I got it starting but the collection file is not avail. I will use the script that you gave the url for and the env settings. Thanks On Wed, Dec 4, 2013 at 4:26 PM, Greg Walters greg.walt...@answers.comwrote: I almost forgot, you'll need a file to setup the environment a bit too: ** JAVA_HOME=/usr/java/default JAVA_OPTIONS=-Xmx15g \ -Xms15g \ -XX:+PrintGCApplicationStoppedTime \ -XX:+PrintGCDateStamps \ -XX:+PrintGCDetails \ -XX:+UseConcMarkSweepGC \ -XX:+UseParNewGC \ -XX:+UseTLAB \ -XX:+CMSParallelRemarkEnabled \ -XX:+CMSScavengeBeforeRemark \ -XX:+UseCMSInitiatingOccupancyOnly \ -XX:CMSInitiatingOccupancyFraction=50 \ -XX:CMSWaitDuration=30 \ -XX:GCTimeRatio=40 \ -Xloggc:/tmp/solr45_gc.log \ -Dbootstrap_conf=true \ -Dbootstrap_confdir=/var/lib/answers/atlascloud/solr45/solr/wa-en-collection_1/conf/ \ -Dcollection.configName=wa-en-collection \ -DzkHost=hosts \ -DnumShards=shards \ -Dsolr.solr.home=/var/lib/answers/atlascloud/solr45/solr/ \ -Dlog4j.configuration=file:///var/lib/answers/atlascloud/solr45/resources/log4j.properties \ -Djetty.port=9101 \ $JAVA_OPTIONS JETTY_HOME=/var/lib/answers/atlascloud/solr45/ JETTY_USER=tomcat JETTY_LOGS=/var/lib/answers/atlascloud/solr45/logs ** On Dec 4, 2013, at 3:21 PM, Greg Walters greg.walt...@answers.com wrote: I found the instructions and scripts on that page to be unclear and/or not work. Here's the script I've been using for solr 4.5.1: https://gist.github.com/gregwalters/7795791 Do note that you'll have to change a couple of paths to get things working correctly. Thanks, Greg On Dec 4, 2013, at 3:15 PM, Eric Palmer e...@ericfpalmer.com wrote: Hey all, I'm pretty new to solr. I'm installing it on an amazon linux (rpm based) ec2 instance and have it running. I even have nutch feeding it pages from a crawl. I'm very happy about that. I want solr to start on a reboot and am following the instructions at http://wiki.apache.org/solr/SolrJetty#Starting I'm using solr 4.5.1 and when I check the jetty version I get this java -jar start.jar --version Active Options: [default, *] Version Information on 17 entries in the classpath. Note: order presented here is how they would appear on the classpath. changes to the OPTIONS=[option,option,...] command line option will be reflected here. 0:(dir) | ${jetty.home}/resources 1: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-xml-8.1.10.v20130312.jar 2: 3.0.0.v201112011016 | ${jetty.home}/lib/servlet-api-3.0.jar 3: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-http-8.1.10.v20130312.jar 4: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-continuation-8.1.10.v20130312.jar 5: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-server-8.1.10.v20130312.jar 6: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-security-8.1.10.v20130312.jar 7: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-servlet-8.1.10.v20130312.jar 8: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-webapp-8.1.10.v20130312.jar 9: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-deploy-8.1.10.v20130312.jar 10:1.6.6 | ${jetty.home}/lib/ext/jcl-over-slf4j-1.6.6.jar 11:1.6.6 | ${jetty.home}/lib/ext/jul-to-slf4j-1.6.6.jar 12: 1.2.16 | ${jetty.home}/lib/ext/log4j-1.2.16.jar 13:1.6.6 | ${jetty.home}/lib/ext/slf4j-api-1.6.6.jar 14:1.6.6 | ${jetty.home}/lib/ext/slf4j-log4j12-1.6.6.jar 15: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-util-8.1.10.v20130312.jar 16: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-io-8.1.10.v20130312.jar the instructions reference a jetty.sh script for version 6 and a different one for 7. Does the version 7 one work with jetty 8? If not where can I get the one for version 8? BTW - this is just the standard install of solr from the gzip file. thanks in advance for your help. -- Eric Palmer U of Richmond -- Eric Palmer
Re: Solr Stalls on Bulk indexing, no logs or errors
Wait, crashes? Or just stops accepting updates? At any rate, this should be fixed in 4.6. If you can dump a stack trace, we can identify whether this is the same issue quickly. jstack is popular. If you're still having queries served, it's probably not your commit settings, try searching the JIRA list for distributed deadlock. You should find two JIRAs, one relevant to SolrJ by Joel Bernstein (probably not one you are about) and one by Mark Miller that address this. Best, Erick On Wed, Dec 4, 2013 at 3:19 PM, steven crichton stevencrich...@mac.comwrote: Yes I can continue to query after this importer goes down and whilst it running. The bulk commit is done via a JSON handler in php. There is 121,000 records that need to go into the index. So this is done in 5000 chunked mySQL retrieve calls and parsing to the data as required. workflow: get record create {add doc… } JSON Post to CORE/update/json I stopped doing a hard commit every 1000 records. To see if that was an issue. the auto commit settings are :: autoCommit maxDocs${solr.autoCommit.MaxDocs:5000}/maxDocs maxTime${solr.autoCommit.MaxTime:24000}/maxTime /autoCommit I’ve pretty much worked out of the drupal schemas for SOLR 4 https://drupal.org/project/apachesolr At one point I thought it could be malformed data, but even reducing the records down to just the id and title now .. it crashes at the same point. As in the query still works but the import handler does nothing at all Tomcat logs seem to indicate no major issues. There’s not a strange variable that is set to make an upper index limit is there? Regards, Steven On 4 Dec 2013, at 20:02, Erick Erickson [via Lucene] ml-node+s472066n4104984...@n3.nabble.com wrote: There's a known issue with SolrCloud with multiple shards, but you haven't told us whether you're using that. The test for whether you're running in to that is whether you can continue to _query_, just not update. But you need to tell us more about our setup. In particular hour commit settings (hard and soft), your solrconfig settings, particularly around autowarming, how you're bulk indexing, SolrJ? DIH? a huge CSV file? Best, Erick On Wed, Dec 4, 2013 at 2:30 PM, steven crichton [hidden email]wrote: I am finding with a bulk index using SOLR 4.3 on Tomcat, that when I reach 69578 records the server stops adding anything more. I've tried reducing the data sent to the bare minimum of fields and using ASC and DESC data to see if it could be a field issue. Is there anything I could look at for this? As I'm not finding anything similar noted before. Does tomcat have issues with closing connections that look like DDOS attacks? Or could it be related to too many commits in too short a time? Any help will be very greatly appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981p4104984.html To unsubscribe from Solr Stalls on Bulk indexing, no logs or errors, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981p4104990.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent
Ah... although the lower case filtering does get applied properly in a multiterm analysis scenario, stemming does not. What stemmer are you using? I suspect that swimming normally becomes swim. Compare the debug output of the two queries. -- Jack Krupansky -Original Message- From: Mhd Wrk Sent: Wednesday, December 04, 2013 2:08 PM To: solr-user@lucene.apache.org Subject: Re: Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent Debug shows that all terms are lowercased properly. Thanks On Dec 4, 2013 3:18 AM, Erik Hatcher erik.hatc...@gmail.com wrote: Chances are you're not getting those fuzzy terms analyzed as you'd like. See debug (debug=true) output to be sure. Most likely the fuzzy terms are not being lowercased. See http://wiki.apache.org/solr/MultitermQueryAnalysis for more details (this applies to fuzzy, not just wildcard) terms too. Erik On Dec 4, 2013, at 4:46 AM, Mhd Wrk mhd...@gmail.com wrote: I'm using the following query to do a fuzzy search on Solr 4.5.1 and am getting empty result. qt=standardq=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2) +(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO 2013-12-04T00:23:00Z] -endDate:[* TO 2013-12-04T00:23:00Z])start=0rows=10fl=id If I change it to a not fuzzy query by simply dropping tildes from the terms (see below) then it returns the expected result! Is this a bug? Shouldn't fuzzy version of a query always return a super set of its not-fuzzy equivalent? qt=standardq=+(field1|en_CA|:Swimming field1|en|:Swimming) +(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO 2013-12-04T00:23:00Z] -endDate:[* TO 2013-12-04T00:23:00Z])start=0rows=10fl=id
Re: a core for every user, lots of users... are there issues
Hank: I should add that lots of cores and SolrCloud aren't guaranteed to play nice together. I think some of the committers will be addressing this sometime soon. I'm not saying that this will certainly fail, OTOH I don't know anyone who's combined the two. Erick On Wed, Dec 4, 2013 at 3:18 PM, hank williams hank...@gmail.com wrote: Super helpful. Thanks. On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey s...@elyograg.org wrote: On 12/4/2013 12:34 PM, hank williams wrote: Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we were *trying* to use the rest API function create to create cores without having to manually mess with files on the server. Is this what create was supposed to do? If so it was borken or we werent using it right. In any case in 4.6 is that the right way to programmatically add cores in discovery mode? If you are NOT in SolrCloud mode, in order to create new cores, the config files need to already exist on the disk. This is the case with all versions of Solr. If you're running in SolrCloud mode, the core is associated with a collection. Collections have a link to aconfig in zookeeper. The config is not stored with the core on the disk. Thanks, Shawn -- blog: whydoeseverythingsuck.com
Re: a core for every user, lots of users... are there issues
Oh my... when you say I don't know anyone who's combined the two. do you mean that those that have tried have failed or that no one has gotten around to trying? It sounds like you are saying you have some specific knowledge that right now these wont work, otherwise you wouldnt say committers will be addressing this sometime soon, right? I'm worried as we need to make a practical decision here and it sounds like maybe we should stick with solr for now... is that what you are saying? On Wed, Dec 4, 2013 at 5:01 PM, Erick Erickson erickerick...@gmail.comwrote: Hank: I should add that lots of cores and SolrCloud aren't guaranteed to play nice together. I think some of the committers will be addressing this sometime soon. I'm not saying that this will certainly fail, OTOH I don't know anyone who's combined the two. Erick On Wed, Dec 4, 2013 at 3:18 PM, hank williams hank...@gmail.com wrote: Super helpful. Thanks. On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey s...@elyograg.org wrote: On 12/4/2013 12:34 PM, hank williams wrote: Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we were *trying* to use the rest API function create to create cores without having to manually mess with files on the server. Is this what create was supposed to do? If so it was borken or we werent using it right. In any case in 4.6 is that the right way to programmatically add cores in discovery mode? If you are NOT in SolrCloud mode, in order to create new cores, the config files need to already exist on the disk. This is the case with all versions of Solr. If you're running in SolrCloud mode, the core is associated with a collection. Collections have a link to aconfig in zookeeper. The config is not stored with the core on the disk. Thanks, Shawn -- blog: whydoeseverythingsuck.com -- blog: whydoeseverythingsuck.com
Inconsistent numFound in SC when querying core directly
Hey guys, I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 3-node external Zookeeper and 1 collection (2 shards, 2 replicas). Currently we are noticing inconsistent results from the SolrCloud when performing the same simple /select query many times to our collection. Almost every other query the numFound count (and the returned data) jumps between two very different values. Initially I suspected a replica in a shard of the collection was inconsistent (and every other request hit that node) and started performing the same /select query direct to the individual cores of the SolrCloud collection on each instance, only to notice the same problem - the count jumps between two very different values! I may be incorrect here, but I assumed when querying a single core of a SolrCloud collection, the SolrCloud routing is bypassed and I am talking directly to a plain/non-SolrCloud core. As you can see here, the count for 1 core of my SolrCloud collection fluctuates wildly, and is only receiving updates and no deletes to explain the jumps: solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:123596839,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:84739144,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:123596839,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:84771358,start:0,maxScore:1.0,docs:[] Could anyone help me understand why the same /select query direct to a single core would return inconsistent, flapping results if there are no deletes issued in my app to cause such jumps? Am I incorrect in my assumption that I am querying the core directly? An interesting observation is when I do an /admin/cores call to see the docCount of the core's index, it does not fluctuate, only the query result. That was hard to explain, hopefully someone has some insight! :) Thanks! Tim
Re: Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent
I'm using snowball stemmer and, you are correct, swimming has been stored as swim. Should I wrap snowball filter in a multiterm analyzer? Thanks On Dec 4, 2013 2:02 PM, Jack Krupansky j...@basetechnology.com wrote: Ah... although the lower case filtering does get applied properly in a multiterm analysis scenario, stemming does not. What stemmer are you using? I suspect that swimming normally becomes swim. Compare the debug output of the two queries. -- Jack Krupansky -Original Message- From: Mhd Wrk Sent: Wednesday, December 04, 2013 2:08 PM To: solr-user@lucene.apache.org Subject: Re: Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent Debug shows that all terms are lowercased properly. Thanks On Dec 4, 2013 3:18 AM, Erik Hatcher erik.hatc...@gmail.com wrote: Chances are you're not getting those fuzzy terms analyzed as you'd like. See debug (debug=true) output to be sure. Most likely the fuzzy terms are not being lowercased. See http://wiki.apache.org/solr/MultitermQueryAnalysis for more details (this applies to fuzzy, not just wildcard) terms too. Erik On Dec 4, 2013, at 4:46 AM, Mhd Wrk mhd...@gmail.com wrote: I'm using the following query to do a fuzzy search on Solr 4.5.1 and am getting empty result. qt=standardq=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2) +(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO 2013-12-04T00:23:00Z] -endDate:[* TO 2013-12-04T00:23:00Z])start=0rows=10fl=id If I change it to a not fuzzy query by simply dropping tildes from the terms (see below) then it returns the expected result! Is this a bug? Shouldn't fuzzy version of a query always return a super set of its not-fuzzy equivalent? qt=standardq=+(field1|en_CA|:Swimming field1|en|:Swimming) +(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO 2013-12-04T00:23:00Z] -endDate:[* TO 2013-12-04T00:23:00Z])start=0rows=10fl=id
Re: Inconsistent numFound in SC when querying core directly
To add two more pieces of data: 1) This occurs with real, conditional queries as well (eg: q=key:timvaillancourt), not just the q=*:* I provided in my email. 2) I've noticed when I bring a node of the SolrCloud down it is remaining state: active in my /clusterstate.json - something is really wrong with this cloud! Would a Zookeeper issue explain my varied results when querying a core directly? Thanks again! Tim On 04/12/13 02:17 PM, Tim Vaillancourt wrote: Hey guys, I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 3-node external Zookeeper and 1 collection (2 shards, 2 replicas). Currently we are noticing inconsistent results from the SolrCloud when performing the same simple /select query many times to our collection. Almost every other query the numFound count (and the returned data) jumps between two very different values. Initially I suspected a replica in a shard of the collection was inconsistent (and every other request hit that node) and started performing the same /select query direct to the individual cores of the SolrCloud collection on each instance, only to notice the same problem - the count jumps between two very different values! I may be incorrect here, but I assumed when querying a single core of a SolrCloud collection, the SolrCloud routing is bypassed and I am talking directly to a plain/non-SolrCloud core. As you can see here, the count for 1 core of my SolrCloud collection fluctuates wildly, and is only receiving updates and no deletes to explain the jumps: solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:123596839,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:84739144,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:123596839,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:84771358,start:0,maxScore:1.0,docs:[] Could anyone help me understand why the same /select query direct to a single core would return inconsistent, flapping results if there are no deletes issued in my app to cause such jumps? Am I incorrect in my assumption that I am querying the core directly? An interesting observation is when I do an /admin/cores call to see the docCount of the core's index, it does not fluctuate, only the query result. That was hard to explain, hopefully someone has some insight! :) Thanks! Tim
RE: Inconsistent numFound in SC when querying core directly
https://issues.apache.org/jira/browse/SOLR-4260 Join the club Tim! Can you upgrade to trunk or incorporate the latest patches of related issues? You can fix it by trashing the bad node's data, although without multiple clusters it may be difficult to decide which node is bad. We use the latest commits now (since tuesday) and are still waiting for it to happen again. -Original message- From:Tim Vaillancourt t...@elementspace.com Sent: Wednesday 4th December 2013 23:38 To: solr-user@lucene.apache.org Subject: Re: Inconsistent numFound in SC when querying core directly To add two more pieces of data: 1) This occurs with real, conditional queries as well (eg: q=key:timvaillancourt), not just the q=*:* I provided in my email. 2) I've noticed when I bring a node of the SolrCloud down it is remaining state: active in my /clusterstate.json - something is really wrong with this cloud! Would a Zookeeper issue explain my varied results when querying a core directly? Thanks again! Tim On 04/12/13 02:17 PM, Tim Vaillancourt wrote: Hey guys, I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 3-node external Zookeeper and 1 collection (2 shards, 2 replicas). Currently we are noticing inconsistent results from the SolrCloud when performing the same simple /select query many times to our collection. Almost every other query the numFound count (and the returned data) jumps between two very different values. Initially I suspected a replica in a shard of the collection was inconsistent (and every other request hit that node) and started performing the same /select query direct to the individual cores of the SolrCloud collection on each instance, only to notice the same problem - the count jumps between two very different values! I may be incorrect here, but I assumed when querying a single core of a SolrCloud collection, the SolrCloud routing is bypassed and I am talking directly to a plain/non-SolrCloud core. As you can see here, the count for 1 core of my SolrCloud collection fluctuates wildly, and is only receiving updates and no deletes to explain the jumps: solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:123596839,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:84739144,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:123596839,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:84771358,start:0,maxScore:1.0,docs:[] Could anyone help me understand why the same /select query direct to a single core would return inconsistent, flapping results if there are no deletes issued in my app to cause such jumps? Am I incorrect in my assumption that I am querying the core directly? An interesting observation is when I do an /admin/cores call to see the docCount of the core's index, it does not fluctuate, only the query result. That was hard to explain, hopefully someone has some insight! :) Thanks! Tim
Re: Inconsistent numFound in SC when querying core directly
Thanks Markus, I'm not sure if I'm encountering the same issue. This JIRA mentions 10s of docs difference, I'm seeing differences in the multi-millions of docs, and even more strangely it very predictably flaps between a 123M value and an 87M value, a 30M+ doc difference. Secondly, I'm not comparing values from 2 instances (Leader to Replica), I'm currently performing the same curl call to the same core directly and am seeing flapping results each time I perform the query, so this is currently happening within a single instance/core unless I am misunderstanding how to directly query a core. Cheers, Tim On 04/12/13 02:46 PM, Markus Jelsma wrote: https://issues.apache.org/jira/browse/SOLR-4260 Join the club Tim! Can you upgrade to trunk or incorporate the latest patches of related issues? You can fix it by trashing the bad node's data, although without multiple clusters it may be difficult to decide which node is bad. We use the latest commits now (since tuesday) and are still waiting for it to happen again. -Original message- From:Tim Vaillancourtt...@elementspace.com Sent: Wednesday 4th December 2013 23:38 To: solr-user@lucene.apache.org Subject: Re: Inconsistent numFound in SC when querying core directly To add two more pieces of data: 1) This occurs with real, conditional queries as well (eg: q=key:timvaillancourt), not just the q=*:* I provided in my email. 2) I've noticed when I bring a node of the SolrCloud down it is remaining state: active in my /clusterstate.json - something is really wrong with this cloud! Would a Zookeeper issue explain my varied results when querying a core directly? Thanks again! Tim On 04/12/13 02:17 PM, Tim Vaillancourt wrote: Hey guys, I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 3-node external Zookeeper and 1 collection (2 shards, 2 replicas). Currently we are noticing inconsistent results from the SolrCloud when performing the same simple /select query many times to our collection. Almost every other query the numFound count (and the returned data) jumps between two very different values. Initially I suspected a replica in a shard of the collection was inconsistent (and every other request hit that node) and started performing the same /select query direct to the individual cores of the SolrCloud collection on each instance, only to notice the same problem - the count jumps between two very different values! I may be incorrect here, but I assumed when querying a single core of a SolrCloud collection, the SolrCloud routing is bypassed and I am talking directly to a plain/non-SolrCloud core. As you can see here, the count for 1 core of my SolrCloud collection fluctuates wildly, and is only receiving updates and no deletes to explain the jumps: solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:123596839,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:84739144,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:123596839,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:84771358,start:0,maxScore:1.0,docs:[] Could anyone help me understand why the same /select query direct to a single core would return inconsistent, flapping results if there are no deletes issued in my app to cause such jumps? Am I incorrect in my assumption that I am querying the core directly? An interesting observation is when I do an /admin/cores call to see the docCount of the core's index, it does not fluctuate, only the query result. That was hard to explain, hopefully someone has some insight! :) Thanks! Tim
Re: Inconsistent numFound in SC when querying core directly
: : I may be incorrect here, but I assumed when querying a single core of a : SolrCloud collection, the SolrCloud routing is bypassed and I am talking : directly to a plain/non-SolrCloud core. No ... every query received from a client by solr is handled by a single core -- if that core knows it's part of a SolrCloud collection then it will do a distributed search across a random replica from each shard in that collection. If you want to bypass the distribute search logic, you have to say so explicitly... To ask an arbitrary replica to only search itself add distrib=false to the request. Alternatively: you can ask that only certain shard names (or certain explicit replicas) be included in a distribute request.. https://cwiki.apache.org/confluence/display/solr/Distributed+Requests -Hoss http://www.lucidworks.com/
Re: Inconsistent numFound in SC when querying core directly
Chris, this is extremely helpful and it's silly I didn't think of this sooner! Thanks a lot, this makes the situation make much more sense. I will gather some proper data with your suggestion and get back to the thread shortly. Thanks!! Tim On 04/12/13 02:57 PM, Chris Hostetter wrote: : : I may be incorrect here, but I assumed when querying a single core of a : SolrCloud collection, the SolrCloud routing is bypassed and I am talking : directly to a plain/non-SolrCloud core. No ... every query received from a client by solr is handled by a single core -- if that core knows it's part of a SolrCloud collection then it will do a distributed search across a random replica from each shard in that collection. If you want to bypass the distribute search logic, you have to say so explicitly... To ask an arbitrary replica to only search itself add distrib=false to the request. Alternatively: you can ask that only certain shard names (or certain explicit replicas) be included in a distribute request.. https://cwiki.apache.org/confluence/display/solr/Distributed+Requests -Hoss http://www.lucidworks.com/
Re: SOLR Master-Slave Repeater with Load balancer
Hi Erick, Thanks a lot for your explanation. We initially considered Solr Cloud but we have limitation on the number of servers that we can use due to budget concerns (limit is 2) Solr Cloud requires minimum 3. I have tried out the solution you suggested and so far its going well and we are not doing self polling concept. -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Master-Slave-Repeater-with-Load-balancer-tp4103363p4105017.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Inconsistent numFound in SC when querying core directly
Hey all, Now that I am getting correct results with distrib=false, I've identified that 1 of my nodes has just 1/3rd of the total data set and totally explains the flapping in results. The fix for this is obvious (rebuild replica) but the cause is less obvious. There is definately more than one issue going on with this SolrCloud (but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that /clusterstate.json doesn't seem to get updated when nodes are brought down/up is the reason why this replica remained in the distributed request chain without recovering/re-replicating from leader. I imagine my Zookeeper ensemble is having some problems unrelated to Solr that is the real root cause. Thanks! Tim On 04/12/13 03:00 PM, Tim Vaillancourt wrote: Chris, this is extremely helpful and it's silly I didn't think of this sooner! Thanks a lot, this makes the situation make much more sense. I will gather some proper data with your suggestion and get back to the thread shortly. Thanks!! Tim On 04/12/13 02:57 PM, Chris Hostetter wrote: : : I may be incorrect here, but I assumed when querying a single core of a : SolrCloud collection, the SolrCloud routing is bypassed and I am talking : directly to a plain/non-SolrCloud core. No ... every query received from a client by solr is handled by a single core -- if that core knows it's part of a SolrCloud collection then it will do a distributed search across a random replica from each shard in that collection. If you want to bypass the distribute search logic, you have to say so explicitly... To ask an arbitrary replica to only search itself add distrib=false to the request. Alternatively: you can ask that only certain shard names (or certain explicit replicas) be included in a distribute request.. https://cwiki.apache.org/confluence/display/solr/Distributed+Requests -Hoss http://www.lucidworks.com/
Re: Inconsistent numFound in SC when querying core directly
Keep in mind, there have been a *lot* of bug fixes since 4.3.1. - Mark On Dec 4, 2013, at 7:07 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey all, Now that I am getting correct results with distrib=false, I've identified that 1 of my nodes has just 1/3rd of the total data set and totally explains the flapping in results. The fix for this is obvious (rebuild replica) but the cause is less obvious. There is definately more than one issue going on with this SolrCloud (but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that /clusterstate.json doesn't seem to get updated when nodes are brought down/up is the reason why this replica remained in the distributed request chain without recovering/re-replicating from leader. I imagine my Zookeeper ensemble is having some problems unrelated to Solr that is the real root cause. Thanks! Tim On 04/12/13 03:00 PM, Tim Vaillancourt wrote: Chris, this is extremely helpful and it's silly I didn't think of this sooner! Thanks a lot, this makes the situation make much more sense. I will gather some proper data with your suggestion and get back to the thread shortly. Thanks!! Tim On 04/12/13 02:57 PM, Chris Hostetter wrote: : : I may be incorrect here, but I assumed when querying a single core of a : SolrCloud collection, the SolrCloud routing is bypassed and I am talking : directly to a plain/non-SolrCloud core. No ... every query received from a client by solr is handled by a single core -- if that core knows it's part of a SolrCloud collection then it will do a distributed search across a random replica from each shard in that collection. If you want to bypass the distribute search logic, you have to say so explicitly... To ask an arbitrary replica to only search itself add distrib=false to the request. Alternatively: you can ask that only certain shard names (or certain explicit replicas) be included in a distribute request.. https://cwiki.apache.org/confluence/display/solr/Distributed+Requests -Hoss http://www.lucidworks.com/
Prioritize search returns by URL path?
We have a Telligent based community with Solr as the search engine. We want to prioritize search returns from within the community by the type of content: Wiki articles as most relevant, then blog posts, then Verified answer and Suggested answer forum posts, then remaining forum posts. We have also implemented a Helpful voting capability and would like to boost items with more Helpful votes above those within their same category with fewer votes. Has anyone out there done something similar, or can someone suggest how to do this? We're new to search engine tuning, so assume very little knowledge on our part. Thanks for your help! JRG -- View this message in context: http://lucene.472066.n3.nabble.com/Prioritize-search-returns-by-URL-path-tp4105023.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to increase each index file size
Hi Erick Thanks for your reply. Regards 2013/12/4 Erick Erickson erickerick...@gmail.com Why do you want to do this? Are you seeing performance problems? If not, I'd just ignore this problem, premature optimization and all that. If you _really_ want to do this, your segments files are closed every time you to a commit, opensearcher=true|false doesn't matter. BUT, the longer these are the bigger your transaction log will be, which may lead to other issues, particularly on restart. See: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ The key is the section on truncating the tlog. And note the sizes of these segments will change as they're merged anyway. Best, Erick On Wed, Dec 4, 2013 at 4:42 AM, YouPeng Yang yypvsxf19870...@gmail.com wrote: Hi I'm using the SolrCloud integreted with HDFS,I found there are lots of small size files. So,I'd like to increase the index file size while doing DIH full-import. Any suggestion to achieve this goal. Regards.
Re: a core for every user, lots of users... are there issues
I don't know of anyone who's tried and failed to combine transient cores and SolrCloud. I also don't know of anyone who's tried and succeeded. I'm saying that the transient core stuff has been thoroughly tested in non-cloud mode. And people have been working with it for a couple of releases now. I know of no a-priori reason it wouldn't work in SolrCloud. But I haven't personally done it, nor do I know of anyone who has. It might just work, but the proof is in the pudding. I've heard some scuttlebutt that the combination of SolrCloud and transient cores is being, or will be soon, investigated. As in testing and writing test cases. Being a pessimist by nature on these things, I suspect (but don't know) that something will come up. For instance, SolrCloud tries to keep track of all the states of all the nodes. I _think_ (but don't know for sure) that this is just keeping contact with the JVM, not particular cores. But what if there's something I don't know about that pings the individual cores? That would keep them constantly loading/unloading, which might crop up in unexpected ways. I've got to emphasize that this is an unknown (at least to me), but an example of something that could crop up. I'm sure there are other possibilities. Or distributed updates. For that, every core on every node for a shard in collectionX must process the update. So for updates, each and every core in each and every shard might have to be loaded for the update to succeed if the core is transient. Does this happen fast enough in all cases so a timeout doesn't cause the update to fail? Or the node to be marked as down? What about combining that with a heavy query load? I just don't know. It's uncharted territory is all. I'd love it for you to volunteer to be the first :). There's certainly committer interest in making this case work so you wouldn't be left hanging all alone. If I were planning a product though, I'd either treat the combination of transient cores and SolrCloud as a RD project or go with non-cloud mode until I had some reassurance that transient cores and SolrCloud played nicely together. All that said, I don't want to paint too bleak a picture. All the transient core stuff is local to a particular node. SolrCloud and ZooKeeper shouldn't be interested in the details. It _should_ just work. It's just that I can't point to any examples where that's been tried Best, Erick On Wed, Dec 4, 2013 at 5:08 PM, hank williams hank...@gmail.com wrote: Oh my... when you say I don't know anyone who's combined the two. do you mean that those that have tried have failed or that no one has gotten around to trying? It sounds like you are saying you have some specific knowledge that right now these wont work, otherwise you wouldnt say committers will be addressing this sometime soon, right? I'm worried as we need to make a practical decision here and it sounds like maybe we should stick with solr for now... is that what you are saying? On Wed, Dec 4, 2013 at 5:01 PM, Erick Erickson erickerick...@gmail.com wrote: Hank: I should add that lots of cores and SolrCloud aren't guaranteed to play nice together. I think some of the committers will be addressing this sometime soon. I'm not saying that this will certainly fail, OTOH I don't know anyone who's combined the two. Erick On Wed, Dec 4, 2013 at 3:18 PM, hank williams hank...@gmail.com wrote: Super helpful. Thanks. On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey s...@elyograg.org wrote: On 12/4/2013 12:34 PM, hank williams wrote: Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we were *trying* to use the rest API function create to create cores without having to manually mess with files on the server. Is this what create was supposed to do? If so it was borken or we werent using it right. In any case in 4.6 is that the right way to programmatically add cores in discovery mode? If you are NOT in SolrCloud mode, in order to create new cores, the config files need to already exist on the disk. This is the case with all versions of Solr. If you're running in SolrCloud mode, the core is associated with a collection. Collections have a link to aconfig in zookeeper. The config is not stored with the core on the disk. Thanks, Shawn -- blog: whydoeseverythingsuck.com -- blog: whydoeseverythingsuck.com
Re: SOLR Master-Slave Repeater with Load balancer
bq: but we have limitation on the number of servers that we can use due to budget concerns (limit is 2) really, really, really push back to your project managers on this. So what you need 3 machines for a ZooKeeper quorum? The needs of ZK are quite light, they don't need a powerful machine. Your managers are saying for want of spending $1,000 on a machine, which we will waste 10 times that paying engineers to set up an old-style system, we can't go with SolrCloud. You can run the ZooKeeper instances in a separate JVM on your two servers and have a cheap machine running ZK for the third instance if necessary. Another rant finished. Erick On Wed, Dec 4, 2013 at 6:07 PM, kondamudims kondamud...@gmail.com wrote: Hi Erick, Thanks a lot for your explanation. We initially considered Solr Cloud but we have limitation on the number of servers that we can use due to budget concerns (limit is 2) Solr Cloud requires minimum 3. I have tried out the solution you suggested and so far its going well and we are not doing self polling concept. -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Master-Slave-Repeater-with-Load-balancer-tp4103363p4105017.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR Master-Slave Repeater with Load balancer
Erick is right, you have been put in a terrible position. You need to get agreement, in writing, that it is OK for search to go down when one server is out of service. This might be for scheduled maintenance or even a config update. When one server is down, search is down, period. This requirement is like choosing a truck, but insisting that there is only budget for three tires. You must, must, must communicate the risks associated with a two-server SolrCloud cluster. wunder On Dec 4, 2013, at 7:10 PM, Erick Erickson erickerick...@gmail.com wrote: bq: but we have limitation on the number of servers that we can use due to budget concerns (limit is 2) really, really, really push back to your project managers on this. So what you need 3 machines for a ZooKeeper quorum? The needs of ZK are quite light, they don't need a powerful machine. Your managers are saying for want of spending $1,000 on a machine, which we will waste 10 times that paying engineers to set up an old-style system, we can't go with SolrCloud. You can run the ZooKeeper instances in a separate JVM on your two servers and have a cheap machine running ZK for the third instance if necessary. Another rant finished. Erick On Wed, Dec 4, 2013 at 6:07 PM, kondamudims kondamud...@gmail.com wrote: Hi Erick, Thanks a lot for your explanation. We initially considered Solr Cloud but we have limitation on the number of servers that we can use due to budget concerns (limit is 2) Solr Cloud requires minimum 3. I have tried out the solution you suggested and so far its going well and we are not doing self polling concept. -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Master-Slave-Repeater-with-Load-balancer-tp4103363p4105017.html Sent from the Solr - User mailing list archive at Nabble.com. -- Walter Underwood wun...@wunderwood.org
SOLR 4 not utilizing multi CPU cores
Hi, We recently upgraded to SOLR 4.6 from SOLR 1.4.1. Overall the performance went down for large phrase queries. On some analysis we have seen that 1.4.1 utilized multiple cpu cores for such queries but SOLR 4.6 is only utilizing single cpu core. Any idea on what could be the reason? Note: We are not using SOLR Sharding. -- Regards, Salman Akram
Re: SOLR 4 not utilizing multi CPU cores
Hi, I did moreless the same but didn't get that behaviour...could you give us more details Best, Gazza On 5 Dec 2013 06:54, Salman Akram salman.ak...@northbaysolutions.net wrote: Hi, We recently upgraded to SOLR 4.6 from SOLR 1.4.1. Overall the performance went down for large phrase queries. On some analysis we have seen that 1.4.1 utilized multiple cpu cores for such queries but SOLR 4.6 is only utilizing single cpu core. Any idea on what could be the reason? Note: We are not using SOLR Sharding. -- Regards, Salman Akram
Re: facet.method=fcs vs facet.method=fc on solr slaves
Hello Patrick, Replication flushes UnInvertedField cache that impacts fc, but doesn't harm Lucene's FieldCache which is for fcs. You can check how much time in millis is spend on UnInvertedField cache regeneration in INFO logs like UnInverted multi-valued field ,time=### ... On Thu, Dec 5, 2013 at 12:15 AM, Patrick O'Lone pol...@townnews.com wrote: Is there any advantage on a Solr slave to receive queries using facet.method=fcs instead of the default of facet.method=fc? Most of the segment files are unchanged between replication events - but I wasn't sure if replication would cause the unchanged segment field caches to be lost anyway. -- Patrick O'Lone Director of Software Development TownNews.com E-mail ... pol...@townnews.com Phone 309-743-0809 Fax .. 309-743-0830 -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Questions about commits and OOE
On Wed, Dec 4, 2013 at 6:36 PM, OSMAN Metin metin.os...@canal-plus.comwrote: During this massive update, we have sometimes a peak of active threads exceeding the limit of 8192 process authorized for the user running the tomcat and zookeeper process. When this happens, every hardCommit is failing with an OutOfMemory : unable to create native thread message. Hello, Can you check by jstack what are these threads? If they are web container threads you need to limit thread pool, if these are background merge threads you might need to configure merge policy, etc. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Sorting on solr results
HI All, Please provide me your idea for below problem. I required to sort product on webshop price with position. e.g. If we have three product (A, B ,C) needs to sort Price asc and position asc. ID Price Position A 10 3 B 10 2 C 20 5 Result should be sorted forst by price than by position. Required Order of result : B A C While A,B products having same price but position of B is higher then A. My result set query as of now :@QueryTerm=*OnlineFlag=1@Sort.Price=0,position=0 Please suggest your views for the same. -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-on-solr-results-tp4105060.html Sent from the Solr - User mailing list archive at Nabble.com.