Understanding the Solr Admin page
I am expanding my Solr skills and would like to understand the Admin page better. I understand that understanding Java memory management and Java memory options will help me, and I am reading and experimenting on that front, but if there are any concise resources that are especially pertinent to Solr I would love to know about them. Everything that I've found is either a do this one-liner or expects Java experience which I don't have and don't know what I need to learn. I notice that some of the Args presented are in black text, and others in grey. Why are they presented differently? Where would I have found this information in the fine manual? When I start Solr with nohup, the resulting nohup.out file is _huge_. How might I start Solr such that INFO is not output, but only WARNINGs and SEVEREs are. In particular, I'd rather not log every query, even the invalid queries which also log as SEVERE. I thought that this would be easy to Google for, but it is not! If there is a concise document that examines this issue, I would love to know where on the wild wild web it exists. Thank you. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
solr 4.2.1 and docValues
Hi list, I want to try docValues for my facets and sorting with solr 4.2.1 and have already seen many papers, examples and source code about and around docValues, but there are still some questions. The example schema.xml has fields: field name=popularity type=int indexed=true stored=true / field name=manu_exact type=string indexed=true stored=false/ and it has a comment for docValues: !-- Some fields such as popularity and manu_exact could be modified to leverage doc values: field name=popularity type=int indexed=true stored=true docValues=true default=0 / field name=manu_exact type=string indexed=false stored=false docValues=true default= / ... -- For popularity with docValues indexed and stored are true and default is 0. For manu_exact with docValues indexed and stored are false and default is an empty string. Questions: - if docValues is true will this replace indexed=true as for field manu_exact? - what is the advantage of having indexed=true and docvalues=true? - what if default= also for the popularity int field? Regards Bernd
Re: Out of memory on some faceting queries
On Wed, Apr 3, 2013 at 8:47 PM, Shawn Heisey s...@elyograg.org wrote: On 4/2/2013 3:09 AM, Dotan Cohen wrote: I notice that this only occurs on queries that run facets. I start Solr with the following command: sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar /opt/solr-4.1.0/example/start.jar It looks like you've followed some advice that I gave previously on how to tune java. I have since learned that this advice is bad, it results in long GC pauses, even with heaps that aren't huge. I see, thanks. As others have pointed out, you don't have a max heap setting, which would mean that you're using whatever Java chooses for its default, which might not be enough. If you can get Solr to successfully run for a while with queries and updates happening, the heap should eventually max out and the admin UI will show you what Java is choosing by default. Here is what I would now recommend for a beginning point on your Solr startup command. You may need to increase the heap beyond 4GB, but be careful that you still have enough free memory to be able to do effective caching of your index. sudo nohup java -Xms4096M -Xmx4096M -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar /opt/solr-4.1.0/example/start.jar Thank you, I will experiment with that. If you are running a really old build of java (latest versions on Oracle's website are 1.6 build 43 and 1.7 build 17), you might want to leave AggressiveOpts out. Some people would argue that you should never use that option. Great, thank for the warning. This is what we're running, I'll see about updating it through my distro's package manager: $ java -version java version 1.6.0_27 OpenJDK Runtime Environment (IcedTea6 1.12.3) (6b27-1.12.3-0ubuntu1~12.04.1) OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode) -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: maxWarmingSearchers in Solr 4.
On Thu, Apr 4, 2013 at 10:54 PM, Shawn Heisey s...@elyograg.org wrote: You'll want to ensure that your autowarmCount value on Solr's caches is low enough that each commit happens quickly. If it takes 5000 milliseconds to warm the caches when you commit, then you want to be sure that you are committing less often than that, or you'll quickly reach your maxWarmingSearchers config value. If the commits are happening VARY quickly, you may need to set autowarmCount to 0, and possibly disable caches entirely. I see. This seems to be the opposite of the approach that I was taking. I went poking in the code, and it seems that maxWarmingSearchers defaults to Integer.MAX_VALUE. I'm not sure whether this is a bad default or not. It does mean that a pathological setup without maxWarmingSearchers in the config will probably blow up with an OutOfMemory exception, but is that better or worse than commits that don't make new documents searchable? I can see arguments either way. This is interesting, what you found is that the value in the stock solrconfig.xml file differs from the Solr default value. I think that this is bad practice: a single default should be decided upon and Solr should use this value when nothing is specified in solrconfig.xml, and that _same_value_ should be specified in the stock solrconfig.xml. Is it not a reasonable assumption that this would be the case? That was directed more at the other committers. I would argue that either a low number or a relatively high number should be the default, but not MAX_VALUE. The example config should have a commented out section for maxWarmingSearchers that mentions the default. I'm having the same discussion about maxBooleanClauses on SOLR-4586. Right. It's possible that this has already been discussed, and that everyone prefers that a badly configured setup will eventually have a spectacular blow up with OutOfMemory, rather than semi-silently ignoring commits. A searcher object contains caches and uses a lot of memory, so having lots of them around will eventually use up the entire heap. Silently dropping data is by far the worse choice, I agree, especially as a default setting. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Filtered search term suggestions via Facet Prefixing or NGrams
Hi, can somebody help, please? Maybe you can at least answer parts of my question? I'd expect that somebody at least knows the limitations of faceting with UninvertedField? Thank you, Andreas Andreas Hubold wrote on 04.04.2013 13:30: Hi, we've successfully implemented suggestion of search terms using facet prefixing with Solr 4.0. However, with lots of unique index terms we've encountered performance problems (long running queries) and even exceptions: Too many values for UnInvertedField faceting on field textbody. We must provide suggestions based on a prefix entered by the user. The solution should use the terms from an indexed text field. Furthermore the suggestions must be filtered according to some specified filter queries. Do you have any performance tips for facet prefixing or know how to avoid the above exception even in the case of many unique terms? What is causing the above exception: a) the total number of unique terms in the field or b) the number of unique terms in the field of a single document If b), is there a way to find such documents easily? Do you know how many unique terms can be handled without problems by facet prefixing? I've read the blog post http://www.searchworkings.org/blog/-/blogs/different-ways-to-make-auto-suggestions-with-solr which describes NGrams as another possible approach to implement suggestions with filtering. I would expect that this approach provides better query performance (at the cost of increased index size). However I haven't found detailed information how to implement it. I know how to configure a field for ngrams and how to perform a query using that field. But the results just give me the document but not the matched terms. Or am I expected to use a stored field and inspect its value? I also found this blog post where the Highlighter is used in combination with ngrams to provide suggestions: http://solr.pl/en/2013/02/25/autocomplete-on-multivalued-fields-using-highlighting/ Can this be used to get the suggested terms from a document? What about performance? Will such an approach perform better than facet prefixing for large text fields with lots of unique terms? Any hints appreciated. Thank you, Andreas
help needed for applying patch to solr I am using
hi all I am new to solr and wanted to apply this patch to my solr how can I do this searched on net but did not got any thing useful the patch is: https://issues.apache.org/jira/browse/SOLR-2585 I am using solr 4.1.0 on tomcat6 in redhat centos. thanks regards rohan
edismax returns very less matches than regular
I have a simple system. I put the title of webpages into the name field and content of the web pages into the Description field. I want to search both fields and give the name a little more boost. A search on name field or description field returns records cloase to hundreds. http://localhost:8983/solr/select/?q=name:%28coldfusion^2%20cache^1%29fq=author:[*%20TO%20*]%20AND%20-author:chinmoypstart=0rows=10fl=author,score,%20id But search on both fields using boost just gives 5 matches. http://localhost:8983/solr/mindfire/?q=%28%20coldfusion^2%20cache^1%29*defType=edismaxqf=name^1.5%20description^1.0*fq=author:[*%20TO%20*]%20AND%20-author:chinmoypstart=0rows=10fl=author,score,%20id I am wondering what is wrong, because there are valid results returned in 1st query which is ignored by edismax. I am on solr3.6 -- View this message in context: http://lucene.472066.n3.nabble.com/edismax-returns-very-less-matches-than-regular-tp4054442.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr spell suggestions help
hi all I have resolved all issues(its was relating to the distance measures I was using was by default lavanstine which is very basic and is not good now I am using jarowinkler distance measures which is better and now giving exact results that I was looking for) except the 4th one which I think is solrs issue and they have also released patch for that https://issues.apache.org/jira/browse/SOLR-2585 I am applying this patch now will let you know if its is working correctly. thanks regards Rohan On Fri, Apr 5, 2013 at 4:44 PM, Rohan Thakur rohan.i...@gmail.com wrote: hi all I had some issues with solr spell suggestions. 1) first of all I wanted to know is indexbased spell suggestions better then directspell suggestions that solr 4.1 provides in any way? 2) then I wanted to know is their way I can get suggestions for words providing only few prefix for the word. like when I query sam I should get samsung as one of suggestion. 3) also I wanted to know why am I not getting suggestions for the words that have more then 2 character difference between them like if I query for wirlpool wich has 8 characters I get suggestion as whirlpool which is 9 characters and correct spelling but when I query for wirlpol which is 7 characters it says that this is false spelling but does not show any suggestions. even like if I search for pansonic(8 char) it provides panasonic(9 char) as suggestion but when I remove one more character that is is search for panonic(7 char) it does not return any suggestions?? how can I correct this? even when I search for ipo it does not return ipod as suggestions? 4) one more thing I want to get clear that when I search for microwave ovan it does not give any miss spell even when ovan is wrong it provides the result for microwave saying the query is correct...this is the case when one of the term in the query is correct while others are incorrect it does not point out the wrong spelling one but reutrns the result for correct word thats it how can I correct this? similar is the case when I query for microvave oven is shows the result for oven saying that the query is correct.. 5) one more case is when I query plntronies (correct word is: plantronics) it does not return any solution but when I query for plantronies it returns the plantronics as suggestions why is that happening? *my schema.xml is:* fieldType name=tSpell class=solr.TextField positionIncrementGap=100 omitNorms=true analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=\\\[\]\(\)\-\,\/\+ replacement= / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LengthFilterFactory min=2 max=20/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LengthFilterFactory min=2 max=20/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType field name=spell type=tSpell indexed=true stored=true / copyField source=title dest=spell / *my solrconfig.xml is :* searchComponent name=spellcheck class=solr.SpellCheckComponent !-- Multiple Spell Checkers can be declared and used by this component -- !-- a spellchecker built from a field of the main index -- lst name=spellchecker !-- Optional, it is required when more than one spellchecker is configured. Select non-default name with spellcheck.dictionary in request handler. -- *str name=namedefault/str* str name=classnamesolr.DirectSolrSpellChecker/str !-- the spellcheck distance measure used, the default is the internal levenshtein -- !-- Load tokens from the following field for spell checking, analyzer for the field's type as defined in schema.xml are used -- *str name=fieldspell/str str name=distanceMeasureinternal/str !-- minimum accuracy needed to be considered a valid spellcheck suggestion -- float name=accuracy0.3/float !-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -- int name=maxEdits1/int !-- the minimum shared prefix when enumerating terms -- int name=minPrefix1/int !-- maximum number of inspections per result. -- int name=maxInspections5/int !-- minimum length of a query term to be considered for correction -- int name=minQueryLength4/int !-- maximum threshold of
solr 4.2.1 still has problems with index version and index generation
I know there was some effort to fix this but I must report that solr 4.2.1 has still problems with index version and index generation numbering in master/slave mode with replication. Test was: 1. installed solr 4.2.1 on master and build index from scratch 2. installed solr 4.2.1 on slave with empty index 3. replicated master to slave, everything was fine, both in sync 4. deleted index on master with *:* and build index from scratch 5. replicated index from master to slave RESULT: slave has different (higher) version number and is with generation 1 ahead :-( Regards Bernd
Re: help needed for applying patch to solr I am using
hi all just checked out this issue was already incorporated in solr4.0 alpha and I am using solr4.1.0 so this must have been in this as wellbut still why am I not getting suggestions for word like microvave oven its stating it to be correct and returning results based of oven wordwhy is this happening? any one please help and when I am querying it like microvave oven its providing corrected suggestionhow to handle this any one please help... thanks regards Rohan On Mon, Apr 8, 2013 at 1:18 PM, Rohan Thakur rohan.i...@gmail.com wrote: hi all I am new to solr and wanted to apply this patch to my solr how can I do this searched on net but did not got any thing useful the patch is: https://issues.apache.org/jira/browse/SOLR-2585 I am using solr 4.1.0 on tomcat6 in redhat centos. thanks regards rohan
SOLR-4581
Hello, I created https://issues.apache.org/jira/browse/SOLR-4581 on 14.03.2013. Can anyone help me out with this? Thank You. Alexander Buhr Software Engineer ePages GmbH Pilatuspool 2 20355 Hamburg Germany +49-40-350 188-266 phone +49-40-350 188-222 fax a.b...@epages.commailto:a.b...@epages.com www.epages.comhttp://www.epages.com/ www.epages.com/bloghttp://www.epages.com/blog www.epages.com/twitterhttp://www.epages.com/twitter www.epages.com/facebookhttp://www.epages.com/facebook e-commerce. now plugplay. Geschäftsführer: Wilfried Beeck Handelsregister: Amtsgericht Hamburg HRB 120861 Sitz der Gesellschaft: Pilatuspool 2, 20355 Hamburg Steuernummer: 48/718/02195 USt-Ident.-Nr.: DE 282 947 700
Re: Understanding the Solr Admin page
Dotan On Monday, April 8, 2013 at 8:21 AM, Dotan Cohen wrote: I notice that some of the Args presented are in black text, and others in grey. Why are they presented differently? Where would I have found this information in the fine manual? iirc there is one ticket open which is related to this. initially that was not meant to highlight specific values .. just a simple even/odd style to make it easier to read the different lines - at least that is what i thought it would be. looks like you're the second one being confused by them, so we'll take it out i'd say? On Monday, April 8, 2013 at 8:21 AM, Dotan Cohen wrote: When I start Solr with nohup, the resulting nohup.out file is _huge_. How might I start Solr such that INFO is not output, but only WARNINGs and SEVEREs are. In particular, I'd rather not log every query, even the invalid queries which also log as SEVERE. Since you're not telling us, how you get it started .. it's just a guess :) For starters: http://wiki.apache.org/solr/LoggingInDefaultJettySetup otherwise, the more advanced one: http://wiki.apache.org/solr/SolrLogging HTH Stefan On Monday, April 8, 2013 at 8:21 AM, Dotan Cohen wrote: I am expanding my Solr skills and would like to understand the Admin page better. I understand that understanding Java memory management and Java memory options will help me, and I am reading and experimenting on that front, but if there are any concise resources that are especially pertinent to Solr I would love to know about them. Everything that I've found is either a do this one-liner or expects Java experience which I don't have and don't know what I need to learn. I notice that some of the Args presented are in black text, and others in grey. Why are they presented differently? Where would I have found this information in the fine manual? When I start Solr with nohup, the resulting nohup.out file is _huge_. How might I start Solr such that INFO is not output, but only WARNINGs and SEVEREs are. In particular, I'd rather not log every query, even the invalid queries which also log as SEVERE. I thought that this would be easy to Google for, but it is not! If there is a concise document that examines this issue, I would love to know where on the wild wild web it exists. Thank you. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Prediction About Index Sizes of Solr
This may not be a well detailed question but I will try to make it clear. I am crawling web pages and will index them at SolrCloud 4.2. What I want to predict is the index size. I will have approximately 2 billion web pages and I consider each of them will be 100 Kb. I know that it depends on storing documents, stop words. etc. etc. If you want to ask about detail of my question I may give you more explanation. However there should be some analysis to help me because I should predict something about what will be the index size for me. On the other hand my other important question is how SolrCloud makes replicas for indexes, can I change it how many replicas will be. Because I should multiply the total amount of index size with replica size. Here I found an article related to my analysis: http://juanggrande.wordpress.com/2010/12/20/solr-index-size-analysis/ I know this question may not be details but if you give ideas about it you are welcome.
Re: Prediction About Index Sizes of Solr
Hello! Let me answer the first part of your question. Please have a look at https://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lucene-solr.xls It should help you make an estimation about your index size. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch This may not be a well detailed question but I will try to make it clear. I am crawling web pages and will index them at SolrCloud 4.2. What I want to predict is the index size. I will have approximately 2 billion web pages and I consider each of them will be 100 Kb. I know that it depends on storing documents, stop words. etc. etc. If you want to ask about detail of my question I may give you more explanation. However there should be some analysis to help me because I should predict something about what will be the index size for me. On the other hand my other important question is how SolrCloud makes replicas for indexes, can I change it how many replicas will be. Because I should multiply the total amount of index size with replica size. Here I found an article related to my analysis: http://juanggrande.wordpress.com/2010/12/20/solr-index-size-analysis/ I know this question may not be details but if you give ideas about it you are welcome.
Re: help needed for applying patch to solr I am using
hi all I think I have to pass the query in inverted commas then its returning correct results as i needed. thanks regards Rohan On Mon, Apr 8, 2013 at 1:50 PM, Rohan Thakur rohan.i...@gmail.com wrote: hi all just checked out this issue was already incorporated in solr4.0 alpha and I am using solr4.1.0 so this must have been in this as wellbut still why am I not getting suggestions for word like microvave oven its stating it to be correct and returning results based of oven wordwhy is this happening? any one please help and when I am querying it like microvave oven its providing corrected suggestionhow to handle this any one please help... thanks regards Rohan On Mon, Apr 8, 2013 at 1:18 PM, Rohan Thakur rohan.i...@gmail.com wrote: hi all I am new to solr and wanted to apply this patch to my solr how can I do this searched on net but did not got any thing useful the patch is: https://issues.apache.org/jira/browse/SOLR-2585 I am using solr 4.1.0 on tomcat6 in redhat centos. thanks regards rohan
Moving from SOLR3.6 to SOLR4.0 - Last remaining warnings
Hi. I have recently made the move from SOLR3.6 to SOLR4.0 and so far everything seems super apart frmo the fact that I had a lot of warnings in my logging section on the solr admin panel. I have tried to work through as many as possible but I have a few that I am not able to correct. This is the hot list: === 11:56:41 WARNING IndexSchema uniqueKey is not stored - distributed search and MoreLikeThis will not work 11:56:41 WARNING SolrCore [collection1] Solr index directory '/opt/solr/example/solr/collection1/data/index' doesn't exist. Creating new index... 11:56:42 WARNING UpdateRequestHandler Using deprecated class: XmlUpdateRequestHandler -- replace with UpdateRequestHandler 11:56:42 WARNING UpdateRequestHandler Using deprecated class: BinaryUpdateRequestHandler -- replace with UpdateRequestHandler === Would anyone please mind explaining to me where I am going wrong to get these errors? It looks like content in my configs is causing them but for the most part my configurations are pretty standard. Any help or advice would be much appreciated. James -- View this message in context: http://lucene.472066.n3.nabble.com/Moving-from-SOLR3-6-to-SOLR4-0-Last-remaining-warnings-tp4054459.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Prediction About Index Sizes of Solr
Interesting bit, thanks* *Rafał! On Mon, Apr 8, 2013 at 12:54 PM, Rafał Kuć r@solr.pl wrote: Hello! Let me answer the first part of your question. Please have a look at https://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lucene-solr.xls It should help you make an estimation about your index size. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch This may not be a well detailed question but I will try to make it clear. I am crawling web pages and will index them at SolrCloud 4.2. What I want to predict is the index size. I will have approximately 2 billion web pages and I consider each of them will be 100 Kb. I know that it depends on storing documents, stop words. etc. etc. If you want to ask about detail of my question I may give you more explanation. However there should be some analysis to help me because I should predict something about what will be the index size for me. On the other hand my other important question is how SolrCloud makes replicas for indexes, can I change it how many replicas will be. Because I should multiply the total amount of index size with replica size. Here I found an article related to my analysis: http://juanggrande.wordpress.com/2010/12/20/solr-index-size-analysis/ I know this question may not be details but if you give ideas about it you are welcome.
Re: solr 4.2.1 still has problems with index version and index generation
I'm on 4.1 and I have a similar problem. Except for the version number everything else seems to be fine. Is that what other people are seeing? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-4-2-1-still-has-problems-with-index-version-and-index-generation-tp405p4054463.html Sent from the Solr - User mailing list archive at Nabble.com.
Numeric fields and payload
Hi, is it possible to store (text) payload to numeric fields (class solr.TrieDoubleField)? My goal is to store measure units to numeric features - e.g. '1.5 cm' - and to use faceted search with these fields. But the field type doesn't allow analyzers to add the payload data. I want to avoid database access to load the units. I'm using Solr 4.2 . Regards, Holger
Re: Numeric fields and payload
Sounds like another new field-mutating update processor is needed - add payload. -- Jack Krupansky -Original Message- From: Holger Rieß Sent: Monday, April 08, 2013 8:27 AM To: solr-user@lucene.apache.org Subject: Numeric fields and payload Hi, is it possible to store (text) payload to numeric fields (class solr.TrieDoubleField)? My goal is to store measure units to numeric features - e.g. '1.5 cm' - and to use faceted search with these fields. But the field type doesn't allow analyzers to add the payload data. I want to avoid database access to load the units. I'm using Solr 4.2 . Regards, Holger
FileBasedSpellchecker with Frequency comaprator
Hi I want to configure file based spellchecker for my application. I am taking the words frol spellcheck.txt file and building the spellcheckerFile directory index. It works fine. But it is not using the frequency of the words into consideration while giving the spell suggestion. I have duplicated the terms that are important in the spellcheck.txt file, by repeating as many times as needed, since FileBasedSpellcheker cannot take the numeric frequency. But still it does not reflect in scoring. Is it the way to go about it, can someone please explain this clearly how solr supports to build FileBasedSpell Check index from a file along with frequency? Is it doable by configuring in solrconfig.xml or should we need to write spellcheck client explicity?
Number of segments
I'm running solr 4.0. I'm noticing my segments are staying in the 30+ range, even though I have these settings: indexConfig mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=segmentsPerTier10/int int name=maxMergeAtOnce10/int int name=maxMergeAtOnceExplicit10/int /mergePolicy useCompoundFilefalse/useCompoundFile Can anyone give me some advice on what I should change or check?
Indexed data not searchable
Hello, I'm very new to Solr and I come to an unexplainable point by myself so I need your help. I have indexed a huge amount of xml-Files by a shell script. function solringest_rec { for SRCFILE in $(find $1 -type f); do #DESTFILE=$URL${SRCFILE/$1/} echo ingest $SRCFILE curl $URL -H Content-type: text/xml --data-binary @$SRCFILE done } The respone I get is everytime: ?xml version=1.0! encoding=UTF-8? response lst name=responseHeaderint name=status0int name=QTime116/int/lst /respone Because of this I think that everything should be fine but the queries doesn't work. For all other operations as the post operation I use the stuff from example folder. Maybe I have to configure something in the schema.xml or solrconfig.xml ? Hope you can help me! Kind regards, Max -- View this message in context: http://lucene.472066.n3.nabble.com/Indexed-data-not-searchable-tp4054473.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Number of segments
On Mon, Apr 8, 2013, at 02:35 PM, Michael Long wrote: I'm running solr 4.0. I'm noticing my segments are staying in the 30+ range, even though I have these settings: indexConfig mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=segmentsPerTier10/int int name=maxMergeAtOnce10/int int name=maxMergeAtOnceExplicit10/int /mergePolicy useCompoundFilefalse/useCompoundFile Can anyone give me some advice on what I should change or check? How many documents do you have? How big are the files on disk? Note it says segments per tier, you may have multiple tiers at play meaning you can have more than ten segments. There's also, I believe, properties that define the maximum size on disk for a segment and the like that can prevent merges from happening. Upayavira
Re: Indexed data not searchable
That is the structure of your content? Is it formatted in the same XML structure as the example data is? What URL are you posting to? Upayavira On Mon, Apr 8, 2013, at 02:08 PM, Max Bo wrote: Hello, I'm very new to Solr and I come to an unexplainable point by myself so I need your help. I have indexed a huge amount of xml-Files by a shell script. function solringest_rec { for SRCFILE in $(find $1 -type f); do #DESTFILE=$URL${SRCFILE/$1/} echo ingest $SRCFILE curl $URL -H Content-type: text/xml --data-binary @$SRCFILE done } The respone I get is everytime: ?xml version=1.0! encoding=UTF-8? response lst name=responseHeaderint name=status0int name=QTime116/int/lst /respone Because of this I think that everything should be fine but the queries doesn't work. For all other operations as the post operation I use the stuff from example folder. Maybe I have to configure something in the schema.xml or solrconfig.xml ? Hope you can help me! Kind regards, Max -- View this message in context: http://lucene.472066.n3.nabble.com/Indexed-data-not-searchable-tp4054473.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexed data not searchable
On 8 April 2013 18:38, Max Bo maximilian.brod...@gmail.com wrote: Hello, I'm very new to Solr and I come to an unexplainable point by myself so I need your help. I have indexed a huge amount of xml-Files by a shell script. [...] For posting XML files to Solr directly with curl the XML files need to be in a particular format, and you need to commit at least at the end of the indexing. Please see http://wiki.apache.org/solr/UpdateXmlMessages If you are following the exact command there, and using XML files from the example/ directory, things should just work. Regards, Gora
Re: Number of segments
On 04/08/2013 09:41 AM, Upayavira wrote: How many documents do you have? How big are the files on disk? 2,795,601 and the index dir is 50G Note it says segments per tier, you may have multiple tiers at play meaning you can have more than ten segments. How do I determine how many tiers it has? There's also, I believe, properties that define the maximum size on disk for a segment and the like that can prevent merges from happening. I just have the defaults...nothing explicitly set Upayavira
Re: Indexed data not searchable
Thanks for your help: The URL I'am positng to is: http://localhost:8983/solr/update?commit=true The XML-Filess I've added contains fields like author so I thought they have to serachable since it it declared as indexed in the example schema. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexed-data-not-searchable-tp4054473p4054481.html Sent from the Solr - User mailing list archive at Nabble.com.
Sub field indexing
Hello All, I'd like to be able to index documents containing criteria and values. In exemple I have a product A this product is compatible with a Product B version 1, 5, 6. My actual schema is like this : Name Price reference features text How can I index values like : compatible_engine : [productB,ProductZ] version_compatible : [1,5,6],[45,85,96] After indexing how to search into ? Best regards Eric
Re: Indexed data not searchable
Are you sure your XML is formatted according to the SolrXML rules? See: http://wiki.apache.org/solr/UpdateXmlMessages I have to ask, because sometimes people send raw XML to Solr, not realizing that Solr accepts a particular format of XML. -- Jack Krupansky -Original Message- From: Max Bo Sent: Monday, April 08, 2013 9:56 AM To: solr-user@lucene.apache.org Subject: Re: Indexed data not searchable Thanks for your help: The URL I'am positng to is: http://localhost:8983/solr/update?commit=true The XML-Filess I've added contains fields like author so I thought they have to serachable since it it declared as indexed in the example schema. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexed-data-not-searchable-tp4054473p4054481.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexed data not searchable
On 8 April 2013 19:26, Max Bo maximilian.brod...@gmail.com wrote: Thanks for your help: The URL I'am positng to is: http://localhost:8983/solr/update?commit=true The XML-Filess I've added contains fields like author so I thought they have to serachable since it it declared as indexed in the example schema. Please include an example of your .xml file and of Solr's schema.xml. It is difficult to keep guessing in the dark. Regards, Gora
RE: FileBasedSpellchecker with Frequency comaprator
I do not believe that FileBasedSpellchecker takes frequency into account at all. That would be a nice enhancement though. To get what you wanted, you could index one or more documents containing the words in your file then create a spellchecker using IndexBasedSpellChecker or DirectSolrSpellChecker. I don't remember off-hand how the spellcheckers count document frequency whether or not multiple occurances in the same document count (I think they do). If so, you could accomplish this with 1 dummy spellcheck-building document and 1 big indexed field. You could even create an IndexBasedSpellChecker dictionary then delete the dummy document(s). (but be sure to lock down spellcheck.build, possibly by putting it in the invariants section of all your request handlers so that you don't accidently overlay it). James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: ilay raja [mailto:ilay@gmail.com] Sent: Monday, April 08, 2013 8:02 AM To: solr-user@lucene.apache.org; solr-...@lucene.apache.org Subject: FileBasedSpellchecker with Frequency comaprator Hi I want to configure file based spellchecker for my application. I am taking the words frol spellcheck.txt file and building the spellcheckerFile directory index. It works fine. But it is not using the frequency of the words into consideration while giving the spell suggestion. I have duplicated the terms that are important in the spellcheck.txt file, by repeating as many times as needed, since FileBasedSpellcheker cannot take the numeric frequency. But still it does not reflect in scoring. Is it the way to go about it, can someone please explain this clearly how solr supports to build FileBasedSpell Check index from a file along with frequency? Is it doable by configuring in solrconfig.xml or should we need to write spellcheck client explicity?
Re: Indexed data not searchable
hi I use dataimporter the actual entity contain this : field column=id_product name=id / field column=quantity name=inStock / field column=reference name=ref / field column=supplier name=brand / field column=manufacturer name=brand / field column=name name=brand / field column=comptabible_model regex=Piéce détachée pour ([\w 0-9éèêîûô]+) Modèle sourceColName=description_short / field column=version_model regex=Modèle:([0-9a-zA-Zéèêîûô-]+),? sourceColName=description_short / data sample : Piéce détachée pour Skimmer COFIES Modèle:Premium-Design-Omega, Zipper5 Piéce détachée pour Régulateur de niveau modèle 3150 Modèle:3150 depuis 2003 Ideal result : name = Couvercle SK siglé - HAYWARD manufacturer = HAYWARD compatibility = [Skimmer COFIES] - [Premium-Design-Omega, Zipper5] [Régulateur de niveau modèle 3150] - [3150 depuis 2003] Then I wish to ba able to get all result for, all product with HAYWARD as Manufacturer. Then retreive the list of All Compatible product, in end the list of available model. Schema.xml contains : field name=ref type=string indexed=true stored=true omitNorms=true multiValued=false/ field name=name type=text_fr indexed=true stored=true / field name=cat type=text_fr indexed=true stored=true multiValued=true / field name=brand type=text_fr indexed=true stored=true multiValued=true / field name=features type=text_fr indexed=true stored=true multiValued=true / where fieldType name=text_fr class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory / !-- removes l', etc -- filter class=solr.ElisionFilterFactory ignoreCase=true articles=lang/contractions_fr.txt / filter class=solr.LowerCaseFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_fr.txt format=snowball enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.StandardFilterFactory / filter class=solr.FrenchLightStemFilterFactory / filter class=solr.FrenchMinimalStemFilterFactory / !-- more aggressive: filter class=solr.SnowballPorterFilterFactory language=French/ -- /analyzer /fieldType fieldType name=text_html_fr class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.StandardTokenizerFactory / !-- removes l', etc -- filter class=solr.ElisionFilterFactory ignoreCase=true articles=lang/contractions_fr.txt / filter class=solr.LowerCaseFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_fr.txt format=snowball enablePositionIncrements=true / filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.FrenchLightStemFilterFactory / filter class=solr.FrenchMinimalStemFilterFactory / filter class=solr.SnowballPorterFilterFactory language=French/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType I do not see how to organize this specification correctly with solr. regards eric Le 08/04/2013 16:36, Gora Mohanty a écrit : On 8 April 2013 19:26, Max Bo maximilian.brod...@gmail.com wrote: Thanks for your help: The URL I'am positng to is: http://localhost:8983/solr/update?commit=true The XML-Filess I've added contains fields like author so I thought they have to serachable since it it declared as indexed in the example schema. Please include an example of your .xml file and of Solr's schema.xml. It is difficult to keep guessing in the dark. Regards, Gora
Re: solr 4.2.1 still has problems with index version and index generation
This is the jira issue that addresses this: https://issues.apache.org/jira/browse/SOLR-4661 I'll try to find some time today and test out the patch and see how things look. On Mon, Apr 8, 2013 at 7:18 AM, Tom Gullo springmeth...@gmail.com wrote: I'm on 4.1 and I have a similar problem. Except for the version number everything else seems to be fine. Is that what other people are seeing? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-4-2-1-still-has-problems-with-index-version-and-index-generation-tp405p4054463.html Sent from the Solr - User mailing list archive at Nabble.com. -- Joel Bernstein Professional Services LucidWorks
Re: Indexed data not searchable
On 8 April 2013 21:35, It-forum it-fo...@meseo.fr wrote: hi I use dataimporter [...] Please do not hijack threads. Instead, start a new one for your questions, or follow up in a thread that you had started. Here is why this is bad practice: http://people.apache.org/~hossman/#threadhijack Regards, Gora
Re: Moving from SOLR3.6 to SOLR4.0 - Last remaining warnings
Here's my guess. On 4/8/13 4:04 AM, Spadez wrote: 11:56:41 WARNING IndexSchema uniqueKey is not stored - distributed search and MoreLikeThis will not work I think the warning is saying the field you specify in uniqueKey does not specify stored=true. Check your schema.xml. 11:56:41 WARNING SolrCore [collection1] Solr index directory '/opt/solr/example/solr/collection1/data/index' doesn't exist. Creating new index... This is normal. 11:56:42 WARNING UpdateRequestHandler Using deprecated class: XmlUpdateRequestHandler -- replace with UpdateRequestHandler 11:56:42 WARNING UpdateRequestHandler Using deprecated class: BinaryUpdateRequestHandler -- replace with UpdateRequestHandler UpdateRequestHandler now takes care of XML, JSON, CSV and binary document formats by itself. No specialized subclasses like XmlUpdateRequestHandler should be used. All these deprecated subclasses add is output of this deprecation warning. But you can ignore these warnings for this version of Solr. -- Kuro Kurosaka
Re: Moving from SOLR3.6 to SOLR4.0 - Last remaining warnings
replace with UpdateRequestHandler Just compare your solrconfig to the new one and consider updating yours and using the newer Solr update API that automatically uses the content type to internally dispatch to the proper update handler. But, it's just a warning, for now. -- Jack Krupansky -Original Message- From: Kuro Kurosaka Sent: Monday, April 08, 2013 12:57 PM To: solr-user@lucene.apache.org Subject: Re: Moving from SOLR3.6 to SOLR4.0 - Last remaining warnings Here's my guess. On 4/8/13 4:04 AM, Spadez wrote: 11:56:41 WARNING IndexSchema uniqueKey is not stored - distributed search and MoreLikeThis will not work I think the warning is saying the field you specify in uniqueKey does not specify stored=true. Check your schema.xml. 11:56:41 WARNING SolrCore [collection1] Solr index directory '/opt/solr/example/solr/collection1/data/index' doesn't exist. Creating new index... This is normal. 11:56:42 WARNING UpdateRequestHandler Using deprecated class: XmlUpdateRequestHandler -- replace with UpdateRequestHandler 11:56:42 WARNING UpdateRequestHandler Using deprecated class: BinaryUpdateRequestHandler -- replace with UpdateRequestHandler UpdateRequestHandler now takes care of XML, JSON, CSV and binary document formats by itself. No specialized subclasses like XmlUpdateRequestHandler should be used. All these deprecated subclasses add is output of this deprecation warning. But you can ignore these warnings for this version of Solr. -- Kuro Kurosaka
Re: Empty Solr 4.2.1 can not create Collection
The steps that I use to setup the collection are slightly different: 1) Start zk and upconfig the config set. Your approach is same. 2) Start appservers with Solr zkHost set to the zk started in step 1. 3) Use a core admin command to spin up a new core and collection. http://app01/solr/admin/cores?action=CREATEname=storage-corecollection=storagenumShards=1collection.configName=storage-confhttp://app03/solr/admin/collections?action=CREATEname=storagenumShards=1replicationFactor=2collection.configName=storage-conf shard=shard1 This will spin up the new collection and initial core. I'm not using a replication factor because the following commands manually bind the replicas. 4) Spin up replica with a core admin command: http://app02/solr/admin/cores?action=CREATEname=storage-corecollection=storage;http://app03/solr/admin/collections?action=CREATEname=storagenumShards=1replicationFactor=2collection.configName=storage-conf shard=shard1 5) Same command as above on the 3rd server to spin up another replica. This will spin up a new core and bind it to shard1 of the storage collection. On Mon, Apr 8, 2013 at 9:34 AM, A.Eibner a_eib...@yahoo.de wrote: Hi, I have a problem with setting up my solr cloud environment (on three machines). If I want to create my collections from scratch I do the following: *) Start ZooKeeper on all machines. *) Upload the configuration (on app02) for the collection via the following command: zkcli.sh -cmd upconfig --zkhost app01:4181,app02:4181,app03:**4181 --confdir config/solr/storage/conf/ --confname storage-conf *) Linking the configuration (on app02) via the following command: zkcli.sh -cmd linkconfig --collection storage --confname storage-conf --zkhost app01:4181,app02:4181,app03:**4181 *) Start Tomcats (containing Solr) on app02,app03 *) Create Collection via: http://app03/solr/admin/**collections?action=CREATE** name=storagenumShards=1**replicationFactor=2** collection.configName=storage-**confhttp://app03/solr/admin/collections?action=CREATEname=storagenumShards=1replicationFactor=2collection.configName=storage-conf This creates the replication of the shard on app02 and app03, but neither of them is marked as leader, both are marked as DOWN. And after wards I can not access the collection. In the browser I get: SEVERE: org.apache.solr.common.**SolrException: no servers hosting shard: In the log files the following error is present: SEVERE: Error from shard: app02:9985/solr org.apache.solr.common.**SolrException: Error CREATEing SolrCore 'storage_shard1_replica1': at org.apache.solr.client.solrj.**impl.HttpSolrServer.request(** HttpSolrServer.java:404) at org.apache.solr.client.solrj.**impl.HttpSolrServer.request(** HttpSolrServer.java:181) at org.apache.solr.handler.**component.HttpShardHandler$1.** call(HttpShardHandler.java:**172) at org.apache.solr.handler.**component.HttpShardHandler$1.** call(HttpShardHandler.java:**135) at java.util.concurrent.**FutureTask$Sync.innerRun(** FutureTask.java:334) at java.util.concurrent.**FutureTask.run(FutureTask.**java:166) at java.util.concurrent.**Executors$RunnableAdapter.** call(Executors.java:471) at java.util.concurrent.**FutureTask$Sync.innerRun(** FutureTask.java:334) at java.util.concurrent.**FutureTask.run(FutureTask.**java:166) at java.util.concurrent.**ThreadPoolExecutor.runWorker(** ThreadPoolExecutor.java:1110) at java.util.concurrent.**ThreadPoolExecutor$Worker.run(** ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.**java:722) Caused by: org.apache.solr.common.cloud.**ZooKeeperException: at org.apache.solr.core.**CoreContainer.registerInZk(** CoreContainer.java:922) at org.apache.solr.core.**CoreContainer.registerCore(** CoreContainer.java:892) at org.apache.solr.core.**CoreContainer.register(** CoreContainer.java:841) at org.apache.solr.handler.admin.**CoreAdminHandler.** handleCreateAction(**CoreAdminHandler.java:479) ... 19 more Caused by: org.apache.solr.common.**SolrException: Error getting leader from zk for shard shard1 at org.apache.solr.cloud.**ZkController.getLeader(** ZkController.java:864) at org.apache.solr.cloud.**ZkController.register(** ZkController.java:776) at org.apache.solr.cloud.**ZkController.register(** ZkController.java:727) at org.apache.solr.core.**CoreContainer.registerInZk(** CoreContainer.java:908) ... 22 more Caused by: java.lang.**InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.solr.cloud.**ZkController.getLeaderProps(** ZkController.java:905) at org.apache.solr.cloud.**ZkController.getLeaderProps(** ZkController.java:875) at org.apache.solr.cloud.**ZkController.getLeader(** ZkController.java:839) ... 25 more I have
Re: Sub field indexing
Solr does not support querying nested data structures. If at query time you know the product you want to check compatibility for, you can use dynamic fields. Thus, if you want to find products compatible with productB, you could index: id: productA compatible_productB: 1, 5, 6 compatible_productZ: 45, 85, 96 Then you can search for: q=compatible_productB: 5 This will find you all documents that are compatible with productB version 5. Upayavira On Mon, Apr 8, 2013, at 03:04 PM, It-forum wrote: Hello All, I'd like to be able to index documents containing criteria and values. In exemple I have a product A this product is compatible with a Product B version 1, 5, 6. My actual schema is like this : Name Price reference features text How can I index values like : compatible_engine : [productB,ProductZ] version_compatible : [1,5,6],[45,85,96] After indexing how to search into ? Best regards Eric
Re: Moving from SOLR3.6 to SOLR4.0 - Last remaining warnings
On Apr 8, 2013, at 19:09 , Jack Krupansky j...@basetechnology.com wrote: replace with UpdateRequestHandler Just compare your solrconfig to the new one and consider updating yours and using the newer Solr update API that automatically uses the content type to internally dispatch to the proper update handler. But, it's just a warning, for now. It would probably be a good idea to split solrconfig, so that settings specific to a SOLR version would be separate from settings that mainly (only?) depend on the schema... I was quite surprised when I had to re-do my modifications tol solrconfig.xml when moving from 4.0 to 4.1.
stupid collection tricks
Hi, I've been shuffling things around in our Solr Cloud for the past few weeks, and now some cruft has accrued, probably from not having done things in the proper way. Now I have two (obsolete) collections I want to delete, but none of the hosts that are marked as having hosted it are around anymore. Running a DELETE from the collections api doesn't seem to do something, probably because no hosts exist that can be considered leaders for each of the shards of those two collections. Additionally, I have a few shards that have the names of former hosts registered, and I'd like to remove them. Is there an easy way to do this? Do I need to manually edit clusterstate.json somehow? I should mention that everything's running 4.2.1 now. Thanks, Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game
RE: Sub field indexing
It-forum [it-fo...@meseo.fr]: In exemple I have a product A this product is compatible with a Product B version 1, 5, 6. How can I index values like : compatible_engine : [productB,ProductZ] version_compatible : [1,5,6],[45,85,96] Index them as compatible_engine: productB/1 compatible_engine: productB/5 compatible_engine: productB/6 compatible_engine: productZ/45 compatible_engine: productZ/85 compatible_engine: productZ/96 in a StrField (so that it is not tokenized). After indexing how to search into ? compatible_engine:productZ/85 to get all products compatible with productZ, version 85 compatible_engine:productZ* to get all products compatible with any version of productZ. - Toke Eskildsen
Re: Empty Solr 4.2.1 can not create Collection
The scenario above needs to have collection1 removed from the solr.xml to work. This, I believe, is the Empty Solr scenario that you are talking about. If you don't remove collection1 from solr.xml on all the solr instances, they will get tripped up on collection1 during these steps. If you startup with collection1 in solr.xml it's best to startup the initial Solr instance with the bootstrap-conf parameter so Solr can properly create this collection. On Mon, Apr 8, 2013 at 1:12 PM, Joel Bernstein joels...@gmail.com wrote: The steps that I use to setup the collection are slightly different: 1) Start zk and upconfig the config set. Your approach is same. 2) Start appservers with Solr zkHost set to the zk started in step 1. 3) Use a core admin command to spin up a new core and collection. http://app01/solr/admin/cores?action=CREATEname=storage-corecollection=storagenumShards=1collection.configName=storage-confhttp://app03/solr/admin/collections?action=CREATEname=storagenumShards=1replicationFactor=2collection.configName=storage-conf shard=shard1 This will spin up the new collection and initial core. I'm not using a replication factor because the following commands manually bind the replicas. 4) Spin up replica with a core admin command: http://app02/solr/admin/cores?action=CREATEname=storage-corecollection=storage;http://app03/solr/admin/collections?action=CREATEname=storagenumShards=1replicationFactor=2collection.configName=storage-conf shard=shard1 5) Same command as above on the 3rd server to spin up another replica. This will spin up a new core and bind it to shard1 of the storage collection. On Mon, Apr 8, 2013 at 9:34 AM, A.Eibner a_eib...@yahoo.de wrote: Hi, I have a problem with setting up my solr cloud environment (on three machines). If I want to create my collections from scratch I do the following: *) Start ZooKeeper on all machines. *) Upload the configuration (on app02) for the collection via the following command: zkcli.sh -cmd upconfig --zkhost app01:4181,app02:4181,app03:**4181 --confdir config/solr/storage/conf/ --confname storage-conf *) Linking the configuration (on app02) via the following command: zkcli.sh -cmd linkconfig --collection storage --confname storage-conf --zkhost app01:4181,app02:4181,app03:**4181 *) Start Tomcats (containing Solr) on app02,app03 *) Create Collection via: http://app03/solr/admin/**collections?action=CREATE** name=storagenumShards=1**replicationFactor=2** collection.configName=storage-**confhttp://app03/solr/admin/collections?action=CREATEname=storagenumShards=1replicationFactor=2collection.configName=storage-conf This creates the replication of the shard on app02 and app03, but neither of them is marked as leader, both are marked as DOWN. And after wards I can not access the collection. In the browser I get: SEVERE: org.apache.solr.common.**SolrException: no servers hosting shard: In the log files the following error is present: SEVERE: Error from shard: app02:9985/solr org.apache.solr.common.**SolrException: Error CREATEing SolrCore 'storage_shard1_replica1': at org.apache.solr.client.solrj.**impl.HttpSolrServer.request(** HttpSolrServer.java:404) at org.apache.solr.client.solrj.**impl.HttpSolrServer.request(** HttpSolrServer.java:181) at org.apache.solr.handler.**component.HttpShardHandler$1.** call(HttpShardHandler.java:**172) at org.apache.solr.handler.**component.HttpShardHandler$1.** call(HttpShardHandler.java:**135) at java.util.concurrent.**FutureTask$Sync.innerRun(** FutureTask.java:334) at java.util.concurrent.**FutureTask.run(FutureTask.**java:166) at java.util.concurrent.**Executors$RunnableAdapter.** call(Executors.java:471) at java.util.concurrent.**FutureTask$Sync.innerRun(** FutureTask.java:334) at java.util.concurrent.**FutureTask.run(FutureTask.**java:166) at java.util.concurrent.**ThreadPoolExecutor.runWorker(** ThreadPoolExecutor.java:1110) at java.util.concurrent.**ThreadPoolExecutor$Worker.run(** ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.**java:722) Caused by: org.apache.solr.common.cloud.**ZooKeeperException: at org.apache.solr.core.**CoreContainer.registerInZk(** CoreContainer.java:922) at org.apache.solr.core.**CoreContainer.registerCore(** CoreContainer.java:892) at org.apache.solr.core.**CoreContainer.register(** CoreContainer.java:841) at org.apache.solr.handler.admin.**CoreAdminHandler.** handleCreateAction(**CoreAdminHandler.java:479) ... 19 more Caused by: org.apache.solr.common.**SolrException: Error getting leader from zk for shard shard1 at org.apache.solr.cloud.**ZkController.getLeader(** ZkController.java:864) at org.apache.solr.cloud.**ZkController.register(** ZkController.java:776) at org.apache.solr.cloud.**ZkController.register(**
Re: solr 4.2.1 still has problems with index version and index generation
: I know there was some effort to fix this but I must report : that solr 4.2.1 has still problems with index version and : index generation numbering in master/slave mode with replication. ... : RESULT: slave has different (higher) version number and is with generation 1 ahead :-( Can you please provide more details... * are you using autocommit? with what settings? * are you using openSearcher=false in any of your commits? * where exactly are you looking that you see the master/slave out of sync? * are you observing any actual problems, or just seeing that the gen/version are reported as different? As Joel mentioned, there is an open Jira related purely to the *display* of information about gen/version between master slave, because in many cases the searcher in use on the master may refer to an older commit point, but it doesn't mean there is any actual problem in replication -- the slave is still fetching seraching the latest commit from the master as intended https://issues.apache.org/jira/browse/SOLR-4661 -Hoss
Re: Slow qTime for distributed search
After taking a look on what I'd wrote earlier, I will try to rephrase in a clear manner. It seems that sharding my collection to many shards slowed down unreasonably, and I'm trying to investigate why. First, I created collection1 - 4 shards*replicationFactor=1 collection on 2 servers. Second I created collection2 - 48 shards*replicationFactor=2 collection on 24 servers, keeping same config and same num of documents per shard. Observations showed the following: 1. Total qTime for the same query set is 5 time higher in collection2 (150ms-700 ms) 2. Adding to colleciton2 the *shard.info=true* param in the query shows that each shard is much slower than each shard was in collection1 (about 4 times slower) 3. Querying only specific shards on collection2 (by adding the shards=shard1,shard2...shard12 param) gave me much better qTime per shard (only 2 times higher than in collection1) 4. I have a low qps rate, thus i don't suspect the replication factor for being the major cause of this. 5. The avg. cpu load on servers during querying was much higher in collection1 than in collection2 and i didn't catch any other bottlekneck. Q: 1. Why does the amount of shards affect the qTime of each shard? 2. How can I overcome to reduce back the qTime of each shard? Thanks, Manu
Re: Slow qTime for distributed search
On 4/8/2013 12:19 PM, Manuel Le Normand wrote: It seems that sharding my collection to many shards slowed down unreasonably, and I'm trying to investigate why. First, I created collection1 - 4 shards*replicationFactor=1 collection on 2 servers. Second I created collection2 - 48 shards*replicationFactor=2 collection on 24 servers, keeping same config and same num of documents per shard. The primary reason to use shards is for index size, when your index is so big that a single index cannot give you reasonable performance. There are also sometimes performance gains when you break a smaller index into shards, but there is a limit. Going from 2 shards to 3 shards will have more of an impact that going from 8 shards to 9 shards. At some point, adding shards makes things slower, not faster, because of the extra work required for combining multiple queries into one result response. There is no reasonable way to predict when that will happen. Observations showed the following: 1. Total qTime for the same query set is 5 time higher in collection2 (150ms-700 ms) 2. Adding to colleciton2 the *shard.info=true* param in the query shows that each shard is much slower than each shard was in collection1 (about 4 times slower) 3. Querying only specific shards on collection2 (by adding the shards=shard1,shard2...shard12 param) gave me much better qTime per shard (only 2 times higher than in collection1) 4. I have a low qps rate, thus i don't suspect the replication factor for being the major cause of this. 5. The avg. cpu load on servers during querying was much higher in collection1 than in collection2 and i didn't catch any other bottlekneck. A distributed query actually consists of up to two queries per shard. The first query just requests the uniqueKey field, not the entire document. If you are sorting the results, then the sort field(s) are also requested, otherwise the only additional information requested is the relevance score. The results are compiled into a set of unique keys, then a second query is sent to the proper shards requesting specific documents. Q: 1. Why does the amount of shards affect the qTime of each shard? 2. How can I overcome to reduce back the qTime of each shard? With more shards, it takes longer for the first phase to compile the results, so the second phase (document retrieval) gets delayed, and the QTime goes up. One way to reduce the total time is to reduce the number of shards. You haven't said anything about how complex your queries are, your index size(s), or how much RAM you have on each server and how it is allocated. Can you provide this information? Getting good performance out of Solr requires plenty of RAM in your OS disk cache. Query times of 150 to 700 milliseconds seem very high, which could be due to query complexity or a lack of server resources (especially RAM), or possibly both. Thanks, Shawn
Solr language-dependent sort
Hi, I found that in solr we need to define a special fieldType for each language (http://wiki.apache.org/solr/UnicodeCollation), then point a field to this type. But in our application one field (like 'title') can be used by various users for their languages (user1 used for English, user2 used it for Japanese ..), so it is even difficult for us to use dynamical field, we would prefer to pass in a parameter like language = 'en' at run time, then solr API may use this parameter to call lucene API to sort a field. This approach would be much more flexible (we programmed this way when using lucene directly)? Thanks very much for helps, Lisheng
Re: Score after boost before
: I am using edismax and boosting certain fields using bq during query time. : : I would like to compare effect of boost side by side with original score : without boost. Is there anyway i can get original score without boosting? using functions and DocTransformers, it's possible to get the numeric score of any arbitrary query as a psuedo field. if you are using e/dismax and you would like to see the scores a document would have gotten w/o bq or boost boosting, you need to specify the query twice -- once for the main query with the boost, and once as part of the field list w/o the boost. Using LocalParams in which you override specific params (like bq) can make this a bit easier to express... http://localhost:8983/solr/select?q=videoqf=text+name^2defType=edismaxbq=name:ati^200fl=id,name,score,no_boost_score:query%28$alt%29alt={!edismax%20bq=%27%27%20v=$q}debug=truedebug.explain.structured=true -Hoss
Re: Solr metrics in Codahale metrics and Graphite?
That approach sounds great. --wunder On Apr 7, 2013, at 9:40 AM, Alan Woodward wrote: I've been thinking about how to improve this reporting, especially now that metrics-3 (which removes all of the funky thread issues we ran into last time I tried to add it to Solr) is close to release. I think we could go about it as follows: * refactor the existing JMX reporting to use metrics-3. This would mean replacing the SolrCore.infoRegistry map with a MetricsRegistry, and adding a JmxReporter, keeping the existing config logic to determine which JMX server to use. PluginInfoHandler and SolrMBeanInfoHandler translate the metrics-3 data back into SolrMBean format to keep the reporting backwards-compatible. This seems like a lot of work for no visible benefit, but… * we can then add the ability to define other metrics reporters in solrconfig.xml. There are already reporters for Ganglia and Graphite - you just add then to the Solr lib/ directory, configure them in solrconfig, and voila - Solr can be monitored using the same devops tools you use to monitor everything else. Does this sound sane? Alan Woodward www.flax.co.uk On 6 Apr 2013, at 20:49, Walter Underwood wrote: Wow, that really doesn't help at all, since these seem to only be reported in the stats page. I don't need another non-standard app-specific set of metrics, especially one that needs polling. I need metrics delivered to the common system that we use for all our servers. This is also why SPM is not useful for us, sorry Otis. Also, there is no time period on these stats. How do you graph the 95th percentile? I know there was a lot of work on these, but they seem really useless to me. I'm picky about metrics, working at Netflix does that to you. wunder On Apr 3, 2013, at 4:01 PM, Walter Underwood wrote: In the Jira, but not in the docs. It would be nice to have VM stats like GC, too, so we can have common monitoring and alerting on all our services. wunder On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote: It's there! :) http://search-lucene.com/?q=percentilefc_project=Solrfc_type=issue Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood wun...@wunderwood.org wrote: That sounds great. I'll check out the bug, I didn't see anything in the docs about this. And if I can't find it with a search engine, it probably isn't there. --wunder On Apr 3, 2013, at 6:39 AM, Shawn Heisey wrote: On 3/29/2013 12:07 PM, Walter Underwood wrote: What are folks using for this? I don't know that this really answers your question, but Solr 4.1 and later includes a big chunk of codahale metrics internally for request handler statistics - see SOLR-1972. First we tried including the jar and using the API, but that created thread leak problems, so the source code was added. Thanks, Shawn -- Walter Underwood wun...@wunderwood.org
Re: Solr language-dependent sort
Hi Lisheng, We did something similar in Solr using a custom handler (but I think you could just build a custom QeryParser to do this), but you could do this in your application as well, ie, get the language and then rewrite your query to use the language specific fields. Come to think of it, the QueryParser would probably be sufficiently general to qualify as a patch for custom functionality. -sujit On Apr 8, 2013, at 12:28 PM, Zhang, Lisheng wrote: Hi, I found that in solr we need to define a special fieldType for each language (http://wiki.apache.org/solr/UnicodeCollation), then point a field to this type. But in our application one field (like 'title') can be used by various users for their languages (user1 used for English, user2 used it for Japanese ..), so it is even difficult for us to use dynamical field, we would prefer to pass in a parameter like language = 'en' at run time, then solr API may use this parameter to call lucene API to sort a field. This approach would be much more flexible (we programmed this way when using lucene directly)? Thanks very much for helps, Lisheng
RE: Solr language-dependent sort
Hi, Thanks very much for quick help! In our case we mainly need to sort a field based on language defined at run time, but I understood that the principle is the same. Thanks and best regards, Lisheng -Original Message- From: Sujit Pal [mailto:sujitatgt...@gmail.com]On Behalf Of SUJIT PAL Sent: Monday, April 08, 2013 1:27 PM To: solr-user@lucene.apache.org Subject: Re: Solr language-dependent sort Hi Lisheng, We did something similar in Solr using a custom handler (but I think you could just build a custom QeryParser to do this), but you could do this in your application as well, ie, get the language and then rewrite your query to use the language specific fields. Come to think of it, the QueryParser would probably be sufficiently general to qualify as a patch for custom functionality. -sujit On Apr 8, 2013, at 12:28 PM, Zhang, Lisheng wrote: Hi, I found that in solr we need to define a special fieldType for each language (http://wiki.apache.org/solr/UnicodeCollation), then point a field to this type. But in our application one field (like 'title') can be used by various users for their languages (user1 used for English, user2 used it for Japanese ..), so it is even difficult for us to use dynamical field, we would prefer to pass in a parameter like language = 'en' at run time, then solr API may use this parameter to call lucene API to sort a field. This approach would be much more flexible (we programmed this way when using lucene directly)? Thanks very much for helps, Lisheng
Best practice for rebuild index in SolrCloud
We are using SolrCloud for replication and dynamic scaling but not distribution so we are only using a single shard. From time to time we make changes to the index schema that requires rebuilding of the index. Should I treat the rebuilding as just any other index operation? It seems to me it would be better if I can somehow take a node offline and rebuild the index there, then put it back online and let the new index be replicated from there. But I am not sure how to do the latter. Bill
Solr Admin Page Master Size
When I check my Solr Admin Page: Replication (Master) Version Gen Size Master: 1365458125729 5 18.24 MB It is a one shard one computer. What is that 18.24 MB. Does it contains just indexes or indexes, highlights etc. etc.? My solr home folder was 512.7 KB and it has become 22860 KB that is why I ask this question.
Re: Number of segments
On Mon, Apr 8, 2013, at 02:51 PM, Michael Long wrote: On 04/08/2013 09:41 AM, Upayavira wrote: How many documents do you have? How big are the files on disk? 2,795,601 and the index dir is 50G Note it says segments per tier, you may have multiple tiers at play meaning you can have more than ten segments. How do I determine how many tiers it has? There's also, I believe, properties that define the maximum size on disk for a segment and the like that can prevent merges from happening. I just have the defaults...nothing explicitly set What issue are you trying to solve here? Generally, the tiered merge policy works well, and if searches perform well, then having a reasonable number of segments needn't cause you any issues. Indeed, with larger indexes, having too few segments can cause issues as merging can require copying large segments, which can be time-consuming. Upayavira
Re: Sub field indexing
: Subject: Sub field indexing : References: 1365426517091-4054473.p...@n3.nabble.com : In-Reply-To: 1365426517091-4054473.p...@n3.nabble.com https://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. -Hoss
Re: Number of segments
: How do I determine how many tiers it has? You may find this blog post from mccandless helpful... http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html (don't ignore the videos! watching them really helpful to understand what he is talking about) Once you've obsorbed that, then please revist your question, specifically Upayavira's key point: what is the problem you are trying to solve? https://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss
Re: Solr 4.2.1 Branch
There is also this path for the SVN guys out there: https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_1 Cheers, Tim On 05/04/13 05:53 PM, Jagdish Nomula wrote: That works out. Thanks for shooting the link. On Fri, Apr 5, 2013 at 5:51 PM, Jack Krupanskyj...@basetechnology.comwrote: You want the tagged branch: https://github.com/apache/**lucene-solr/tree/lucene_solr_**4_2_1https://github.com/apache/lucene-solr/tree/lucene_solr_4_2_1 -- Jack Krupansky -Original Message- From: Jagdish Nomula Sent: Friday, April 05, 2013 8:36 PM To: solr-user@lucene.apache.org Subject: Solr 4.2.1 Branch Hello, I was trying to get hold of solr 4.2.1 branch on github. I see https://github.com/apache/**lucene-solr/tree/lucene_solr_**4_2https://github.com/apache/lucene-solr/tree/lucene_solr_4_2. I don't see any branch for 4.2.1. Am i missing anything ?. Thanks in advance for your help. -- ***Jagdish Nomula* Sr. Manager Search Simply Hired, Inc. 370 San Aleso Ave., Ste 200 Sunnyvale, CA 94085 office - 408.400.4700 cell - 408.431.2916 email - jagd...@simplyhired.comyourem...@simplyhired.com www.simplyhired.com
conditional queries?
Hi, Is it possible to do a conditional query if another query has no results? For example, say I want to search against a given field for: - Search for car. If there are results, return them. - Else, search for car* . If there are results, return them. - Else, search for car~ . If there are results, return them. Is this possible in one query? Or would I need to make 3 separate queries by implementing this logic within my client? Thanks! Mark
Field exist in schema.xml but returns
hi all, I am using solrcloud and running some simple test queries... though i am getting a undefined field error for a field that I have in my schema.xml so the query is myField:* and response is: response lst name=responseHeader int name=status400/int int name=QTime3/int lst name=params str name=wtxml/str str name=qmyField:*/str /lst /lst lst name=error str name=msgundefined field myField/str int name=code400/int /lst /response and this is how my schema.xml looks like: .. field name=field1 type=tint indexed=true stored=true/ fiald name=myField type=long indexed=true stored=true/ field name=field3 type=tint indexed=true stored=true/ .. Any ideas what the reason could be? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Field-exist-in-schema-xml-but-returns-tp4054634.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field exist in schema.xml but returns
You have misspelt the tag name in the field definition... you have fiald instead of field. On Tue, Apr 9, 2013 at 7:43 AM, deniz denizdurmu...@gmail.com wrote: hi all, I am using solrcloud and running some simple test queries... though i am getting a undefined field error for a field that I have in my schema.xml so the query is myField:* and response is: response lst name=responseHeader int name=status400/int int name=QTime3/int lst name=params str name=wtxml/str str name=qmyField:*/str /lst /lst lst name=error str name=msgundefined field myField/str int name=code400/int /lst /response and this is how my schema.xml looks like: .. field name=field1 type=tint indexed=true stored=true/ fiald name=myField type=long indexed=true stored=true/ field name=field3 type=tint indexed=true stored=true/ .. Any ideas what the reason could be? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Field-exist-in-schema-xml-but-returns-tp4054634.html Sent from the Solr - User mailing list archive at Nabble.com.