copyField based on value of another field
Hi folks, is it possible to copyField only if another field has a certain value? e.g. copyField 'dc.subject' to 'image_suggestions' only if rdf http://www.nsdl.org/ontologies/relationships#isInImageBank is true thanks, Alistair -- mov eax,1 mov ebx,0 int 80h
Re: suggester returning stems instead of whole words
ah looks like I need to use copyField to get a non stemmed version of the suggester field Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:15, Alistair Young alistair.yo...@uhi.ac.uk wrote: I was wondering if there's a way to get the suggester to return whole words. Instead of returning 'technology' , 'temperature' and 'tutorial', it's returning 'technolog' , 'temperatur' and 'tutori' using this config: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactory/ str str name=fielddc.subject/str float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler thanks, Alistair -- mov eax,1 mov ebx,0 int 80h
suggester returning stems instead of whole words
I was wondering if there's a way to get the suggester to return whole words. Instead of returning 'technology' , 'temperature' and 'tutorial', it's returning 'technolog' , 'temperatur' and 'tutori' using this config: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactory/str str name=fielddc.subject/str float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler thanks, Alistair -- mov eax,1 mov ebx,0 int 80h
Re: suggester returning stems instead of whole words
copyField doesn¹t seem to fix the suggestion stemming. Copying the field to another field of this type: field name=subject_autocomplete type=text_auto indexed=true stored=true multiValued=false / copyField source=dc.subject dest=subject_autocomplete / fieldType class=solr.TextField name=text_auto positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType but I¹m still getting stemmed suggestions after rebuilding the index. Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:28, Alistair Young alistair.yo...@uhi.ac.uk wrote: ah looks like I need to use copyField to get a non stemmed version of the suggester field Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:15, Alistair Young alistair.yo...@uhi.ac.uk wrote: I was wondering if there's a way to get the suggester to return whole words. Instead of returning 'technology' , 'temperature' and 'tutorial', it's returning 'technolog' , 'temperatur' and 'tutori' using this config: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactory / str str name=fielddc.subject/str float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler thanks, Alistair -- mov eax,1 mov ebx,0 int 80h
Re: suggester returning stems instead of whole words
yep did both of those things. Getting the same results as using dc.subject On 17/06/2015 14:44, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Did you change the SpellCheckComponent's configuration to use subject_autocomplete instead of dc.subject? After you made that change, did you invoke spellcheck.build=true to re-build the spellcheck index? On Wed, Jun 17, 2015 at 7:06 PM, Alistair Young alistair.yo...@uhi.ac.uk wrote: copyField doesn¹t seem to fix the suggestion stemming. Copying the field to another field of this type: field name=subject_autocomplete type=text_auto indexed=true stored=true multiValued=false / copyField source=dc.subject dest=subject_autocomplete / fieldType class=solr.TextField name=text_auto positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType but I¹m still getting stemmed suggestions after rebuilding the index. Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:28, Alistair Young alistair.yo...@uhi.ac.uk wrote: ah looks like I need to use copyField to get a non stemmed version of the suggester field Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:15, Alistair Young alistair.yo...@uhi.ac.uk wrote: I was wondering if there's a way to get the suggester to return whole words. Instead of returning 'technology' , 'temperature' and 'tutorial', it's returning 'technolog' , 'temperatur' and 'tutori' using this config: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactor y / str str name=fielddc.subject/str float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler thanks, Alistair -- mov eax,1 mov ebx,0 int 80h -- Regards, Shalin Shekhar Mangar.
Re: suggester returning stems instead of whole words
looking at the schema browser, subject_autocomplete has a type of text_en rather than text_auto and all the terms are stemmed. Its contents are the same as the one it was copied from, dc.subject, which is text_en and stemmed. On 17/06/2015 14:58, Erick Erickson erickerick...@gmail.com wrote: Hmmm, shouldn't be happening that way. Spellcheck is supposed to be looking at indexed terms. If you go into the admin/schema browser page and look at the new field, what are the terms in the index? They shouldn't be stemmed. And I always get confused where this str name=spellcheck.dictionarysuggest/str is supposed to point. Do you have any other component named suggest that you might be picking up? Best, Erick On Wed, Jun 17, 2015 at 6:50 AM, Alistair Young alistair.yo...@uhi.ac.uk wrote: yep did both of those things. Getting the same results as using dc.subject On 17/06/2015 14:44, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Did you change the SpellCheckComponent's configuration to use subject_autocomplete instead of dc.subject? After you made that change, did you invoke spellcheck.build=true to re-build the spellcheck index? On Wed, Jun 17, 2015 at 7:06 PM, Alistair Young alistair.yo...@uhi.ac.uk wrote: copyField doesn¹t seem to fix the suggestion stemming. Copying the field to another field of this type: field name=subject_autocomplete type=text_auto indexed=true stored=true multiValued=false / copyField source=dc.subject dest=subject_autocomplete / fieldType class=solr.TextField name=text_auto positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType but I¹m still getting stemmed suggestions after rebuilding the index. Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:28, Alistair Young alistair.yo...@uhi.ac.uk wrote: ah looks like I need to use copyField to get a non stemmed version of the suggester field Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:15, Alistair Young alistair.yo...@uhi.ac.uk wrote: I was wondering if there's a way to get the suggester to return whole words. Instead of returning 'technology' , 'temperature' and 'tutorial', it's returning 'technolog' , 'temperatur' and 'tutori' using this config: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFact or y / str str name=fielddc.subject/str float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler thanks, Alistair -- mov eax,1 mov ebx,0 int 80h -- Regards, Shalin Shekhar Mangar.
Re: suggester returning stems instead of whole words
working in a tiny tmux window does have some disadvantages, such as losing one’s place in the file! the subject_autocomplete definition wasn’t inside fields. Now that it is, everything is working. thanks for listening Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 15:17, Alistair Young alistair.yo...@uhi.ac.uk wrote: looking at the schema browser, subject_autocomplete has a type of text_en rather than text_auto and all the terms are stemmed. Its contents are the same as the one it was copied from, dc.subject, which is text_en and stemmed. On 17/06/2015 14:58, Erick Erickson erickerick...@gmail.com wrote: Hmmm, shouldn't be happening that way. Spellcheck is supposed to be looking at indexed terms. If you go into the admin/schema browser page and look at the new field, what are the terms in the index? They shouldn't be stemmed. And I always get confused where this str name=spellcheck.dictionarysuggest/str is supposed to point. Do you have any other component named suggest that you might be picking up? Best, Erick On Wed, Jun 17, 2015 at 6:50 AM, Alistair Young alistair.yo...@uhi.ac.uk wrote: yep did both of those things. Getting the same results as using dc.subject On 17/06/2015 14:44, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Did you change the SpellCheckComponent's configuration to use subject_autocomplete instead of dc.subject? After you made that change, did you invoke spellcheck.build=true to re-build the spellcheck index? On Wed, Jun 17, 2015 at 7:06 PM, Alistair Young alistair.yo...@uhi.ac.uk wrote: copyField doesn¹t seem to fix the suggestion stemming. Copying the field to another field of this type: field name=subject_autocomplete type=text_auto indexed=true stored=true multiValued=false / copyField source=dc.subject dest=subject_autocomplete / fieldType class=solr.TextField name=text_auto positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType but I¹m still getting stemmed suggestions after rebuilding the index. Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:28, Alistair Young alistair.yo...@uhi.ac.uk wrote: ah looks like I need to use copyField to get a non stemmed version of the suggester field Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:15, Alistair Young alistair.yo...@uhi.ac.uk wrote: I was wondering if there's a way to get the suggester to return whole words. Instead of returning 'technology' , 'temperature' and 'tutorial', it's returning 'technolog' , 'temperatur' and 'tutori' using this config: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFac t or y / str str name=fielddc.subject/str float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler thanks, Alistair -- mov eax,1 mov ebx,0 int 80h -- Regards, Shalin Shekhar Mangar.
Re: suggester returning stems instead of whole words
yep, 4.3.1. The API changed after that so it’s finding the time to rewrite the entire backend that uses it On 17/06/2015 16:55, Shalin Shekhar Mangar shalinman...@gmail.com wrote: You must be using an old version of Solr. Since Solr 4.8 and beyond, the fields and types tags have been deprecated and you can place the field and field type definitions anywhere in the schema.xml. See http://issues.apache.org/jira/browse/SOLR-5228 On Wed, Jun 17, 2015 at 9:09 PM, Alistair Young alistair.yo...@uhi.ac.uk wrote: working in a tiny tmux window does have some disadvantages, such as losing one’s place in the file! the subject_autocomplete definition wasn’t inside fields. Now that it is, everything is working. thanks for listening Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 15:17, Alistair Young alistair.yo...@uhi.ac.uk wrote: looking at the schema browser, subject_autocomplete has a type of text_en rather than text_auto and all the terms are stemmed. Its contents are the same as the one it was copied from, dc.subject, which is text_en and stemmed. On 17/06/2015 14:58, Erick Erickson erickerick...@gmail.com wrote: Hmmm, shouldn't be happening that way. Spellcheck is supposed to be looking at indexed terms. If you go into the admin/schema browser page and look at the new field, what are the terms in the index? They shouldn't be stemmed. And I always get confused where this str name=spellcheck.dictionarysuggest/str is supposed to point. Do you have any other component named suggest that you might be picking up? Best, Erick On Wed, Jun 17, 2015 at 6:50 AM, Alistair Young alistair.yo...@uhi.ac.uk wrote: yep did both of those things. Getting the same results as using dc.subject On 17/06/2015 14:44, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Did you change the SpellCheckComponent's configuration to use subject_autocomplete instead of dc.subject? After you made that change, did you invoke spellcheck.build=true to re-build the spellcheck index? On Wed, Jun 17, 2015 at 7:06 PM, Alistair Young alistair.yo...@uhi.ac.uk wrote: copyField doesn¹t seem to fix the suggestion stemming. Copying the field to another field of this type: field name=subject_autocomplete type=text_auto indexed=true stored=true multiValued=false / copyField source=dc.subject dest=subject_autocomplete / fieldType class=solr.TextField name=text_auto positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType but I¹m still getting stemmed suggestions after rebuilding the index. Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:28, Alistair Young alistair.yo...@uhi.ac.uk wrote: ah looks like I need to use copyField to get a non stemmed version of the suggester field Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:15, Alistair Young alistair.yo...@uhi.ac.uk wrote: I was wondering if there's a way to get the suggester to return whole words. Instead of returning 'technology' , 'temperature' and 'tutorial', it's returning 'technolog' , 'temperatur' and 'tutori' using this config: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupF ac t or y / str str name=fielddc.subject/str float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler thanks, Alistair -- mov eax,1 mov ebx,0 int 80h -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Re: phrase matches returning near matches
yep seems that’s the answer. The highlighting is done separately by the rails app, so I’ll look into proper solr highlighting. thanks a lot for the use of your ears, much improved understanding! cheers, Alistair -- mov eax,1 mov ebx,0 int 80h On 16/06/2015 16:33, Erick Erickson erickerick...@gmail.com wrote: Hmmm. First, highlighting should work here. If you have it configured to work on the dc.description field. As to whether the phrase management changes is near enough, I pretty much guarantee it is. This is where the admin/analysis page can answer this type of question authoritatively since it's based exactly on your particular analysis chain. Best, Erick On Tue, Jun 16, 2015 at 8:25 AM, Alistair Young alistair.yo...@uhi.ac.uk wrote: yes prolly not a bug. The highlighting is on but nothing is highlighted. Perhaps this text is triggering it? 'consider the impacts of land management changes’ that would seem reasonable. It’s not a direct match so no highlighting (the highlighting does work on a direct match) but 'management changes’ must be near enough ‘manage change’ to trigger a result. Alistair -- mov eax,1 mov ebx,0 int 80h On 16/06/2015 16:18, Erick Erickson erickerick...@gmail.com wrote: I agree with Allesandro the behavior you're describing is _not_ correct at all given your description. So either 1 There's something interesting about your configuration that doesn't seem important that you haven't told us, although what it could be is a mystery to me too ;) 2 it's matching on something else. Note that the phrase has been stemmed, so something in there besides management might stem to manag and/or something other than changes might stem to chang and the two of _them_ happen to be next to each other. are managers changing? for instance. Or even something less likely. Perhaps turn on highlighting and see if it pops out? 3 you've uncovered a bug. Although I suspect others would have reported it and the unit tests would have barfed all over the place. One other thing you can do. Go to the admin/analysis page and turn on the verbose check box. Put management is undergoing many changes in both the query and index boxes. The result (it's kind of hard to read I'll admit) will include the position of each token after all the analysis is done. Phrase queries (without slop) should only be matching adjacent positions. So the question is whether the position info looks correct Best, Erick On Tue, Jun 16, 2015 at 4:40 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: According to your debug you are using a default Lucene Query Parser. This surprise me as i would expect with that query a match with distance 0 between the 2 terms . Are you sure nothing else is that field that matches the phrase query ? From the documentation Lucene supports finding words are a within a specific distance away. To do a proximity search use the tilde, ~, symbol at the end of a Phrase. For example to search for a apache and jakarta within 10 words of each other in a document use the search: jakarta apache~10 Cheers 2015-06-16 11:33 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk: it¹s a useful behaviour. I¹d just like to understand where it¹s deciding the document is relevant. debug output is: lst name=debug str name=rawquerystringdc.description:manage change/str str name=querystringdc.description:manage change/str str name=parsedqueryPhraseQuery(dc.description:manag chang)/str str name=parsedquery_toStringdc.description:manag chang/str lst name=explain str name=tst:test 1.2008798 = (MATCH) weight(dc.description:manag chang in 221) [DefaultSimilarity], result of: 1.2008798 = fieldWeight in 221, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = phraseFreq=1.0 9.6070385 = idf(), sum of: 4.0365543 = idf(docFreq=101, maxDocs=2125) 5.5704846 = idf(docFreq=21, maxDocs=2125) 0.125 = fieldNorm(doc=221) /str /lst str name=QParserLuceneQParser/str lst name=timing double name=time41.0/double lst name=prepare double name=time3.0/double lst name=query double name=time0.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats double name=time0.0/double /lst lst name=debug double name=time0.0/double /lst /lst lst name=process double name=time35.0/double lst name=query double name=time0.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats
Re: phrase matches returning near matches
yes prolly not a bug. The highlighting is on but nothing is highlighted. Perhaps this text is triggering it? 'consider the impacts of land management changes’ that would seem reasonable. It’s not a direct match so no highlighting (the highlighting does work on a direct match) but 'management changes’ must be near enough ‘manage change’ to trigger a result. Alistair -- mov eax,1 mov ebx,0 int 80h On 16/06/2015 16:18, Erick Erickson erickerick...@gmail.com wrote: I agree with Allesandro the behavior you're describing is _not_ correct at all given your description. So either 1 There's something interesting about your configuration that doesn't seem important that you haven't told us, although what it could be is a mystery to me too ;) 2 it's matching on something else. Note that the phrase has been stemmed, so something in there besides management might stem to manag and/or something other than changes might stem to chang and the two of _them_ happen to be next to each other. are managers changing? for instance. Or even something less likely. Perhaps turn on highlighting and see if it pops out? 3 you've uncovered a bug. Although I suspect others would have reported it and the unit tests would have barfed all over the place. One other thing you can do. Go to the admin/analysis page and turn on the verbose check box. Put management is undergoing many changes in both the query and index boxes. The result (it's kind of hard to read I'll admit) will include the position of each token after all the analysis is done. Phrase queries (without slop) should only be matching adjacent positions. So the question is whether the position info looks correct Best, Erick On Tue, Jun 16, 2015 at 4:40 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: According to your debug you are using a default Lucene Query Parser. This surprise me as i would expect with that query a match with distance 0 between the 2 terms . Are you sure nothing else is that field that matches the phrase query ? From the documentation Lucene supports finding words are a within a specific distance away. To do a proximity search use the tilde, ~, symbol at the end of a Phrase. For example to search for a apache and jakarta within 10 words of each other in a document use the search: jakarta apache~10 Cheers 2015-06-16 11:33 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk: it¹s a useful behaviour. I¹d just like to understand where it¹s deciding the document is relevant. debug output is: lst name=debug str name=rawquerystringdc.description:manage change/str str name=querystringdc.description:manage change/str str name=parsedqueryPhraseQuery(dc.description:manag chang)/str str name=parsedquery_toStringdc.description:manag chang/str lst name=explain str name=tst:test 1.2008798 = (MATCH) weight(dc.description:manag chang in 221) [DefaultSimilarity], result of: 1.2008798 = fieldWeight in 221, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = phraseFreq=1.0 9.6070385 = idf(), sum of: 4.0365543 = idf(docFreq=101, maxDocs=2125) 5.5704846 = idf(docFreq=21, maxDocs=2125) 0.125 = fieldNorm(doc=221) /str /lst str name=QParserLuceneQParser/str lst name=timing double name=time41.0/double lst name=prepare double name=time3.0/double lst name=query double name=time0.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats double name=time0.0/double /lst lst name=debug double name=time0.0/double /lst /lst lst name=process double name=time35.0/double lst name=query double name=time0.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats double name=time0.0/double /lst lst name=debug double name=time35.0/double /lst /lst /lst /lst thanks, Alistair -- mov eax,1 mov ebx,0 int 80h On 16/06/2015 11:26, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Can you show us how the query is parsed ? You didn't tell us nothing about the query parser you are using. Enable the debugQuery=true will show you how the query is parsed and this will be quite useful for us. Cheers 2015-06-16 11:22 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk: Hiya, I've been looking for documentation that would point to where I could modify or explain why 'near neighbours' are returned from a phrase search. If I search for: manage change I
phrase matches returning near matches
Hiya, I've been looking for documentation that would point to where I could modify or explain why 'near neighbours' are returned from a phrase search. If I search for: manage change I get back a document that contains this will help in your management of lots more words... changes. It's relevant but I'd like to understand why solr is returning it. Is it a combination of fuzzy/slop? The distance between the two variations of the two words in the document is quite large. thanks, Alistair -- mov eax,1 mov ebx,0 int 80h
Re: phrase matches returning near matches
it¹s a useful behaviour. I¹d just like to understand where it¹s deciding the document is relevant. debug output is: lst name=debug str name=rawquerystringdc.description:manage change/str str name=querystringdc.description:manage change/str str name=parsedqueryPhraseQuery(dc.description:manag chang)/str str name=parsedquery_toStringdc.description:manag chang/str lst name=explain str name=tst:test 1.2008798 = (MATCH) weight(dc.description:manag chang in 221) [DefaultSimilarity], result of: 1.2008798 = fieldWeight in 221, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = phraseFreq=1.0 9.6070385 = idf(), sum of: 4.0365543 = idf(docFreq=101, maxDocs=2125) 5.5704846 = idf(docFreq=21, maxDocs=2125) 0.125 = fieldNorm(doc=221) /str /lst str name=QParserLuceneQParser/str lst name=timing double name=time41.0/double lst name=prepare double name=time3.0/double lst name=query double name=time0.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats double name=time0.0/double /lst lst name=debug double name=time0.0/double /lst /lst lst name=process double name=time35.0/double lst name=query double name=time0.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats double name=time0.0/double /lst lst name=debug double name=time35.0/double /lst /lst /lst /lst thanks, Alistair -- mov eax,1 mov ebx,0 int 80h On 16/06/2015 11:26, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Can you show us how the query is parsed ? You didn't tell us nothing about the query parser you are using. Enable the debugQuery=true will show you how the query is parsed and this will be quite useful for us. Cheers 2015-06-16 11:22 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk: Hiya, I've been looking for documentation that would point to where I could modify or explain why 'near neighbours' are returned from a phrase search. If I search for: manage change I get back a document that contains this will help in your management of lots more words... changes. It's relevant but I'd like to understand why solr is returning it. Is it a combination of fuzzy/slop? The distance between the two variations of the two words in the document is quite large. thanks, Alistair -- mov eax,1 mov ebx,0 int 80h -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Solr spellcheck - onlyMorePopular threshold?
Hello all, I was wondering what does the onlyMorePopular option for spellchecking use as its threshold? Will it always pick the suggestion that returns the most queries or does it base its result based off of some threshold that can be configured? Thanks! Ali. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-spellcheck-onlyMorePopular-threshold-tp4140727.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Having trouble with German compound words in Solr 4.7
I've managed to solve this (in a quite hacky sort of way) by using filter queries and the edismax queryparser. I added in my solrconfig.xml the following parameters: str name=defTypeedismax/str str name=mm75%/str Then when searching for multiple keywords (for example: schwarzkleid wenz, where wenz is a german brand name), I use the first keyword as a query and anything after that I add as a filterquery. So my final query looks something like this: fl=idsort=popular+descindent=onq=keywords:'schwarzkleide'+wt=jsonfq={!edismax}+keywords:'wenz'fq=deleted:0 My compound splitter filter splits schwarzkleide correctly and it is parsed as edismax with mm=75%, then the filterqueries are added, for keywords they are also parsed as edismax. The returned result is all the black dresses from 'Wenz'. If anybody has a better solution to what I've posted I would be more than happy to read up on it as I'm quite new to Solr and I think my way is a bit convoluted to be honest. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Having-trouble-with-German-compound-words-in-Solr-4-7-tp4131964p4132478.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Having trouble with German compound words in Solr 4.7
Hi Siegfried, the debug shows that the separated keywords get OR'd together so a match to either keyword appears in the results. So if I am searching for: *keywords:schwarzkleid* this will get transformed to *keywords:schwarz keywords:kleid *which is equivalent to *keywords:schwarz OR keywords:kleid*. I need this query to be defaulted to* keywords:schwarz AND keywords:kleid* so only items that match both keywords appear in my results (in this case black dresses). I am pretty confused as to why replacing the default boolean operator is this difficult :( Any other suggestions? Ali -- View this message in context: http://lucene.472066.n3.nabble.com/Having-trouble-with-German-compound-words-in-Solr-4-7-tp4131964p4132338.html Sent from the Solr - User mailing list archive at Nabble.com.
Having trouble with German compound words in Solr 4.7
Hello all, I'm a fairly new Solr user and I need my search function to handle compound words in German. I've searched through the archives and found that Solr already has a Filter Factory made for such words called DictionaryCompoundWordTokenFilterFactory. I've already built a list of words that I want split, and it seems like the filter is working correctly in most cases, the majority of our searches are clothing items so let's say /schwarzkleid/ (black dress) becomes /schwarz/ /kleid/, which is what I want to happen. However, it seems like the keyword search is done using an *OR* operator. So I'm seeing items that are either black or are dresses but I just want to see items that are both. I've also read that changing the default operator in schema.xml or adding q.op as *AND* in the solrconfig.xml will rectify this issue, but nothing has changed in my query results. It still uses the *OR* operator. I've tried using Extended dismax in my queries but I am using the Solr PHP library and I don't think it supports adding Dismax filters to the queries themselves (if I'm wrong, please correct me). By the way, I am using Zend Framework 2.0 in the backend and am communicating with Solr through the Solr PHP library: Solr PHP http://www.php.net/manual/tr/book.solr.php . Any suggestions on how to change the operator after my compound word queries have been split? Thanks! Ali -- View this message in context: http://lucene.472066.n3.nabble.com/Having-trouble-with-German-compound-words-in-Solr-4-7-tp4131964.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Having trouble with German compound words in Solr 4.7
Hey Jack, thanks for the reply. I added autoGeneratePhraseQueries=true to the fieldType and now it's giving me even more results! I'm not sure if the debug of my query will be helpful but I'll paste it just in case someone might have an idea. This produces 113524 results, whereas if I manually enter the query as keyword:schwarz AND keyword:kleid I only get 20283 results (which is the correct one). -- View this message in context: http://lucene.472066.n3.nabble.com/Having-trouble-with-German-compound-words-in-Solr-4-7-tp4131964p4131973.html Sent from the Solr - User mailing list archive at Nabble.com.
Strange behaviour with single word and phrase
I wonder if anyone could point me in the right direction please? If I search on the phrase the toolkit I get hits containing that phrase but also hits that have the word 'the' before the word 'toolkit', no matter how far apart they are. Also, if I search on the word 'the' there are no hits at all. Thanks, Alistair - mov eax,1 mov ebx,0 int 80
Re: Strange behaviour with single word and phrase
Yep ignoring stop words. Thanks for the pointer. Alistair - mov eax,1 mov ebx,0 int 80 On 04/09/2013 13:43, Jack Krupansky j...@basetechnology.com wrote: Do you have stop word filtering enabled? What does your field type look like? If stop words are ignored, you will get exactly the behavior you described. -- Jack Krupansky -Original Message- From: Alistair Young Sent: Wednesday, September 04, 2013 6:57 AM To: solr-user@lucene.apache.org Subject: Strange behaviour with single word and phrase I wonder if anyone could point me in the right direction please? If I search on the phrase the toolkit I get hits containing that phrase but also hits that have the word 'the' before the word 'toolkit', no matter how far apart they are. Also, if I search on the word 'the' there are no hits at all. Thanks, Alistair - mov eax,1 mov ebx,0 int 80
Re: Collection not current after insert
thanks Michael, adding autoCommit sorted it. cheers, Alistair -- mov eax,1 mov ebx,0 int 80h On 23/07/2013 18:34, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hi Alistair, You probably need a commit, and not an optimize. Which version of Solr are you running against? The 4.0 releases have more complications, but generally sending a commit will do. Not sure if GSearch sends one, only partly because I never was able to make it work. :) Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. ³The Science of Influence Marketing² 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Tue, Jul 23, 2013 at 9:57 AM, Alistair Young alistair.yo...@uhi.ac.ukwrote: Hi there, My Solr is being fed by Fedora GSearch and when uploading a new resource, the Collection is optimized but not current so the new resource can't be found. I have to go to the Core Admin page and Optimize it from there, in order to make the collection current. Is there anything I should look for to see what the problem is? This is the comms to solr when inserting: DEBUG 2013-07-23 13:27:37,023 (OperationsImpl) resultXml = solrUpdateIndex indexName=FgsIndex insertedltk:13000116/inserted counts insertTotal=1 updateTotal=0 deleteTotal=0 emptyTotal=0 docCount=854 warnCount=0/ /solrUpdateIndex DEBUG 2013-07-23 13:27:37,023 (GTransformer) xsltName=fgsconfigFinal/index/FgsIndex/updateIndexToResultPage DEBUG 2013-07-23 13:27:37,027 (GTransformer) getTransformer transformer=org.apache.xalan.transformer.TransformerImpl@6561b973uriResol ver=null DEBUG 2013-07-23 13:27:37,028 (GenericOperationsImpl) resultXml=?xml version=1.0 encoding=UTF-8? resultPage operation=updateIndex action=fromPid value=ltk:13000116 repositoryName=FgsRepos indexNames= resultPageXslt= dateTime=Tue Jul 23 13:27:36 UTC 2013 updateIndex xmlns:dc=http://purl.org/dc/elements/1.1/; xmlns:foxml=info:fedora/fedora-system:def/foxml# xmlns:zs= http://www.loc.gov/zing/srw/; warnCount=0 docCount=854 deleteTotal=0 updateTotal=0 insertTotal=1 indexName=FgsIndex/ /resultPage INFO 2013-07-23 13:27:37,028 (UpdateListener) Index updated by notification message, returning: ?xml version=1.0 encoding=UTF-8? resultPage operation=updateIndex action=fromPid value=ltk:13000116 repositoryName=FgsRepos indexNames= resultPageXslt= dateTime=Tue Jul 23 13:27:36 UTC 2013 updateIndex xmlns:dc=http://purl.org/dc/elements/1.1/; xmlns:foxml=info:fedora/fedora-system:def/foxml# xmlns:zs= http://www.loc.gov/zing/srw/; warnCount=0 docCount=854 deleteTotal=0 updateTotal=0 insertTotal=1 indexName=FgsIndex/ /resultPage thanks, Alistair -- mov eax,1 mov ebx,0 int 80h
Collection not current after insert
Hi there, My Solr is being fed by Fedora GSearch and when uploading a new resource, the Collection is optimized but not current so the new resource can't be found. I have to go to the Core Admin page and Optimize it from there, in order to make the collection current. Is there anything I should look for to see what the problem is? This is the comms to solr when inserting: DEBUG 2013-07-23 13:27:37,023 (OperationsImpl) resultXml = solrUpdateIndex indexName=FgsIndex insertedltk:13000116/inserted counts insertTotal=1 updateTotal=0 deleteTotal=0 emptyTotal=0 docCount=854 warnCount=0/ /solrUpdateIndex DEBUG 2013-07-23 13:27:37,023 (GTransformer) xsltName=fgsconfigFinal/index/FgsIndex/updateIndexToResultPage DEBUG 2013-07-23 13:27:37,027 (GTransformer) getTransformer transformer=org.apache.xalan.transformer.TransformerImpl@6561b973 uriResolver=null DEBUG 2013-07-23 13:27:37,028 (GenericOperationsImpl) resultXml=?xml version=1.0 encoding=UTF-8? resultPage operation=updateIndex action=fromPid value=ltk:13000116 repositoryName=FgsRepos indexNames= resultPageXslt= dateTime=Tue Jul 23 13:27:36 UTC 2013 updateIndex xmlns:dc=http://purl.org/dc/elements/1.1/; xmlns:foxml=info:fedora/fedora-system:def/foxml# xmlns:zs=http://www.loc.gov/zing/srw/; warnCount=0 docCount=854 deleteTotal=0 updateTotal=0 insertTotal=1 indexName=FgsIndex/ /resultPage INFO 2013-07-23 13:27:37,028 (UpdateListener) Index updated by notification message, returning: ?xml version=1.0 encoding=UTF-8? resultPage operation=updateIndex action=fromPid value=ltk:13000116 repositoryName=FgsRepos indexNames= resultPageXslt= dateTime=Tue Jul 23 13:27:36 UTC 2013 updateIndex xmlns:dc=http://purl.org/dc/elements/1.1/; xmlns:foxml=info:fedora/fedora-system:def/foxml# xmlns:zs=http://www.loc.gov/zing/srw/; warnCount=0 docCount=854 deleteTotal=0 updateTotal=0 insertTotal=1 indexName=FgsIndex/ /resultPage thanks, Alistair -- mov eax,1 mov ebx,0 int 80h