RE: Re[2]: NOT SOLVED searches for single char tokens instead of from 3 uppwards
I have gotten nearly everything to work. There are to queries where i dont get back what i want. avaloq frage 1- only returns if i set minGramSize=1 while indexing yh_cug- query parser doesn't remove _ but the indexer does (WDF) so there is no match Is there a way to also query the hole term avaloq frage 1 without tokenizing it? Fieldtype: fieldType name=text_de class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ !-- remove noun/adjective inflections like plural endings -- filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=15/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ /analyzer analyzer type=query tokenizer class=solr.WhiteSpaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ /analyzer /fieldType -Original Message- From: Andreas Owen [mailto:a...@conx.ch] Sent: Mittwoch, 12. März 2014 18:39 To: solr-user@lucene.apache.org Subject: RE: Re[2]: NOT SOLVED searches for single char tokens instead of from 3 uppwards Hi Jack, do you know how i can use local parameters in my solrconfig? The params are visible in the debugquery-output but solr doesn't parse them. lst name=invariants str name=fq{!q.op=OR} (*:* -organisations:[ TO *] -roles:[ TO *]) (+organisations:($org) +roles:($r)) (-organisations:[ TO *] +roles:($r)) (+organisations:($org) -roles:[ TO *])/str /lst -Original Message- From: Andreas Owen [mailto:a...@conx.ch] Sent: Mittwoch, 12. März 2014 14:44 To: solr-user@lucene.apache.org Subject: Re[2]: NOT SOLVED searches for single char tokens instead of from 3 uppwards yes that is exactly what happend in the analyzer. the term i searched for was listed on both sides (index query). here's the rest: analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. enablePositionIncrements=true ensures that a 'gap' is left to allow for accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer -Original-Nachricht- Von: Jack Krupansky j...@basetechnology.com An: solr-user@lucene.apache.org Datum: 12/03/2014 13:25 Betreff: Re: NOT SOLVED searches for single char tokens instead of from 3 uppwards You didn't show the new index analyzer - it's tricky to assure that index and query are compatible, but the Admin UI Analysis page can help. Generally, using pure defaults for WDF is not what you want, especially for query time. Usually there needs to be a slight asymmetry between index and query for WDF - index generates more terms than query. -- Jack Krupansky -Original Message- From: Andreas Owen Sent: Wednesday, March 12, 2014 6:20 AM To: solr-user@lucene.apache.org Subject: RE: NOT SOLVED searches for single char tokens instead of from 3 uppwards I now have the following: analyzer type=query tokenizer class=solr.WhiteSpaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ filter
Re[2]: NOT SOLVED searches for single char tokens instead of from 3 uppwards
yes that is exactly what happend in the analyzer. the term i searched for was listed on both sides (index query). here's the rest: analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. enablePositionIncrements=true ensures that a 'gap' is left to allow for accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer -Original-Nachricht- Von: Jack Krupansky j...@basetechnology.com An: solr-user@lucene.apache.org Datum: 12/03/2014 13:25 Betreff: Re: NOT SOLVED searches for single char tokens instead of from 3 uppwards You didn't show the new index analyzer - it's tricky to assure that index and query are compatible, but the Admin UI Analysis page can help. Generally, using pure defaults for WDF is not what you want, especially for query time. Usually there needs to be a slight asymmetry between index and query for WDF - index generates more terms than query. -- Jack Krupansky -Original Message- From: Andreas Owen Sent: Wednesday, March 12, 2014 6:20 AM To: solr-user@lucene.apache.org Subject: RE: NOT SOLVED searches for single char tokens instead of from 3 uppwards I now have the following: analyzer type=query tokenizer class=solr.WhiteSpaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ /analyzer The gui analysis shows me that wdf doesn't cut the underscore anymore but it still returns 0 results? Output: lst name=debug str name=rawquerystringyh_cug/str str name=querystringyh_cug/str str name=parsedquery(+DisjunctionMaxQuery((tags:yh_cug^10.0 | links:yh_cug^5.0 | thema:yh_cug^15.0 | plain_text:yh_cug^10.0 | url:yh_cug^5.0 | h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 | breadcrumb:yh_cug^6.0 | contentmanager:yh_cug^5.0 | title:yh_cug^20.0 | editorschoice:yh_cug^200.0 | doctype:yh_cug^10.0)) ((expiration:[1394619501862 TO *] (+MatchAllDocsQuery(*:*) -expiration:*))^6.0) FunctionQuery((div(int(clicks),max(int(displays),const(1^8.0))/no_coord/str str name=parsedquery_toString+(tags:yh_cug^10.0 | links:yh_cug^5.0 | thema:yh_cug^15.0 | plain_text:yh_cug^10.0 | url:yh_cug^5.0 | h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 | breadcrumb:yh_cug^6.0 | contentmanager:yh_cug^5.0 | title:yh_cug^20.0 | editorschoice:yh_cug^200.0 | doctype:yh_cug^10.0) ((expiration:[1394619501862 TO *] (+*:* -expiration:*))^6.0) (div(int(clicks),max(int(displays),const(1^8.0/str lst name=explain/ arr name=expandedSynonyms stryh_cug/str /arr lst name=reasonForNotExpandingSynonyms str name=nameDidntFindAnySynonyms/str str name=explanationNo synonyms found for this query. Check your synonyms file./str /lst lst name=mainQueryParser str name=QParserExtendedDismaxQParser/str null name=altquerystring/ arr name=boost_queries str(expiration:[NOW TO *] OR (*:* -expiration:*))^6/str /arr arr name=parsed_boost_queries str(expiration:[1394619501862 TO *] (+MatchAllDocsQuery(*:*) -expiration:*))^6.0/str /arr arr name=boostfuncs strdiv(clicks,max(displays,1))^8/str /arr /lst lst name=synonymQueryParser str name=QParserExtendedDismaxQParser/str null name=altquerystring/ arr name=boostfuncs strdiv(clicks,max(displays,1))^8/str /arr /lst lst name=timing -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Dienstag, 11. März 2014 14:25 To: solr-user@lucene.apache.org Subject: Re: NOT SOLVED searches for single char tokens instead of from 3 uppwards The usual use of an ngram filter is at index time and not at query time. What exactly are you trying to achieve by using ngram filtering at query time as well as index time? Generally, it is inappropriate to combine the word delimiter filter with the
RE: Re[2]: NOT SOLVED searches for single char tokens instead of from 3 uppwards
Hi Jack, do you know how i can use local parameters in my solrconfig? The params are visible in the debugquery-output but solr doesn't parse them. lst name=invariants str name=fq{!q.op=OR} (*:* -organisations:[ TO *] -roles:[ TO *]) (+organisations:($org) +roles:($r)) (-organisations:[ TO *] +roles:($r)) (+organisations:($org) -roles:[ TO *])/str /lst -Original Message- From: Andreas Owen [mailto:a...@conx.ch] Sent: Mittwoch, 12. März 2014 14:44 To: solr-user@lucene.apache.org Subject: Re[2]: NOT SOLVED searches for single char tokens instead of from 3 uppwards yes that is exactly what happend in the analyzer. the term i searched for was listed on both sides (index query). here's the rest: analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. enablePositionIncrements=true ensures that a 'gap' is left to allow for accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer -Original-Nachricht- Von: Jack Krupansky j...@basetechnology.com An: solr-user@lucene.apache.org Datum: 12/03/2014 13:25 Betreff: Re: NOT SOLVED searches for single char tokens instead of from 3 uppwards You didn't show the new index analyzer - it's tricky to assure that index and query are compatible, but the Admin UI Analysis page can help. Generally, using pure defaults for WDF is not what you want, especially for query time. Usually there needs to be a slight asymmetry between index and query for WDF - index generates more terms than query. -- Jack Krupansky -Original Message- From: Andreas Owen Sent: Wednesday, March 12, 2014 6:20 AM To: solr-user@lucene.apache.org Subject: RE: NOT SOLVED searches for single char tokens instead of from 3 uppwards I now have the following: analyzer type=query tokenizer class=solr.WhiteSpaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ /analyzer The gui analysis shows me that wdf doesn't cut the underscore anymore but it still returns 0 results? Output: lst name=debug str name=rawquerystringyh_cug/str str name=querystringyh_cug/str str name=parsedquery(+DisjunctionMaxQuery((tags:yh_cug^10.0 | links:yh_cug^5.0 | thema:yh_cug^15.0 | plain_text:yh_cug^10.0 | url:yh_cug^5.0 | h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 | breadcrumb:yh_cug^6.0 | contentmanager:yh_cug^5.0 | title:yh_cug^20.0 | editorschoice:yh_cug^200.0 | doctype:yh_cug^10.0)) ((expiration:[1394619501862 TO *] (+MatchAllDocsQuery(*:*) -expiration:*))^6.0) FunctionQuery((div(int(clicks),max(int(displays),const(1^8.0))/no_ coord/str str name=parsedquery_toString+(tags:yh_cug^10.0 | links:yh_cug^5.0 | thema:yh_cug^15.0 | plain_text:yh_cug^10.0 | url:yh_cug^5.0 | h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 | breadcrumb:yh_cug^6.0 | contentmanager:yh_cug^5.0 | title:yh_cug^20.0 | editorschoice:yh_cug^200.0 | doctype:yh_cug^10.0) ((expiration:[1394619501862 TO *] (+*:* -expiration:*))^6.0) (div(int(clicks),max(int(displays),const(1^8.0/str lst name=explain/ arr name=expandedSynonyms stryh_cug/str /arr lst name=reasonForNotExpandingSynonyms str name=nameDidntFindAnySynonyms/str str name=explanationNo synonyms found for this query. Check your synonyms file./str /lst lst name=mainQueryParser str name=QParserExtendedDismaxQParser/str null name=altquerystring/ arr name=boost_queries str(expiration:[NOW TO *] OR (*:* -expiration:*))^6/str /arr arr name=parsed_boost_queries str(expiration:[1394619501862 TO *] (+MatchAllDocsQuery(*:*) -expiration:*))^6.0/str /arr arr name=boostfuncs strdiv(clicks,max(displays,1))^8/str /arr /lst lst name=synonymQueryParser str name=QParserExtendedDismaxQParser/str null name=altquerystring/ arr name=boostfuncs