Re: wrong results with wdf ngtf

2014-03-23 Thread Jack Krupansky
What indexed text are you expecting the avaloq frage 1 query to match 
against?


I just noticed that you have two distinct calls to WDF in your index 
analyzer.


I think you're going to need to go back and clearly state all of the term 
requirements for both indexing and query. Show all the use cases, both index 
and query. You have too many balls in the air right now for anybody to be 
confident about what you're really trying to do.


-- Jack Krupansky

-Original Message- 
From: Andreas Owen

Sent: Thursday, March 20, 2014 5:48 AM
To: solr-user@lucene.apache.org
Subject: wrong results with wdf  ngtf

Is there a way to tell ngramfilterfactory while indexing that number shall
never be tokenized? then the query should be able to find numbers.



Or do i have to change the ngram-min for numbers (not alpha) to 1, if that
is possible? So to speak put the hole number as token and not all possible
tokens.



Solr analysis shows onnly WDF has no underscore in its tokens, the rest have
it. can i tell the query to search numbers differently with NGTF, WT, LCF or
whatever?



I also tried filter class=solr.WordDelimiterFilterFactory
types=at-under-alpha.txt/

   @ = ALPHA

   _ = ALPHA



I have gotten nearly everything to work. There are to queries where i dont
get back what i want.



   avaloq frage 1   - only returns if i set
minGramSize=1 while indexing

   yh_cug- query parser doesn't
remove _ but the indexer does (WDF) so there is no match



Is there a way to also query the hole term avaloq frage 1 without
tokenizing it?



Fieldtype:



fieldType name=text_de class=solr.TextField positionIncrementGap=100

 analyzer type=index

  tokenizer
class=solr.StandardTokenizerFactory/

   filter
class=solr.LowerCaseFilterFactory/

  filter
class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/

  filter class=solr.StopFilterFactory
ignoreCase=true words=lang/stopwords_de.txt format=snowball
enablePositionIncrements=true/ !-- remove common words --

   filter
class=solr.GermanNormalizationFilterFactory/

  filter
class=solr.SnowballPorterFilterFactory language=German/ !-- remove
noun/adjective inflections like plural endings --


  filter class=solr.NGramFilterFactory
minGramSize=3 maxGramSize=15/

  filter
class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1/

  /analyzer

  analyzer type=query

  tokenizer
class=solr.WhiteSpaceTokenizerFactory/

  filter
class=solr.LowerCaseFilterFactory/

  filter
class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/

  filter
class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_de.txt format=snowball
enablePositionIncrements=true/ !-- remove common words --

  filter
class=solr.GermanNormalizationFilterFactory/

  filter
class=solr.SnowballPorterFilterFactory language=German/

 /analyzer

/fieldType





Solrconfig:




queryParser name=synonym_edismax



class=solr.SynonymExpandingExtendedDismaxQParserPlugin



  lst name=synonymAnalyzers



lst name=myCoolAnalyzer



  lst name=tokenizer



str name=classstandard/str



  /lst



  lst name=filter



str name=classshingle/str



str name=outputUnigramsIfNoShinglestrue/str



str name=outputUnigramstrue/str



str name=minShingleSize2/str



str name=maxShingleSize4/str



  /lst



  lst name=filter



str name=classsynonym/str



str name=tokenizerFactorysolr.KeywordTokenizerFactory/str



str name=synonymssynonyms.txt/str



str name=expandtrue/str



str name=ignoreCasetrue/str



  /lst



/lst



  /lst



/queryParser







requestHandler name=/select2 class=solr.SearchHandler



 lst name=defaults



   str name=echoParamsexplicit/str



   int name=rows10/int



   str name=defTypesynonym_edismax/str



   str name=synonymstrue/str



   str name=qfplain_text^10 editorschoice^200



title^20 h_*^14



tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10



contentmanager^5 links^5



last_modified^5 url^5



   /str



   str name=bq(expiration:[NOW TO *] OR (*:*



-expiration:*))^6/str



   str name=bfdiv(clicks,max(displays,1))^8/str !-- tested --







   str name=dftext/str



   str name=fl*,path,score/str



   str name=wtjson/str



   str name=q.opAND/str







   !-- Highlighting defaults --



   str name=hlon/str



   str name

wrong results with wdf ngtf

2014-03-20 Thread Andreas Owen
Is there a way to tell ngramfilterfactory while indexing that number shall
never be tokenized? then the query should be able to find numbers.

 

Or do i have to change the ngram-min for numbers (not alpha) to 1, if that
is possible? So to speak put the hole number as token and not all possible
tokens.

 

Solr analysis shows onnly WDF has no underscore in its tokens, the rest have
it. can i tell the query to search numbers differently with NGTF, WT, LCF or
whatever?

 

I also tried filter class=solr.WordDelimiterFilterFactory
types=at-under-alpha.txt/

@ = ALPHA

_ = ALPHA

 

I have gotten nearly everything to work. There are to queries where i dont
get back what i want.

 

avaloq frage 1   - only returns if i set
minGramSize=1 while indexing

yh_cug- query parser doesn't
remove _ but the indexer does (WDF) so there is no match

 

Is there a way to also query the hole term avaloq frage 1 without
tokenizing it?

 

Fieldtype:

 

fieldType name=text_de class=solr.TextField positionIncrementGap=100

  analyzer type=index 

   tokenizer
class=solr.StandardTokenizerFactory/

filter
class=solr.LowerCaseFilterFactory/

   filter
class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ 

   filter class=solr.StopFilterFactory
ignoreCase=true words=lang/stopwords_de.txt format=snowball
enablePositionIncrements=true/ !-- remove common words --

filter
class=solr.GermanNormalizationFilterFactory/

   filter
class=solr.SnowballPorterFilterFactory language=German/ !-- remove
noun/adjective inflections like plural endings --


   filter class=solr.NGramFilterFactory
minGramSize=3 maxGramSize=15/

   filter
class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1/

   /analyzer

   analyzer type=query

   tokenizer
class=solr.WhiteSpaceTokenizerFactory/

   filter
class=solr.LowerCaseFilterFactory/

   filter
class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ 

   filter
class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_de.txt format=snowball
enablePositionIncrements=true/ !-- remove common words --

   filter
class=solr.GermanNormalizationFilterFactory/

   filter
class=solr.SnowballPorterFilterFactory language=German/

  /analyzer

/fieldType

 

 

Solrconfig:

 

 queryParser name=synonym_edismax

 class=solr.SynonymExpandingExtendedDismaxQParserPlugin

   lst name=synonymAnalyzers

 lst name=myCoolAnalyzer

   lst name=tokenizer

 str name=classstandard/str

   /lst

   lst name=filter

 str name=classshingle/str

 str name=outputUnigramsIfNoShinglestrue/str

 str name=outputUnigramstrue/str

 str name=minShingleSize2/str

 str name=maxShingleSize4/str

   /lst

   lst name=filter

 str name=classsynonym/str

 str name=tokenizerFactorysolr.KeywordTokenizerFactory/str

 str name=synonymssynonyms.txt/str

 str name=expandtrue/str

 str name=ignoreCasetrue/str

   /lst

 /lst

   /lst

 /queryParser

 

 requestHandler name=/select2 class=solr.SearchHandler

  lst name=defaults

str name=echoParamsexplicit/str

int name=rows10/int

str name=defTypesynonym_edismax/str

str name=synonymstrue/str

str name=qfplain_text^10 editorschoice^200

 title^20 h_*^14

 tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10

 contentmanager^5 links^5

 last_modified^5 url^5

/str

str name=bq(expiration:[NOW TO *] OR (*:* 

 -expiration:*))^6/str

str name=bfdiv(clicks,max(displays,1))^8/str !-- tested --

 

str name=dftext/str

str name=fl*,path,score/str

str name=wtjson/str

str name=q.opAND/str

 

!-- Highlighting defaults --

str name=hlon/str

str name=hl.flplain_text,title/str

str name=hl.fragSize200/str

str name=hl.simple.prelt;bgt;/str

str name=hl.simple.postlt;/bgt;/str

 

 !-- lst name=invariants --

 str name=faceton/str

 str name=facet.mincount1/str

 str name=facet.field{!ex=inhaltstyp_s}inhaltstyp_s/str

 str name=f.inhaltstyp_s.facet.sortindex/str

 str name=facet.field{!ex=doctype}doctype/str

 str name=f.doctype.facet.sortindex/str

 str name=facet.field{!ex=thema_f}thema_f/str

 str name=f.thema_f.facet.sortindex/str

 str name=facet.field{!ex=author_s}author_s/str

 str name=f.author_s.facet.sortindex/str

 str

 

wrong results with wdf ngtf

2014-03-20 Thread aowen
Is there a way to tell ngramfilterfactory while indexing that number shall 
never be tokenized? then the query should be able to find numbers.

Or do i have to change the ngram-min for numbers (not alpha) to 1, if that is 
possible? So to speak put the hole number as token and not all possible tokens.

Solr analysis shows onnly WDF has no underscore in its tokens, the rest have 
it. can i tell the query to search numbers differently with NGTF, WT, LCF or 
whatever?

I also tried filter class=solr.WordDelimiterFilterFactory 
types=at-under-alpha.txt/
@ = ALPHA
_ = ALPHA

I have gotten nearly everything to work. There are to queries where i dont get 
back what i want.

avaloq frage 1   - only returns if i set 
minGramSize=1 while indexing
yh_cug- query parser doesn't 
remove _ but the indexer does (WDF) so there is no match

Is there a way to also query the hole term avaloq frage 1 without tokenizing 
it?

Fieldtype:

fieldType name=text_de class=solr.TextField positionIncrementGap=100
  analyzer type=index 
   tokenizer 
class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
   filter class=solr.WordDelimiterFilterFactory 
types=at-under-alpha.txt/ 
   filter class=solr.StopFilterFactory 
ignoreCase=true words=lang/stopwords_de.txt format=snowball 
enablePositionIncrements=true/ !-- remove common words --
filter 
class=solr.GermanNormalizationFilterFactory/
   filter class=solr.SnowballPorterFilterFactory 
language=German/ !-- remove noun/adjective inflections like plural endings 
-- 
   filter class=solr.NGramFilterFactory 
minGramSize=3 maxGramSize=15/
   filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
   /analyzer
   analyzer type=query
   tokenizer 
class=solr.WhiteSpaceTokenizerFactory/
   filter 
class=solr.LowerCaseFilterFactory/
   filter 
class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ 
   filter 
class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt 
format=snowball enablePositionIncrements=true/ !-- remove common words --
   filter 
class=solr.GermanNormalizationFilterFactory/
   filter 
class=solr.SnowballPorterFilterFactory language=German/
  /analyzer
/fieldType


Solrconfig:

 queryParser name=synonym_edismax
 class=solr.SynonymExpandingExtendedDismaxQParserPlugin
   lst name=synonymAnalyzers
 lst name=myCoolAnalyzer
   lst name=tokenizer
 str name=classstandard/str
   /lst
   lst name=filter
 str name=classshingle/str
 str name=outputUnigramsIfNoShinglestrue/str
 str name=outputUnigramstrue/str
 str name=minShingleSize2/str
 str name=maxShingleSize4/str
   /lst
   lst name=filter
 str name=classsynonym/str
 str name=tokenizerFactorysolr.KeywordTokenizerFactory/str
 str name=synonymssynonyms.txt/str
 str name=expandtrue/str
 str name=ignoreCasetrue/str
   /lst
 /lst
   /lst
 /queryParser
 
 requestHandler name=/select2 class=solr.SearchHandler
  lst name=defaults
str name=echoParamsexplicit/str
int name=rows10/int
str name=defTypesynonym_edismax/str
str name=synonymstrue/str
str name=qfplain_text^10 editorschoice^200
 title^20 h_*^14
 tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10
 contentmanager^5 links^5
 last_modified^5 url^5
/str
str name=bq(expiration:[NOW TO *] OR (*:* 
 -expiration:*))^6/str
str name=bfdiv(clicks,max(displays,1))^8/str !-- tested --
 
str name=dftext/str
str name=fl*,path,score/str
str name=wtjson/str
str name=q.opAND/str
 
!-- Highlighting defaults --
str name=hlon/str
str name=hl.flplain_text,title/str
str name=hl.fragSize200/str
str name=hl.simple.prelt;bgt;/str
str name=hl.simple.postlt;/bgt;/str
 
 !-- lst name=invariants --
 str name=faceton/str
 str name=facet.mincount1/str
 str name=facet.field{!ex=inhaltstyp_s}inhaltstyp_s/str
 str name=f.inhaltstyp_s.facet.sortindex/str
 str name=facet.field{!ex=doctype}doctype/str
 str name=f.doctype.facet.sortindex/str
 str name=facet.field{!ex=thema_f}thema_f/str
 str name=f.thema_f.facet.sortindex/str
 str name=facet.field{!ex=author_s}author_s/str
 str name=f.author_s.facet.sortindex/str
 str
 name=facet.field{!ex=sachverstaendiger_s}sachverstaendiger_s/str
 str