RE: Re[2]: NOT SOLVED searches for single char tokens instead of from 3 uppwards

2014-03-13 Thread Andreas Owen
I have gotten nearly everything to work. There are to queries where i dont get 
back what i want.

"avaloq frage 1"-> only returns if i set minGramSize=1 while 
indexing
"yh_cug"-> query parser doesn't remove "_" but the 
indexer does (WDF) so there is no match

Is there a way to also query the hole term "avaloq frage 1" without tokenizing 
it?

Fieldtype:


   


 
 
 
  


   
   


 
 


  
 


-Original Message-
From: Andreas Owen [mailto:a...@conx.ch] 
Sent: Mittwoch, 12. März 2014 18:39
To: solr-user@lucene.apache.org
Subject: RE: Re[2]: NOT SOLVED searches for single char tokens instead of from 
3 uppwards

Hi Jack,

do you know how i can use local parameters in my solrconfig? The params are 
visible in the debugquery-output but solr doesn't parse them.


{!q.op=OR} (*:* -organisations:["" TO *] -roles:["" TO 
*]) (+organisations:($org) +roles:($r)) (-organisations:["" TO *] +roles:($r)) 
(+organisations:($org) -roles:["" TO *]) 


-Original Message-
From: Andreas Owen [mailto:a...@conx.ch]
Sent: Mittwoch, 12. März 2014 14:44
To: solr-user@lucene.apache.org
Subject: Re[2]: NOT SOLVED searches for single char tokens instead of from 3 
uppwards

yes that is exactly what happend in the analyzer. the term i searched for was 
listed on both sides (index & query).

here's the rest:










  

-Original-Nachricht- 
> Von: "Jack Krupansky" 
> An: solr-user@lucene.apache.org
> Datum: 12/03/2014 13:25
> Betreff: Re: NOT SOLVED searches for single char tokens instead of 
> from 3 uppwards
> 
> You didn't show the new index analyzer - it's tricky to assure that 
> index and query are compatible, but the Admin UI Analysis page can help.
> 
> Generally, using pure defaults for WDF is not what you want, 
> especially for query time. Usually there needs to be a slight 
> asymmetry between index and query for WDF - index generates more terms than 
> query.
> 
> -- Jack Krupansky
> 
> -Original Message-
> From: Andreas Owen
> Sent: Wednesday, March 12, 2014 6:20 AM
> To: solr-user@lucene.apache.org
> Subject: RE: NOT SOLVED searches for single char tokens instead of 
> from 3 uppwards
> 
> I now have the following:
> 
> 
> 
>  types="at-under-alpha.txt"/>  class="solr.LowerCaseFilterFactory"/>
>  words="lang/stopwords_de.txt" format="snowball" 
> enablePositionIncrements="true"/>   class="solr.GermanNormalizationFilterFactory"/>
> 
>   
> 
> The gui analysis shows me that wdf doesn't cut the underscore anymore 
> but it still returns 0 results?
> 
> Output:
> 
> 
>   yh_cug
>   yh_cug
>   (+DisjunctionMaxQuery((tags:yh_cug^10.0 |
> links:yh_cug^5.0 | thema:yh_cug^15.0 | plain_text:yh_cug^10.0 |
> url:yh_cug^5.0 | h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 |
> breadcrumb:yh_cug^6.0 | contentmanager:yh_cug^5.0 | title:yh_cug^20.0
> |
> editorschoice:yh_cug^200.0 | doctype:yh_cug^10.0))
> ((expiration:[1394619501862 TO *]
> (+MatchAllDocsQuery(*:*) -expiration:*))^6.0) 
> FunctionQuery((div(int(clicks),max(int(displays),const(1^8.0))/no_
> coord
>   +(tags:yh_cug^10.0 |
> links:yh_cug^5.0 |
> thema:yh_cug^15.0 | plain_text:yh_cug^10.0 | url:yh_cug^5.0 |
> h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 | breadcrumb:yh_cug^6.0 |
> contentmanager:yh_cug^5.0 | title:yh_cug^20.0 |
> editorschoice:yh_cug^200.0 |
> doctype:yh_cug^10.0) ((expiration:[1394619501862 TO *]
> (+*:* -expiration:*))^6.0)
> (div(int(clicks),max(int(displays),const(1^8.0
>   
>   
> yh_cug
>   
>   
> DidntFindAnySynonyms
> No synonyms found for this query.  Check 
> your synonyms file.
>   
>   
> ExtendedDismaxQParser
> 
> 
>   (expiration:[NOW TO *] OR (*:* -expiration:*))^6
> 
> 
>   (expiration:[1394619501862 TO *]
> (+MatchAllDocsQuery(*:*) -expiration:*))^6.0
> 
> 
>   div(clicks,max(displays,1))^8
> 
>   
>   
> ExtendedDismaxQParser
> 
> 
>   div(clicks,max(displays,1))^8
> 
>   
>   
> 
> 
> 
> 
> -Original Message-
> From: Jack Krupansky [mailto:j...@basetechnology.com]
> Sent: Dienstag, 11. März 2014 14

RE: Re[2]: NOT SOLVED searches for single char tokens instead of from 3 uppwards

2014-03-12 Thread Andreas Owen
Hi Jack,

do you know how i can use local parameters in my solrconfig? The params are 
visible in the debugquery-output but solr doesn't parse them.


{!q.op=OR} (*:* -organisations:["" TO *] -roles:["" TO 
*]) (+organisations:($org) +roles:($r)) (-organisations:["" TO *] +roles:($r)) 
(+organisations:($org) -roles:["" TO *])



-Original Message-
From: Andreas Owen [mailto:a...@conx.ch] 
Sent: Mittwoch, 12. März 2014 14:44
To: solr-user@lucene.apache.org
Subject: Re[2]: NOT SOLVED searches for single char tokens instead of from 3 
uppwards

yes that is exactly what happend in the analyzer. the term i searched for was 
listed on both sides (index & query).

here's the rest:










  

-Original-Nachricht- 
> Von: "Jack Krupansky" 
> An: solr-user@lucene.apache.org
> Datum: 12/03/2014 13:25
> Betreff: Re: NOT SOLVED searches for single char tokens instead of 
> from 3 uppwards
> 
> You didn't show the new index analyzer - it's tricky to assure that 
> index and query are compatible, but the Admin UI Analysis page can help.
> 
> Generally, using pure defaults for WDF is not what you want, 
> especially for query time. Usually there needs to be a slight 
> asymmetry between index and query for WDF - index generates more terms than 
> query.
> 
> -- Jack Krupansky
> 
> -Original Message-
> From: Andreas Owen
> Sent: Wednesday, March 12, 2014 6:20 AM
> To: solr-user@lucene.apache.org
> Subject: RE: NOT SOLVED searches for single char tokens instead of 
> from 3 uppwards
> 
> I now have the following:
> 
> 
> 
>  types="at-under-alpha.txt"/>  class="solr.LowerCaseFilterFactory"/>
>  words="lang/stopwords_de.txt" format="snowball" 
> enablePositionIncrements="true"/>   class="solr.GermanNormalizationFilterFactory"/>
> 
>   
> 
> The gui analysis shows me that wdf doesn't cut the underscore anymore 
> but it still returns 0 results?
> 
> Output:
> 
> 
>   yh_cug
>   yh_cug
>   (+DisjunctionMaxQuery((tags:yh_cug^10.0 |
> links:yh_cug^5.0 | thema:yh_cug^15.0 | plain_text:yh_cug^10.0 |
> url:yh_cug^5.0 | h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 |
> breadcrumb:yh_cug^6.0 | contentmanager:yh_cug^5.0 | title:yh_cug^20.0 
> |
> editorschoice:yh_cug^200.0 | doctype:yh_cug^10.0))
> ((expiration:[1394619501862 TO *]
> (+MatchAllDocsQuery(*:*) -expiration:*))^6.0) 
> FunctionQuery((div(int(clicks),max(int(displays),const(1^8.0))/no_
> coord
>   +(tags:yh_cug^10.0 | 
> links:yh_cug^5.0 |
> thema:yh_cug^15.0 | plain_text:yh_cug^10.0 | url:yh_cug^5.0 |
> h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 | breadcrumb:yh_cug^6.0 |
> contentmanager:yh_cug^5.0 | title:yh_cug^20.0 | 
> editorschoice:yh_cug^200.0 |
> doctype:yh_cug^10.0) ((expiration:[1394619501862 TO *]
> (+*:* -expiration:*))^6.0)
> (div(int(clicks),max(int(displays),const(1^8.0
>   
>   
> yh_cug
>   
>   
> DidntFindAnySynonyms
> No synonyms found for this query.  Check 
> your synonyms file.
>   
>   
> ExtendedDismaxQParser
> 
> 
>   (expiration:[NOW TO *] OR (*:* -expiration:*))^6
> 
> 
>   (expiration:[1394619501862 TO *]
> (+MatchAllDocsQuery(*:*) -expiration:*))^6.0
> 
> 
>   div(clicks,max(displays,1))^8
> 
>   
>   
> ExtendedDismaxQParser
> 
> 
>   div(clicks,max(displays,1))^8
> 
>   
>   
> 
> 
> 
> 
> -Original Message-
> From: Jack Krupansky [mailto:j...@basetechnology.com]
> Sent: Dienstag, 11. März 2014 14:25
> To: solr-user@lucene.apache.org
> Subject: Re: NOT SOLVED searches for single char tokens instead of 
> from 3 uppwards
> 
> The usual use of an ngram filter is at index time and not at query time.
> What exactly are you trying to achieve by using ngram filtering at 
> query time as well as index time?
> 
> Generally, it is inappropriate to combine the word delimiter filter 
> with the standard tokenizer - the later removes the punctuation that 
> normally influences how WDF treats the parts of a token. Use the white 
> space tokenizer if you intend to use WDF.
> 
> Which query parser are you using? What fields are being queried?
> 
> Please post the parsed query string from the debug output - it will 
> show the precise generated query.
> 
> I think what you are seeing is that the ngram filter is generating 
> tokens like "h_cugtest" and then the WDF is removing the underscore and then 
> "h"
> gets generated as a separate token.
> 
> -- Jack Krupansky
> 
> -Original Message-
> From: Andreas Owen
> Sent: Tuesday, March 11, 2014 5:09 AM
> To: solr-user@lucene.apache.org
> Subject: RE: NOT SOLVED searches for single char tokens instead of 
> from 3 uppwards
> 
> I got it roght the first time and here is my requesthandler. The field 
> "plain_text" is searched correctly and has the sam fieldtype as 
> "title" -> "text_de"
> 
>  class="solr.SynonymExpandingExtendedDismaxQParserPlugin">
>   
> 
>   
> standard
>   
>   
> shingle
> true
> true
> 2
> 4
>