facet.sort-Problem in 4.0

2013-03-06 Thread Oliver Schihin

Hello

Our application (VuFind, a library discovery tool) is speaking to a solr 3.5 index per 
default. To get the facet values back, the param 'facet.sort' is sent in the request with 
an empty value. Solr then delivers the called facets sorted by count, as it should be. A 
request looks (simplified) like this:

http://[host:port]/solr/[core]/select/?q=*:*&rows=10&fl=*,score&start=0&facet=true&facet.mincount=1&facet.limit=30&facet.sort=&facet.field=authorStr&wt=xml

We switched from solr 3.5 to solr 4.1 and discovered that the request built by our 
application yields an undesired result, facets are sorted by index. This is the behaviour:

* facet.sort=  //with the empty value, facets are sorted by index
* facet.sort=count|index //obviously, facets are sorted as told
* no facet.sort parameter //facets are sorted by count, as the default should be

A quick test on a solr 4.0 instance shows that the behaviour is still backward compatible 
with 3.5. An empty value delivers standard sorting by count, a non-existing parameter as well.


Short: 4.1 is not backwards compatible to 3.5 and to 4.0 with regard to facet 
sorting.

We suspect this has to do with the change from boolean to string in the 
setFacetSort Method:
http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/client/solrj/SolrQuery.html#setFacetSort%28java.lang.String%29

Questions are:
* Is this known? I guess so
* Is there a way to fix this on the solr side?
* What are the reasons behind the change?

If not, we obviously should to this in the client app.

Regards
Oliver


Re: removing whitespaces in query

2013-03-07 Thread Oliver Schihin

Hello Jochen

What are your tokenizers? I guess it should be 'KeywordTokenizerFactory'. To fully 
understand, you might send the whole analyzer chain.


But there might be a simple mistake in your pattern, character classes are enclosed by 
square brackets. We do a replace of all non-alphanumeric characters like this:

**

**

If that helps.
Regards from Basel
Oliver

 Original-Nachricht 
Betreff: removing whitespaces in query
Von: Jochen Lienhard 
An: solr-user@lucene.apache.org
Datum: 07.03.2013 10:33


Hello,

we have indexed a field, where we have removed the whitespaces before the 
indexing.

For example:

50A91
Frei91\:9984

Now we want allow the users to search for:

50 A 91
Frei 91 \: 9984

Our idea was to add a PatternReplaceFilterFactory in the query analyzer to 
remove the
whitespaces:


But it does not work.

For normal queries - we are using vufind als frontend - we can remove the 
whitespace in
the yaml part, but if
the user search with wildcards ... the yaml does not work ... so we hope to 
find a
solution in solr.

We are using solr 3.6.

Thanks for ideas and hints.

Greetings from Germany

Jochen





Re: removing whitespaces in query

2013-03-07 Thread Oliver Schihin

Hi Jochen

You could try this:


   
   
   
   
   
   
   



Remarks:
* I am not sure whether your sequence of filters is correct. I guess you should use 
charFilter at the beginning of the chain only, and patternReplace after the tokenizer.
* If you use ICUFoldingFilter you won't need LowerCaseFilter, it would be redundant. 
LowerCase might do the job

* TrimFilter is redundant in that setting, I guess.
* A LenghtFilterFactory can be helpfull against odd term of only one character
* You do have a type attribute="query" in your analyzer element. Do the two chains 
correspond or could you do with  an analyzer for both index and query?


Regards
Oliver


 Original-Nachricht 
Betreff: Re: removing whitespaces in query
Von: Jochen Lienhard 
An: solr-user@lucene.apache.org
Datum: 07.03.2013 11:04


Hello Jilal and Oliver,

hmmm ... I don't know, how two fields can help.

The problem seems to be, that solr does not recognize the whitespace.

We are using following analyser:









It replaces in the Query: Frei 91 \: 9984 the Frei with blubb ... so it seems 
to work
perfect.
But when we try to replace whitespace using \s nothing happens.

@Oliver: we dont want replace the : in the query ... it is a part of our 
callnumbers.

Greetings

Jochen

Oliver Schihin schrieb:

Hello Jochen

What are your tokenizers? I guess it should be 'KeywordTokenizerFactory'. To 
fully
understand, you might send the whole analyzer chain.

But there might be a simple mistake in your pattern, character classes are 
enclosed by
square brackets. We do a replace of all non-alphanumeric characters like this:
**

**

If that helps.
Regards from Basel
Oliver

 Original-Nachricht 
Betreff: removing whitespaces in query
Von: Jochen Lienhard 
An: solr-user@lucene.apache.org
Datum: 07.03.2013 10:33


Hello,

we have indexed a field, where we have removed the whitespaces before the 
indexing.

For example:

50A91
Frei91\:9984

Now we want allow the users to search for:

50 A 91
Frei 91 \: 9984

Our idea was to add a PatternReplaceFilterFactory in the query analyzer to 
remove the
whitespaces:


But it does not work.

For normal queries - we are using vufind als frontend - we can remove the 
whitespace in
the yaml part, but if
the user search with wildcards ... the yaml does not work ... so we hope to 
find a
solution in solr.

We are using solr 3.6.

Thanks for ideas and hints.

Greetings from Germany

Jochen








Field type change / copy field

2011-08-24 Thread Oliver Schihin

Hello list

My documents come with a field holding a date, always a year:
2008In the schema, this content is taken for a field  as an integer, and it will be 
searchable.


Through a copyfield-instruction I move the  to a -field, you guess, to 
use it for faceting and make range queries possible. Its field type is of the class 
'solr.TrieDateField' that requires canonical date representation. Is there a way in solr 
to extend the simple year to 2008-01-01T00:00:00Z. Or, do i have 
to solve the problem in preprocessing, before posting?


Thanks
Oliver


spellcheck-index is rebuilt on commit

2012-01-02 Thread Oliver Schihin
Hello

We are working with solr 4.0, the spellchecker used is still the classic
IndexBasedSpellChecker. Now every time I do a commit, it rebuilds the
spellchecker index, even though I clearly state a build on optimize. The
configuration in solrconfig looks like this:


I call commits testwise through curl


This is from the log:


Where am I wrong, any suggestions? Thanks for help
Oliver

--
View this message in context: 
http://lucene.472066.n3.nabble.com/spellcheck-index-is-rebuilt-on-commit-tp3626492p3626492.html
Sent from the Solr - User mailing list archive at Nabble.com.


ICUCollation throws exception

2012-07-16 Thread Oliver Schihin

Hello

According to release notes from 4.0.0-ALPHA, SOLR-2396, I replaced 
ICUCollationKeyFilterFactory with ICUCollationField in our schema. But this throws an 
exception, see the following excerpt from the log:


Jul 16, 2012 5:27:48 PM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] 
fieldType "alphaOnlySort": Pl
ugin init failure for [schema.xml] analyzer/filter: class 
org.apache.solr.schema.ICUCollationField

at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:168)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:359)

The deprecated filter of ICUCollationKeyFilterFactory is working without any problem. This 
is how I did the schema (with the deprecated filter):


   
   omitNorms="true">

  


  



Do I have to replace jars in /contrib/analysis-extras/, or any other hints of what might 
be wrong in my install and configuration?


Thanks a lot
Oliver




ICUCollation throws exception

2012-07-17 Thread Oliver Schihin

Hello

According to release notes from 4.0.0-ALPHA, SOLR-2396, I replaced 
ICUCollationKeyFilterFactory with ICUCollationField in our schema. But this throws an 
exception, see the following excerpt from the log:


Jul 16, 2012 5:27:48 PM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] 
fieldType "alphaOnlySort": Pl
ugin init failure for [schema.xml] analyzer/filter: class 
org.apache.solr.schema.ICUCollationField

at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:168)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:359)

The deprecated filter of ICUCollationKeyFilterFactory is working without any problem. This 
is how I did the schema (with the deprecated filter):


   
   omitNorms="true">

  


  



Do I have to replace jars in /contrib/analysis-extras/, or any other hints of what might 
be wrong in my install and configuration?


Thanks a lot
Oliver






Re: Is there any special meaning for # symbol in solr.

2012-09-04 Thread Oliver Schihin

You are not using a string type, but a TextField. And in your analysis chain,
standardtokenizer strips the number sign (or #). You can check this in the 
"analysis" part
of the solr backend.

You can either use a string type for seaches like C#, C++ and the like, or map 
the
characters to something textual *before* tokenizing. My solution goes something 
like this:


while mapping-chars.txt is:
*
# 
# Specials
# 

# C+ => Cplus
# C++ => Cplusplus
"\u0043\u002B" => "Cplus"
"\u0063\u002B" => "Cplus"
"\u0043\u002B\u002B" => "Cplusplus"
"\u0063\u002B\u002B" => "Cplusplus"

# C#, C♯ => Csharp
"\u0043\u0023" => "Csharp"
"\u0063\u0023" => "Csharp"
"\u0043\u266f" => "Csharp"
"\u0063\u266f" => "Csharp"

# F#, F♯ => Fsharp
"\u0046\u0023" => "Fsharp"
"\u0066\u0023" => "Fsharp"
"\u0046\u266f" => "Fsharp"
"\u0066\u266f" => "Fsharp"

# J#, J♯ => Jsharp
"\u004A\u0023" => "Jsharp"
"\u006A\u0023" => "Jsharp"
"\u004A\u266f" => "Jsharp"
"\u006A\u266f" => "Jsharp"

# ♭ => b
"\u266d" => "b"

# @ => at
"\u0040" => "at"
***

Then use any tokenizer



 Original-Nachricht 
Betreff: Re: Is there any special meaning for # symbol in solr.
Von: veena rani 
An: solr-user@lucene.apache.org
CC: te 
Datum: 04.09.2012 09:49


this is the field type i m using for techskill,

 


  




  
  




  



On Tue, Sep 4, 2012 at 1:16 PM, veena rani  wrote:


No, # is not a stop word.


On Tue, Sep 4, 2012 at 12:59 PM, 李赟  wrote:


Is "#" in your stop words list ?


2012-09-04



Li Yun
Software Engineer @ Netease
Mail: liyun2...@corp.netease.com
MSN: rockiee...@gmail.com




发件人: veena rani
发送时间: 2012-09-04  12:57:26
收件人: solr-user; te
抄送:
主题: Re: Is there any special meaning for # symbol in solr.

if i use this link ,
http://localhost:8080/solr/select?&q=(techskill%3Ac%23)
, solr is going to display techskill:c result.
But i want to display only techskill:c#  result.
On Mon, Sep 3, 2012 at 7:23 PM, Toke Eskildsen 
wrote:
On Mon, 2012-09-03 at 13:39 +0200, veena rani wrote:

 I have an issue with the # symbol, in solr,
 I m trying to search for string ends up with # , Eg:c#, it is

throwing

 error Like, org.apache.lucene.queryparser.classic.ParseException:

Cannot

 parse '(techskill:c': Encountered "" at line 1, column 12.

Solr only received '(techskill:c', which has unbalanced parentheses.
My guess is that you do not perform a URL-encode of '#' and that you
were sending something like
http://localhost:8080/solr/select?&q=(techskill:c#)
when you should have been sending
http://localhost:8080/solr/select?&q=(techskill%3Ac%23)



--
Regards,
Veena.
Banglore.




--
Regards,
Veena.
Banglore.