Got parseException when search keyword AND on a text field

2008-04-18 Thread Xuesong Luo
Hi, 
I got the following error when search keyword AND on a text field. I
checked stopwords.txt, it has an entry for word and.(case insensitive),
but it seems not work for word AND. Does any one know how to fix this
problem?

Thanks
Xuesong

http://localhost/search/select/?q=firstName%3AAND&version=2.2&start=0&ro
ws=10&indent=on

INFO: [triHealthPerf] /select/
rows=10&start=0&indent=on&q=firstName:AND&version=2.2 0 0
2008-04-18 16:12:10,877 ERROR [STDERR] Apr 18, 2008 4:12:10 PM
org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.queryParser.ParseException: Cannot parse
'firstName:AND': Encountered "AND" at line 1, column 10.
Was expecting one of:
"(" ...
"*" ...
 ...
 ...
 ...
 ...
"[" ...
"{" ...
 ...
at
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:150)
at
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:79)
at
org.apache.solr.search.OldLuceneQParser.parse(LuceneQParserPlugin.java:1
19)
at org.apache.solr.search.QParser.getQuery(QParser.java:80)
at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.
java:66)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search
Handler.java:143)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:117)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:902)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
va:280)

Here is the text field definition.

 
  







  
  







  




Re: Got parseException when search keyword AND on a text field

2008-04-23 Thread Xuesong Luo
Otis, Thanks for the reply. Is there a list of words that have special
meaning? 

Thanks
Xuesong  
 


Re: Got parseException when search keyword AND on a text field
Otis Gospodnetic
Fri, 18 Apr 2008 18:39:45 -0700

Xuesong,

AND has a special meaning - it is a boolean AND when capitalized.  That
is why 
you are getting an error - the query parser doesn't know what to do with
just 
"AND" for a query.

Otis 


question about highlight field

2007-06-01 Thread Xuesong Luo
Hi, there,

I have a question about how to use the highlight field(hl.fl), below is
my test result. As you can see, if I don't use hl.fl in the query, the
highlighting element in the result only shows the id information. I have
to add the field name (hl.fl=TITLE) to the query to see the field
information. Is that the correct behavior? If there are multiple fields
that could contain the search string, I have to add all of them to
hl.fl?

 

Thanks

Xuesong

 

http://localhost:8080/search/select/?q=Consultant&version=2.2&start=0&ro
ws=10&indent=on&hl=true

 

-
  <   

  <   

  <

 

 

http://localhost:8080/search/select/?q=Consultant&version=2.2&start=0&ro
ws=10&indent=on&hl=true&hl.fl=TITLE
 

 

 

< 

  <   

 

   Senior Event Manager 

   

  <  



 



RE: question about highlight field

2007-06-04 Thread Xuesong Luo
Mike,

Thanks for the information, You are right, my problem is my default
search field (searchall) is not stored. The searchall field is a multi
valued field(a combination of TITLE and a few other fields).I have a
separate field TITLE, which is stored, so I thought that field should be
highlighted automatically if hl.fl is absent, I didn't know the default
search field will be used until now. After I store the searchall field,
I saw the TITLE information in searchall field is displayed in the
highlighting element when hl.fl is absent. (See example below)

- 

  - 

- 

Senior Event Manager 

  



  

  

So if I need to search a string in field f1, f2, f3 and highlight them
in the response, I have to append hl.fl=f1,f2,f3 to my query. Is this
the only solution?  I thought of using searchall field, but the problem
is the highlight element doesn't tell which value belongs to which
field, as you can see in the example above, I can't tell the Senior
Event Manager is from TITLE or other fields. 

 

Thanks for all the help

Xuesong

 

 

-Original Message-

From: Mike Klaas [mailto:[EMAIL PROTECTED] 

Sent: Friday, June 01, 2007 11:43 AM

To: solr-user@lucene.apache.org

Subject: Re: question about highlight field

 

 

On 1-Jun-07, at 9:37 AM, Xuesong Luo wrote:

 

> Hi, there,

> 

> I have a question about how to use the highlight field(hl.fl),  

> below is

> my test result. As you can see, if I don't use hl.fl in the query, the

> highlighting element in the result only shows the id information. I  

> have

> to add the field name (hl.fl=TITLE) to the query to see the field

> information. Is that the correct behavior? If there are multiple  

> fields

> that could contain the search string, I have to add all of them to

> hl.fl?

 

Highlighting uses the following fields:

 

1. hl.fl, if present, will define all fields to be highlighted.  You  

can highlight fields that were not part of the query (as you  

demonstrate below)

2. if hl.fl is absent and qt=standard, the default search field is  

highlighted (set in schema.xml or df= parameter

3. if hl.fl is absent and qt=dismax, the query fields are used (qf=)

 

Note that every field to be highlighted must be stored.  If not, it  

will not be present in the output (perhaps that is what you are  

seeing in your example).

 

Finally, all terms are highlighted in all highlight fields.  If you  

query searches for different terms in different fields and you want  

this exactitude to carry forth in your highlighting, specify  

hl.requireFieldMatch=true.

 

-Mike

 



RE: question about highlight field

2007-06-04 Thread Xuesong Luo
Chris,
Thanks for the reply. I'm curious why we want to search one field but
highlight different fields? Doesn't it make more sense to only highlight
the query fields? In my example, if I search f1, f2, f3, most likely I
only want to the searching words in those fields to be highlighted. Of
course I can use hl.fl, but I think it make more sense for solr to
automatically highlight those fields(rather than the default search
field) for us. 

Thanks
Xuesong


-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Monday, June 04, 2007 11:33 AM
To: solr-user@lucene.apache.org
Subject: RE: question about highlight field

:
: So if I need to search a string in field f1, f2, f3 and highlight them
: in the response, I have to append hl.fl=f1,f2,f3 to my query. Is this
: the only solution?  I thought of using searchall field, but the
problem
: is the highlight element doesn't tell which value belongs to which

that's because you are highlighting your "searchall" field ... you can
search one field and highlight differnet fields -- but yes, you have to
list the fields you want to highlight (Solr can only do some much to
"guess" which fields to highlight, and in your case it's not gussing
what
you want it to)




-Hoss




RE: question about highlight field

2007-06-04 Thread Xuesong Luo
Thanks Mike, I tried using dismax and it seems working. The only problem
is I could not use wildcard in the query string if I specify qt=dismax. 

I have a default search field called TITLE(TextField),
This one returns all engineer whose TITLE starts with engin:  /?q=engin*
This one does not return anything:   /?q=engin*&qt=dismax

Do you know what is the problem?

Thanks
Xuesong

-Original Message-
From: Mike Klaas [mailto:[EMAIL PROTECTED] 
Sent: Monday, June 04, 2007 11:46 AM
To: solr-user@lucene.apache.org
Subject: Re: question about highlight field

On 4-Jun-07, at 9:56 AM, Xuesong Luo wrote:

>
> So if I need to search a string in field f1, f2, f3 and highlight them
> in the response, I have to append hl.fl=f1,f2,f3 to my query. Is this
> the only solution?  I thought of using searchall field, but the  
> problem
> is the highlight element doesn't tell which value belongs to which
> field, as you can see in the example above, I can't tell the Senior
> Event Manager is from TITLE or other fields.

As Chris mentioned, I'm not sure how Solr could "know" that you want  
to highlight those fields, given that you aren't even searching them.

One option is to search those fields directly, using dismax.  In that  
case, the highlight fields will be picked up automatically.

-Mike




RE: question about highlight field

2007-06-05 Thread Xuesong Luo
Chris,
Thanks for your reply, I got what you said. In my case, I have the
following requirements:
1. search on different fields
2. highlight the query string in searching fields, not default search
field
3. Use wildcard

I think the only option I have is to use standard request handler and
specify which field I want to search and add the same field to hl.fl,
something similar to ?q=TITLE:consult*&hl=on&hl.fl=TITLE, right?

Thanks
Xuesong

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Monday, June 04, 2007 11:08 PM
To: solr-user@lucene.apache.org
Subject: RE: question about highlight field


: Thanks for the reply. I'm curious why we want to search one field but
: highlight different fields? Doesn't it make more sense to only
highlight

consider a typical use case: you have an index of articles with
fields for the title, description, and body of the article.  you search
all of them, but on the search results page you only highlight matches
in
the title and description (maybe you have a "cached" view of each
article
where you display the stored contents of the article body with
highlighting)

: the query fields? In my example, if I search f1, f2, f3, most likely I
: course I can use hl.fl, but I think it make more sense for solr to
: automatically highlight those fields(rather than the default search
: field) for us.

that wasn't actually your example .. you weren't searching across fields
f1, f2 and f3; you were searching for words in the default field
("searchall") that happened to be made by combining the text from f1,
f2,
and f3 using copyField.  As Mike pointed out, if you use dismax to
*really* search for your input in f1, f2, and f3 (using the qf param)
then
Solr will highlight those fields for you.

you may be wondering about what happens when you do a search like...
   http://localhost:8983/solr/select?q=features%3Asolr&hl=on
...ie: your query string explicitly looks for solr in the field
features.

Solr doesn't "guess" that you want to highlight the features field in
this
case, it could -- but it would be a bad idea.  The hl.fl default
"guessing" logic is independent of the fields that appear in the query
string ... if you were relying on SOlr to highlight your default search
field for you so you could display it in your application, you wouldn't
be
very happy if your application broke because someone happend to do a
field
specific query.

-Hoss




RE: question about highlight field

2007-06-05 Thread Xuesong Luo
Good point, I haven't thought about it. It makes sense to use
requireFieldMatch in my case.

One more question about using wildcard. I found if wildcard is used in
the query, the highlight elements only shows unique id, it won't display
the field information(See below, the arr section in blue is returned).
Is this the designed behavior? 

 

 

?q=TITLE:consult*&hl=on&hl.fl=TITLE

 



 

  

Consultant 

  

 



 

 

Thanks

Xuesong

 

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 05, 2007 12:02 PM
To: solr-user@lucene.apache.org


Subject: RE: question about highlight field

 

 

: 1. search on different fields

: 2. highlight the query string in searching fields, not default search

 

: I think the only option I have is to use standard request handler and

: specify which field I want to search and add the same field to hl.fl,

: something similar to ?q=TITLE:consult*&hl=on&hl.fl=TITLE, right?

 

pretty much, if you want to ensure that your highlighting only aplies to

things that actauly result in query matches, set
hl.requireFieldMatch=true

that way in queries like this...

 

   ?q=DEK:albino+TITLE:elephant&hl=on&hl.fl=TITLE,DEK

 

...the word elephant won't be highlighted in the DEK field, and the word

albino won't be highlighted in the TITLE field.

 

 

 

 

-Hoss

 



RE: question about highlight field

2007-06-05 Thread Xuesong Luo
Yes, I'm using 1.1. The example in my last email is an expected result,
not the real result. Indeed I didn't see the arr element in the
highlighting element when either prefix wildcard or true wildcard query
is used.
I just tried nightly build, as you said, it works great except for
prefix wildcard.

Thanks for your help!
Xuesong


-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 05, 2007 10:16 PM
To: solr-user@lucene.apache.org
Subject: RE: question about highlight field


: One more question about using wildcard. I found if wildcard is used in
: the query, the highlight elements only shows unique id, it won't
display

: 
:  
:   
: Consultant

your description of the problem doesn't seem to match what you've pasted
... it looks like it's highlighting just the prefix from the query.

You're using Solr 1.1 right?

Unfortunately, i think you are damned if you do, damned if you don't ...
in Solr 1.1, highlighting used the info from the raw query to do
highlighting, hence in your query for consult* it would highlight
the Consult part of Consultant even though the prefix query was matchign
the whole word.  In the trunk (soon to be Solr 1.2) Mike fixed that so
the
query is "rewritten" to it's expanded form before highlighting is done
...
this works great for true wild card queries (ie: cons*t* or cons?lt*)
but
for prefix queries Solr has an optimization ofr Prefix queries (ie:
consult*) to reduce the likely hood of Solr crashing if the prefix
matches
a lot of terms ... unfortunately this breaks highlighting of prefix
queries, and no one has implemented a solution yet...

https://issues.apache.org/jira/browse/SOLR-195




-Hoss




RE: Wildcards / Binary searches

2007-06-06 Thread Xuesong Luo
I have a similar question about dismax, here is what Chris said:

the dismax handler uses a much more simplified query syntax then the
standard request handler.  Only +, -, and " are special characters so
wildcards are not supported.


HTH

-Original Message-
From: galo [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, June 06, 2007 8:41 AM
To: solr-user@lucene.apache.org
Subject: Wildcards / Binary searches

Hi,

Three questions:

1. I want to use solr for some sort of live search, querying with 
incomplete terms + wildcard and getting any similar results. Radioh* 
would return anything containing that string. The DisMax req. hander 
doesn't accept wildcards in the q param so i'm trying the simple one and

still have problems as all my results are coming back with score = 1 and

I need them sorted by relevance.. Is there a way of doing this? Why 
doesn't * work in dismax (nor ~ by the way)??

2. What do the phrase slop params do?

3. I'm trying to implement another index where I store a number of int 
values for each document. Everything works ok as integers but i'd like 
to have some sort of fuzzy searches based on the bit representation of 
the numbers. Essentially, this number:

1001001010100

would be compared to these two

1011001010100
1001001010111

And the first would get a bigger score than the second, as it has only 1

flipped bit while the second has 2.

Is it possible to implement this in solr?


Cheers,
galo




RE: highlight and wildcards ?

2007-06-07 Thread Xuesong Luo
Frédéric,
I asked a similar question several days before, it seems we don't have a 
perfect solution when using prefix wildcard with highlight. Here is what Chris 
said: 

in Solr 1.1, highlighting used the info from the raw query to do highlighting, 
hence in your query for consult* it would highlight the Consult part of 
Consultant even though the prefix query was matchign the whole word.  In the 
trunk (soon to be Solr 1.2) Mike fixed that so the query is "rewritten" to it's 
expanded form before highlighting is done ...
this works great for true wild card queries (ie: cons*t* or cons?lt*) but for 
prefix queries Solr has an optimization ofr Prefix queries (ie:
consult*) to reduce the likely hood of Solr crashing if the prefix matches a 
lot of terms ... unfortunately this breaks highlighting of prefix queries, and 
no one has implemented a solution yet...



-Original Message-
From: Frédéric Glorieux [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 07, 2007 3:52 AM
To: solr-user@lucene.apache.org
Subject: highlight and wildcards ?


Hi all,

I'm talking about solr subversion, jetty example, default documents, 
like the tutorial. I tried to highlight queries with wildcard. Documents 
are found like waited, but I haven't seen the terms highlighted. It 
seems to work with fuzzy search, so I thought it was a supposed feature. 
Am I wrong ?


Tests
=

q=solr
http://localhost:8983/solr/select?indent=on&version=2.2&q=solr&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl=on&hl.fl=features


  

  Scalability - Efficient Replication to other Solr 
Search Servers
   
  


q=black~ (fuzzy search)
see black and clocked
http://localhost:8983/solr/select?indent=on&version=2.2&q=solr&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl=on&hl.fl=features

   
 
   Printing speed up to 29ppm black, 19ppm color
 
   
   
   
 
   NVIDIA GeForce 7800 GTX GPU/VPU clocked at 
486MHz
 
   
   
 
  ATI RADEON X1900 GPU/VPU clocked at 650MHz
 
   




q=a*
http://localhost:8983/solr/select?indent=on&version=2.2&start=0&rows=100&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl=on&hl.fl=features&q=a*

   
   
   
   
   
   
   
   
   
   
   
   
   



-- 
Frédéric Glorieux
École nationale des chartes
direction des nouvelles technologies et de l'informatique




RE: highlight and wildcards ?

2007-06-07 Thread Xuesong Luo
Same in my project. Chris does mention we can put a ? before the *, so instead 
of domin*, you can use domin?*, however that requires at least one char 
following your search string.


-Original Message-
From: Frédéric Glorieux [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 07, 2007 10:37 AM
To: solr-user@lucene.apache.org
Cc: Florence Clavaud; Nicolas Legrand
Subject: Re: highlight and wildcards ?

Xuesong (?),

Thanks a lot for your answer, sorry to have not scan the archives 
before. This a really good and understandable reason, but sad for my 
project. Prefix queries will be the main activities of my users (they 
need to search latin texts, so that domin* is enough to match "dominus" 
or "domino"). So, I need some more investigations.

Xuesong Luo a écrit :

> Frédéric,
> I asked a similar question several days before, it seems we don't have a 
> perfect solution when using prefix wildcard with highlight. Here is what 
> Chris said: 
> 
> in Solr 1.1, highlighting used the info from the raw query to do 
> highlighting, hence in your query for consult* it would highlight the Consult 
> part of Consultant even though the prefix query was matchign the whole word.  
> In the trunk (soon to be Solr 1.2) Mike fixed that so the query is 
> "rewritten" to it's expanded form before highlighting is done ...
> this works great for true wild card queries (ie: cons*t* or cons?lt*) but for 
> prefix queries Solr has an optimization ofr Prefix queries (ie:
> consult*) to reduce the likely hood of Solr crashing if the prefix matches a 
> lot of terms ... unfortunately this breaks highlighting of prefix queries, 
> and no one has implemented a solution yet...



-- 
Frédéric Glorieux
École nationale des chartes
direction des nouvelles technologies et de l'informatique




TextField case sensitivity

2007-06-07 Thread Xuesong Luo
I run a problem when searching on a TextField. When I pass q=William or
q=WILLiam, solr is able to find records whose default search field value
is William, however if I pass q=WilliAm, solr did not return any thing.
I searched on the archive, Yonik mentioned the lowercasefilterfactory
doesn't work for wildcard because the QueryParser does not invoke
analysis for partial word, that makes sense. But in my case, it's a
whole word. Anyone knows why it's not working? Below is my schema info.

Thanks
Xuesong


  


  
  


  




RE: TextField case sensitivity

2007-06-07 Thread Xuesong Luo
I have WordDelimiterFilter defined in the schema, I didn't include it in
my original email because I thought it doesn't matter. It seems it
matters. Looks like WilliAm is treated as two words. That's why it
didn't find a match.

Thanks
Xuesong

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Thursday, June 07, 2007 11:25 AM
To: solr-user@lucene.apache.org
Subject: Re: TextField case sensitivity

On 6/7/07, Xuesong Luo <[EMAIL PROTECTED]> wrote:
> I run a problem when searching on a TextField. When I pass q=William
or
> q=WILLiam, solr is able to find records whose default search field
value
> is William, however if I pass q=WilliAm, solr did not return any
thing.

Sounds like WordDelimiterFilter is still being used for your fieldType.
After you changed the fieldType for "text", did you restart Solr and
re-index your collection?

-Yonik


> I searched on the archive, Yonik mentioned the lowercasefilterfactory
> doesn't work for wildcard because the QueryParser does not invoke
> analysis for partial word, that makes sense. But in my case, it's a
> whole word. Anyone knows why it's not working? Below is my schema
info.
>
> Thanks
> Xuesong
>
>  positionIncrementGap="100">
>   
> 
> 
>   
>   
> 
> 
>   
> 




RE: TextField case sensitivity

2007-06-07 Thread Xuesong Luo
Ryan, you are right, that's the problem. WilliAM is treated as two words
by the WordDelimiterFilterFactory.

Thanks
Xuesong

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 07, 2007 11:30 AM
To: solr-user@lucene.apache.org
Subject: Re: TextField case sensitivity

have you taken a look the output from the admin/analysis?
http://localhost:8983/solr/admin/analysis.jsp?highlight=on

This lets you see what tokens are generated for index/query.  From your 
description, I'm suspicious that the generated tokens are actually:
  willi am

Also, if you want the same analyzer for indexing and query, just define
one:


 
  




Xuesong Luo wrote:
> I run a problem when searching on a TextField. When I pass q=William
or
> q=WILLiam, solr is able to find records whose default search field
value
> is William, however if I pass q=WilliAm, solr did not return any
thing.
> I searched on the archive, Yonik mentioned the lowercasefilterfactory
> doesn't work for wildcard because the QueryParser does not invoke
> analysis for partial word, that makes sense. But in my case, it's a
> whole word. Anyone knows why it's not working? Below is my schema
info.
> 
> Thanks
> Xuesong
> 
>  positionIncrementGap="100">
>   
> 
> 
>   
>   
> 
> 
>   
> 
> 
> 




question about sorting

2007-06-11 Thread Xuesong Luo
Hi,
My sorting fields include both TextField type and StrField type. Because
TextField uses TokenizerFactory, they can't be sorted. I have to copy
each TextField to a StrField and sort on those StrFields. Does anyone
know if there is a better way to do that?

Thanks
Xuesong



RE: question about sorting

2007-06-11 Thread Xuesong Luo
For example, first name, department, job title etc.

Thanks
Xuesong

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Monday, June 11, 2007 6:35 PM
To: solr-user@lucene.apache.org
Subject: Re: question about sorting

On 6/11/07, Xuesong Luo <[EMAIL PROTECTED]> wrote:
> My sorting fields include both TextField type and StrField type.
Because
> TextField uses TokenizerFactory, they can't be sorted. I have to copy
> each TextField to a StrField and sort on those StrFields. Does anyone
> know if there is a better way to do that?

What information does this TextField carry?
Sorting works on indexed field values, and thus needs to be
single-valued per document.

-Yonik




RE: question about sorting

2007-06-12 Thread Xuesong Luo
Thanks, Yonik. Unfortunately we have users whose first names contain
more than one word, it seems copy field is my only option.

Thanks
Xuesong 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Tuesday, June 12, 2007 10:35 AM
To: solr-user@lucene.apache.org
Subject: Re: question about sorting

On 6/11/07, Xuesong Luo <[EMAIL PROTECTED]> wrote:
> For example, first name, department, job title etc.

Something like first name might be able to be a single field that is
searchable and sortable (use a keyword tokenizer followed by a
lowercase filter).  If the field contains multiple words, and you want
to both search and sort on that field, there isn't currently a better
alternative to copyField.

-Yonik




RE: question about highlight field

2007-06-14 Thread Xuesong Luo
Hi, Chris,
I rewrite the prefix wildcard query consult* to (consult consult?*), it
works with highlighting. Do you think it's a possible solution? 
Could you explain a little bit why put a "?" before "*" won't crash solr
if matching a lot of terms?

Thanks
Xuesong 

In the trunk (soon to be Solr 1.2) Mike fixed that so the
query is "rewritten" to it's expanded form before highlighting is done
...
this works great for true wild card queries (ie: cons*t* or cons?lt*)
but
for prefix queries Solr has an optimization ofr Prefix queries (ie:
consult*) to reduce the likely hood of Solr crashing if the prefix
matches
a lot of terms ... unfortunately this breaks highlighting of prefix
queries, and no one has implemented a solution yet...

https://issues.apache.org/jira/browse/SOLR-195




-Hoss




add CJKTokenizer to solr

2007-06-18 Thread Xuesong Luo
Hi, 

I got the error below after adding CJKTokenizer to schema.xml.  I
checked the constructor of CJKTokenizer, it requires a Reader parameter,
I guess that's why I get this error, I searched the email archive, it
seems working for other users. Does anyone know what is the problem?

 

Thanks

Xuesong

 

 

2007-06-18 17:09:29,369 ERROR [STDERR] Jun 18, 2007 5:09:29 PM
org.apache.solr.core.SolrException log

SEVERE: org.apache.solr.core.SolrException: Error instantiating class
class org.apache.lucene.analysis.cjk.CJKTokenizer

at org.apache.solr.core.Config.newInstance(Config.java:229)

at
org.apache.solr.schema.IndexSchema.readTokenizerFactory(IndexSchema.java
:619)

at
org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:593)

at
org.apache.solr.schema.IndexSchema.readConfig(IndexSchema.java:331)

at
org.apache.solr.schema.IndexSchema.(IndexSchema.java:71)

 

 

 

Schema.xml



  






  



 



RE: add CJKTokenizer to solr

2007-06-21 Thread Xuesong Luo
Thanks, Toru and Chris,
I tried both the CJKTokenizer and CJKAnalyzer. Both return some unexpected 
highlight results when I tested with Germany. The field value I searched is 
"Ein Mann beißt den Hund".  The search criteria is beißt. 

When using CJKAnalyzer, beißt is treated as 2 single terms(bei and ß) the 
highlight result is: 
Ein Mann beißt den Hund 

When using CJKTokenizer, beißt is treated as 3 single terms, the result is:
Ein Mann beißt den Hund

When using standard tokenizer, beißt is treated as a word, the result is:
Ein Mann beißt den Hund


I understand why the standard tokenizer treat beißt as a word, but don't know 
how CJKAnalyzer and CJKAnalyzer work, could anyone explain a little bit?


Thanks
Xuesong

-Original Message-
From: Toru Matsuzawa [mailto:[EMAIL PROTECTED] 
Sent: Monday, June 18, 2007 10:29 PM
To: solr-user@lucene.apache.org
Subject: Re: add CJKTokenizer to solr

I'm sorry. Because it was not possible to append it, 
it sends it again. 

> > I got the error below after adding CJKTokenizer to schema.xml.  I
> > checked the constructor of CJKTokenizer, it requires a Reader parameter,
> > I guess that's why I get this error, I searched the email archive, it
> > seems working for other users. Does anyone know what is the problem?
> 
> 
> CJKTokenizerFactory that I am using is appended.
> 
--
package org.apache.solr.analysis.ja;

import java.io.Reader;
import org.apache.lucene.analysis.cjk.CJKTokenizer ;

import org.apache.lucene.analysis.TokenStream;
import org.apache.solr.analysis.BaseTokenizerFactory;

/**
 * CJKTokenizer for Solr
 * @see org.apache.lucene.analysis.cjk.CJKTokenizer
 * @author matsu
 *
 */
public class CJKTokenizerFactory extends BaseTokenizerFactory {

  /**
   * @see org.apache.solr.analysis.TokenizerFactory#create(Reader)
   */
  public TokenStream create(Reader input) {
return new CJKTokenizer( input );
  }

}


-- 
Trou Matsuzawa





RE: add CJKTokenizer to solr

2007-06-22 Thread Xuesong Luo
Thanks, otis, I didn't know CJK is only used for Asian language. I'll try the 
German Analyzer. 

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Friday, June 22, 2007 3:18 AM
To: solr-user@lucene.apache.org
Subject: Re: add CJKTokenizer to solr

I'm jumping in the middle of the thread here.
CJK = Chinese, Japanese, Korean
German = etwas ganz anderes
Why are you trying to use CJKAnalyzer+Tokenizer for German?  Have you tried 
German Analyzer from Lucene contrib?

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message ----
From: Xuesong Luo <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Friday, June 22, 2007 8:54:37 AM
Subject: RE: add CJKTokenizer to solr

Thanks, Toru and Chris,
I tried both the CJKTokenizer and CJKAnalyzer. Both return some unexpected 
highlight results when I tested with Germany. The field value I searched is 
"Ein Mann beißt den Hund".  The search criteria is beißt. 

When using CJKAnalyzer, beißt is treated as 2 single terms(bei and ß) the 
highlight result is: 
Ein Mann beißt den Hund 

When using CJKTokenizer, beißt is treated as 3 single terms, the result is:
Ein Mann beißt den Hund

When using standard tokenizer, beißt is treated as a word, the result is:
Ein Mann beißt den Hund


I understand why the standard tokenizer treat beißt as a word, but don't know 
how CJKAnalyzer and CJKAnalyzer work, could anyone explain a little bit?


Thanks
Xuesong

-Original Message-
From: Toru Matsuzawa [mailto:[EMAIL PROTECTED] 
Sent: Monday, June 18, 2007 10:29 PM
To: solr-user@lucene.apache.org
Subject: Re: add CJKTokenizer to solr

I'm sorry. Because it was not possible to append it, 
it sends it again. 

> > I got the error below after adding CJKTokenizer to schema.xml.  I
> > checked the constructor of CJKTokenizer, it requires a Reader parameter,
> > I guess that's why I get this error, I searched the email archive, it
> > seems working for other users. Does anyone know what is the problem?
> 
> 
> CJKTokenizerFactory that I am using is appended.
> 
--
package org.apache.solr.analysis.ja;

import java.io.Reader;
import org.apache.lucene.analysis.cjk.CJKTokenizer ;

import org.apache.lucene.analysis.TokenStream;
import org.apache.solr.analysis.BaseTokenizerFactory;

/**
 * CJKTokenizer for Solr
 * @see org.apache.lucene.analysis.cjk.CJKTokenizer
 * @author matsu
 *
 */
public class CJKTokenizerFactory extends BaseTokenizerFactory {

  /**
   * @see org.apache.solr.analysis.TokenizerFactory#create(Reader)
   */
  public TokenStream create(Reader input) {
return new CJKTokenizer( input );
  }

}


-- 
Trou Matsuzawa










RE: Multi-language Tokenizers / Filters recommended?

2007-06-23 Thread Xuesong Luo
For chinese search, you may also consider
org.apache.lucene.analysis.cn.ChineseTokenizer.

ChineseTokenizer Description: Extract tokens from the Stream using
Character.getType() Rule: A Chinese character as a single token
Copyright: Copyright (c) 2001 Company: The difference between thr
ChineseTokenizer and the CJKTokenizer (id=23545) is that they have
different token parsing logic. Let me use an example. If having a
Chinese text "C1C2C3C4" to be indexed, the tokens returned from the
ChineseTokenizer are C1, C2, C3, C4. And the tokens returned from the
CJKTokenizer are C1C2, C2C3, C3C4. Therefore the index the CJKTokenizer
created is much larger. The problem is that when searching for C1, C1C2,
C1C3, C4C2, C1C2C3 ... the ChineseTokenizer works, but the CJKTokenizer
will not work.




-Original Message-
From: Teruhiko Kurosaka [mailto:[EMAIL PROTECTED] 
Sent: Friday, June 22, 2007 2:25 PM
To: solr-user@lucene.apache.org
Subject: RE: Multi-language Tokenizers / Filters recommended? 

Hi Daniel,
As you know, Chinese and Japanese does not use
space or any other delimiters to break words.
To overcome this problem, CJKTokenizer uses a method
called bi-gram where the run of ideographic (=Chinese) 
characters are made into tokens of two neighboring
characters.  So a run of five characters ABCDE
will result in four tokens AB, BC, CD, and DE.

So search for "BC" will hits this text,
even if AB is a word and CD is another word.
That is, it increases the noise in the hits.
I don't know how much real problem it would be
for Chinese.  But for Japanese, my native language,
this is a problem. Because of this, search result
for Kyoto will include false hits of documents
that incldue Tokyoto, i.e. Tokyo prefecture.

There is another method called morphological
analysis, which uses dictionaries and grammer
rules to break down text into real words.  You
might want to consider this method. 

-kuro  




RE: snapshooter no go

2007-06-27 Thread Xuesong Luo
I got similar problems, tried both default setting solr/bin and full
path /export/home/jboss/jboss-4.0.5.GA/bin/solr/bin, neither works. I'm
using 1.2.


2007-06-27 14:10:03,907 ERROR [STDERR] Jun 27, 2007 2:10:03 PM
org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=true,waitFlush=true,waitSearcher=true)
2007-06-27 14:10:03,961 ERROR [STDERR] Jun 27, 2007 2:10:03 PM
org.apache.solr.core.SolrException log
SEVERE: java.io.IOException: snapshooter: not found
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:53)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:451)
at java.lang.Runtime.exec(Runtime.java:591)
at
org.apache.solr.core.RunExecutableListener.exec(RunExecutableListener
.java:70)
at
org.apache.solr.core.RunExecutableListener.postCommit(RunExecutableLi
stener.java:97)
at
org.apache.solr.update.UpdateHandler.callPostCommitCallbacks(UpdateHa
ndler.java:99)




RE: snapshooter no go

2007-06-27 Thread Xuesong Luo
I thought I had to use the full path in dir attribute, later I realized
I should modify the environment variable path. Now it's working, I
didn't append ./ before snapshooter.

Thanks
Xuesong

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, June 27, 2007 2:49 PM
To: solr-user@lucene.apache.org
Subject: RE: snapshooter no go


: I got similar problems, tried both default setting solr/bin and full
: path /export/home/jboss/jboss-4.0.5.GA/bin/solr/bin, neither works.
I'm
: using 1.2.

Did you try setting "exe" to "./snapshooter" ?

for the record, there is no "default" setting of "solr/bin" in either
the
code, or in the example solrconfig ... there is only a commented out
example (which is clearly a bad example based on the confusion it seems
to
be causing and the fact that if you uncomment it, it doesn't work
properly)


-Hoss




RE: snapshooter no go

2007-06-28 Thread Xuesong Luo
I got another problem: solr is able to find snapshooter but didn't
generate any snapshot files after I updated the index. I checked the
log, everything looks fine, then I run snapshooter from command line. It
failed because solaris doesn't support -l option when using cp command.
I'm trying to find an alternative way but it seems the only option is to
loop through all files in that dir and use ln command to create hard
link for each of them. Does anyone know a better solution?

I'm also curious why there is no error log when solr failed running
snapshooter. 

Thanks
Xuesong




RE: snapshooter no go

2007-06-28 Thread Xuesong Luo
Just found I can use "ln dir1/* dir2" to make hard link for all files in
dir1 to dir2.

-Original Message-----
From: Xuesong Luo 
Sent: Thursday, June 28, 2007 11:17 AM
To: solr-user@lucene.apache.org
Subject: RE: snapshooter no go

I got another problem: solr is able to find snapshooter but didn't
generate any snapshot files after I updated the index. I checked the
log, everything looks fine, then I run snapshooter from command line. It
failed because solaris doesn't support -l option when using cp command.
I'm trying to find an alternative way but it seems the only option is to
loop through all files in that dir and use ln command to create hard
link for each of them. Does anyone know a better solution?

I'm also curious why there is no error log when solr failed running
snapshooter. 

Thanks
Xuesong





RE: snapshooter no go

2007-06-28 Thread Xuesong Luo
The log didn't have any error message and there is no
snapshot.20070628204821 directory generated. 

2007/06/28 20:48:21 started by
2007/06/28 20:48:21 command:
/export/home/jboss/jboss-3.2.7/bin/solr/bin/snapshooter arg1 arg2
2007/06/28 20:48:21 taking snapshot
/export/home/jboss/jboss-3.2.7/bin/solr/data/snapshot.20070628204821
2007/06/28 20:48:21 ended (elapsed time:  sec)

Thanks
Xuesong

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 28, 2007 4:53 PM
To: solr-user@lucene.apache.org
Subject: Re: snapshooter no go

Look at /logs directory - you should see snapshooter.log
there.

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: Xuesong Luo <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Thursday, June 28, 2007 8:17:14 PM
Subject: RE: snapshooter no go

I got another problem: solr is able to find snapshooter but didn't
generate any snapshot files after I updated the index. I checked the
log, everything looks fine, then I run snapshooter from command line. It
failed because solaris doesn't support -l option when using cp command.
I'm trying to find an alternative way but it seems the only option is to
loop through all files in that dir and use ln command to create hard
link for each of them. Does anyone know a better solution?

I'm also curious why there is no error log when solr failed running
snapshooter. 

Thanks
Xuesong








RE: snapshooter no go

2007-06-28 Thread Xuesong Luo
Indeed there is one more step: mkdir dir2. ln reports error if the
destination dir doesn't not exist.
I'll create an JIRA later.

Thanks
Xuesong

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 28, 2007 4:52 PM
To: solr-user@lucene.apache.org
Subject: Re: snapshooter no go

Maybe you can open an issue in JIRA with your solution, so Bill Au & Co.
can fix snappuller.

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: Xuesong Luo <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Thursday, June 28, 2007 8:42:25 PM
Subject: RE: snapshooter no go

Just found I can use "ln dir1/* dir2" to make hard link for all files in
dir1 to dir2.

-----Original Message-
From: Xuesong Luo 
Sent: Thursday, June 28, 2007 11:17 AM
To: solr-user@lucene.apache.org
Subject: RE: snapshooter no go

I got another problem: solr is able to find snapshooter but didn't
generate any snapshot files after I updated the index. I checked the
log, everything looks fine, then I run snapshooter from command line. It
failed because solaris doesn't support -l option when using cp command.
I'm trying to find an alternative way but it seems the only option is to
loop through all files in that dir and use ln command to create hard
link for each of them. Does anyone know a better solution?

I'm also curious why there is no error log when solr failed running
snapshooter. 

Thanks
Xuesong









autocommit not working for delete?

2007-07-03 Thread Xuesong Luo
Hi,
I set up solr to autocommit each minute. It works well if I sent an add
request, but it does not work for delete, nothing happened after 1
minute. Is this a bug or a designed behavior?

Thanks
Xuesong


   6




RE: autocommit not working for delete?

2007-07-03 Thread Xuesong Luo
Thanks, Ryan

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 03, 2007 11:32 AM
To: solr-user@lucene.apache.org
Subject: Re: autocommit not working for delete?

Check:
https://issues.apache.org/jira/browse/SOLR-283

This is now fixed in trunk

ryan


Xuesong Luo wrote:
> Hi,
> I set up solr to autocommit each minute. It works well if I sent an
add
> request, but it does not work for delete, nothing happened after 1
> minute. Is this a bug or a designed behavior?
> 
> Thanks
> Xuesong
> 
> 
>6
> 
> 
> 




RE: Solr indexing

2007-07-03 Thread Xuesong Luo
2) yes. 

-Original Message-
From: niraj tulachan [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 03, 2007 3:09 PM
To: solr-user@lucene.apache.org
Subject: Solr indexing

Hi all,
 I have successfully implemented the Solr so far but there are
couple of questions I want the solr user to shine a light on them:
  1) In Solr, we create an index by POSTing a XML file to the server.
However, is there a way we can do that same process by db(containg
metadat) approach?
  2) while updating the pre-exist index, the update won't happen until
we do the "commit" on it.  However, While updating the index (before
doing 'commit'), can we still search on that index (to use the old
content)?
  Any info will be highly appericated..
  Cheers,
  Niraj

   
-
Need a vacation? Get great deals to amazing places on Yahoo! Travel. 



Not enough space

2007-07-03 Thread Xuesong Luo
I set up solr1.2 to run snapshooter each time after a commit/optimize.
It worked fine for a while, but later I got the error message below
after sending the commit request. It seems jboss(4.0.GA) had problem
running snapshooter. The index size is 290m, the file system that solr
data directory is on has 2g free space. The swap space(/tmp) has 420m
free space. Both seem have enough space. I also tried running
snapshooter from command line and is able to create the snapshot without
causing any problem.

 

 

2007-07-03 14:49:20,923 ERROR [STDERR] Jul 3, 2007 2:49:20 PM
org.apache.solr.core.SolrException log

SEVERE: java.io.IOException: Not enough space

at java.lang.UNIXProcess.forkAndExec(Native Method)

at java.lang.UNIXProcess.(UNIXProcess.java:53)

at java.lang.ProcessImpl.start(ProcessImpl.java:65)

at java.lang.ProcessBuilder.start(ProcessBuilder.java:451)

at java.lang.Runtime.exec(Runtime.java:591)

at
org.apache.solr.core.RunExecutableListener.exec(RunExecutableListener

.java:70)

at
org.apache.solr.core.RunExecutableListener.postCommit(RunExecutableLi

stener.java:97)

at
org.apache.solr.update.UpdateHandler.callPostCommitCallbacks(UpdateHa

ndler.java:99)

at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandle

r2.java:514)

at
org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateReque

stHandler.java:214) 

at
org.apache.solr.handler.XmlUpdateRequestHandler.doLegacyUpdate(XmlUpd

ateRequestHandler.java:355)

at
org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.ja

va:58)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:810)



webapp_name in commit and optimize

2007-07-06 Thread Xuesong Luo
Hi, 
I deployed solr web app with a different name then found commit does not
work. When I looked at the code, I saw variable webapp_name is populated
but not used. It always uses solr as the web app name. optimize has the
same problem. Is this a known bug? 

Wrong:
rs=`curl http://${solr_hostname}:${solr_port}/solr/update -s -d
""`

Correct:
rs=`curl http://${solr_hostname}:${solr_port}/${webapp_name}/update -s
-d ""`


Thanks
Xuesong




RE: webapp_name in commit and optimize

2007-07-06 Thread Xuesong Luo
I configured the webapp name in scripts.conf, just found our bin
directory has not been updated to 1.2. That's the problem.

Thanks
Xuesong

-Original Message-
From: Tobin Cataldo [mailto:[EMAIL PROTECTED] 
Sent: Friday, July 06, 2007 10:44 AM
To: solr-user@lucene.apache.org
Subject: Re: webapp_name in commit and optimize

It defaults to solr, are you specifying your webapp name when you invoke

the program?

usage: $prog [-h hostname] [-p port] [-u username] [-U url] [-v] [-V]
   ...
   -w  specify name of Solr webapp (defaults to solr)


Xuesong Luo wrote:
> I'm using 1.2, yes, it adds a webapps_name option, but it's never used
> in the commit/optimizer.
>
> -Original Message-
> From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
> Sent: Friday, July 06, 2007 10:32 AM
> To: solr-user@lucene.apache.org
> Subject: Re: webapp_name in commit and optimize
>
>
> : but not used. It always uses solr as the web app name. optimize has
> the
> : same problem. Is this a known bug?
>
> which version of Solr are you using?  1.2 added a new webapps_name
> option.
>
>
>
> -Hoss
>
>
>
>   




RE: webapp_name in commit and optimize

2007-07-06 Thread Xuesong Luo
Hmmm, it's different than the one I got, we may just installed the
search.war from 1.2 but forgot to update the bin. 

Thanks
Xuesong

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Friday, July 06, 2007 10:44 AM
To: solr-user@lucene.apache.org
Subject: RE: webapp_name in commit and optimize


: I'm using 1.2, yes, it adds a webapps_name option, but it's never used
: in the commit/optimizer.

uh are you sure you are using 1.2?

http://svn.apache.org/viewvc/lucene/solr/tags/release-1.2.0/src/scripts/
commit?view=markup
http://svn.apache.org/viewvc/lucene/solr/tags/release-1.2.0/src/scripts/
optimize?view=markup


-Hoss




RE: webapp_name in commit and optimize

2007-07-06 Thread Xuesong Luo
I'm using 1.2, yes, it adds a webapps_name option, but it's never used
in the commit/optimizer.

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Friday, July 06, 2007 10:32 AM
To: solr-user@lucene.apache.org
Subject: Re: webapp_name in commit and optimize


: but not used. It always uses solr as the web app name. optimize has
the
: same problem. Is this a known bug?

which version of Solr are you using?  1.2 added a new webapps_name
option.



-Hoss




RE: distribution scripts on Solaris

2007-07-09 Thread Xuesong Luo


-Original Message-
From: Bill Au [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 09, 2007 8:40 AM
To: solr-user@lucene.apache.org
Subject: distribution scripts on Solaris

I am working on bug SOLR-282:

https://issues.apache.org/jira/browse/SOLR-282

and noticed that the code in the scripts to measure elapsed time also
does
not work on Solaris as the date command there does not support the "%s"
format.

Anyone know of a good way to measure the elapsed time on Solaris?  If
not, I
am thinking to skip that for Solaris as this feature is not really
required
for things to work.

Bill


RE: distribution scripts on Solaris

2007-07-09 Thread Xuesong Luo
You can use perl to get the sec, an example is:
rsyncEndSec=`perl -e "print time;"`



-Original Message-
From: Bill Au [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 09, 2007 8:40 AM
To: solr-user@lucene.apache.org
Subject: distribution scripts on Solaris

I am working on bug SOLR-282:

https://issues.apache.org/jira/browse/SOLR-282

and noticed that the code in the scripts to measure elapsed time also
does
not work on Solaris as the date command there does not support the "%s"
format.

Anyone know of a good way to measure the elapsed time on Solaris?  If
not, I
am thinking to skip that for Solaris as this feature is not really
required
for things to work.

Bill


RE: wildcard searches standard request handler

2007-07-10 Thread Xuesong Luo
That's also what I did in my code, I search for * or ?, if exists,
lowercase the query string.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Tuesday, July 10, 2007 12:13 PM
To: solr-user@lucene.apache.org
Subject: Re: wildcard searches standard request handler

On 7/10/07, Karen Loughran <[EMAIL PROTECTED]> wrote:
> Hi Yonik, whene* does indeed work thanks.  Though the Context diff
patch fails
> against my 1.2 download:

For now, I'd advise just lowercasing wildcard queries in the client if
you know that is how your field is indexed.

-Yonik


multiple slaves on the same box

2007-07-17 Thread Xuesong Luo
Hi, there,
We have one master server and multiple slave servers. The multiple slave
servers can be run either on the same box or different boxes.  For
slaves on the same box, is there any best practice that they should use
the same index or each should have separate indexes? 

Thanks
Xuesong


RE: multiple slaves on the same box

2007-07-17 Thread Xuesong Luo
Thanks for sharing your experience. We have multiple slaves for load
balance (performance reason) and failover(in case one server dies or
hung). 


Thanks
Xuesong

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 17, 2007 3:00 PM
To: solr-user@lucene.apache.org
Subject: Re: multiple slaves on the same box

Xuesong Luo wrote:
> Hi, there,
> We have one master server and multiple slave servers. The multiple
slave
> servers can be run either on the same box or different boxes.  For
> slaves on the same box, is there any best practice that they should
use
> the same index or each should have separate indexes? 
> 

I'm not sure about 'best' practices, but I can tell you my experience...

We have a master and single slave on the same server using the same 
index.  Since it is the same index, there really is no 'distribution' 
scripts, only something that periodically calls 'commit' on the slave 
index.  This is working great.

I can't think of any reason to have more then one slave server on the 
same machine.  What are you trying to do?

ryan


RE: Updating index on cluster

2007-07-18 Thread Xuesong Luo
On your slave, you can run snappuller to get the latest snapshot from
master(generated by snapshooter), then run snapinstaller to notify solr
to use the updated index.

-Original Message-
From: Matt Mitchell [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 18, 2007 12:12 PM
To: solr-user@lucene.apache.org
Subject: Updating index on cluster

Hi,

I'm currently working on an application which is living in a  
clustered server environment. There is a hardware based balancer, and  
each node in the cluster has  a separate install of Solr. The  
application code and files are on a NFS mount, along with the "solr/ 
home". The first node has been acting as the master.

My question is about reindexing, and even schema updates in some  
circumstances.

For a reindex, I post to Solr on the master node and then restart the  
remaining nodes.
Is there a better way to do this?

For a schema update, I stop the master, delete the data/index dir,  
start solr and then post to Solr on the master node. Then I restart  
the remaining nodes.
Is there a better way to do this?

Any tips, feedback or what have are much appreciated!

Matt


always fail to update the first time after I restart the server

2007-08-09 Thread Xuesong Luo
Hi, 
I noticed the first index update after I restart my jboss server always
fail with the exception below. Any update after that works fine. Does
anyone know what the problem is? The solr version I'm using is solr1.2

Thanks
Xuesong


2007-08-09 11:41:44,559 ERROR [STDERR] Aug 9, 2007 11:41:44 AM
org.apache.solr.core.SolrException log
SEVERE: java.io.IOException: Underlying input stream returned zero bytes
at
sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:415)
at
sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2972)
at org.xmlpull.mxp1.MXParser.more(MXParser.java:3026)
at org.xmlpull.mxp1.MXParser.parseProlog(MXParser.java:1410)
at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1395)
at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)
at org.xmlpull.mxp1.MXParser.nextTag(MXParser.java:1078)
at
org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestH
andler.java:111)
at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpd
ateRequestHandler.java:84)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:77)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
va:191)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
ava:159)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica
tionFilterChain.java:202)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt
erChain.java:173)
at
org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilte
r.java:96)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica
tionFilterChain.java:202)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt
erChain.java:173)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv
e.java:213)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv
e.java:178)
at
org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAs
sociationValve.java:175)
at
org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.j
ava:74)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java
:126)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java
:105)
at
org.jboss.web.tomcat.tc5.jca.CachedConnectionValve.invoke(CachedConnecti
onValve.java:156)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.
java:107)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:1
48)
at
org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:199)
at
org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:282)
at
org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:767)
at
org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:
697)
at
org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.
java:889)
at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool
.java:684)
at java.lang.Thread.run(Thread.java:595)
2007-08-09 11:41:44,590 ERROR [STDERR] Aug 9, 2007 11:41:44 AM
org.apache.solr.core.SolrCore execute
INFO: /update/  0 78
2007-08-09 11:41:44,590 ERROR [STDERR] Aug 9, 2007 11:41:44 AM
org.apache.solr.core.SolrException log
SEVERE: java.io.IOException: Underlying input stream returned zero bytes
at
sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:415)
at
sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2972)
at org.xmlpull.mxp1.MXParser.more(MXParser.java:3026)
at org.xmlpull.mxp1.MXParser.parseProlog(MXParser.java:1410)
at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1395)
at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)
at org.xmlpull.mxp1.MXParser.nextTag(MXParser.java:1078)
at
org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestH
andler.java:111)
at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpd
ateRequestHandler.java:84)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:77)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
at
org.apache.solr.servlet.SolrDispatc

RE: dataset parameters suitable for lucene application

2007-09-26 Thread Xuesong Luo
My experience so far:
200k number of indexes were created in 90 mins(including db time), index
size is 200m, query a key word on all string fields(30) takes 0.3-1 sec,
query a key word on one field takes tens of mill seconds.



-Original Message-
From: Charlie Jackson [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 26, 2007 8:53 AM
To: solr-user@lucene.apache.org
Subject: RE: dataset parameters suitable for lucene application

My experiences so far with this level of data have been good.

Number of records: Maxed out at 8.8 million
Database size: friggin huge (100+ GB)
Index size: ~24 GB

1) It took me about a day to index 8 million docs using a non-optimized
program I wrote. It's non-optimized in the sense that it's not
multi-threaded. It batched together groups of about 5,000 docs at a time
to be indexed.

2) Search times for a basic search are almost always sub-second. If we
toss in some faceting, it takes a little longer, but I've hardly ever
seen it go above 1-2 seconds even with the most advanced queries. 

Hope that helps.


Charlie



-Original Message-
From: Law, John [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 26, 2007 9:28 AM
To: solr-user@lucene.apache.org
Subject: dataset parameters suitable for lucene application

I am new to the list and new to lucene and solr. I am considering Lucene
for a potential new application and need to know how well it scales. 

Following are the parameters of the dataset.

Number of records: 7+ million
Database size: 13.3 GB
Index Size:  10.9 GB 

My questions are simply:

1) Approximately how long would it take Lucene to index these documents?
2) What would the approximate retrieval time be (i.e. search response
time)?

Can someone provide me with some informed guidance in this regard?

Thanks in advance,
John

__
John Law
Director, Platform Management
ProQuest
789 Eisenhower Parkway
Ann Arbor, MI 48106
734-997-4877
[EMAIL PROTECTED]
www.proquest.com
www.csa.com

ProQuest... Start here.




one query or multiple queries

2007-09-27 Thread Xuesong Luo
Hi, 
I have a user index(each user has a unique index record) and need to get
information for 10 users. Should I run 10 queries or 1 query with
multiple user ids? Any performance difference?

Thanks
Xuesong



one query or multiple queries

2007-09-28 Thread Xuesong Luo
Hi, there,
I have a user index(each user has a unique index record). If I want to
search 10 users, should I run 10 queries or 1 query with multiple user
ids? Is there any performance difference?
 
Thanks
Xuesong

 



facet and field collapse

2007-10-04 Thread Xuesong Luo
Hi, there,

Our index stores employee working history information. For each
employee, there could be multiple index records. The requirement is:

1.  The search result should be sorted on score.
2.  Each employee should only appear once regardless how many match
are found.
3.  The result should support pagination.

 

For example, if the original search result is:

 

doc=1, id=A, score=100
doc=2, id=A, score=90
doc=3, id=B, score=80
doc=4, id=A, score=70
doc=5, id=B, score=69
doc=6, id=B, score=68
doc=7, id=C, score=60
doc=8, id=C, score=59
doc=9, id=D, score=59
doc=10, id=E, score=58
...
doc=206, id=B, score=40
.
 
We want the final result to be:
 
Doc=1, id=A
Doc=3, id=B
Doc=7, id=C
Doc=9, id=D
Doc=10, id=E
 
If the user wants to see record 2-4, he/she should get
 
Doc=3, id=B
Doc=7, id=C
Doc=9, id=D
 
 
I tried both facet and field collapse, it seems neither satisfies our
requirement.
The problem of facet is it can only sort either on the number of counts
or the alphabetical order. If sort on counts, B will be returned before
A. 
The problem of using Field collapse is its pagination is based on doc,
not on group. In my pagination example, Doc 2-4 will be returned instead
of 3, 7, 9. 
 
Does anyone have similar experience before? Any suggestions? 
 
Thanks
Xuesong
 

 



range query failed if highlight is used

2008-02-25 Thread Xuesong Luo
Hi, 
I'm using solr1.3 nightly build. I defined a sint field bookCount. When
I query on this field, it works fine without highlight. However if I
turn on highlight(hl=true&hl.fl=bookCount), it failed due to the error
below. Does anyone know if this is a bug? If I change the type to
integer, the highlight work, but I need to do range query on this field,
so it has to be defined as sint.
 
Thanks
Xuesong
 
 
2008-02-25 16:54:53,524 ERROR [STDERR] Feb 25, 2008 4:54:53 PM
org.apache.solr.core.SolrCore execute
INFO: [xluo] /select/
rows=10&start=0&hl.fl=bookCount&indent=on&q=bookCount:5&hl=true&version=
2.2 0 0

2008-02-25 16:54:53,524 ERROR [STDERR] Feb 25, 2008 4:54:53 PM
org.apache.solr.common.SolrException log
SEVERE: java.lang.NumberFormatException: For input string: "   "
 at
java.lang.NumberFormatException.forInputString(NumberFormatException.jav
a:48)
 at java.lang.Long.parseLong(Long.java:403)
 at java.lang.Long.parseLong(Long.java:461)
 at
org.apache.solr.util.NumberUtils.long2sortableStr(NumberUtils.java:52)
 at
org.apache.solr.schema.SortableLongField.toInternal(SortableLongField.ja
va:49)
 at
org.apache.solr.schema.FieldType$DefaultAnalyzer$1.next(FieldType.java:3
15)
 at
org.apache.solr.highlight.TokenOrderingFilter.next(SolrHighlighter.java:
439)
 at
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(High
lighter.java:226)

 


 

 


RE: range query failed if highlight is used

2008-02-26 Thread Xuesong Luo
Thanks Hoss, I created https://issues.apache.org/jira/browse/SOLR-491 to
check this bug

The reason I need to highlight the numeric or data field is I have to
loop through the search result to apply role permission check on those
fields. If the searcher doesn't have permission to see the numeric/date
field of the user in the search result list, that field should be set to
null when returned. If the search doesn't have permission on all
matching fields, then the whole record should not be returned. How can I
find out which fields are the matching fields if the searcher is
searching on multiple fields? The only easy way I can think about is if
the field is highlighted, it's a matching field. 
  
Does it make sense?

Thanks
Xuesong

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, February 26, 2008 6:06 PM
To: solr-user@lucene.apache.org
Subject: Re: range query failed if highlight is used


: I'm using solr1.3 nightly build. I defined a sint field bookCount.
When
: I query on this field, it works fine without highlight. However if I
: turn on highlight(hl=true&hl.fl=bookCount), it failed due to the error

I'm not sure if i really understand what it would mean to highlight a
numeric field,  hilighting a range query probably won't ever work
because of the way range queries are implemented in Solr ... but at the
very least there should be a better error message in this case.  (and
the case of a simple single value numeric lookup should probably work)

could you please file a bug for this?

:
rows=10&start=0&hl.fl=bookCount&indent=on&q=bookCount:5&hl=true&version=
: 2.2 0 0
: 
: 2008-02-25 16:54:53,524 ERROR [STDERR] Feb 25, 2008 4:54:53 PM
: org.apache.solr.common.SolrException log
: SEVERE: java.lang.NumberFormatException: For input string: "   "



-Hoss