Add 2 stemmers to a textfield?

2007-07-09 Thread Thierry Collogne

Hello,

Our index contains 2 languages : dutch and french. I was wondering if it is
possible to add 2 solr.SnowballPorterFilterFactory filters to one text field
like this :


 
   
   
   
   
   
   
   
   
   
 
 
   
   
   
   
   
   
   
   
   
   
 



Also can someone explain to me, why sometimes a filter is used at index time
and sometimes at query time. It is not entirely clear to me what the
difference is.

Thank you,

Thierry


Re: Facet prefix question

2007-07-09 Thread Chris Hostetter

: Hey there does anyone know how to perform multiple facet prefixes on the same
: field.  I am using the f..facet.prefix syntax and cannot get an
: "OR"ing of faceted results.

that's really not possible ... facet.prefix isn't parsed as a query, it's
a raw term prefix.

can you elaborate on what it is your are trying to achieve?  There might
be a better way to go about it that we can help you with.




-Hoss



Re: distribution scripts on Solaris

2007-07-09 Thread Bill Au

Thanks.  It looks like perl is a standard part of Solaris since Solaris 8.
I will use that on Solaris in stead.

I have filed a bug on this:

https://issues.apache.org/jira/browse/SOLR-294

Bill

On 7/9/07, Xuesong Luo <[EMAIL PROTECTED]> wrote:

You can use perl to get the sec, an example is:
rsyncEndSec=`perl -e "print time;"`



-Original Message-
From: Bill Au [mailto:[EMAIL PROTECTED]
Sent: Monday, July 09, 2007 8:40 AM
To: solr-user@lucene.apache.org
Subject: distribution scripts on Solaris

I am working on bug SOLR-282:

https://issues.apache.org/jira/browse/SOLR-282

and noticed that the code in the scripts to measure elapsed time also
does
not work on Solaris as the date command there does not support the "%s"
format.

Anyone know of a good way to measure the elapsed time on Solaris?  If
not, I
am thinking to skip that for Solaris as this feature is not really
required
for things to work.

Bill



Re: Processor load

2007-07-09 Thread Michael Thessel
On Wed, 2007-04-07 at 10:37 -0400, Yonik Seeley wrote:
> On 7/3/07, Michael Thessel <[EMAIL PROTECTED]> wrote:
> > --
> >  208973 SEVERE: Error during auto-warming of
> > key:[EMAIL PROTECTED]:java
> >  208974 at
> > org.apache.lucene.search.ConjunctionScorer.init(ConjunctionScorer.java:97)
> >  208975 at
> 
> The Exception and exception message seem to be missing from the stack trace.
> Could you check your log and see if they are all like this?
> 
> -Yonik

The problem was the autocommit every 10 s. I increased the autocommit
time to 60s and now the process runs only at 100% for 30-40s per minute.

Thanks a lot,

Michael

-- 
Michael Thessel <[EMAIL PROTECTED]>
Gossamer Threads Inc. http://www.gossamer-threads.com/
Tel: (604) 687-5804 Fax: (604) 687-5806



RE: distribution scripts on Solaris

2007-07-09 Thread Xuesong Luo
You can use perl to get the sec, an example is:
rsyncEndSec=`perl -e "print time;"`



-Original Message-
From: Bill Au [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 09, 2007 8:40 AM
To: solr-user@lucene.apache.org
Subject: distribution scripts on Solaris

I am working on bug SOLR-282:

https://issues.apache.org/jira/browse/SOLR-282

and noticed that the code in the scripts to measure elapsed time also
does
not work on Solaris as the date command there does not support the "%s"
format.

Anyone know of a good way to measure the elapsed time on Solaris?  If
not, I
am thinking to skip that for Solaris as this feature is not really
required
for things to work.

Bill


RE: distribution scripts on Solaris

2007-07-09 Thread Xuesong Luo


-Original Message-
From: Bill Au [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 09, 2007 8:40 AM
To: solr-user@lucene.apache.org
Subject: distribution scripts on Solaris

I am working on bug SOLR-282:

https://issues.apache.org/jira/browse/SOLR-282

and noticed that the code in the scripts to measure elapsed time also
does
not work on Solaris as the date command there does not support the "%s"
format.

Anyone know of a good way to measure the elapsed time on Solaris?  If
not, I
am thinking to skip that for Solaris as this feature is not really
required
for things to work.

Bill


Re: useCompoundFile, mergeFactor & index replication

2007-07-09 Thread Otis Gospodnetic
Using the compound index format will, I imagine, result in fewer and larger 
files being sent over the wire during replication.
I would stay with the multi-file index format and increase the max number of 
open files via ulimit or sysctl (Linux, at least).

Otis
--
Lucene Consulting -- http://lucene-consulting.com/



- Original Message 
From: James Jory <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, July 10, 2007 1:06:26 AM
Subject: useCompoundFile, mergeFactor & index replication

I am working with a pre-production Solr installation using a single 
master and 4 search slaves. The index will eventually contain ~20M 
documents. There are about 25 fields in the index but only a few 
"integer" and small "string" fields actually have their data stored in 
the index. The index is being updated constantly so snapshot replication 
will be frequent. The plan is to optimize once per day but that can be 
changed.

During our testing we are running into the "too many open files" 
exception. We are going to change the ulimit on all the servers but I 
was curious if anyone is enabling the compound file format and/or 
lowering the merge factor to keep the file count down and, if so, what 
impact those changes are having on replication. So far in our testing we 
haven't noticed any significant differences in replication 
performance/behavior but we haven't grown the index much yet.

Does anyone have any experience or advice on using the compound file 
format with index replication?

Thanks,
James






useCompoundFile, mergeFactor & index replication

2007-07-09 Thread James Jory
I am working with a pre-production Solr installation using a single 
master and 4 search slaves. The index will eventually contain ~20M 
documents. There are about 25 fields in the index but only a few 
"integer" and small "string" fields actually have their data stored in 
the index. The index is being updated constantly so snapshot replication 
will be frequent. The plan is to optimize once per day but that can be 
changed.


During our testing we are running into the "too many open files" 
exception. We are going to change the ulimit on all the servers but I 
was curious if anyone is enabling the compound file format and/or 
lowering the merge factor to keep the file count down and, if so, what 
impact those changes are having on replication. So far in our testing we 
haven't noticed any significant differences in replication 
performance/behavior but we haven't grown the index much yet.


Does anyone have any experience or advice on using the compound file 
format with index replication?


Thanks,
James



Facet prefix question

2007-07-09 Thread Robert Purdy

Hey there does anyone know how to perform multiple facet prefixes on the same
field.  I am using the f..facet.prefix syntax and cannot get an
"OR"ing of faceted results.

Thanks Robert.
-- 
View this message in context: 
http://www.nabble.com/Facet-prefix-question-tf4052754.html#a11511601
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Problem with Russian stemmer in Solr 1.2

2007-07-09 Thread Andrew Stromnov

Hi, Daniel

Stemmer in RussianAnalyser works as expected. But this analyser doesn't
allow any Solr customization. All stopwords are hardcoded, no support for
custom tokenizer, no synonym support.

RussianAnalyser is similar to this scheme:
  standard tokenizer
  standard filter factory
  word delimeter filter factory 
  lowercase filter factory
  stop filter factory (with hardcoded stopwords)
  russian stem filter
 

Regards,
Andrew


Daniel Alheiros wrote:
> 
> Hi Andrew
> 
> In fact I did it creating all the Factories for Solr, but I think you can
> use it directly, changing your index like this:
> 
>  positionIncrementGap="100">
>  class=”org.apache.lucene.analysis.ru.RussianAnalyzer”>
> 
>  class=”org.apache.lucene.analysis.ru.RussianAnalyzer”>
> 
> 
> 
> I’ve not tested that, but I saw something like this.
> 
> Please tell me if it works as expected and if it solves your problem (I’m
> indexing Russian content and as you seem to be knowledgeable of Russian
> language your comments are very useful).
> 
> Regards,
> Daniel
> 

-- 
View this message in context: 
http://www.nabble.com/Problem-with-Russian-stemmer-in-Solr-1.2-tf4049948.html#a11507263
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Spell Check Handler

2007-07-09 Thread Charles Hornberger

For what it's worth, I recently did a quick implementation of the
spellchecker feature, and I simply created another field in my schema
(Iike 'spell' in Tristan's example below). After feeding content into
my search index, I used the spell field into add one single-field
document for every distinct word in my document collection (I'm
assuming the content folks have run spell-checkers :-)). E.g.:

aardvark
abacus
abbot
acacia
etc.

I also added some extra documents for proper names that appear in my
documents. For instance, there are a couple fields that have
comma-separated list of names, so I for each of those -- in addition
to documents for "john", "doe", and "jane", which were generated by
the naive word-splitting done in the first pass -- I added documents
like so:

john doe
jane doe
etc.

You could do the same for other searchable multi-word tokens in your
input -- song/album/book/movie titles, publisher names, geographic
names (cities, neighborhoods, etc.), product names, and so on.

-Charlie

On 7/9/07, Tristan Vittorio <[EMAIL PROTECTED]> wrote:

I think there is some confusion regarding how the spell checker actually
uses the termSourceField.  It is suggested that you use a simple field type
such a "string", however since this field type does not tokenize or split
words, it is only useful in situations where the whole field is considered a
dictionary "word":



Accountant
Auditor
Solicitor


The follow example case will not work with spell checker since the whole
field is considered a single word or string:



Accountant reveals that Accounting is boring


I might suggest that you create an additional field in your schema that
takes advantage of the StandardTokenizer and StandardFilter which doesn't
perform a great deal of processing on the field yet should provide decent
results when used with the spell checker:


  




  
  





  


If you want this field to be automatically populated with the contents of
the title field when a document is added to the index, simply use a
copyField:



Hope this helps, let me know if this is still not clear, I probably will add
it to the wiki page soon.

cheers,
Tristan



On 7/9/07, climbingrose <[EMAIL PROTECTED]> wrote:
>
> Thanks for the quick reply. However, I'm still not able to setup
> spellchecker. Solr does create spell directory under data but doesn't seem
> to build the spellchecker index. Here are snippets of my schema.xml:
>
> 
>
>  startup="lazy">
> 
>  
>1
>0.5
>  
>
>  
>
>  
>  
>  
>  spell
>
>  
>  
>  
>  title
>
>
>
> I tried this url:
>
> http://localhost:8984/solr/select/?q=Accountent&qt=spellchecker&cmd=rebuildand
> receive this:
>
> 
> 
> 0
> 2
> 
> rebuild
> 
> 
>
>
> On 7/9/07, Tristan Vittorio <[EMAIL PROTECTED]> wrote:
> >
> > The spellchecker should be available in 1.2 release, your query is
> > incorrect, try the following:
> >
> >
> >
> 
http://localhost:8984/solr/select/?q=java&qt=spellchecker&termSourceField=title_text&cmd=rebuild
> >
> > the 'q' parameter must only contain the word being checked; you must
> > specify
> > the field separately.  You can set "termSourceField" in your
> > solrconfig.xmlfile so you do not need to explicitly set it each time
> > you want to run a
> > spell check query. Also make sure your field isn't heavily processed (
> i.e.
> > with porter stemmer analyzers) otherwise the suggestions will look a bit
> > weird / mangled.  Take a look at the wiki page for more info:
> >
> > http://wiki.apache.org/solr/SpellCheckerRequestHandler
> >
> > cheers,
> > Tristan
> >
> >
> >
> > On 7/9/07, climbingrose <[EMAIL PROTECTED]> wrote:
> > >
> > > Hi Tristan,
> > >
> > > Is this spellchecker available in 1.2 release or I have to build the
> > > trunk.
> > > I tried your instructions but Solr returns nothing:
> > >
> > >
> > >
> >
> 
http://localhost:8984/solr/select/?q=title_text:java&qt=spellchecker&cmd=rebuild
> > >
> > > Result:
> > >
> > > 
> > > 
> > > 0
> > > 3
> > > 
> > > rebuild
> > > 
> > > 
> > >
> > > Thanks.
> > >
> > >
> > > On 7/8/07, Tristan Vittorio <[EMAIL PROTECTED]> wrote:
> > > >
> > > > Hi Otis,
> > > >
> > > > I have written a draft wiki entry for the spell checker:
> > > > http://wiki.apache.org/solr/SpellCheckerRequestHandler
> > > >
> > > > I've learned that my initial observation about the suggestion
> ordering
> > > was
> > > > incorrect, it does in fact order the results by popularity (or term
> > > > frequency) of the word in the termSourceField, the problem I
> > experienced
> > > > was
> > > > caused by setting termSourceField to a field of type "text", which
> > > heavily
> > > > stemmed and analyzed the words.  I found that using the
> > > StandardTokenizer
> > > > and StandardFilter and removing the PorterStemmer and
> LowerCaseFilter
> > > from
> > > > the field schema really improved the spell

Re: Problems running SOLR 1.2 - documents not being indexed properly

2007-07-09 Thread Chris Hostetter

: After I removed manually it worked correctly and I've restarted a few times
: since the "lost lock" was there... Isn't that lock removal on start-up
: optional?

it is, it's controlled by the...
false
...option in the mainIndex or indexDefaults section.

: >> The main problem to me is that instead of having some failure
: >> logging or
: >> console information about it I just had those misleading information.

are you sure you didn't get an error?

I just tested this out by touching the write.lock file in solr/data/index,
then started the server up, and on any attempt to add a document i got
this in the Solr logs...

SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
timed out: SimpleFSLock@/home/chrish/svn/solr/example/solr/data/index/write.lock

...and i got a status 500 HTTP response code to my client.



-Hoss



Re: Problem with Russian stemmer in Solr 1.2

2007-07-09 Thread Daniel Alheiros
Hi Andrew

In fact I did it creating all the Factories for Solr, but I think you can
use it directly, changing your index like this:








I’ve not tested that, but I saw something like this.

Please tell me if it works as expected and if it solves your problem (I’m
indexing Russian content and as you seem to be knowledgeable of Russian
language your comments are very useful).

Regards,
Daniel

On 9/7/07 18:00, "Andrew Stromnov" <[EMAIL PROTECTED]> wrote:

> 
> Hi Daniel,
> 
> Yes, I want to try RussianAnalyzer. How to enable it in Solr config?
> 
> Thank you.
> 
> 
> Daniel Alheiros wrote:
>> 
>> Hi Andrew.
>> 
>> I'm using the RussianAnalyzer (part of the Lucene analyzers) and it
>> reduces
>> списки to списк.
>> 
>> Do you want to try this other Analyzer?
>> 
>> Regards,
>> Daniel
>> 
>> 
>> On 9/7/07 16:06, "Andrew Stromnov" <[EMAIL PROTECTED]> wrote:
>> 
>>> списки arrondissement turvallisuuden
>> 
>> 
>> http://www.bbc.co.uk/
>> This e-mail (and any attachments) is confidential and may contain personal
>> views which are not the views of the BBC unless specifically stated.
>> If you have received it in error, please delete it from your system.
>> Do not use, copy or disclose the information in any way nor act in
>> reliance on it and notify the sender immediately.
>> Please note that the BBC monitors e-mails sent or received.
>> Further communication will signify your consent to this.
>> 
>> 
>> 


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.



Re: Problem with Russian stemmer in Solr 1.2

2007-07-09 Thread Andrew Stromnov

Hi Daniel,

Yes, I want to try RussianAnalyzer. How to enable it in Solr config?

Thank you.


Daniel Alheiros wrote:
> 
> Hi Andrew.
> 
> I'm using the RussianAnalyzer (part of the Lucene analyzers) and it
> reduces
> списки to списк.
> 
> Do you want to try this other Analyzer?
> 
> Regards,
> Daniel
> 
> 
> On 9/7/07 16:06, "Andrew Stromnov" <[EMAIL PROTECTED]> wrote:
> 
>> списки arrondissement turvallisuuden
> 
> 
> http://www.bbc.co.uk/
> This e-mail (and any attachments) is confidential and may contain personal
> views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in
> reliance on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
>   
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Problem-with-Russian-stemmer-in-Solr-1.2-tf4049948.html#a11505646
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Problem with Russian stemmer in Solr 1.2

2007-07-09 Thread Daniel Alheiros
Hi Andrew.

I'm using the RussianAnalyzer (part of the Lucene analyzers) and it reduces
списки to списк.

Do you want to try this other Analyzer?

Regards,
Daniel


On 9/7/07 16:06, "Andrew Stromnov" <[EMAIL PROTECTED]> wrote:

> списки arrondissement turvallisuuden


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.



distribution scripts on Solaris

2007-07-09 Thread Bill Au

I am working on bug SOLR-282:

https://issues.apache.org/jira/browse/SOLR-282

and noticed that the code in the scripts to measure elapsed time also does
not work on Solaris as the date command there does not support the "%s"
format.

Anyone know of a good way to measure the elapsed time on Solaris?  If not, I
am thinking to skip that for Solaris as this feature is not really required
for things to work.

Bill


Problem with Russian stemmer in Solr 1.2

2007-07-09 Thread Andrew Stromnov

Tried on JDK1.6p2 on MS Vista and CentOS.

query analyser config:
...



...


Query: списки   arrondissement  turvallisuuden

Analysis.jsp result:
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position   1   2   3
term text   списки  arrondissement  turvallisuuden

org.apache.solr.analysis.SnowballPorterFilterFactory {language=French}
term position   1   2   3
term text   списки  arrond  turvallisuuden

org.apache.solr.analysis.SnowballPorterFilterFactory {language=Russian}
term position   1   2   3
term text   списки  arrond  turvallisuuden

org.apache.solr.analysis.SnowballPorterFilterFactory {language=Finnish}
term position   1   2   3
term text   списки  arrond  turvallisuud


All stemmers, except Russian, works. Standalone snowball stemmer works
perfect.
Stemmed form of "списки" must be "списк".
-- 
View this message in context: 
http://www.nabble.com/Problem-with-Russian-stemmer-in-Solr-1.2-tf4049948.html#a11503583
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Problems running SOLR 1.2 - documents not being indexed properly

2007-07-09 Thread Daniel Alheiros
Hi Ard

After I removed manually it worked correctly and I've restarted a few times
since the "lost lock" was there... Isn't that lock removal on start-up
optional?

Regards,
Daniel


On 9/7/07 13:50, "Ard Schrijvers" <[EMAIL PROTECTED]> wrote:

> Hello Daniel,
> 
> it sounds strange to me because in SolrCore you can find in initIndex() that
> locks are removed at initialisation
> 
> Regards Ard
> 
>> 
>> Hi
>> 
>> I'm developing a search application using SOLR/Lucene and I
>> think I found a
>> bug.
>> 
>> I was trying to index more documents and the total document
>> number wasn't
>> changing, but for each document batch I was sending to update
>> the index, the
>> numbers shown by the console in the update handler section
>> was changing by 1
>> (not by the number of documents sent) and the autocommit
>> wasn't committing
>> it... So I realised that a write.lock was in place since a
>> JVM crash that
>> happened a couple of days ago.
>> 
>> The main problem to me is that instead of having some failure
>> logging or
>> console information about it I just had those misleading information.
>> 
>> Regards,
>> Daniel  
>> 
>> 
>> http://www.bbc.co.uk/
>> This e-mail (and any attachments) is confidential and may
>> contain personal views which are not the views of the BBC
>> unless specifically stated.
>> If you have received it in error, please delete it from your system.
>> Do not use, copy or disclose the information in any way nor
>> act in reliance on it and notify the sender immediately.
>> Please note that the BBC monitors e-mails sent or received.
>> Further communication will signify your consent to this.
>> 
>> 


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.



RE: Problems running SOLR 1.2 - documents not being indexed properly

2007-07-09 Thread Ard Schrijvers
Hello Daniel,

it sounds strange to me because in SolrCore you can find in initIndex() that 
locks are removed at initialisation

Regards Ard

> 
> Hi
> 
> I'm developing a search application using SOLR/Lucene and I 
> think I found a
> bug.
> 
> I was trying to index more documents and the total document 
> number wasn't
> changing, but for each document batch I was sending to update 
> the index, the
> numbers shown by the console in the update handler section 
> was changing by 1
> (not by the number of documents sent) and the autocommit 
> wasn't committing
> it... So I realised that a write.lock was in place since a 
> JVM crash that
> happened a couple of days ago.
> 
> The main problem to me is that instead of having some failure 
> logging or
> console information about it I just had those misleading information.
> 
> Regards,
> Daniel  
> 
> 
> http://www.bbc.co.uk/
> This e-mail (and any attachments) is confidential and may 
> contain personal views which are not the views of the BBC 
> unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor 
> act in reliance on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
>   
> 


Problems running SOLR 1.2 - documents not being indexed properly

2007-07-09 Thread Daniel Alheiros
Hi

I'm developing a search application using SOLR/Lucene and I think I found a
bug.

I was trying to index more documents and the total document number wasn't
changing, but for each document batch I was sending to update the index, the
numbers shown by the console in the update handler section was changing by 1
(not by the number of documents sent) and the autocommit wasn't committing
it... So I realised that a write.lock was in place since a JVM crash that
happened a couple of days ago.

The main problem to me is that instead of having some failure logging or
console information about it I just had those misleading information.

Regards,
Daniel  


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.



Re: Spell Check Handler

2007-07-09 Thread Tristan Vittorio

I think there is some confusion regarding how the spell checker actually
uses the termSourceField.  It is suggested that you use a simple field type
such a "string", however since this field type does not tokenize or split
words, it is only useful in situations where the whole field is considered a
dictionary "word":



Accountant
Auditor
Solicitor


The follow example case will not work with spell checker since the whole
field is considered a single word or string:



Accountant reveals that Accounting is boring


I might suggest that you create an additional field in your schema that
takes advantage of the StandardTokenizer and StandardFilter which doesn't
perform a great deal of processing on the field yet should provide decent
results when used with the spell checker:


 
   
   
   
   
 
 
   
   
   
   
   
 


If you want this field to be automatically populated with the contents of
the title field when a document is added to the index, simply use a
copyField:



Hope this helps, let me know if this is still not clear, I probably will add
it to the wiki page soon.

cheers,
Tristan



On 7/9/07, climbingrose <[EMAIL PROTECTED]> wrote:


Thanks for the quick reply. However, I'm still not able to setup
spellchecker. Solr does create spell directory under data but doesn't seem
to build the spellchecker index. Here are snippets of my schema.xml:





 
   1
   0.5
 

 

 
 
 
 spell

 
 
 
 title

   

I tried this url:

http://localhost:8984/solr/select/?q=Accountent&qt=spellchecker&cmd=rebuildand
receive this:



0
2

rebuild




On 7/9/07, Tristan Vittorio <[EMAIL PROTECTED]> wrote:
>
> The spellchecker should be available in 1.2 release, your query is
> incorrect, try the following:
>
>
>
http://localhost:8984/solr/select/?q=java&qt=spellchecker&termSourceField=title_text&cmd=rebuild
>
> the 'q' parameter must only contain the word being checked; you must
> specify
> the field separately.  You can set "termSourceField" in your
> solrconfig.xmlfile so you do not need to explicitly set it each time
> you want to run a
> spell check query. Also make sure your field isn't heavily processed (
i.e.
> with porter stemmer analyzers) otherwise the suggestions will look a bit
> weird / mangled.  Take a look at the wiki page for more info:
>
> http://wiki.apache.org/solr/SpellCheckerRequestHandler
>
> cheers,
> Tristan
>
>
>
> On 7/9/07, climbingrose <[EMAIL PROTECTED]> wrote:
> >
> > Hi Tristan,
> >
> > Is this spellchecker available in 1.2 release or I have to build the
> > trunk.
> > I tried your instructions but Solr returns nothing:
> >
> >
> >
>
http://localhost:8984/solr/select/?q=title_text:java&qt=spellchecker&cmd=rebuild
> >
> > Result:
> >
> > 
> > 
> > 0
> > 3
> > 
> > rebuild
> > 
> > 
> >
> > Thanks.
> >
> >
> > On 7/8/07, Tristan Vittorio <[EMAIL PROTECTED]> wrote:
> > >
> > > Hi Otis,
> > >
> > > I have written a draft wiki entry for the spell checker:
> > > http://wiki.apache.org/solr/SpellCheckerRequestHandler
> > >
> > > I've learned that my initial observation about the suggestion
ordering
> > was
> > > incorrect, it does in fact order the results by popularity (or term
> > > frequency) of the word in the termSourceField, the problem I
> experienced
> > > was
> > > caused by setting termSourceField to a field of type "text", which
> > heavily
> > > stemmed and analyzed the words.  I found that using the
> > StandardTokenizer
> > > and StandardFilter and removing the PorterStemmer and
LowerCaseFilter
> > from
> > > the field schema really improved the spell checker performance.
> > >
> > > I haven't included this info on the wiki page yet, I'll try to
update
> it
> > > soon when I have a bit more time.
> > >
> > > cheers,
> > > Tristan
> > >
> > >
> > >
> > > On 7/8/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> > > >
> > > > Tristan - good summary - want to copy that to the Solr Wiki?
> > > >
> > > > Thanks,
> > > > Otis
> > > >
> > > > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> > > > Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share
> > > >
> > > > - Original Message 
> > > > From: Tristan Vittorio <[EMAIL PROTECTED]>
> > > > To: solr-user@lucene.apache.org
> > > > Sent: Saturday, July 7, 2007 1:51:15 AM
> > > > Subject: Re: Spell Check Handler
> > > >
> > > > I couldn't find any documention on the spell check handler either
> but
> > > > found
> > > > enough information from the solrconfig.xml file, simply search for
> > > > "SpellCheckerRequestHandler" (online version here):
> > > >
> > > >
> > >
> >
>
http://svn.apache.org/repos/asf/lucene/solr/trunk/example/solr/conf/solrconfig.xml
> > > >
> > > > You can view the original development discussion from JIRA (not
sure
> > how
> > > > helpful that will be for you though):
> > > > https://issues.apache.org/jira/browse/SOLR-81
> > > >
> > > > In a nutshell, the confi

Re: Spell Check Handler

2007-07-09 Thread climbingrose

Thanks for the quick reply. However, I'm still not able to setup
spellchecker. Solr does create spell directory under data but doesn't seem
to build the spellchecker index. Here are snippets of my schema.xml:




   

  1
  0.5







spell




title

  

I tried this url:
http://localhost:8984/solr/select/?q=Accountent&qt=spellchecker&cmd=rebuildand
receive this:



0
2

rebuild




On 7/9/07, Tristan Vittorio <[EMAIL PROTECTED]> wrote:


The spellchecker should be available in 1.2 release, your query is
incorrect, try the following:


http://localhost:8984/solr/select/?q=java&qt=spellchecker&termSourceField=title_text&cmd=rebuild

the 'q' parameter must only contain the word being checked; you must
specify
the field separately.  You can set "termSourceField" in your
solrconfig.xmlfile so you do not need to explicitly set it each time
you want to run a
spell check query. Also make sure your field isn't heavily processed (i.e.
with porter stemmer analyzers) otherwise the suggestions will look a bit
weird / mangled.  Take a look at the wiki page for more info:

http://wiki.apache.org/solr/SpellCheckerRequestHandler

cheers,
Tristan



On 7/9/07, climbingrose <[EMAIL PROTECTED]> wrote:
>
> Hi Tristan,
>
> Is this spellchecker available in 1.2 release or I have to build the
> trunk.
> I tried your instructions but Solr returns nothing:
>
>
>
http://localhost:8984/solr/select/?q=title_text:java&qt=spellchecker&cmd=rebuild
>
> Result:
>
> 
> 
> 0
> 3
> 
> rebuild
> 
> 
>
> Thanks.
>
>
> On 7/8/07, Tristan Vittorio <[EMAIL PROTECTED]> wrote:
> >
> > Hi Otis,
> >
> > I have written a draft wiki entry for the spell checker:
> > http://wiki.apache.org/solr/SpellCheckerRequestHandler
> >
> > I've learned that my initial observation about the suggestion ordering
> was
> > incorrect, it does in fact order the results by popularity (or term
> > frequency) of the word in the termSourceField, the problem I
experienced
> > was
> > caused by setting termSourceField to a field of type "text", which
> heavily
> > stemmed and analyzed the words.  I found that using the
> StandardTokenizer
> > and StandardFilter and removing the PorterStemmer and LowerCaseFilter
> from
> > the field schema really improved the spell checker performance.
> >
> > I haven't included this info on the wiki page yet, I'll try to update
it
> > soon when I have a bit more time.
> >
> > cheers,
> > Tristan
> >
> >
> >
> > On 7/8/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> > >
> > > Tristan - good summary - want to copy that to the Solr Wiki?
> > >
> > > Thanks,
> > > Otis
> > >
> > > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> > > Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share
> > >
> > > - Original Message 
> > > From: Tristan Vittorio <[EMAIL PROTECTED]>
> > > To: solr-user@lucene.apache.org
> > > Sent: Saturday, July 7, 2007 1:51:15 AM
> > > Subject: Re: Spell Check Handler
> > >
> > > I couldn't find any documention on the spell check handler either
but
> > > found
> > > enough information from the solrconfig.xml file, simply search for
> > > "SpellCheckerRequestHandler" (online version here):
> > >
> > >
> >
>
http://svn.apache.org/repos/asf/lucene/solr/trunk/example/solr/conf/solrconfig.xml
> > >
> > > You can view the original development discussion from JIRA (not sure
> how
> > > helpful that will be for you though):
> > > https://issues.apache.org/jira/browse/SOLR-81
> > >
> > > In a nutshell, the configuration parameters available are::
> > >
> > > suggestionCount: determines how many spelling suggestions are
> returned.
> > > accuracy: a float value between 1.0 and 0.0 on how close the
suggested
> > > words
> > > should match the original word being checked.
> > > spellcheckerIndexDir and  termSourceField: check solrconfig.xml for
a
> > full
> > > explanation.
> > >
> > > In order to use the spell checking hander for the first time, you
need
> > to
> > > explicitly build the spelling index with a sample query something
like
> > > this:
> > >
> > >
> >
>
http://localhost:8080/solr/select/?q=macrosoft&qt=spellchecker&cmd=rebuild
> > > 
> > > Depending on how large you main index is, this rebuild operation
could
> > > take
> > > a while.  Subsequent queries can omit '&cmd=rebuild' and will return
> > > results
> > > much faster:
> > >
> > > http://localhost:8080/solr/select/?q=macrosoft&qt=spellchecker
> > > 
> > > The order of the suggestions returned seems to be based on the
> accuracy
> > > figure (i.e. how close it matches the original word). it would be
> great
> > to
> > > be able to sort these suggested results based on term frequency /
> > document
> > > frequency of the suggested word in the main index, since the most
> > accurate
> > > suggestion may not always be the most relevant.
> > >
> > > As far as I can tel

Re: Spell Check Handler

2007-07-09 Thread Tristan Vittorio

The spellchecker should be available in 1.2 release, your query is
incorrect, try the following:

http://localhost:8984/solr/select/?q=java&qt=spellchecker&termSourceField=title_text&cmd=rebuild

the 'q' parameter must only contain the word being checked; you must specify
the field separately.  You can set "termSourceField" in your
solrconfig.xmlfile so you do not need to explicitly set it each time
you want to run a
spell check query. Also make sure your field isn't heavily processed (i.e.
with porter stemmer analyzers) otherwise the suggestions will look a bit
weird / mangled.  Take a look at the wiki page for more info:

http://wiki.apache.org/solr/SpellCheckerRequestHandler

cheers,
Tristan



On 7/9/07, climbingrose <[EMAIL PROTECTED]> wrote:


Hi Tristan,

Is this spellchecker available in 1.2 release or I have to build the
trunk.
I tried your instructions but Solr returns nothing:


http://localhost:8984/solr/select/?q=title_text:java&qt=spellchecker&cmd=rebuild

Result:



0
3

rebuild



Thanks.


On 7/8/07, Tristan Vittorio <[EMAIL PROTECTED]> wrote:
>
> Hi Otis,
>
> I have written a draft wiki entry for the spell checker:
> http://wiki.apache.org/solr/SpellCheckerRequestHandler
>
> I've learned that my initial observation about the suggestion ordering
was
> incorrect, it does in fact order the results by popularity (or term
> frequency) of the word in the termSourceField, the problem I experienced
> was
> caused by setting termSourceField to a field of type "text", which
heavily
> stemmed and analyzed the words.  I found that using the
StandardTokenizer
> and StandardFilter and removing the PorterStemmer and LowerCaseFilter
from
> the field schema really improved the spell checker performance.
>
> I haven't included this info on the wiki page yet, I'll try to update it
> soon when I have a bit more time.
>
> cheers,
> Tristan
>
>
>
> On 7/8/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> >
> > Tristan - good summary - want to copy that to the Solr Wiki?
> >
> > Thanks,
> > Otis
> >
> > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> > Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share
> >
> > - Original Message 
> > From: Tristan Vittorio <[EMAIL PROTECTED]>
> > To: solr-user@lucene.apache.org
> > Sent: Saturday, July 7, 2007 1:51:15 AM
> > Subject: Re: Spell Check Handler
> >
> > I couldn't find any documention on the spell check handler either but
> > found
> > enough information from the solrconfig.xml file, simply search for
> > "SpellCheckerRequestHandler" (online version here):
> >
> >
>
http://svn.apache.org/repos/asf/lucene/solr/trunk/example/solr/conf/solrconfig.xml
> >
> > You can view the original development discussion from JIRA (not sure
how
> > helpful that will be for you though):
> > https://issues.apache.org/jira/browse/SOLR-81
> >
> > In a nutshell, the configuration parameters available are::
> >
> > suggestionCount: determines how many spelling suggestions are
returned.
> > accuracy: a float value between 1.0 and 0.0 on how close the suggested
> > words
> > should match the original word being checked.
> > spellcheckerIndexDir and  termSourceField: check solrconfig.xml for a
> full
> > explanation.
> >
> > In order to use the spell checking hander for the first time, you need
> to
> > explicitly build the spelling index with a sample query something like
> > this:
> >
> >
>
http://localhost:8080/solr/select/?q=macrosoft&qt=spellchecker&cmd=rebuild
> > 
> > Depending on how large you main index is, this rebuild operation could
> > take
> > a while.  Subsequent queries can omit '&cmd=rebuild' and will return
> > results
> > much faster:
> >
> > http://localhost:8080/solr/select/?q=macrosoft&qt=spellchecker
> > 
> > The order of the suggestions returned seems to be based on the
accuracy
> > figure (i.e. how close it matches the original word). it would be
great
> to
> > be able to sort these suggested results based on term frequency /
> document
> > frequency of the suggested word in the main index, since the most
> accurate
> > suggestion may not always be the most relevant.
> >
> > As far as I can tell there is currently no way of doing this using the
> > spellchecker handler alone (you could always run seperate standard
> queries
> > on each word suggestion and order by numDocs, but that would be very
> > inefficient), has anybody else tried to achieve this?
> >
> > cheers,
> > Tristan
> >
> >
> >
> > On 7/7/07, Andrew Nagy <[EMAIL PROTECTED] > wrote:
> > >
> > > Hello, is there any documentation on how to use the new spell check
> > > module?
> > >
> > > Thanks
> > > Andrew
> > >
> >
> >
> >
> >
>



--
Regards,

Cuong Hoang



Re: Spell Check Handler

2007-07-09 Thread climbingrose

Hi Tristan,

Is this spellchecker available in 1.2 release or I have to build the trunk.
I tried your instructions but Solr returns nothing:

http://localhost:8984/solr/select/?q=title_text:java&qt=spellchecker&cmd=rebuild

Result:



0
3

rebuild



Thanks.


On 7/8/07, Tristan Vittorio <[EMAIL PROTECTED]> wrote:


Hi Otis,

I have written a draft wiki entry for the spell checker:
http://wiki.apache.org/solr/SpellCheckerRequestHandler

I've learned that my initial observation about the suggestion ordering was
incorrect, it does in fact order the results by popularity (or term
frequency) of the word in the termSourceField, the problem I experienced
was
caused by setting termSourceField to a field of type "text", which heavily
stemmed and analyzed the words.  I found that using the StandardTokenizer
and StandardFilter and removing the PorterStemmer and LowerCaseFilter from
the field schema really improved the spell checker performance.

I haven't included this info on the wiki page yet, I'll try to update it
soon when I have a bit more time.

cheers,
Tristan



On 7/8/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
>
> Tristan - good summary - want to copy that to the Solr Wiki?
>
> Thanks,
> Otis
>
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share
>
> - Original Message 
> From: Tristan Vittorio <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Saturday, July 7, 2007 1:51:15 AM
> Subject: Re: Spell Check Handler
>
> I couldn't find any documention on the spell check handler either but
> found
> enough information from the solrconfig.xml file, simply search for
> "SpellCheckerRequestHandler" (online version here):
>
>
http://svn.apache.org/repos/asf/lucene/solr/trunk/example/solr/conf/solrconfig.xml
>
> You can view the original development discussion from JIRA (not sure how
> helpful that will be for you though):
> https://issues.apache.org/jira/browse/SOLR-81
>
> In a nutshell, the configuration parameters available are::
>
> suggestionCount: determines how many spelling suggestions are returned.
> accuracy: a float value between 1.0 and 0.0 on how close the suggested
> words
> should match the original word being checked.
> spellcheckerIndexDir and  termSourceField: check solrconfig.xml for a
full
> explanation.
>
> In order to use the spell checking hander for the first time, you need
to
> explicitly build the spelling index with a sample query something like
> this:
>
>
http://localhost:8080/solr/select/?q=macrosoft&qt=spellchecker&cmd=rebuild
> 
> Depending on how large you main index is, this rebuild operation could
> take
> a while.  Subsequent queries can omit '&cmd=rebuild' and will return
> results
> much faster:
>
> http://localhost:8080/solr/select/?q=macrosoft&qt=spellchecker
> 
> The order of the suggestions returned seems to be based on the accuracy
> figure (i.e. how close it matches the original word). it would be great
to
> be able to sort these suggested results based on term frequency /
document
> frequency of the suggested word in the main index, since the most
accurate
> suggestion may not always be the most relevant.
>
> As far as I can tell there is currently no way of doing this using the
> spellchecker handler alone (you could always run seperate standard
queries
> on each word suggestion and order by numDocs, but that would be very
> inefficient), has anybody else tried to achieve this?
>
> cheers,
> Tristan
>
>
>
> On 7/7/07, Andrew Nagy <[EMAIL PROTECTED] > wrote:
> >
> > Hello, is there any documentation on how to use the new spell check
> > module?
> >
> > Thanks
> > Andrew
> >
>
>
>
>





--
Regards,

Cuong Hoang