Re: optimization failed

2009-02-11 Thread Otis Gospodnetic
Eh, this replies through Nabble are really problematic.  I don't recall what 
the original error was any more.  java-u...@lucene is the best place to ask 
Lucene questions.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch





From: Qingdi 
To: solr-user@lucene.apache.org
Sent: Wednesday, February 11, 2009 6:33:45 PM
Subject: Re: optimization failed


Hi Otis,

Thanks for your quick response. We are on solr 1.3. 
We cannot upgrade to solr 1.4-dev at this moment. Do you know where we can
find more details on how lucene optimization process work? We want to check
if there is any solr config parameter we could adjust to avoid this problem.

Thanks.

Qingdi



Otis Gospodnetic wrote:
> 
> Hi Qingdi,
> 
> Hm, I've never encountered this problem.  You didn't mention your Solr
> version.  If I were you I would grab the nightly build tomorrow, because
> tonight's Solr nightly build should include the very latest Lucene jars. 
> Of course, this means running Solr 1.4-dev.
> 
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> 
> 
> -- 
> View this message in context:
> http://www.nabble.com/optimization-failed-tp21939498p21939498.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 

-- 
View this message in context: 
http://www.nabble.com/optimization-failed-tp21939498p21966936.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Example Solr instance

2009-02-11 Thread Otis Gospodnetic
Mauricio - are you aware of SolrSharp - a Solr client for .NET?  Would it be 
better to contribute to SolrSharp instead of creating another .NET client, or 
is your client going to be built very differently?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch





From: Mauricio Scheffer 
To: solr-user@lucene.apache.org
Sent: Wednesday, February 11, 2009 9:27:59 PM
Subject: Example Solr instance

Hi everyone. I'm developing a Solr client for .NET (
http://code.google.com/p/solrnet/) and I was wondering if I could use the
Solr instance at example.solrstuff.org (the one used by solrjs) to build an
online demo of my library... Of course, this would be just read-only access,
no updates. I would also put a cap on the number of rows so that people
couldn't ask for 1000 rows for example.Please let me know your opinion about
this. Thanks.

Cheers,
Mauricio


Re: Example Solr instance

2009-02-11 Thread Noble Paul നോബിള്‍ नोब्ळ्
this link is not working example.solrstuff.org. who is maintaining it?

On Thu, Feb 12, 2009 at 7:57 AM, Mauricio Scheffer
 wrote:
> Hi everyone. I'm developing a Solr client for .NET (
> http://code.google.com/p/solrnet/) and I was wondering if I could use the
> Solr instance at example.solrstuff.org (the one used by solrjs) to build an
> online demo of my library... Of course, this would be just read-only access,
> no updates. I would also put a cap on the number of rows so that people
> couldn't ask for 1000 rows for example.Please let me know your opinion about
> this. Thanks.
>
> Cheers,
> Mauricio
>



-- 
--Noble Paul


Re: Recent Paging Change?

2009-02-11 Thread wojtekpia

This was a false alarm, sorry. I misinterpreted some results.



wojtekpia wrote:
> 
> Has there been a recent change (since Dec 2/08) in the paging algorithm?
> I'm seeing much worse performance (75% drop in throughput) when I request
> 20 records starting at record 180 (page 10 in my application). 
> 
> Edit: the 75% drop is compared to my throughput for page 10 queries using
> Dec 2/08 code.
> 
> Thanks.
> 
> Wojtek
> 

-- 
View this message in context: 
http://www.nabble.com/Recent-Paging-Change--tp21946610p21969121.html
Sent from the Solr - User mailing list archive at Nabble.com.



Example Solr instance

2009-02-11 Thread Mauricio Scheffer
Hi everyone. I'm developing a Solr client for .NET (
http://code.google.com/p/solrnet/) and I was wondering if I could use the
Solr instance at example.solrstuff.org (the one used by solrjs) to build an
online demo of my library... Of course, this would be just read-only access,
no updates. I would also put a cap on the number of rows so that people
couldn't ask for 1000 rows for example.Please let me know your opinion about
this. Thanks.

Cheers,
Mauricio


Re: optimization failed

2009-02-11 Thread Qingdi

Hi Otis,

Thanks for your quick response. We are on solr 1.3. 
We cannot upgrade to solr 1.4-dev at this moment. Do you know where we can
find more details on how lucene optimization process work? We want to check
if there is any solr config parameter we could adjust to avoid this problem.

Thanks.

Qingdi



Otis Gospodnetic wrote:
> 
> Hi Qingdi,
> 
> Hm, I've never encountered this problem.  You didn't mention your Solr
> version.  If I were you I would grab the nightly build tomorrow, because
> tonight's Solr nightly build should include the very latest Lucene jars. 
> Of course, this means running Solr 1.4-dev.
> 
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> 
> 
> -- 
> View this message in context:
> http://www.nabble.com/optimization-failed-tp21939498p21939498.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 

-- 
View this message in context: 
http://www.nabble.com/optimization-failed-tp21939498p21966936.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: score filter

2009-02-11 Thread Grant Ingersoll
+1.  Of course it is doable, but that doesn't mean you should, which  
is what I was trying to say before, (but was typing on my iPod so it  
wasn't fast) and which Walter has done so.  It is entirely conceivable  
to me that someone could search for a very common word such that the  
score of all relevant (and thus, "good") documents are below your  
predefined threshold.


At any rate, proceed at your own peril.  To implement it, look into  
the SearchComponent functionality.


On Feb 11, 2009, at 12:20 PM, Walter Underwood wrote:


Don't bother doing this. It doesn't work.

This seems like a good idea, something that would be useful for
almost every Lucene installation, but it isn't in Lucene because
it does not work in the real world.

A few problems:

* Some users want every match and don't care how many pages of
results they look at.

* Some users are very bad at creating queries that match their
information needs. Others are merely bad, not very bad. The good
matches for their query are on top, but the good matches for
their information need are on the third page.

* Misspellings can put the right match (partial match) at the
bottom. I did this yesterday at my library site, typeing
"Katherine Kerr" instead of the correct "Katharine Kerr".
Their search engine showed no matches (grrr), so I had to
search again with "Kerr".

* Most users do not know how to repair their queries, like
I did with "Katherine Kerr", changing it to "Kerr". Even if
they do, you shouldn't make them. Just show the weakly
relevant results.

* Documents have errors, just like queries. I find bad data
on our site about once a month, and we have professional
editors. We still haven't fixed our entry for "Betty Page"
to read "Bettie Page".

* People may use non-title words in the query, like searching
for "batman" when they want "The Dark Knight".

So, don't do this. If you are forced to do it, make sure that you
measure your search quality before and after it is implemented,
because it will get worse. Then you can stop doing it.

wunder

On 2/11/09 8:28 AM, "Cheng Zhang"  wrote:

Just did some research. It seems that it's doable with additional  
code added

to Solr but not out of box. Thank you, Grant.



- Original Message 
From: Grant Ingersoll 
To: "solr-user@lucene.apache.org" 
Sent: Wednesday, February 11, 2009 8:14:01 AM
Subject: Re: score filter

At what point do you draw the line?  0.01 is too low, but what  
about 0.5 or

0.3?  In fact, there may be queries where 0.01 is relevant.

Relevance is a tricky thing and putting in arbitrary cutoffs is  
usually not a
good thing. An alternative might be to instead look at the  
difference between
scores and see if the gap is larger than some delta, but even that  
is subject

to the vagaries of scoring.

What kind of relevance testing have you done so far to come up with  
those

values?  See also
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debug
ging-Relevance-Issues-in-Search/


On Feb 11, 2009, at 10:16, Cheng Zhang   
wrote:



Hi Grant,

In my case, for example searching a book. Some of the returned  
documents are
with high relevance (score > 3), but some of document with low  
score (<0.01)

are useless.

Without a "score filter", I have to go through each document to  
find out the
number of documents I'm interested (score > nnn). This causes some  
problem
for pagination.  For example if I only need to display the first  
10 records I
need to retrieve all 1000 documents to figure out the number of  
meaningful

documents which have score > nnn.

Thx,
Kevin




- Original Message 
From: Grant Ingersoll 
To: solr-user@lucene.apache.org
Sent: Wednesday, February 11, 2009 6:47:11 AM
Subject: Re: score filter

What's the motivation for wanting to do this?  The reason I ask,  
is score is
a relative thing determined by Lucene based on your index  
statistics.  It is
only meaningful for comparing the results of a specific query with  
a specific
instance of the index.  In other words, it isn't useful to filter  
on b/c
there is no way of knowing what a good cutoff value would be.  So,  
you won't
be able to do score:[1.2 TO *] because score is a not an actual  
Field.


That being said, you probably could implement a HitCollector at  
the Lucene
level and somehow hook it into Solr to do what you want.  Or, of  
course, just
stop processing the results in your app after you see a score  
below a certain
value.  Naturally, this still means you have to retrieve the  
results.


-Grant


On Feb 10, 2009, at 10:01 PM, Cheng Zhang wrote:


Hello,

Is there a way to set a score filter? I tried "+score:[1.2 TO *]"  
but it did

not work.

Many thanks,
Kevin



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using

Solr/Lucene:
http://www.lucidimagination.com/search








debugQuery missing boost

2009-02-11 Thread Sammy Yu
Hi,
   I'm trying to get some information how boost is used in the ranking
calculation via the debugQuery parameter for the following query:
(bodytext:iphone OR bodytext:firmware)^2.0 OR dateCreatedYear:2009^5.0

For one of the matching documents I can see:

4.7144237 = (MATCH) sum of:
  2.2903786 = (MATCH) sum of:
0.7662499 = (MATCH) weight(bodytext:iphon in 8339166), product of:
  0.427938 = queryWeight(bodytext:iphon), product of:
5.729801 = idf(docFreq=76646, numDocs=8682037)
0.07468636 = queryNorm
  1.7905629 = (MATCH) fieldWeight(bodytext:iphon in 8339166), product of:
1.0 = tf(termFreq(bodytext:iphon)=1)
5.729801 = idf(docFreq=76646, numDocs=8682037)
0.3125 = fieldNorm(field=bodytext, doc=8339166)
1.5241286 = (MATCH) weight(bodytext:firmwar in 8339166), product of:
  0.60354054 = queryWeight(bodytext:firmwar), product of:
8.081 = idf(docFreq=7300, numDocs=8682037)
0.07468636 = queryNorm
  2.5253127 = (MATCH) fieldWeight(bodytext:firmwar in 8339166), product of:
1.0 = tf(termFreq(bodytext:firmwar)=1)
8.081 = idf(docFreq=7300, numDocs=8682037)
0.3125 = fieldNorm(field=bodytext, doc=8339166)
  2.424045 = (MATCH) weight(dateCreatedYear:2009^5.0 in 8339166), product of:
0.6727613 = queryWeight(dateCreatedYear:2009^5.0), product of:
  5.0 = boost
  3.603128 = idf(docFreq=642831, numDocs=8682037)
  0.03734318 = queryNorm
3.603128 = (MATCH) fieldWeight(dateCreatedYear:2009 in 8339166), product of:
  1.0 = tf(termFreq(dateCreatedYear:2009)=1)
  3.603128 = idf(docFreq=642831, numDocs=8682037)
  1.0 = fieldNorm(field=dateCreatedYear, doc=8339166)

which shows that the 5.0 boost in  dateCreatedYear:2009^5.0 is being
applied however, the 2.0 boost is missing in "(bodytext:iphone OR
bodytext:firmware)^2.0"  How is the 2.0 boost being applied to the
score?

Thanks,
Sammy


Re: score filter

2009-02-11 Thread Walter Underwood
Don't bother doing this. It doesn't work.

This seems like a good idea, something that would be useful for
almost every Lucene installation, but it isn't in Lucene because
it does not work in the real world.

A few problems:

* Some users want every match and don't care how many pages of
results they look at.

* Some users are very bad at creating queries that match their
information needs. Others are merely bad, not very bad. The good
matches for their query are on top, but the good matches for
their information need are on the third page.

* Misspellings can put the right match (partial match) at the
bottom. I did this yesterday at my library site, typeing
"Katherine Kerr" instead of the correct "Katharine Kerr".
Their search engine showed no matches (grrr), so I had to
search again with "Kerr".

* Most users do not know how to repair their queries, like
I did with "Katherine Kerr", changing it to "Kerr". Even if
they do, you shouldn't make them. Just show the weakly
relevant results.

* Documents have errors, just like queries. I find bad data
on our site about once a month, and we have professional
editors. We still haven't fixed our entry for "Betty Page"
to read "Bettie Page".

* People may use non-title words in the query, like searching
for "batman" when they want "The Dark Knight".

So, don't do this. If you are forced to do it, make sure that you
measure your search quality before and after it is implemented,
because it will get worse. Then you can stop doing it.

wunder

On 2/11/09 8:28 AM, "Cheng Zhang"  wrote:

> Just did some research. It seems that it's doable with additional code added
> to Solr but not out of box. Thank you, Grant.
> 
> 
> 
> - Original Message 
> From: Grant Ingersoll 
> To: "solr-user@lucene.apache.org" 
> Sent: Wednesday, February 11, 2009 8:14:01 AM
> Subject: Re: score filter
> 
> At what point do you draw the line?  0.01 is too low, but what about 0.5 or
> 0.3?  In fact, there may be queries where 0.01 is relevant.
> 
> Relevance is a tricky thing and putting in arbitrary cutoffs is usually not a
> good thing. An alternative might be to instead look at the difference between
> scores and see if the gap is larger than some delta, but even that is subject
> to the vagaries of scoring.
> 
> What kind of relevance testing have you done so far to come up with those
> values?  See also
> http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debug
> ging-Relevance-Issues-in-Search/
> 
> 
> On Feb 11, 2009, at 10:16, Cheng Zhang  wrote:
> 
>> Hi Grant,
>> 
>> In my case, for example searching a book. Some of the returned documents are
>> with high relevance (score > 3), but some of document with low score (<0.01)
>> are useless.
>> 
>> Without a "score filter", I have to go through each document to find out the
>> number of documents I'm interested (score > nnn). This causes some problem
>> for pagination.  For example if I only need to display the first 10 records I
>> need to retrieve all 1000 documents to figure out the number of meaningful
>> documents which have score > nnn.
>> 
>> Thx,
>> Kevin
>> 
>> 
>> 
>> 
>> - Original Message 
>> From: Grant Ingersoll 
>> To: solr-user@lucene.apache.org
>> Sent: Wednesday, February 11, 2009 6:47:11 AM
>> Subject: Re: score filter
>> 
>> What's the motivation for wanting to do this?  The reason I ask, is score is
>> a relative thing determined by Lucene based on your index statistics.  It is
>> only meaningful for comparing the results of a specific query with a specific
>> instance of the index.  In other words, it isn't useful to filter on b/c
>> there is no way of knowing what a good cutoff value would be.  So, you won't
>> be able to do score:[1.2 TO *] because score is a not an actual Field.
>> 
>> That being said, you probably could implement a HitCollector at the Lucene
>> level and somehow hook it into Solr to do what you want.  Or, of course, just
>> stop processing the results in your app after you see a score below a certain
>> value.  Naturally, this still means you have to retrieve the results.
>> 
>> -Grant
>> 
>> 
>> On Feb 10, 2009, at 10:01 PM, Cheng Zhang wrote:
>> 
>>> Hello,
>>> 
>>> Is there a way to set a score filter? I tried "+score:[1.2 TO *]" but it did
>>> not work.
>>> 
>>> Many thanks,
>>> Kevin
>>> 
>> 
>> --
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> 
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
> 



How the inverted index works.

2009-02-11 Thread Josiane Gamgo
Hi,
I'm trying to understand the internal Sturcture of the lucene indexer.
Well according to "Lucene in action" book , the Document are first converted
into lucene Document Format, then analysed with the standardAnalyser.
I don't understand how the analysed Documents added to the inverted index,
are process .
thanks .
Josiane


term offsets not returned with tv=true

2009-02-11 Thread Jeffrey Baker
I'm trying to exercise the termOffset functions in the nightly build
(2009-02-11) but it doesn't seem to do anything.  I have an item in my
schema like so:



And I attempt this query:

qt=tvrh&
tv=true&
tv.offsets=true&
indent=true&
wt=json&
facet.mincount=1&
facet=true&
hl=on&
hl.fl=document&
hl.mergeContiguous=true&
hl.requireFieldMatch=true&
fl=document,id,title,doctype,score&
hl.usePhraseHighlighter=true&
hl.snippets=3&
hl.fragsize=200&
hl.maxAnalyzedChars=1048576&
hl.simple.pre=[[[hit]&
hl.simple.post=[[[/hit]&
rows=20&
q=iphone

... where most of those parameters are irrelevant to this question (I
think).  The response looks like this:

"termVectors":[
  "doc-51630",[
"uniqueKey","streetevents:2012449"],
  "doc-19343",[
"uniqueKey","streetevents:1904785"],
  "doc-22599",[
"uniqueKey","streetevents:1873725"],
  "doc-52660",[
"uniqueKey","streetevents:2029389"],
  "doc-37532",[
"uniqueKey","streetevents:1665907"],
  "doc-49797",[
"uniqueKey","streetevents:1996051"],
  "doc-21476",[
"uniqueKey","streetevents:1885188"],
  "doc-24671",[
"uniqueKey","streetevents:1820498"],
  "doc-25617",[
"uniqueKey","streetevents:1794743"],
  "doc-48135",[
"uniqueKey","streetevents:1981537"],
  "doc-47239",[
"uniqueKey","streetevents:1940855"],
  "doc-54651",[
"uniqueKey","streetevents:2069828"],
  "doc-48085",[
"uniqueKey","streetevents:1979847"],
  "doc-28956",[
"uniqueKey","streetevents:1766038"],
  "doc-47986",[
"uniqueKey","streetevents:1978001"],
  "doc-32287",[
"uniqueKey","streetevents:1740905"],
  "doc-41568",[
"uniqueKey","streetevents:1599906"],
  "doc-44964",[
"uniqueKey","streetevents:1782481"],
  "doc-43900",[
"uniqueKey","streetevents:1748639"],
  "doc-45390",[
"uniqueKey","streetevents:1811998"],

I guess I was expecting to get some lists of term offsets.  Am I doing it wrong?

-jwb


Re: Performance degradation caused by choice of range fields

2009-02-11 Thread wojtekpia

Yes, I commit roughly every 15 minutes (via a data update). This update is
consistent between my tests, and only causes a performance drop when I'm
sorting on fields with many unique values. I've examined my GC logs, and
they are also consistent between my tests.



Otis Gospodnetic wrote:
> 
> Hi,
> 
> Did you commit (reopen the searcher) during the performance degradation
> period and did any of your queries use sort?  If so, perhaps your JVM is
> accumulating those thrown-away FieldCache objects and then GC has more and
> more garbage to clean up, causing pauses and lowering your overall
> throughput.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 

-- 
View this message in context: 
http://www.nabble.com/Performance-degradation-caused-by-choice-of-range-fields-tp21924197p21958268.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Recent Paging Change?

2009-02-11 Thread wojtekpia

I'll run a profiler on new and old code and let you know what I find.

I have changed my schema between tests: I used to have termVectors turned on
for several fields, and now they are always off. My underlying data has not
changed.
-- 
View this message in context: 
http://www.nabble.com/Recent-Paging-Change--tp21946610p21958267.html
Sent from the Solr - User mailing list archive at Nabble.com.



Is dismax mm applied before filtering stopwords in query?

2009-02-11 Thread Steven Hentschel
If a naive user enters a string that contains typical stopwords like "and" 
and "the", these seem to be included in the word count for the must 
match criteria of the the dismax query.

So, if for example the mm parameter is the default " 2>-1 5>-2 
6>90%" and the user enters something like "Jason and the Argonauts", 
this won't match a document with that title because the word count is 
treated as 4 and only 2 words match. As the dismax query is 
recommended for naive users, wouldn't be more logical to apply the 
mm criteria after applying the stopword filter on the query?

Steven H. 


Re: Is there a way to query for this value?

2009-02-11 Thread Ian Connor
Thanks,

Here is a ruby translation for those that want it:

solr_query = ""
  doi_part.each_char do |c|
if (c == '\\' || c == '+' || c == '-' || c == '!' || c == '(' || c
== ')' || c == ':' || c == '^' || c == '[' || c == ']' || c == '\"' || c ==
'{' || c == '}' || c == '~' || c == '*' || c == '?' || c == '|' || c == ';')
  solr_query += '\\'
  solr_query += "#{c}"
elsif (c == '&')
  solr_query += "%26"
else
  solr_query += "#{c}"
end
  end
  solr_query

It still seems to get confused by & characters and turning them into %26
does not work from the solr-ruby connection. ..but it works for most of the
DOI's that I have tried so still a big improvement - thanks.

On Tue, Feb 10, 2009 at 11:19 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

> Hi Ian,
>
> I'll assume this actually did get indexed as a single token, so there is no
> problem there.
> As for query string escaping, perhaps this method from Lucene's QueryParser
> will help:
>
>  /**
>   * Returns a String where those characters that QueryParser
>   * expects to be escaped are escaped by a preceding \.
>   */
>  public static String escape(String s) {
>StringBuffer sb = new StringBuffer();
>for (int i = 0; i < s.length(); i++) {
>  char c = s.charAt(i);
>  // These characters are part of the query syntax and must be escaped
>  if (c == '\\' || c == '+' || c == '-' || c == '!' || c == '(' || c ==
> ')' || c == ':'
>|| c == '^' || c == '[' || c == ']' || c == '\"' || c == '{' || c ==
> '}' || c == '~'
>|| c == '*' || c == '?' || c == '|' || c == '&') {
>sb.append('\\');
>  }
>  sb.append(c);
>}
>return sb.toString();
>  }
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
>
> 
> From: Ian Connor 
> To: solr 
> Sent: Tuesday, February 10, 2009 9:28:11 PM
> Subject: Is there a way to query for this value?
>
> I have tried to escape the characters as best I can, but cannot seem to
> find
> one that works.
>
> The value is:
>
> 10.1002/(SICI)1096-9136(199604)13:4<390::AID-DIA121>3.0.CO;2-4
>
> It is a doi (see http://doi.org), so is a valid value to search on.
> However,
> when I query this through ruby or even the admin interface, the parser does
> not like it and returns an error.
>
> What is the way to escape this? Is there such code for ruby?
> --
> Regards,
>
> Ian Connor
>



-- 
Regards,

Ian Connor
1 Leighton St #723
Cambridge, MA 02141
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Fax: +1(770) 818 5697
Skype: ian.connor


Re: score filter

2009-02-11 Thread Cheng Zhang
Just did some research. It seems that it's doable with additional code added to 
Solr but not out of box. Thank you, Grant.



- Original Message 
From: Grant Ingersoll 
To: "solr-user@lucene.apache.org" 
Sent: Wednesday, February 11, 2009 8:14:01 AM
Subject: Re: score filter

At what point do you draw the line?  0.01 is too low, but what about 0.5 or 
0.3?  In fact, there may be queries where 0.01 is relevant.

Relevance is a tricky thing and putting in arbitrary cutoffs is usually not a 
good thing. An alternative might be to instead look at the difference between 
scores and see if the gap is larger than some delta, but even that is subject 
to the vagaries of scoring.

What kind of relevance testing have you done so far to come up with those 
values?  See also 
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debugging-Relevance-Issues-in-Search/


On Feb 11, 2009, at 10:16, Cheng Zhang  wrote:

> Hi Grant,
> 
> In my case, for example searching a book. Some of the returned documents are 
> with high relevance (score > 3), but some of document with low score (<0.01) 
> are useless.
> 
> Without a "score filter", I have to go through each document to find out the 
> number of documents I'm interested (score > nnn). This causes some problem 
> for pagination.  For example if I only need to display the first 10 records I 
> need to retrieve all 1000 documents to figure out the number of meaningful 
> documents which have score > nnn.
> 
> Thx,
> Kevin
> 
> 
> 
> 
> - Original Message 
> From: Grant Ingersoll 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, February 11, 2009 6:47:11 AM
> Subject: Re: score filter
> 
> What's the motivation for wanting to do this?  The reason I ask, is score is 
> a relative thing determined by Lucene based on your index statistics.  It is 
> only meaningful for comparing the results of a specific query with a specific 
> instance of the index.  In other words, it isn't useful to filter on b/c 
> there is no way of knowing what a good cutoff value would be.  So, you won't 
> be able to do score:[1.2 TO *] because score is a not an actual Field.
> 
> That being said, you probably could implement a HitCollector at the Lucene 
> level and somehow hook it into Solr to do what you want.  Or, of course, just 
> stop processing the results in your app after you see a score below a certain 
> value.  Naturally, this still means you have to retrieve the results.
> 
> -Grant
> 
> 
> On Feb 10, 2009, at 10:01 PM, Cheng Zhang wrote:
> 
>> Hello,
>> 
>> Is there a way to set a score filter? I tried "+score:[1.2 TO *]" but it did 
>> not work.
>> 
>> Many thanks,
>> Kevin
>> 
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
> Solr/Lucene:
> http://www.lucidimagination.com/search



Re: score filter

2009-02-11 Thread Grant Ingersoll
At what point do you draw the line?  0.01 is too low, but what about  
0.5 or 0.3?  In fact, there may be queries where 0.01 is relevant.


Relevance is a tricky thing and putting in arbitrary cutoffs is  
usually not a good thing. An alternative might be to instead look at  
the difference between scores and see if the gap is larger than some  
delta, but even that is subject to the vagaries of scoring.


What kind of relevance testing have you done so far to come up with  
those values?  See also http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debugging-Relevance-Issues-in-Search/



On Feb 11, 2009, at 10:16, Cheng Zhang  wrote:


Hi Grant,

In my case, for example searching a book. Some of the returned  
documents are with high relevance (score > 3), but some of document  
with low score (<0.01) are useless.


Without a "score filter", I have to go through each document to find  
out the number of documents I'm interested (score > nnn). This  
causes some problem for pagination.  For example if I only need to  
display the first 10 records I need to retrieve all 1000 documents  
to figure out the number of meaningful documents which have score >  
nnn.


Thx,
Kevin




- Original Message 
From: Grant Ingersoll 
To: solr-user@lucene.apache.org
Sent: Wednesday, February 11, 2009 6:47:11 AM
Subject: Re: score filter

What's the motivation for wanting to do this?  The reason I ask, is  
score is a relative thing determined by Lucene based on your index  
statistics.  It is only meaningful for comparing the results of a  
specific query with a specific instance of the index.  In other  
words, it isn't useful to filter on b/c there is no way of knowing  
what a good cutoff value would be.  So, you won't be able to do  
score:[1.2 TO *] because score is a not an actual Field.


That being said, you probably could implement a HitCollector at the  
Lucene level and somehow hook it into Solr to do what you want.  Or,  
of course, just stop processing the results in your app after you  
see a score below a certain value.  Naturally, this still means you  
have to retrieve the results.


-Grant


On Feb 10, 2009, at 10:01 PM, Cheng Zhang wrote:


Hello,

Is there a way to set a score filter? I tried "+score:[1.2 TO *]"  
but it did not work.


Many thanks,
Kevin



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search


Re: score filter

2009-02-11 Thread Cheng Zhang
Hi Grant,

In my case, for example searching a book. Some of the returned documents are 
with high relevance (score > 3), but some of document with low score (<0.01) 
are useless. 

Without a "score filter", I have to go through each document to find out the 
number of documents I'm interested (score > nnn). This causes some problem for 
pagination.  For example if I only need to display the first 10 records I need 
to retrieve all 1000 documents to figure out the number of meaningful documents 
which have score > nnn.

Thx,
Kevin




- Original Message 
From: Grant Ingersoll 
To: solr-user@lucene.apache.org
Sent: Wednesday, February 11, 2009 6:47:11 AM
Subject: Re: score filter

What's the motivation for wanting to do this?  The reason I ask, is score is a 
relative thing determined by Lucene based on your index statistics.  It is only 
meaningful for comparing the results of a specific query with a specific 
instance of the index.  In other words, it isn't useful to filter on b/c there 
is no way of knowing what a good cutoff value would be.  So, you won't be able 
to do score:[1.2 TO *] because score is a not an actual Field.

That being said, you probably could implement a HitCollector at the Lucene 
level and somehow hook it into Solr to do what you want.  Or, of course, just 
stop processing the results in your app after you see a score below a certain 
value.  Naturally, this still means you have to retrieve the results.

-Grant


On Feb 10, 2009, at 10:01 PM, Cheng Zhang wrote:

> Hello,
> 
> Is there a way to set a score filter? I tried "+score:[1.2 TO *]" but it did 
> not work.
> 
> Many thanks,
> Kevin
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
Solr/Lucene:
http://www.lucidimagination.com/search


Re: score filter

2009-02-11 Thread Grant Ingersoll
What's the motivation for wanting to do this?  The reason I ask, is  
score is a relative thing determined by Lucene based on your index  
statistics.  It is only meaningful for comparing the results of a  
specific query with a specific instance of the index.  In other words,  
it isn't useful to filter on b/c there is no way of knowing what a  
good cutoff value would be.  So, you won't be able to do score:[1.2 TO  
*] because score is a not an actual Field.


That being said, you probably could implement a HitCollector at the  
Lucene level and somehow hook it into Solr to do what you want.  Or,  
of course, just stop processing the results in your app after you see  
a score below a certain value.  Naturally, this still means you have  
to retrieve the results.


-Grant


On Feb 10, 2009, at 10:01 PM, Cheng Zhang wrote:


Hello,

Is there a way to set a score filter? I tried "+score:[1.2 TO *]"  
but it did not work.


Many thanks,
Kevin



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: commit looks stuck ?

2009-02-11 Thread Grant Ingersoll

It looks like you are running out of memory.  What is your heap size?

On Feb 11, 2009, at 4:09 AM, sunnyfr wrote:



Hi

Have you an idea why after a night with solr running, but just  
commit every

five minute??
It looks like process never shutdown ???


root 29428  0.0  0.0  53988  2648 ?S01:05   0:00 curl
http://localhost:8180/solr/book/update -s -H Content-type:text/xml;
charset=utf-8 -d 
root 29829  0.0  0.0   3944   560 ?Ss   01:10   0:00 / 
bin/sh -c

/data/solr/book/bin/commit
root 29830  0.0  0.0   8936  1256 ?S01:10   0:00 / 
bin/bash

/data/solr/book/bin/commit
root 29852  0.0  0.0  53988  2640 ?S01:10   0:00 curl
http://localhost:8180/solr/book/update -s -H Content-type:text/xml;
charset=utf-8 -d 
root 30286  0.0  0.0   3944   564 ?Ss   01:15   0:00 / 
bin/sh -c

/data/solr/book/bin/commit
root 30287  0.0  0.0   8936  1256 ?S01:15   0:00 / 
bin/bash

/data/solr/book/bin/commit
root 30309  0.0  0.0  53988  2644 ?S01:15   0:00 curl
http://localhost:8180/solr/book/update -s -H Content-type:text/xml;
charset=utf-8 -d 
root 30715  0.0  0.0   3944   560 ?Ss   01:20   0:00 / 
bin/sh -c

/data/solr/book/bin/commit
root 30716  0.0  0.0   8936  1252 ?S01:20   0:00 / 
bin/bash

/data/solr/book/bin/commit
root 30738  0.0  0.0  53988  2644 ?S01:20   0:00 curl
http://localhost:8180/solr/book/update -s -H Content-type:text/xml;
charset=utf-8 -d 
root 31172  0.0  0.0   3944   564 ?Ss   01:25   0:00 / 
bin/sh -c

/data/solr/book/bin/commit
root 31173  0.0  0.0   8936  1252 ?S01:25   0:00 / 
bin/bash

/data/solr/book/bin/commit
root 31195  0.0  0.0  53988  2644 ?S01:25   0:00 curl
http://localhost:8180/solr/book/update -s -H Content-type:text/xml;
charset=utf-8 -d 
root 31606  0.0  0.0   3944   564 ?Ss   01:30   0:00 / 
bin/sh -c

/data/solr/book/bin/commit
root 31607  0.0  0.0   8936  1256 ?S01:30   0:00 / 
bin/bash

/data/solr/book/bin/commit
root 31629  0.0  0.0  53988  2648 ?S01:30   0:00 curl
http://localhost:8180/solr/book/update -s -H Content-type:text/xml;
charset=utf-8 -d 
root 32063  0.0  0.0   3944   560 ?Ss   01:35   0:00 / 
bin/sh -c

/data/solr/book/bin/commit
root 32064  0.0  0.0   8936  1256 ?S01:35   0:00 / 
bin/bash

/data/solr/book/bin/commit
root 32086  0.0  0.0  53988  2640 ?S01:35   0:00 curl
http://localhost:8180/solr/book/update -s -H Content-type:text/xml;
charset=utf-8 -d 
root 32499  0.0  0.0   3944   564 ?Ss   01:40   0:00 / 
bin/sh -c

/data/solr/book/bin/commit
root 32500  0.0  0.0   8936  1252 ?S01:40   0:00 / 
bin/bash

/data/solr/book/bin/commit
root 32522  0.0  0.0  53988  2648 ?S01:40   0:00 curl
http://localhost:8180/solr/book/update -s -H Content-type:text/xml;
charset=utf-8 -d 

My logs has a huge error, I don't know where it comes from??

2009/02/10 19:29:37 Apache Tomcat/5.5 - Error
report

Re: Recent Paging Change?

2009-02-11 Thread Grant Ingersoll
Has anything else changed index-wise?  For instance, do you have  
larger stored fields or are you retrieving more fields?


On Feb 10, 2009, at 8:26 PM, wojtekpia wrote:



Has there been a recent change (since Dec 2/08) in the paging  
algorithm? I'm
seeing much worse performance (75% drop in throughput) when I  
request 20

records starting at record 180 (page 10 in my application).

Thanks.

Wojtek
--
View this message in context: 
http://www.nabble.com/Recent-Paging-Change--tp21946610p21946610.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: SPELLCHECK Problems

2009-02-11 Thread Kraus, Ralf | pixelhouse GmbH

Grant Ingersoll schrieb:

What's your "textSpell FieldType look like?

Spelling is definitely something that needs tuning, so you might have 
to play with some of the knobs like accuracy, etc.


As for JaroWinkler, and I suppose the default, your field is "spell", 
but based on your configuration, I gather you really want it to be 
"RezeptNameSpellCheck".  I am guessing that if you point Luke at your 
those two spell checking indexes, you're going to find that they are 
empty.



Hey ! Thx a lot ... Thats indeed was my problem :-)

Greets,

Ralf


Re: DIH fails to import after svn update

2009-02-11 Thread Fergus McMenemie
Thanks, 

That fixed it.

>On Wed, Feb 11, 2009 at 4:19 PM, Fergus McMenemie  wrote:
>
>
>> java.lang.NoSuchFieldError: docCount
>>at
>> org.apache.solr.handler.dataimport.SolrWriter.getDocCount(SolrWriter.java:231)
>>at
>> org.apache.solr.handler.dataimport.DataImportHandlerException.(DataImportHandlerException.java:42)
>>at
>> org.apache.solr.handler.dataimport.FileListEntityProcessor.init(FileListEntityProcessor.java:81)
>>at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:293)
>>at
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:222)
>>at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:155)
>>at
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:324)
>>at
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:384)
>>at
>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:365)
>>
>
>Seems like this was not a clean compile. The AtomicInteger field docCount
>was changed to a AtomicLong.
>
>Can you please do a "ant clean dist"?
>
>-- 
>Regards,
>Shalin Shekhar Mangar.

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


"ant dist" of a nightly download fails

2009-02-11 Thread Fergus McMenemie
Hi,

I have been looking at the nightly downloads, trying to work
backwards through the nightly's till my code starts working 
again!

I have downloaded all the available nightly's and they all fail
to "ant dist" as follows:-


>root: ant dist
>Buildfile: build.xml
>
>init-forrest-entities:
>
>compile-solrj:
>
>make-manifest:
>
>dist-solrj:
>  [jar] Building jar: 
> /Volumes/spare/ts/apache-solr-nightly/dist/apache-solr-solrj-1.4-dev.jar
>
>compile:
>
>dist-jar:
>  [jar] Building jar: 
> /Volumes/spare/ts/apache-solr-nightly/dist/apache-solr-core-1.4-dev.jar
>
>dist-contrib:
>
>init:
>
>init-forrest-entities:
>
>compile-solrj:
>
>compile:
>
>make-manifest:
>
>compile:
>
>build:
>  [jar] Building jar: 
> /Volumes/spare/ts/apache-solr-nightly/contrib/dataimporthandler/target/apache-solr-dataimporthandler-1.4-dev.jar
>
>dist:
> [copy] Copying 2 files to /Volumes/spare/ts/apache-solr-nightly/build/web
>[mkdir] Created dir: 
> /Volumes/spare/ts/apache-solr-nightly/build/web/WEB-INF/lib
> [copy] Copying 1 file to 
> /Volumes/spare/ts/apache-solr-nightly/build/web/WEB-INF/lib
> [copy] Copying 1 file to /Volumes/spare/ts/apache-solr-nightly/dist
>
>init:
>
>init-forrest-entities:
>
>compile-solrj:
>
>compile:
>
>make-manifest:
>
>compile:
>
>build:
>  [jar] Building jar: 
> /Volumes/spare/ts/apache-solr-nightly/contrib/extraction/build/apache-solr-cell-1.4-dev.jar
>
>dist:
> [copy] Copying 1 file to /Volumes/spare/ts/apache-solr-nightly/dist
>
>clean:
>   [delete] Deleting directory 
> /Volumes/spare/ts/apache-solr-nightly/contrib/javascript/dist
>
>create-dist-folder:
>[mkdir] Created dir: 
> /Volumes/spare/ts/apache-solr-nightly/contrib/javascript/dist
>
>concat:
>
>docs:
>[mkdir] Created dir: 
> /Volumes/spare/ts/apache-solr-nightly/contrib/javascript/dist/doc
> [java] Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/mozilla/javascript/tools/shell/Main
> [java]at JsRun.main(Unknown Source)
>
>BUILD FAILED
>/Volumes/spare/ts/apache-solr-nightly/common-build.xml:338: The following 
>error occurred while executing this line:
>/Volumes/spare/ts/apache-solr-nightly/common-build.xml:215: The following 
>error occurred while executing this line:
>/Volumes/spare/ts/apache-solr-nightly/contrib/javascript/build.xml:74: Java 
>returned: 1
>
>Total time: 3 seconds
>root: 

Performing "ant test" is fine. Removing the javascript contrib directory
allows the "ant dist" to complete and I have a usable war file. However
I suspect this may not represent best practise; however "ant test" is still
fine. 


What does removal of the this contrib function loose me? I was wondering if
it went with the DIH ScriptTransformer?
 
Regards Fergus.

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: SPELLCHECK Problems

2009-02-11 Thread Grant Ingersoll

What's your "textSpell FieldType look like?

Spelling is definitely something that needs tuning, so you might have  
to play with some of the knobs like accuracy, etc.


As for JaroWinkler, and I suppose the default, your field is "spell",  
but based on your configuration, I gather you really want it to be  
"RezeptNameSpellCheck".  I am guessing that if you point Luke at your  
those two spell checking indexes, you're going to find that they are  
empty.


HTH,
Grant

On Feb 11, 2009, at 5:47 AM, Kraus, Ralf | pixelhouse GmbH wrote:


Hi,

My SOLRCONFIG.XML

  
  
  
  
  spellcheck
  
  

  

  textSpell

  
  default
  spell
  ./spellchecker1
  true
  

  
  jarowinkler
  spell
  
  name 
= 
"distanceMeasure 
">org.apache.lucene.search.spell.JaroWinklerDistance

  ./spellchecker2
  true
  

  
  solr.FileBasedSpellChecker
  file
  dictionary.txt
  UTF-8
  ./spellcheckerFile
  true
  

  

My Schema.xml

stored="true" multiValued="true"/>



Search:
spellcheck=true
&wt=phps
&rows=30
&start=0
&sort=score+desc
&spellcheck.build=true
&spellcheck.extendedResults=false
&spellcheck.count=1
&q=sudeln
&spellcheck.onlyMorePopular=true
&spellcheck.dictionary=file

Now my Problems :-)

If I use the "file" choice with "spellcheck.dictionary=file" I got  
very bad suggestions :-( If I use "default" or "Jarowinkler" I dont  
get any suggestions at all :-(

Whats the problem ?

Greets,

Ralf


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: DIH fails to import after svn update

2009-02-11 Thread Shalin Shekhar Mangar
On Wed, Feb 11, 2009 at 4:19 PM, Fergus McMenemie  wrote:


> java.lang.NoSuchFieldError: docCount
>at
> org.apache.solr.handler.dataimport.SolrWriter.getDocCount(SolrWriter.java:231)
>at
> org.apache.solr.handler.dataimport.DataImportHandlerException.(DataImportHandlerException.java:42)
>at
> org.apache.solr.handler.dataimport.FileListEntityProcessor.init(FileListEntityProcessor.java:81)
>at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:293)
>at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:222)
>at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:155)
>at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:324)
>at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:384)
>at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:365)
>

Seems like this was not a clean compile. The AtomicInteger field docCount
was changed to a AtomicLong.

Can you please do a "ant clean dist"?

-- 
Regards,
Shalin Shekhar Mangar.


DIH fails to import after svn update

2009-02-11 Thread Fergus McMenemie
Hello,

I had a nice working version of SOLR building from trunk, I think
it was from about 2-4th Feb, On the 7th I performed a "svn update"
and it now fails as follows when performing 

get 'http://localhost:8080/apache-solr-1.4-dev/dataimport?command=full-import'

I have performed a "svn update" on the 11th (today) again. It still
fails.

Feb 11, 2009 4:27:34 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll
INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
Feb 11, 2009 4:27:34 AM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=2

commit{dir=/Volumes/spare/ts/solrnightlyjanes/data/index,segFN=segments_1,version=1234326438927,generation=1,filenames=[segments_1]

commit{dir=/Volumes/spare/ts/solrnightlyjanes/data/index,segFN=segments_2,version=1234326438928,generation=2,filenames=[segments_2]
Feb 11, 2009 4:27:34 AM org.apache.solr.core.SolrDeletionPolicy updateCommits
INFO: last commit = 1234326438928
Feb 11, 2009 4:27:34 AM org.apache.solr.handler.dataimport.DataImporter 
doFullImport
SEVERE: Full Import failed
java.lang.NoSuchFieldError: docCount
at 
org.apache.solr.handler.dataimport.SolrWriter.getDocCount(SolrWriter.java:231)
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.(DataImportHandlerException.java:42)
at 
org.apache.solr.handler.dataimport.FileListEntityProcessor.init(FileListEntityProcessor.java:81)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:293)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:222)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:155)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:324)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:384)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:365)
Feb 11, 2009 4:27:34 AM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: start rollback
Feb 11, 2009 4:27:34 AM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: end_rollback
Feb 11, 2009 4:27:34 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
Feb 11, 2009 4:27:34 AM org.apache.solr.search.SolrIndexSearcher 


Regards to all.
-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


SPELLCHECK Problems

2009-02-11 Thread Kraus, Ralf | pixelhouse GmbH

Hi,

My SOLRCONFIG.XML

   
   
   
   
   spellcheck
   
   

   

   textSpell

   
   default
   spell
   ./spellchecker1
   true
   

   
   jarowinkler
   spell
   
   name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance

   ./spellchecker2
   true
   

   
   solr.FileBasedSpellChecker
   file
   dictionary.txt
   UTF-8
   ./spellcheckerFile
   true
   

   

My Schema.xml

stored="true" multiValued="true"/>



Search:
spellcheck=true
&wt=phps
&rows=30
&start=0
&sort=score+desc
&spellcheck.build=true
&spellcheck.extendedResults=false
&spellcheck.count=1
&q=sudeln
&spellcheck.onlyMorePopular=true
&spellcheck.dictionary=file

Now my Problems :-)

If I use the "file" choice with "spellcheck.dictionary=file" I got very 
bad suggestions :-( If I use "default" or "Jarowinkler" I dont get any 
suggestions at all :-(

Whats the problem ?

Greets,

Ralf


commit looks stuck ?

2009-02-11 Thread sunnyfr

Hi 

Have you an idea why after a night with solr running, but just commit every
five minute??
It looks like process never shutdown ??? 


root 29428  0.0  0.0  53988  2648 ?S01:05   0:00 curl
http://localhost:8180/solr/book/update -s -H Content-type:text/xml;
charset=utf-8 -d 
root 29829  0.0  0.0   3944   560 ?Ss   01:10   0:00 /bin/sh -c
/data/solr/book/bin/commit
root 29830  0.0  0.0   8936  1256 ?S01:10   0:00 /bin/bash
/data/solr/book/bin/commit
root 29852  0.0  0.0  53988  2640 ?S01:10   0:00 curl
http://localhost:8180/solr/book/update -s -H Content-type:text/xml;
charset=utf-8 -d 
root 30286  0.0  0.0   3944   564 ?Ss   01:15   0:00 /bin/sh -c
/data/solr/book/bin/commit
root 30287  0.0  0.0   8936  1256 ?S01:15   0:00 /bin/bash
/data/solr/book/bin/commit
root 30309  0.0  0.0  53988  2644 ?S01:15   0:00 curl
http://localhost:8180/solr/book/update -s -H Content-type:text/xml;
charset=utf-8 -d 
root 30715  0.0  0.0   3944   560 ?Ss   01:20   0:00 /bin/sh -c
/data/solr/book/bin/commit
root 30716  0.0  0.0   8936  1252 ?S01:20   0:00 /bin/bash
/data/solr/book/bin/commit
root 30738  0.0  0.0  53988  2644 ?S01:20   0:00 curl
http://localhost:8180/solr/book/update -s -H Content-type:text/xml;
charset=utf-8 -d 
root 31172  0.0  0.0   3944   564 ?Ss   01:25   0:00 /bin/sh -c
/data/solr/book/bin/commit
root 31173  0.0  0.0   8936  1252 ?S01:25   0:00 /bin/bash
/data/solr/book/bin/commit
root 31195  0.0  0.0  53988  2644 ?S01:25   0:00 curl
http://localhost:8180/solr/book/update -s -H Content-type:text/xml;
charset=utf-8 -d 
root 31606  0.0  0.0   3944   564 ?Ss   01:30   0:00 /bin/sh -c
/data/solr/book/bin/commit
root 31607  0.0  0.0   8936  1256 ?S01:30   0:00 /bin/bash
/data/solr/book/bin/commit
root 31629  0.0  0.0  53988  2648 ?S01:30   0:00 curl
http://localhost:8180/solr/book/update -s -H Content-type:text/xml;
charset=utf-8 -d 
root 32063  0.0  0.0   3944   560 ?Ss   01:35   0:00 /bin/sh -c
/data/solr/book/bin/commit
root 32064  0.0  0.0   8936  1256 ?S01:35   0:00 /bin/bash
/data/solr/book/bin/commit
root 32086  0.0  0.0  53988  2640 ?S01:35   0:00 curl
http://localhost:8180/solr/book/update -s -H Content-type:text/xml;
charset=utf-8 -d 
root 32499  0.0  0.0   3944   564 ?Ss   01:40   0:00 /bin/sh -c
/data/solr/book/bin/commit
root 32500  0.0  0.0   8936  1252 ?S01:40   0:00 /bin/bash
/data/solr/book/bin/commit
root 32522  0.0  0.0  53988  2648 ?S01:40   0:00 curl
http://localhost:8180/solr/book/update -s -H Content-type:text/xml;
charset=utf-8 -d 

My logs has a huge error, I don't know where it comes from??

2009/02/10 19:29:37 Apache Tomcat/5.5 - Error
report
HTTP Status 500 - Java heap space
java.lang.OutOfMemoryError: Java heap space at
org.apache.lucene.index.SegmentTermEnum.termInfo(SegmentTermEnum.java:178)
at
org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:179)
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:225) at
org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:218) at
org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:55) at
org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:811) at
org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:947)
at
org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:913)
at org.apache.lucene.index.IndexWriter.applyDeletes(IndexWriter.java:4606)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3842) at
org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3712) at
org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1752) at
org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1716) at
org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1687) at
org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:214) at
org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:172)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:341)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.jav