Scorer skipTo() expectations?

2007-10-04 Thread Dan Rich
Hi, 

I have a custom Query class that provides a long list of lucene docIds (not for 
filtering purposes), which is one clause in a standard BooleanQuery (which also 
contains TermQuery instances).

I have a custom Scorer that goes along with the custom Query class. 

What (if any) document ordering requirements does the Scorer class have for its 
skipTo(int docId) method?

In particular, currently I'm sorting/returning the docIds in ascending order 
from my custom Query class. That can be expensive for large docId lists; is 
sorting necessary? It looks like skipTo() might expect the documents it gets to 
be in ascending order to behave correctly as part of a BooleanQuery, but I 
can't tell for sure from the doc.

If the document list from my custom Scorer class does not have its document 
list in ascending order (e.g. 10, 80, 40, 60, 50) will whatever uses skipTo() 
potentially lose hits? If not, is there any performance concern with having the 
docIds unordered?


  

Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel 
and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 


RegexQuery on multiple fields?

2007-10-04 Thread Oliver Hummel
Hi,

I've recently tried the RegexQuery with Lucene which works fine with the
following code snippet:

  Hits hits;
  String q = someregex;
  Term t = new Term(content, q);
  Query query = new RegexQuery(t);
  hits = searcher.search(query);

However, I wonder whether it is possible to use a QueryParser together with the
RegexQuery to determine the field to be searched on dynamically?

I wasn't able to find a solution in the API. Anybody knows one? Or is this not
possible?

Thanks in advance!

  Oliver



--
http://merobase.com - find source code, components and web services

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



TermPositionVector.indexesOf()

2007-10-04 Thread Patricio Galeas
Hello,

I'm using the following method to obtain the position of some terms in a 
document:

int[] indexOfTerms = TermPositionVector.indexesOf(String[] terms, int start, 
int len);

Should I parse the strings contained in terms before I apply indexOf()?

Thank you in advance
Patricio
-- 
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten 
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Scorer skipTo() expectations?

2007-10-04 Thread Paul Elschot
Dan,

In Scorers, when skipTo() or next() returns true for the second or later
time, the result of doc() will be increased.
When Scorer.skipTo() does not have document order, documents will
be lost, which means that not all matching documents will be found
by the search.

For disjunctions (OR), one needs to merge the documents of
two Scorers using next() to iterate over the documents.
The merging is normally done on the fly using a specialized priority queue
on the doc() values in DisjunctionSumScorer.
No sorting of  complete document lists is done at search time,
that is done at indexing time. And since TermScorer uses the
index directly, it will always return documents in order.

The only exception to document ordering is BooleanScorer.next(),
which is used by BooleanQuery for some cases of top
level disjunctions, and then only when documents are allowed
to be scored out of order. The reason for that is performance,
BooleanScorer uses a faster data structure than a priority queue,
but BooleanScorer does not implement skipTo().

Regards,
Paul Elschot




On Thursday 04 October 2007 09:12, Dan Rich wrote:
 Hi,

 I have a custom Query class that provides a long list of lucene docIds (not
 for filtering purposes), which is one clause in a standard BooleanQuery
 (which also contains TermQuery instances).

 I have a custom Scorer that goes along with the custom Query class.

 What (if any) document ordering requirements does the Scorer class have for
 its skipTo(int docId) method?

 In particular, currently I'm sorting/returning the docIds in ascending
 order from my custom Query class. That can be expensive for large docId
 lists; is sorting necessary? It looks like skipTo() might expect the
 documents it gets to be in ascending order to behave correctly as part of a
 BooleanQuery, but I can't tell for sure from the doc.

 If the document list from my custom Scorer class does not have its document
 list in ascending order (e.g. 10, 80, 40, 60, 50) will whatever uses
 skipTo() potentially lose hits? If not, is there any performance concern
 with having the docIds unordered?


  
 ___
_ Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s
 user panel and lay it on us.
 http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Promblems with searching a field

2007-10-04 Thread Mikal sk�ren

Hi,
I am new to lucene and am currently having some problems searching an index.

so we make the index like this :

 doc.add(new Field(itno, item.getMMITNO(), Field.Store.YES, 
Field.Index.TOKENIZED ));


this runs ok the index looks like this :

  [stored/uncompressed,indexed,tokenizeditno:0002 ,

But when we try searching this field we get no hits (search is 0002, 
ItemIndexing.getAnalyzer()  == SimpleAnalyzer)


try {
   Hits hits =
indexSearcher.search(newQueryParser(itno,ItemIndexing.getAnalyzer()).parse(search));
  //Returns 0
   log.info(Size  + hits.length());
   List result = getResult(hits);
   indexSearcher.close();

   return result;
   } catch (Exception e) {


What are we doing wrong, any help would be appreciated..

_
Trangt om plassen? http://www.hotmail.com MSN Hotmail gir deg 250MB gratis 
lagringsplass



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Promblems with searching a field

2007-10-04 Thread Erick Erickson
It's hard to say, but two things will help you track this down.

1 get a copy of Luke to examine your index (which you may have already).
2 Query.toString is your friend. It'll show you exactly what the parsed
query looks like. It may be obvious when you see that output
what the problem is, but if not you can try moving the parsed
code into the search tab of Luke and glean more info.

Where did you get this data: itno:0002 ,? It's kind of
interesting that there are spaces AFTER the 2. What analyzer
did you use when you indexed it and can you guarantee that it's
the same analyzer that you used to parse the query?


And one aside. Opening and closing a searcher for each request is
very wasteful. Is closing your searcher just an artifact of cutting/pasting?
If not, you haven't opened the searcher in the snippet either G...

Best
Erick

On 10/4/07, Mikal skåren [EMAIL PROTECTED] wrote:

 Hi,
 I am new to lucene and am currently having some problems searching an
 index.

 so we make the index like this :

   doc.add(new Field(itno, item.getMMITNO(), Field.Store.YES,
 Field.Index.TOKENIZED ));

 this runs ok the index looks like this :

[stored/uncompressed,indexed,tokenizeditno:0002 ,

 But when we try searching this field we get no hits (search is 0002,
 ItemIndexing.getAnalyzer()  == SimpleAnalyzer)

 try {
 Hits hits =
 indexSearcher.search(newQueryParser(itno,ItemIndexing.getAnalyzer
 ()).parse(search));
//Returns 0
 log.info(Size  + hits.length());
 List result = getResult(hits);
 indexSearcher.close();

 return result;
 } catch (Exception e) {


 What are we doing wrong, any help would be appreciated..

 _
 Trangt om plassen? http://www.hotmail.com MSN Hotmail gir deg 250MB gratis
 lagringsplass


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




Help with Lucene Indexer crash recovery

2007-10-04 Thread vivek sar
Hi,

 We are using Lucene 2.3. The problem we are facing is quite a few
times if our application is stopped (killed or crash) while Indexer is
doing its job, the next time when we bring up the application the
Indexer fails to run with the following exception,

2007-10-04 12:29:53,089 ERROR [PS thread 10] IndexerJob - Full-text
indexer failed to index
java.io.FileNotFoundException:
/opt/manager/apps/conf/index/MasterIndex/_llb.cfs (No such file or
directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(Unknown Source)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.init(FSDirectory.java:506)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput.init(FSDirectory.java:536)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445)
at 
org.apache.lucene.index.CompoundFileReader.init(CompoundFileReader.java:70)
at 
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:181)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:167)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:131)
at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:206)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:610)

The search also doesn't work after this.

Looks like the index were left in some weird state (might be
corrupted). I was wondering if there is a tool or a way to repair the
indexes if we are not able to open them at run-time?

Thanks,
-vivek

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]