Re: WildcardTermEnum skipping terms containing numbers?!

2004-11-20 Thread Sanyi
 why reindex?

Well, since I had different experiences with different analyzers I've tried, I 
thougt that this
problem must origin from either the indexing or a lucene bug.

 As stated at the end of my mail, I'd expect that to skip the
 first term in the enum.

Yes, this must be a problem for me, since I took this sentence from the manual 
as the starting
point:
Returns the current Term in the enumeration. Initially invalid, valid after 
next() called for the
first time.

So, it seems that it was a bug in the docs, not the api itself.

 Is that, what you miss or do you loose
 more than one term?

It seemed to me that it was skipping more stuff, but I'd better not say this, 
since I didn't know
that the term is valid even before the first next(), so I could've been 
misleaded by my own
chaotic experiences.

Since my code was completly restructured since then, I don't have all the 
surrounging stuff needed
for further testing.

Anyway, we've found a docs bug thanks to you and my code is cleaner and better 
the other way.

Thanx!





__ 
Do you Yahoo!? 
The all-new My Yahoo! - Get yours free! 
http://my.yahoo.com 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: WildcardTermEnum skipping terms containing numbers?!

2004-11-19 Thread Morus Walter
Sanyi writes:
  If there's a bug, it should be tracked down, not worked around...
 
 Sure, but I'm working with 20million records and it takes about 25 hours to 
 re-index, so I'm
 looking for ways that doesn't require reindexing.
 
why reindex?

 My code was:
 
   WildcardTermEnum wcenum = new WildcardTermEnum(reader, term);
   
   while (wcenum.next()) {
   terms.add(new WeightedTerm(termgroup,wcenum.term().text()));
   //System.out.println(wcenum.term().text());
   }
 
 And it skipped lots of things it shouldn't have skipped.

As stated at the end of my mail, I'd expect that to skip the first
term in the enum.
Is that, what you miss or do you loose more than one term?

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: WildcardTermEnum skipping terms containing numbers?!

2004-11-18 Thread Morus Walter
Sanyi writes:
 Enumerating the terms using WildcardTermEnum and an IndexReader seems to be 
 too buggy to use.

If there's a bug, it should be tracked down, not worked around...

But it looks ok to me:

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.*;
import org.apache.lucene.document.*;
import org.apache.lucene.store.*;
import org.apache.lucene.search.*;

public class LuceneTest {

public static void main(String[] args) throws Exception {

RAMDirectory dir = new RAMDirectory();

IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(), true);

Document doc = new Document();

doc.add(new Field(foo, blabla etc.. etc... c0la c0ca caca ccca, 
true, true, true));

writer.addDocument(doc);

writer.close();

IndexReader reader = IndexReader.open(dir);

WildcardTermEnum enum = new WildcardTermEnum(reader, new Term(foo, 
c??a));

do {
System.out.println(enum.term().text());
} while ( enum.next() );

WildcardQuery wq = new WildcardQuery(new Term(foo, c??a));

Query q = wq.rewrite(reader);

System.out.println(q.toString());

reader.close();
}
}

gives
c0ca
c0la
caca
ccca
foo:c0ca foo:c0la foo:caca foo:ccca

The only bug I see is in the docs, that claims enum.term() to be invalid
before the first call to next() which does not seem to be the case.
So if you use
while ( enum.next() ) {
...
}
you will loose the first term, whatever it is.
Looking at the sources I find that this behaviour is shared by 
FuzzyTermEnum. Both implementations of the abstract FilteredTermEnum class
call setEnum at the end of the constructor, which prepares the first
result.

Morus


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: WildcardTermEnum skipping terms containing numbers?!

2004-11-17 Thread Yonik Seeley
test



__ 
Do you Yahoo!? 
The all-new My Yahoo! - Get yours free! 
http://my.yahoo.com 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: WildcardTermEnum skipping terms containing numbers?!

2004-11-17 Thread Sanyi
Enumerating the terms using WildcardTermEnum and an IndexReader seems to be too 
buggy to use.
I'm now reimplementing my code using WildcardTermEnum.wildcardEquals which 
seems to be better so
far.

--- Sanyi [EMAIL PROTECTED] wrote:

 Hi!
 
 I have following problem with 1.4.2:
 I'm searching for c?ca (using StandardAnalyzer) and one of the hits looks 
 something like this:
 blabla c0ca c0la etc.. etc...
 (those big o-s are zero characters)
 Now, I'm enumerating the terms using WildcardTermEnum and all I get is:
 
 caca
 ccca
 ceca
 cica
 coca
 crca
 csca
 cuca
 cyca
 
 It doesn't know about c0ca at all.
 Is there any solution to come over this problem?
 
 Thanks,
 Sanyi
 
 
   
 __ 
 Do you Yahoo!? 
 The all-new My Yahoo! - Get yours free! 
 http://my.yahoo.com 
  
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 




__ 
Do you Yahoo!? 
Meet the all-new My Yahoo! - Try it today! 
http://my.yahoo.com 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]