date:20090426

RE: Lucene 2.9 status (to port to Lucene.Net)

2009-04-26 Thread Uwe Schindler

Some status update:

  George, did you mean LUCENE-1516 below?  (LUCENE-1313 is a further
  improvement to near real-time search that's still being iterated on).
 
  In general I would say 2.9 seems to be in rather active development
 still
  ;)
 
  I too would love to hear about production/beta use of 2.9.  George
  maybe you should re-ask on java-user?
 
 Here! I updated www.pangaea.de to Lucene-trunk today (because of
 incomplete
 hashcode in TrieRangeQuery)... Works perfect, but I do not use the
 realtime
 parts. And 10 days before the same, no problems :-)
 
 Currently I rewrite parts of my code to Collector to go away from
 HitCollector (without score, so optimizations)! The reopen() and sorting
 is
 fine, almost no time is consumed for sorted searches after reopening
 indexes
 every 20 minutes with just some new and small segments with changed
 documents. No extra warming is needed.

I rewrote my collectors now to use the new API. Even through the number of
methods to overwrite in the new collector is 3 instead of 1, the code got
shorter (because the collect methods now can throw IOExceptions, great!!!).
What is also perfect is the way how to use a FieldCache: Just retrieve the
FieldCache array (e.g. getInts()) in the setNextReader() method and use the
value array in the collect() method with the docid as index. Now I am able
to e.g. retrieve cached values even after an index reopen without warming
(same with sort). In the past you had to use a cache array for the whole
index. The docBase is not used in my code, as I directly access the index
readers. So users now have both possibilities: use the supplied reader or
use the docBase as index offset into the searcher/main reader. Really cool!

The overhead of score calculation can be left out, if not needed, also cool!

One of my collectors is used retrieve the database ids (integers) for
building up a SQL IN (...) from the field cache based on the collected
hits. In the past this was very complicated, because FieldCache was slow
after reopening and getting stored fields (the ids) is also very slow (inner
search loop). Now it's just 10 lines of code and no score is involved.

The new code is working now in production at PANGAEA.

 Another change to be done here is Field.Store.COMPRESS and replace by
 manually compressed binary stored fields, but this is only to get rid of
 the
 deprecated warnings. But this cannot be done without complete reindexing.
 
 Uwe
 
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-1615) deprecated method used in fieldsReader / setOmitTf()

2009-04-26 Thread Eks Dev (JIRA)

deprecated method used in fieldsReader / setOmitTf()


 Key: LUCENE-1615
 URL: https://issues.apache.org/jira/browse/LUCENE-1615
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Eks Dev
Priority: Trivial


setOmitTf(boolean) is deprecated and should not be used by core classes. One 
place where it appears is FieldsReader , this patch fixes it. It was necessary 
to change Fieldable to AbstractField at two places, only local variables.   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1615) deprecated method used in fieldsReader / setOmitTf()

2009-04-26 Thread Eks Dev (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eks Dev updated LUCENE-1615:


Attachment: LUCENE-1615.patch

 deprecated method used in fieldsReader / setOmitTf()
 

 Key: LUCENE-1615
 URL: https://issues.apache.org/jira/browse/LUCENE-1615
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Eks Dev
Priority: Trivial
 Attachments: LUCENE-1615.patch


 setOmitTf(boolean) is deprecated and should not be used by core classes. One 
 place where it appears is FieldsReader , this patch fixes it. It was 
 necessary to change Fieldable to AbstractField at two places, only local 
 variables.   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1615) deprecated method used in fieldsReader / setOmitTf()

2009-04-26 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12702896#action_12702896
 ] 

Uwe Schindler commented on LUCENE-1615:
---

We know this problem, your fix seems ok (LUCENE-1561).
We did not want to change the Fieldable interface again, so we left omitTf in 
the interface but deprecated the methods in AbstractField  Co. In future, the 
Fieldable interface should be completely removed for 3.0 and this is a first 
step towards it! All references to Fieldable should be replaced by 
AbstractField or a better alternative that also has the type in it (see 
LUCENE-1597)

 deprecated method used in fieldsReader / setOmitTf()
 

 Key: LUCENE-1615
 URL: https://issues.apache.org/jira/browse/LUCENE-1615
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Eks Dev
Priority: Trivial
 Attachments: LUCENE-1615.patch


 setOmitTf(boolean) is deprecated and should not be used by core classes. One 
 place where it appears is FieldsReader , this patch fixes it. It was 
 necessary to change Fieldable to AbstractField at two places, only local 
 variables.   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1615) deprecated method used in fieldsReader / setOmitTf()

2009-04-26 Thread Eks Dev (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12702901#action_12702901
 ] 

Eks Dev commented on LUCENE-1615:
-

sure, replacing Fieldable is good,  just noticed quick win when cleaning-up 
deprecations from our code base... one step in a time 

 deprecated method used in fieldsReader / setOmitTf()
 

 Key: LUCENE-1615
 URL: https://issues.apache.org/jira/browse/LUCENE-1615
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Eks Dev
Priority: Trivial
 Attachments: LUCENE-1615.patch


 setOmitTf(boolean) is deprecated and should not be used by core classes. One 
 place where it appears is FieldsReader , this patch fixes it. It was 
 necessary to change Fieldable to AbstractField at two places, only local 
 variables.   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Lucene 2.9 status (to port to Lucene.Net)

2009-04-26 Thread Michael McCandless

This is great feedback on the new Collector API, Uwe.  Thanks!

It's awesome that you no longer have to warm your searchers... but be
careful when a large segment merge commits.

Did you hit any snags/problems/etc. that we should fix before releasing 2.9?

Mike

On Sun, Apr 26, 2009 at 9:54 AM, Uwe Schindler u...@thetaphi.de wrote:
 Some status update:

  George, did you mean LUCENE-1516 below?  (LUCENE-1313 is a further
  improvement to near real-time search that's still being iterated on).
 
  In general I would say 2.9 seems to be in rather active development
 still
  ;)
 
  I too would love to hear about production/beta use of 2.9.  George
  maybe you should re-ask on java-user?

 Here! I updated www.pangaea.de to Lucene-trunk today (because of
 incomplete
 hashcode in TrieRangeQuery)... Works perfect, but I do not use the
 realtime
 parts. And 10 days before the same, no problems :-)

 Currently I rewrite parts of my code to Collector to go away from
 HitCollector (without score, so optimizations)! The reopen() and sorting
 is
 fine, almost no time is consumed for sorted searches after reopening
 indexes
 every 20 minutes with just some new and small segments with changed
 documents. No extra warming is needed.

 I rewrote my collectors now to use the new API. Even through the number of
 methods to overwrite in the new collector is 3 instead of 1, the code got
 shorter (because the collect methods now can throw IOExceptions, great!!!).
 What is also perfect is the way how to use a FieldCache: Just retrieve the
 FieldCache array (e.g. getInts()) in the setNextReader() method and use the
 value array in the collect() method with the docid as index. Now I am able
 to e.g. retrieve cached values even after an index reopen without warming
 (same with sort). In the past you had to use a cache array for the whole
 index. The docBase is not used in my code, as I directly access the index
 readers. So users now have both possibilities: use the supplied reader or
 use the docBase as index offset into the searcher/main reader. Really cool!

 The overhead of score calculation can be left out, if not needed, also cool!

 One of my collectors is used retrieve the database ids (integers) for
 building up a SQL IN (...) from the field cache based on the collected
 hits. In the past this was very complicated, because FieldCache was slow
 after reopening and getting stored fields (the ids) is also very slow (inner
 search loop). Now it's just 10 lines of code and no score is involved.

 The new code is working now in production at PANGAEA.

 Another change to be done here is Field.Store.COMPRESS and replace by
 manually compressed binary stored fields, but this is only to get rid of
 the
 deprecated warnings. But this cannot be done without complete reindexing.

 Uwe


 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1594) Use source code specialization to maximize search performance

2009-04-26 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1594:
---

Attachment: LUCENE-1594.patch

Another iteration... this patch is very large because I'm including
all the generated classes.  Some changes:

  * I moved everying under a new contrib/spec

  * There is now a simple FastSearch class that you can use to run
searches.  This makes it very simple to try out -- if the
specializer can handle it, it will; else it falls back to
IndexSearcher's search methods.

  * Two-term boolean OR query is now covered

  * String (ord/val) search is now specialized

The code is still horrific and there are many cases not handled.  Very
early in the iterations still...
(reversed, multi-field, other query types, etc.).


 Use source code specialization to maximize search performance
 -

 Key: LUCENE-1594
 URL: https://issues.apache.org/jira/browse/LUCENE-1594
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Attachments: FastSearchTask.java, LUCENE-1594.patch, LUCENE-1594.patch


 Towards eeking absolute best search performance, and after seeing the
 Java ghosts in LUCENE-1575, I decided to build a simple prototype
 source code specializer for Lucene's searches.
 The idea is to write dynamic Java code, specialized to run a very
 specific query context (eg TermQuery, collecting top N by field, no
 filter, no deletions), compile that Java code, and run it.
 Here're the performance gains when compared to trunk:
 ||Query||Sort||Filt|Deletes||Scoring||Hits||QPS (base)||QPS (new)||%||
 |1|Date (long)|no|no|Track,Max|2561886|6.8|10.6|{color:green}55.9%{color}|
 |1|Date (long)|no|5%|Track,Max|2433472|6.3|10.5|{color:green}66.7%{color}|
 |1|Date (long)|25%|no|Track,Max|640022|5.2|9.9|{color:green}90.4%{color}|
 |1|Date (long)|25%|5%|Track,Max|607949|5.3|10.3|{color:green}94.3%{color}|
 |1|Date (long)|10%|no|Track,Max|256300|6.7|12.3|{color:green}83.6%{color}|
 |1|Date (long)|10%|5%|Track,Max|243317|6.6|12.6|{color:green}90.9%{color}|
 |1|Relevance|no|no|Track,Max|2561886|11.2|17.3|{color:green}54.5%{color}|
 |1|Relevance|no|5%|Track,Max|2433472|10.1|15.7|{color:green}55.4%{color}|
 |1|Relevance|25%|no|Track,Max|640022|6.1|14.1|{color:green}131.1%{color}|
 |1|Relevance|25%|5%|Track,Max|607949|6.2|14.4|{color:green}132.3%{color}|
 |1|Relevance|10%|no|Track,Max|256300|7.7|15.6|{color:green}102.6%{color}|
 |1|Relevance|10%|5%|Track,Max|243317|7.6|15.9|{color:green}109.2%{color}|
 |1|Title (string)|no|no|Track,Max|2561886|7.8|12.5|{color:green}60.3%{color}|
 |1|Title (string)|no|5%|Track,Max|2433472|7.5|11.1|{color:green}48.0%{color}|
 |1|Title (string)|25%|no|Track,Max|640022|5.7|11.2|{color:green}96.5%{color}|
 |1|Title (string)|25%|5%|Track,Max|607949|5.5|11.3|{color:green}105.5%{color}|
 |1|Title (string)|10%|no|Track,Max|256300|7.0|12.7|{color:green}81.4%{color}|
 |1|Title (string)|10%|5%|Track,Max|243317|6.7|13.2|{color:green}97.0%{color}|
 Those tests were run on a 19M doc wikipedia index (splitting each
 Wikipedia doc @ ~1024 chars), on Linux, Java 1.6.0_10
 But: it only works with TermQuery for now; it's just a start.
 It should be easy for others to run this test:
   * apply patch
   * cd contrib/benchmark
   * run python -u bench.py -delindex /path/to/index/with/deletes
 -nodelindex /path/to/index/without/deletes
 (You can leave off one of -delindex or -nodelindex and it'll skip
 those tests).
 For each test, bench.py generates a single Java source file that runs
 that one query; you can open
 contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/FastSearchTask.java
 to see it.  I'll attach an example.  It writes results.txt, in Jira
 table format, which you should be able to copy/paste back here.
 The specializer uses pretty much every search speedup I can think of
 -- the ones from LUCENE-1575 (to score or not, to maxScore or not),
 the ones suggested in the spinoff LUCENE-1593 (pre-fill w/ sentinels,
 don't use docID for tie breaking), LUCENE-1536 (random access
 filters).  It bypasses TermDocs and interacts directly with the
 IndexInput, and with BitVector for deletions.  It directly folds in
 the collector, if possible.  A filter if used must be random access,
 and is assumed to pre-multiply-in the deleted docs.
 Current status:
   * I only handle TermQuery.  I'd like to add others over time...
   * It can collect by score, or single field (with the 3 scoring
 options in LUCENE-1575).  It can't do reverse field sort nor
 multi-field sort now.
   * The auto-gen code (gen.py) is rather hideous.  It could use some
 serious refactoring, etc.; I think we could get it to the

new TokenStream api Question

2009-04-26 Thread eks dev


I am just looking into new TermAttribute usage and wonder what would be the 
best way to implement PrefixFilter that would filter out some Terms that have 
some prefix, 

something like this, where '-' represents my prefix:

  public final boolean incrementToken() throws IOException {
// the first word we found
while (input.incrementToken()) {
  int len = termAtt.termLength();
  
  if(len  0  termAtt.termBuffer()[0]!='-') //only length  0 and non LFs 
return true;
  // note: else we ignore it
}
// reached EOS 
return false;
  }

 



The question would be:

can I extend TermAttribute and add boolean startsWith(char c);

The point is speed and my code gets smaller.  
TermAttribute has one method called in termLength() and termBuffer() I do not 
understand (back compatibility, I guess)
  public int termLength() {
initTermBuffer(); // I'd like to avoid it...
return termLength;
  }


I'd like to get rid of initTermBuffer(), the first option is to *extend*  
TermAttribute code (but fields are private, so no help there) or can I 
implement my own MyTermAttribute (will Indexer know how to deal with it?) 

Must I extend TermAttribute or I can add my own?

thanks, 
eks




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: new TokenStream api Question

2009-04-26 Thread Uwe Schindler

There is one problem: if you extend TermAttribute, the class is different
(which is the key in the attributes list). So when you initialize the
TokenStream and do a

YourClass termAtt = (YourClass) addAttribute(YourClass.class)

...you create a new attribute. So one possibility would be to also specify
the instance and save the attribute by class (as key), but with your
instance. If you are the first one that creates the attribute (if it is a
token stream and not a filter it is ok, you will be the first, it adding the
attribute in the ctor), everything is ok. Register the attribute by yourself
(maybe we should add a specialized addAttribute, that can specify a instance
as default)?:

YourClass termAtt = new YourClass();
attributes.put(TermAttribute.class, termAtt);

In this case, for the indexer it is a standard TermAttribute, but you can
more with it.

Replacing TermAttribute by an own class is not possible, as the indexer will
get a ClassCastException when using the instance retrieved with
getAttribute(TermAttribute.class).
 
Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: eks dev [mailto:eks...@yahoo.co.uk]
 Sent: Sunday, April 26, 2009 10:39 PM
 To: java-dev@lucene.apache.org
 Subject: new TokenStream api Question
 
 
 I am just looking into new TermAttribute usage and wonder what would be
 the best way to implement PrefixFilter that would filter out some Terms
 that have some prefix,
 
 something like this, where '-' represents my prefix:
 
   public final boolean incrementToken() throws IOException {
 // the first word we found
 while (input.incrementToken()) {
   int len = termAtt.termLength();
 
   if(len  0  termAtt.termBuffer()[0]!='-') //only length  0 and
 non LFs
 return true;
   // note: else we ignore it
 }
 // reached EOS
 return false;
   }
 
 
 
 
 
 The question would be:
 
 can I extend TermAttribute and add boolean startsWith(char c);
 
 The point is speed and my code gets smaller.
 TermAttribute has one method called in termLength() and termBuffer() I do
 not understand (back compatibility, I guess)
   public int termLength() {
 initTermBuffer(); // I'd like to avoid it...
 return termLength;
   }
 
 
 I'd like to get rid of initTermBuffer(), the first option is to *extend*
 TermAttribute code (but fields are private, so no help there) or can I
 implement my own MyTermAttribute (will Indexer know how to deal with it?)
 
 Must I extend TermAttribute or I can add my own?
 
 thanks,
 eks
 
 
 
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: new TokenStream api Question

2009-04-26 Thread eks dev


thanks Uwe, 

looks like nice use case to cover, but I was/am not sure what would be the way 
around it? 
Your propsal sounds OK to me, but I am not familiar enough with this API to say 
for sure... 
I guess we would need to make this put safer to prevent people making silly 
mistakes defining class from completely different objects.  

For this particular case, I would argue it makes sense to add methods you 
usually find in String like classes to TermAttribute, like 
starts(ends)Width(char)... but this sounds wrong, motivates duplication of code.

original motivation is to get char[] and length of the TermAttribute quickly, 
hmm maybe simply adding:
char[] rawTermBuffer(){
return termBuffer;
}

the same for length...

with javadoc feel free to shoot yourself :)




 




- Original Message 
 From: Uwe Schindler u...@thetaphi.de
 To: java-dev@lucene.apache.org
 Sent: Sunday, 26 April, 2009 23:03:06
 Subject: RE: new TokenStream api Question
 
 There is one problem: if you extend TermAttribute, the class is different
 (which is the key in the attributes list). So when you initialize the
 TokenStream and do a
 
 YourClass termAtt = (YourClass) addAttribute(YourClass.class)
 
 ...you create a new attribute. So one possibility would be to also specify
 the instance and save the attribute by class (as key), but with your
 instance. If you are the first one that creates the attribute (if it is a
 token stream and not a filter it is ok, you will be the first, it adding the
 attribute in the ctor), everything is ok. Register the attribute by yourself
 (maybe we should add a specialized addAttribute, that can specify a instance
 as default)?:
 
 YourClass termAtt = new YourClass();
 attributes.put(TermAttribute.class, termAtt);
 
 In this case, for the indexer it is a standard TermAttribute, but you can
 more with it.
 
 Replacing TermAttribute by an own class is not possible, as the indexer will
 get a ClassCastException when using the instance retrieved with
 getAttribute(TermAttribute.class).
 
 Uwe
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
  -Original Message-
  From: eks dev [mailto:eks...@yahoo.co.uk]
  Sent: Sunday, April 26, 2009 10:39 PM
  To: java-dev@lucene.apache.org
  Subject: new TokenStream api Question
  
  
  I am just looking into new TermAttribute usage and wonder what would be
  the best way to implement PrefixFilter that would filter out some Terms
  that have some prefix,
  
  something like this, where '-' represents my prefix:
  
public final boolean incrementToken() throws IOException {
  // the first word we found
  while (input.incrementToken()) {
int len = termAtt.termLength();
  
if(len  0  termAtt.termBuffer()[0]!='-') //only length  0 and
  non LFs
  return true;
// note: else we ignore it
  }
  // reached EOS
  return false;
}
  
  
  
  
  
  The question would be:
  
  can I extend TermAttribute and add boolean startsWith(char c);
  
  The point is speed and my code gets smaller.
  TermAttribute has one method called in termLength() and termBuffer() I do
  not understand (back compatibility, I guess)
public int termLength() {
  initTermBuffer(); // I'd like to avoid it...
  return termLength;
}
  
  
  I'd like to get rid of initTermBuffer(), the first option is to *extend*
  TermAttribute code (but fields are private, so no help there) or can I
  implement my own MyTermAttribute (will Indexer know how to deal with it?)
  
  Must I extend TermAttribute or I can add my own?
  
  thanks,
  eks
  
  
  
  
  -
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: new TokenStream api Question

2009-04-26 Thread eks dev


regardless of it, I really do not understand the  call to initTermBuffer() in 
termLength()? What is it good for?

this method will return the same value in both cases, zero,  I see no harm in 
removing it?

  /** Return number of valid characters (length of the term)
   *  in the termBuffer array. */
  public int termLength() {
initTermBuffer();
return termLength;
  }



- Original Message 
 From: Uwe Schindler u...@thetaphi.de
 To: java-dev@lucene.apache.org
 Sent: Sunday, 26 April, 2009 23:03:06
 Subject: RE: new TokenStream api Question
 
 There is one problem: if you extend TermAttribute, the class is different
 (which is the key in the attributes list). So when you initialize the
 TokenStream and do a
 
 YourClass termAtt = (YourClass) addAttribute(YourClass.class)
 
 ...you create a new attribute. So one possibility would be to also specify
 the instance and save the attribute by class (as key), but with your
 instance. If you are the first one that creates the attribute (if it is a
 token stream and not a filter it is ok, you will be the first, it adding the
 attribute in the ctor), everything is ok. Register the attribute by yourself
 (maybe we should add a specialized addAttribute, that can specify a instance
 as default)?:
 
 YourClass termAtt = new YourClass();
 attributes.put(TermAttribute.class, termAtt);
 
 In this case, for the indexer it is a standard TermAttribute, but you can
 more with it.
 
 Replacing TermAttribute by an own class is not possible, as the indexer will
 get a ClassCastException when using the instance retrieved with
 getAttribute(TermAttribute.class).
 
 Uwe
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
  -Original Message-
  From: eks dev [mailto:eks...@yahoo.co.uk]
  Sent: Sunday, April 26, 2009 10:39 PM
  To: java-dev@lucene.apache.org
  Subject: new TokenStream api Question
  
  
  I am just looking into new TermAttribute usage and wonder what would be
  the best way to implement PrefixFilter that would filter out some Terms
  that have some prefix,
  
  something like this, where '-' represents my prefix:
  
public final boolean incrementToken() throws IOException {
  // the first word we found
  while (input.incrementToken()) {
int len = termAtt.termLength();
  
if(len  0  termAtt.termBuffer()[0]!='-') //only length  0 and
  non LFs
  return true;
// note: else we ignore it
  }
  // reached EOS
  return false;
}
  
  
  
  
  
  The question would be:
  
  can I extend TermAttribute and add boolean startsWith(char c);
  
  The point is speed and my code gets smaller.
  TermAttribute has one method called in termLength() and termBuffer() I do
  not understand (back compatibility, I guess)
public int termLength() {
  initTermBuffer(); // I'd like to avoid it...
  return termLength;
}
  
  
  I'd like to get rid of initTermBuffer(), the first option is to *extend*
  TermAttribute code (but fields are private, so no help there) or can I
  implement my own MyTermAttribute (will Indexer know how to deal with it?)
  
  Must I extend TermAttribute or I can add my own?
  
  thanks,
  eks
  
  
  
  
  -
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-26 Thread Eks Dev (JIRA)

add one setter for start and end offset to OffsetAttribute
--

 Key: LUCENE-1616
 URL: https://issues.apache.org/jira/browse/LUCENE-1616
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Reporter: Eks Dev
Priority: Trivial


add OffsetAttribute. setOffset(startOffset, endOffset);

trivial change, no JUnit needed

Changed CharTokenizer to use it

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-26 Thread Eks Dev (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eks Dev updated LUCENE-1616:


Attachment: LUCENE-1616.patch

 add one setter for start and end offset to OffsetAttribute
 --

 Key: LUCENE-1616
 URL: https://issues.apache.org/jira/browse/LUCENE-1616
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Reporter: Eks Dev
Priority: Trivial
 Attachments: LUCENE-1616.patch


 add OffsetAttribute. setOffset(startOffset, endOffset);
 trivial change, no JUnit needed
 Changed CharTokenizer to use it

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: Lucene 2.9 status (to port to Lucene.Net)

[jira] Created: (LUCENE-1615) deprecated method used in fieldsReader / setOmitTf()

[jira] Updated: (LUCENE-1615) deprecated method used in fieldsReader / setOmitTf()

[jira] Commented: (LUCENE-1615) deprecated method used in fieldsReader / setOmitTf()

[jira] Commented: (LUCENE-1615) deprecated method used in fieldsReader / setOmitTf()

Re: Lucene 2.9 status (to port to Lucene.Net)

[jira] Updated: (LUCENE-1594) Use source code specialization to maximize search performance

new TokenStream api Question

RE: new TokenStream api Question

Re: new TokenStream api Question

Re: new TokenStream api Question

[jira] Created: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

[jira] Updated: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

13 matches

Site Navigation

Mail list logo

Footer information