[jira] Commented: (LUCENE-1187) Things to be done now that Filter is independent from BitSet

2008-03-14 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578626#action_12578626
 ] 

Michael Busch commented on LUCENE-1187:
---

{quote}
Test that now fails with ChainedFilter.
{quote}

The reason apparently is that the core moved from BitSets to OpenBitSets,
whereas the contrib packages haven't.

If we change the contrib packages to also use OpenBitSets, then this is 
still not completely backwards compatible. For example, if a user upgrades
to 2.4, uses a ChainedFilter to combine a 2.4 core filter with their own 
custom Filter that is based on 2.3 and thus uses a BitSet, then it won't
work. So a simple drop-in replacement with the new lucene jar would not
be possible, the user would have to change their own filters.

Maybe we should introduce a DocIdSetFactory in the core? For 
backwards compatibility a factory that produces BitSets can be used,
for speed one that creates OpenBitSets. Thoughts?

> Things to be done now that Filter is independent from BitSet
> 
>
> Key: LUCENE-1187
> URL: https://issues.apache.org/jira/browse/LUCENE-1187
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: ChainedFilterAndCachingFilterTest.patch, 
> javadocsZero2Match.patch
>
>
> (Aside: where is the documentation on how to mark up text in jira comments?)
> The following things are left over after LUCENE-584 :
> For Lucene 3.0  Filter.bits() will have to be removed.
> There is a CHECKME in IndexSearcher about using ConjunctionScorer to have the 
> boolean behaviour of a Filter.
> I have not looked into Filter caching yet, but I suppose there will be some 
> room for improvement there.
> Iirc the current core has moved to use OpenBitSetFilter and that is probably 
> what is being cached.
> In some cases it might be better to cache a SortedVIntList instead.
> Boolean logic on DocIdSetIterator is already available for Scorers (that 
> inherit from DocIdSetIterator) in the search package. This is currently 
> implemented by ConjunctionScorer, DisjunctionSumScorer,
> ReqOptSumScorer and ReqExclScorer.
> Boolean logic on BitSets is available in contrib/misc and contrib/queries
> DisjunctionSumScorer calls score() on its subscorers before the score value 
> actually needed.
> This could be a reason to introduce a DisjunctionDocIdSetIterator, perhaps as 
> a superclass of DisjunctionSumScorer.
> To fully implement non scoring queries a TermDocIdSetIterator will be needed, 
> perhaps as a superclass of TermScorer.
> The javadocs in org.apache.lucene.search using matching vs non-zero score:
> I'll investigate this soon, and provide a patch when necessary.
> An early version of the patches of LUCENE-584 contained a class Matcher,
> that differs from the current DocIdSet in that Matcher has an explain() 
> method.
> It remains to be seen whether such a Matcher could be useful between
> DocIdSet and Scorer.
> The semantics of scorer.skipTo(scorer.doc()) was discussed briefly.
> This was also discussed at another issue recently, so perhaps it is wortwhile 
> to open a separate issue for this.
> Skipping on a SortedVIntList is done using linear search, this could be 
> improved by adding multilevel skiplist info much like in the Lucene index for 
> documents containing a term.
> One comment by me of 3 Dec 2008:
> A few complete (test) classes are deprecated, it might be good to add the 
> target release for removal there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1202) Clover setup currently has some problems

2008-03-14 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated LUCENE-1202:
-

Attachment: LUCENE-1202.patch

builds on previous patch to fix contrib/db (which i should note: also 
centralized teh clover "db" and reports so they were in one place even if you 
ran clover on individual contribs) to also fix it so the classpath for runing 
the contrib tests can find clover.

without this patch, contrib tests don't include ${java.class.path} (the core 
tests did) ... this was causing a problem because ${java.class.path} is where i 
had the clover jar and dependencies.

i'm not sure if we want to change this to add an explicit "clover.path" 
property that people must set saying explicitly where they want the build 
system to look for clover ... that seems like a cleaner way to ensure the 
contrib tests don't include stuff in the junit classpath that they shouldn't - 
but it may not be a big deal considering the core tests have always worked this 
way.

comments?

Grant: still need clarification on your comments about hudson...

bq. ...There is also a bit of a change on Hudson during the migration to the 
new servers that needs to be ironed out. 

> Clover setup currently has some problems
> 
>
> Key: LUCENE-1202
> URL: https://issues.apache.org/jira/browse/LUCENE-1202
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Hoss Man
> Attachments: LUCENE-1202.db-contrib-instrumentation.patch, 
> LUCENE-1202.patch
>
>
> (tracking as a bug before it get lost in email...
>   
> http://www.nabble.com/Clover-reports-missing-from-hudson--to15510616.html#a15510616
> )
> The clover setup for Lucene currently has some problems, 3 i think...
> 1) instrumentation fails on contrib/db/ because it contains java packages the 
> ASF Clover lscence doesn't allow instrumentation of.  i have a patch for this.
> 2) running instrumented contrib tests for other contribs produce strange 
> errors...
> {{monospaced}}
> [junit] Testsuite: org.apache.lucene.analysis.el.GreekAnalyzerTest
> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.126 sec
> [junit]
> [junit] - Standard Error -
> [junit] [CLOVER] FATAL ERROR: Clover could not be initialised. Are you 
> sure you have Clover
> in the runtime classpath? (class
> java.lang.NoClassDefFoundError:com_cenqua_clover/CloverVersionInfo)
> [junit] -  ---
> [junit] Testcase: 
> testAnalyzer(org.apache.lucene.analysis.el.GreekAnalyzerTest):Caused
> an ERROR
> [junit] com_cenqua_clover/g
> [junit] java.lang.NoClassDefFoundError: com_cenqua_clover/g
> [junit] at 
> org.apache.lucene.analysis.el.GreekAnalyzer.(GreekAnalyzer.java:157)
> [junit] at
> org.apache.lucene.analysis.el.GreekAnalyzerTest.testAnalyzer(GreekAnalyzerTest.java:60)
> [junit]
> [junit]
> [junit] Test org.apache.lucene.analysis.el.GreekAnalyzerTest FAILED
> {{monospaced}}
> ...i'm not sure what's going on here.  the error seems to happen both when
> trying to run clover on just a single contrib, or when doing the full
> build ... i suspect there is an issue with the way the batchtests fork
> off, but I can't see why it would only happen to contribs (the regular
> tests fork as well)
> 3) according to Grant...
> {{quote}}
> ...There is also a bit of a change on Hudson during the migration to the new 
> servers that needs to be ironed  out. 
> {{quote}}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1228) IndexWriter.commit() does not update the index version

2008-03-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578649#action_12578649
 ] 

Michael McCandless commented on LUCENE-1228:


{quote}
Does SegmentInfos really need both "version" and "generation"? Is "generation" 
sufficient?
{quote}
I believe they are in fact redundant.  I tested this with a small change to 
just return generation when getVersion is called and all tests pass.  I'll open 
a new issue.

> IndexWriter.commit()  does not update the index version
> ---
>
> Key: LUCENE-1228
> URL: https://issues.apache.org/jira/browse/LUCENE-1228
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.4
>Reporter: Doron Cohen
>Assignee: Doron Cohen
> Attachments: lucene-1228-commit-reopen.patch
>
>
> IndexWriter.commit() can update the index *version* and *generation* but the 
> update of *version* is lost.
> As result added documents are not seen by IndexReader.reopen().
> (There might be other side effects that I am not aware of).
> The fix is 1 line - update also the version in 
> SegmentsInfo.updateGeneration().
> (Finding this line involved more lines though... :-) )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1232) Use segments generation instead of version

2008-03-14 Thread Michael McCandless (JIRA)
Use segments generation instead of version
--

 Key: LUCENE-1232
 URL: https://issues.apache.org/jira/browse/LUCENE-1232
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3.1, 2.3, 2.2, 2.1
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.4


Right now the segments file stores generation, a long starting with 0
that increments by 1 with each commit, and version, a long starting
with System.currentTimeMillis() that also increments by 1 with each
commit.

I think they are redundant so we can replace all methods/uses of
version with generation instead.

Spinoff from LUCENE-1228.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1187) Things to be done now that Filter is independent from BitSet

2008-03-14 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578656#action_12578656
 ] 

Eks Dev commented on LUCENE-1187:
-

Michael, 
I do not think we need to add Factory (for this particular reason), DocIdSet 
type should not be assumed as we could come up with smart ways to select 
optimal Filter representation depending on doc-id distribution, size... 

The only problem we have with is that contrib classes, ChainedFilter and 
BooleanFilter assume BitSet. 
And the solution for this would be to add just a few methods to the DocIdSet 
that are able to do AND/OR/NOT on DocIdSet[] using DocIdSetIterator()
e.g. 
DocIdSet or(DocIdSet[], int minimumShouldMatch);
DocIdSet or(DocIdSet[]);


Optimized code for these basic operations *already exists*, can be copied from 
Conjunction/Disjunction/ReqOpt/ReqExcl Scorer classes by just simply 
stripping-off scoring part.

with these utility methods in DocIdSet, rewriting ChainedFilter/BooleanFilter 
to work with DocIdSet (and that works on all implementations of 
Fileter/DocIdSet) is 10 minutes job... than, if needed this implementation can 
be  optimized to cover type specific cases. Imo, BoolenFilter is better bet, we 
do not need both of them.  

Unfortunately I do not have time to play with it next 3-4 weeks, but should be 
no more than 2 days work (remember, we have difficult part already done in 
Scorers). Having so much code duplication is not something really good, but we 
can then later "merge" these somehow.


> Things to be done now that Filter is independent from BitSet
> 
>
> Key: LUCENE-1187
> URL: https://issues.apache.org/jira/browse/LUCENE-1187
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: ChainedFilterAndCachingFilterTest.patch, 
> javadocsZero2Match.patch
>
>
> (Aside: where is the documentation on how to mark up text in jira comments?)
> The following things are left over after LUCENE-584 :
> For Lucene 3.0  Filter.bits() will have to be removed.
> There is a CHECKME in IndexSearcher about using ConjunctionScorer to have the 
> boolean behaviour of a Filter.
> I have not looked into Filter caching yet, but I suppose there will be some 
> room for improvement there.
> Iirc the current core has moved to use OpenBitSetFilter and that is probably 
> what is being cached.
> In some cases it might be better to cache a SortedVIntList instead.
> Boolean logic on DocIdSetIterator is already available for Scorers (that 
> inherit from DocIdSetIterator) in the search package. This is currently 
> implemented by ConjunctionScorer, DisjunctionSumScorer,
> ReqOptSumScorer and ReqExclScorer.
> Boolean logic on BitSets is available in contrib/misc and contrib/queries
> DisjunctionSumScorer calls score() on its subscorers before the score value 
> actually needed.
> This could be a reason to introduce a DisjunctionDocIdSetIterator, perhaps as 
> a superclass of DisjunctionSumScorer.
> To fully implement non scoring queries a TermDocIdSetIterator will be needed, 
> perhaps as a superclass of TermScorer.
> The javadocs in org.apache.lucene.search using matching vs non-zero score:
> I'll investigate this soon, and provide a patch when necessary.
> An early version of the patches of LUCENE-584 contained a class Matcher,
> that differs from the current DocIdSet in that Matcher has an explain() 
> method.
> It remains to be seen whether such a Matcher could be useful between
> DocIdSet and Scorer.
> The semantics of scorer.skipTo(scorer.doc()) was discussed briefly.
> This was also discussed at another issue recently, so perhaps it is wortwhile 
> to open a separate issue for this.
> Skipping on a SortedVIntList is done using linear search, this could be 
> improved by adding multilevel skiplist info much like in the Lucene index for 
> documents containing a term.
> One comment by me of 3 Dec 2008:
> A few complete (test) classes are deprecated, it might be good to add the 
> target release for removal there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Created: (LUCENE-1229) NGramTokenFilter optimization in query phase

2008-03-14 Thread Mathieu Lecarme

Hiroaki Kawai (JIRA) a écrit :

NGramTokenFilter optimization in query phase


 Key: LUCENE-1229
 URL: https://issues.apache.org/jira/browse/LUCENE-1229
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Reporter: Hiroaki Kawai


I found that NGramTokenFilter-ed token stream could be optimized in query.

A standard 1,2 NGramTokenFilter will generate a token stream from "abcde" as 
follows:
a ab b bc c cd d de e

When we index "abcde", we'll use all of the tokens.

But when we query, we only need:
ab cd de
  

I don't understand why you index something that you will not query?
Why don'y you use a  bigram?

M.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Created: (LUCENE-1229) NGramTokenFilter optimization inquery phase

2008-03-14 Thread Hiroaki Kawai
Thanks for your replay.

> > I found that NGramTokenFilter-ed token stream could be optimized in query.
> >
> > A standard 1,2 NGramTokenFilter will generate a token stream from "abcde" 
> > as follows:
> > a ab b bc c cd d de e
> >
> > When we index "abcde", we'll use all of the tokens.
> >
> > But when we query, we only need:
> > ab cd de
> >   
> I don't understand why you index something that you will not query?
> Why don'y you use a  bigram?

Good point. :-) Consider the following case:
1. We stored(indexed) "abcde"

2. We query with "a", and want "abcde" to be hit.

3. We query with "ab", and want "abcde" to be hit.

4. We query with "de", and want "abcde" to be hit.

5. Of cource, we query with "abcde", and want "abcde" to be hit.


I mean, whether the gram is really necessary to query or not is 
dependent on the query string. Required tokens might be differnt in 
index phase and query phase.

Of cource, you CAN query "abcde" with ALL of the tokens of
(a ab b bc c cd d de e). But, it is not necessary.
We can omit some tokens to test and search for query that include
"abcde".

Bigram, might work as well, but it can't hit 3 gram token in one 
index search, so we want to index 3gram token as well, for example.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Created: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2008-03-14 Thread eks dev
"A better way to do this is using payloads. By creating a "special" posting list
that has one posting with payload for each document you can "simulate" a column-
stride field. The performance is significantly better compared to stored fields,
however still not optimal. The reason is that for each document the freq value,
which is in this particular case always 1, has to be decoded, also one position
value, which is always 0, has to be loaded."

If we put this approach into 
http://wiki.apache.org/jakarta-lucene/FlexibleIndexing context, than one 
special case of it would remove performance obstacles  you have mentioned. 
Would it be easier to tackle these issues and have both problems fixed?
I am not very familiar with Lucene file formats, so please take this with a 
pinch of salt.

- Original Message 
From: Michael Busch (JIRA) <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Friday, 14 March, 2008 7:57:24 AM
Subject: [jira] Created: (LUCENE-1231) Column-stride fields (aka per-document 
Payloads)

Column-stride fields (aka per-document Payloads)


 Key: LUCENE-1231
 URL: https://issues.apache.org/jira/browse/LUCENE-1231
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.4


This new feature has been proposed and discussed here:
http://markmail.org/search/?q=per-document+payloads#query:per-document%20payloads+page:1+mid:jq4g5myhlvidw3oc+state:results

Currently it is possible in Lucene to store data as stored fields or as 
payloads.
Stored fields provide good performance if you want to load all fields for one
document, because this is an sequential I/O operation.

If you however want to load the data from one field for a large number of 
documents, then stored fields perform quite badly, because lot's of I/O seeks 
might have to be performed. 

A better way to do this is using payloads. By creating a "special" posting list
that has one posting with payload for each document you can "simulate" a column-
stride field. The performance is significantly better compared to stored fields,
however still not optimal. The reason is that for each document the freq value,
which is in this particular case always 1, has to be decoded, also one position
value, which is always 0, has to be loaded.

As a solution we want to add real column-stride fields to Lucene. A possible
format for the new data structure could look like this (CSD stands for column-
stride data, once we decide for a final name for this feature we can change 
this):

CSDList --> FixedLengthList |  
FixedLengthList --> ^SegSize 
VariableLengthList -->  
Payload --> Byte^PayloadLength 
PayloadLength --> VInt 
SkipList --> see frq.file

We distinguish here between the fixed length and the variable length cases. To
allow flexibility, Lucene could automatically pick the "right" data structure. 
This could work like this: When the DocumentsWriter writes a segment it checks 
whether all values of a field have the same length. If yes, it stores them as 
FixedLengthList, if not, then as VariableLengthList. When the SegmentMerger 
merges two or more segments it checks if all segments have a FixedLengthList 
with the same length for a column-stride field. If not, it writes a 
VariableLengthList to the new segment. 

Once this feature is implemented, we should think about making the column-
stride fields updateable, similar to the norms. This will be a very powerful
feature that can for example be used for low-latency tagging of documents.

Other use cases:
- replace norms
- allow to store boost values separately from norms
- as input for the FieldCache, thus providing significantly improved loading
performance (see LUCENE-831)

Things that need to be done here:
- decide for a name for this feature :) - I think "column-stride fields" was
liked better than "per-document payloads"
- Design an API for this feature. We should keep in mind here that these 
fields are supposed to be updateable.
- Define datastructures.

I would like to get this feature into 2.4. Feedback about the open questions
is very welcome so that we can finalize the design soon and start 
implementing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






  __
Sent from Yahoo! Mail.
The World's Favourite Email http://uk.docs.yahoo.com/nowyoucan.html


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Should Document.getFieldables really return null

2008-03-14 Thread Michael McCandless


I agree, this makes sense.  I'll commit it.  Thanks Stefan!

Except, the last one you list (getBinaryValue) I think should still  
return null if no field by that name exists?


Mike

Stefan Trcek wrote:


Hello

The 'Document.getFieldables(String name)' is documented to return  
'null'
in some cases (and really does, see the code below). However this  
makes

a penalty to the client, as code like this

Document doc = hits.doc(i);
for (Fieldable f: doc.getFieldables("somefield")) {
System.out.println(f.stringValue());
}

is wrong (no check on 'null'). For the client code it would be better,
if 'Document.getFieldables(String)' would return 'new Fieldable[0]'
instead (no NullPointerException).

If you needn't distinguish between null-ed arrays and arrays of zero
lenght (do you?), I suggest to never return 'null', but return an  
array

of size zero. If you don't trust the just-in-time compiler (concerning
performance), you may even define

private final static Fieldable[] EMPTY = new Fieldable[0];

and return 'EMPTY' at the (*) line. Same with

   public final Field[] getFields(String name) {
   public final String[] getValues(String name) {
   public final byte[][] getBinaryValues(String name) {
   public final byte[] getBinaryValue(String name) {

and maybe others.

Stefan

--- org.apache.lucene.document.Document.java -
   public Fieldable[] getFieldables(String name) {
 List result = new ArrayList();
 for (int i = 0; i < fields.size(); i++) {
   Fieldable field = (Fieldable)fields.get(i);
   if (field.name().equals(name)) {
 result.add(field);
   }
 }

 if (result.size() == 0)
(*)return null;

 return (Fieldable[])result.toArray(new Fieldable[result.size()]);
   }


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1233) Fix Document.getFieldables and others to never return null

2008-03-14 Thread Michael McCandless (JIRA)
Fix Document.getFieldables and others to never return null
--

 Key: LUCENE-1233
 URL: https://issues.apache.org/jira/browse/LUCENE-1233
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3.1, 2.3, 2.2, 2.1
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 2.4


Document.getFieldables (and other similar methods) returns null if there are no 
fields matching the name.  We can avoid NPE in consumers of this API if instead 
we return an empty array.

Spinoff from http://markmail.org/message/g2nzstmce4cnf3zj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



How to avoid byte[] allocation in Document.getBinaryValue(String name)

2008-03-14 Thread eks dev
I am looking for ideas on how I could pass my byte[] to  
Document.getBinaryValue(String name) in order to avoid allocation of new byte[] 
for each Field retrieved.

first idea I had was to add something like this in Document:
  
public final byte[] getBinaryValue(String name, byte[] myBuffer) {
for (int i=0; i < fields.size(); i++) {
  Fieldable field = (Fieldable)fields.get(i);
  if (field.name().equals(name) && (field.isBinary())){ 
if(field instanceof LazyField){ //HERE
return ((LazyField)field).binaryValue(myBuffer);
}
return field.binaryValue();
return null;
  }  

in that case binaryValue(myBuffer) would optionally reallocate my buffer. Here 
I need length of actually written byte segment (which is available in 
LazyField). So this approach went wrong... with some tweaking, this could work 
fine, but is totally dirty imo as it delegates far too many Field internals to 
the Document class.

Probably much better way would be to use tricks from TokenStream:

Field Document.getField(String fieldName, Field myField); 
in this scenario,  myField can be null and allocated or will be reused if not 
null ... (we have similar case in TokenStream now)

Can this work somehow? or someone sees some better way to reduce byte[] 
allocations when fetching binary fields?

This all  assumes we have LUCENE-1219 in place, as we need length(ad offset) of 
byte[]





  ___ 
Rise to the challenge for Sport Relief with Yahoo! For Good  

http://uk.promotions.yahoo.com/forgood/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1232) Use segments generation instead of version

2008-03-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1232:
---

Attachment: LUCENE-1232.patch

Attached patch.  I plan to commit in a day or two.

> Use segments generation instead of version
> --
>
> Key: LUCENE-1232
> URL: https://issues.apache.org/jira/browse/LUCENE-1232
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.1, 2.2, 2.3, 2.3.1
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.4
>
> Attachments: LUCENE-1232.patch
>
>
> Right now the segments file stores generation, a long starting with 0
> that increments by 1 with each commit, and version, a long starting
> with System.currentTimeMillis() that also increments by 1 with each
> commit.
> I think they are redundant so we can replace all methods/uses of
> version with generation instead.
> Spinoff from LUCENE-1228.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-1228) IndexWriter.commit() does not update the index version

2008-03-14 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-1228.
-

   Resolution: Fixed
Lucene Fields: [Patch Available]  (was: [Patch Available, New])

Committed.

> IndexWriter.commit()  does not update the index version
> ---
>
> Key: LUCENE-1228
> URL: https://issues.apache.org/jira/browse/LUCENE-1228
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.4
>Reporter: Doron Cohen
>Assignee: Doron Cohen
> Attachments: lucene-1228-commit-reopen.patch
>
>
> IndexWriter.commit() can update the index *version* and *generation* but the 
> update of *version* is lost.
> As result added documents are not seen by IndexReader.reopen().
> (There might be other side effects that I am not aware of).
> The fix is 1 line - update also the version in 
> SegmentsInfo.updateGeneration().
> (Finding this line involved more lines though... :-) )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-1233) Fix Document.getFieldables and others to never return null

2008-03-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1233.


Resolution: Fixed

Thanks Stefan!

> Fix Document.getFieldables and others to never return null
> --
>
> Key: LUCENE-1233
> URL: https://issues.apache.org/jira/browse/LUCENE-1233
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.1, 2.2, 2.3, 2.3.1
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 2.4
>
>
> Document.getFieldables (and other similar methods) returns null if there are 
> no fields matching the name.  We can avoid NPE in consumers of this API if 
> instead we return an empty array.
> Spinoff from http://markmail.org/message/g2nzstmce4cnf3zj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Should Document.getFieldables really return null

2008-03-14 Thread Stefan Trcek
On Friday 14 March 2008 11:46:42 Michael McCandless wrote:
> I agree, this makes sense.  I'll commit it.  Thanks Stefan!
>
> Except, the last one you list (getBinaryValue) I think should still
> return null if no field by that name exists?

Yes, you are right. Looking at the array notion made me somewhat sloppy.
This is just a single element represented as byte array and you opted to 
return null in that case in other methods, too.

Stefan

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1232) Use segments generation instead of version

2008-03-14 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-1232:


Attachment: check.version.vs.gen.diff

To test this I added to SegmentInfos a comparison of *current_generation* to 
*versionGap* where the latter is defined as *current_version - 
first_vald_version*. 
The values are compared at writing the segmentInfos, which is where both are 
updated.
Note: this path is applied on svn head - i.e. before/without the patch that 
gets rid of version.
All core tests pass except one: 
{noformat:title=TestIndexWriter.testAddIndexOnDiskFull()}
[junit] - Standard Output ---
[junit] At write (segments_4) ver=1205513263760 verGap=3 gen=4   WRONG 
!!
[junit] -  ---
[junit] Testcase: 
testAddIndexOnDiskFull(org.apache.lucene.index.TestIndexWriter):  Caused an 
ERROR
[junit] At write (segments_4) ver=1205513263760 verGap=3 gen=4   WRONG 
!!
[junit] java.lang.RuntimeException: At write (segments_4) ver=1205513263760 
verGap=3 gen=4   WRONG !!
[junit] at 
org.apache.lucene.index.SegmentInfos.compareVersionAndGeneration(SegmentInfos.java:360)
[junit] at 
org.apache.lucene.index.SegmentInfos.write(SegmentInfos.java:348)
[junit] at 
org.apache.lucene.index.SegmentInfos.commit(SegmentInfos.java:767)
[junit] at 
org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:4194)
[junit] at 
org.apache.lucene.index.IndexWriter.commitTransaction(IndexWriter.java:2530)
[junit] at 
org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3000)
[junit] at 
org.apache.lucene.index.TestIndexWriter.testAddIndexOnDiskFull(TestIndexWriter.java:294)
[junit]
[junit]
[junit] Test org.apache.lucene.index.TestIndexWriter FAILED
{noformat}
Mike, since all tests passed for you when disabling version I assume this 
failure is just related 
to deliberately hitting that full disk error, but I thought perhaps you'd like 
to take a look at that to make sure.

> Use segments generation instead of version
> --
>
> Key: LUCENE-1232
> URL: https://issues.apache.org/jira/browse/LUCENE-1232
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.1, 2.2, 2.3, 2.3.1
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.4
>
> Attachments: check.version.vs.gen.diff, LUCENE-1232.patch
>
>
> Right now the segments file stores generation, a long starting with 0
> that increments by 1 with each commit, and version, a long starting
> with System.currentTimeMillis() that also increments by 1 with each
> commit.
> I think they are redundant so we can replace all methods/uses of
> version with generation instead.
> Spinoff from LUCENE-1228.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1232) Use segments generation instead of version

2008-03-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578866#action_12578866
 ] 

Michael McCandless commented on LUCENE-1232:


I think this is OK: it looks like that test hit a disk full after
generation was incremented but before version was incremented, in
SegmentInfos.write.

> Use segments generation instead of version
> --
>
> Key: LUCENE-1232
> URL: https://issues.apache.org/jira/browse/LUCENE-1232
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.1, 2.2, 2.3, 2.3.1
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.4
>
> Attachments: check.version.vs.gen.diff, LUCENE-1232.patch
>
>
> Right now the segments file stores generation, a long starting with 0
> that increments by 1 with each commit, and version, a long starting
> with System.currentTimeMillis() that also increments by 1 with each
> commit.
> I think they are redundant so we can replace all methods/uses of
> version with generation instead.
> Spinoff from LUCENE-1228.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1232) Use segments generation instead of version

2008-03-14 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578870#action_12578870
 ] 

Doron Cohen commented on LUCENE-1232:
-

Yes you're right, thanks.

> Use segments generation instead of version
> --
>
> Key: LUCENE-1232
> URL: https://issues.apache.org/jira/browse/LUCENE-1232
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.1, 2.2, 2.3, 2.3.1
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.4
>
> Attachments: check.version.vs.gen.diff, LUCENE-1232.patch
>
>
> Right now the segments file stores generation, a long starting with 0
> that increments by 1 with each commit, and version, a long starting
> with System.currentTimeMillis() that also increments by 1 with each
> commit.
> I think they are redundant so we can replace all methods/uses of
> version with generation instead.
> Spinoff from LUCENE-1228.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1232) Use segments generation instead of version

2008-03-14 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578871#action_12578871
 ] 

Hoss Man commented on LUCENE-1232:
--

if i'm understanding this correctly: we're removing the concept of "version" 
from SegmentsInfo and leaving the concept of "generation" but in IndexReader we 
are preserving the term "version" but making it an alias for the current 
generation.

unless i'm missing something, a side affect of this will be that after 
upgrading modifying an existing index could result in reader.getVersion() and 
IndexReader.getCurrentVersion(Directory) returning lower numbers then before -- 
but 
the contract for getCurrentVersion has always suggested that version numbers 
will only ever increase

this doesn't really hurt me personally in anyway, but i can imagine some 
situations where this could screw people over (ie: code that tests if one 
version of an index is "newer" then another by having a higher version#)

i don't fully understand the relationship between "version" and "generation" in 
SegmentInfos (if this patch is making SegmentInfos.getVersion() return 
getGeneration then shouldn't is also remove/modify the reading/writing of 
"version" from the segments file?) but perhaps one way to prevent potential 
problems like the one mentioned above is if we define IndexReader "version" as 
the sum of the SegmentsInfos "version" and SegmentsInfos "generation" 


> Use segments generation instead of version
> --
>
> Key: LUCENE-1232
> URL: https://issues.apache.org/jira/browse/LUCENE-1232
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.1, 2.2, 2.3, 2.3.1
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.4
>
> Attachments: check.version.vs.gen.diff, LUCENE-1232.patch
>
>
> Right now the segments file stores generation, a long starting with 0
> that increments by 1 with each commit, and version, a long starting
> with System.currentTimeMillis() that also increments by 1 with each
> commit.
> I think they are redundant so we can replace all methods/uses of
> version with generation instead.
> Spinoff from LUCENE-1228.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-1230) Source release files missing the *.pom.template files

2008-03-14 Thread Michael Busch (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch resolved LUCENE-1230.
---

Resolution: Fixed

Committed to trunk & 2.3 branch.

> Source release files missing the *.pom.template files
> -
>
> Key: LUCENE-1230
> URL: https://issues.apache.org/jira/browse/LUCENE-1230
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.2, 2.3, 2.3.1
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 2.3.2, 2.4
>
> Attachments: lucene-1230.patch
>
>
> The source release files should contain the *.pom.template files, otherwise 
> it is not possible to build the maven artifacts using "ant 
> generate-maven-artifacts" from official release files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1232) Use segments generation instead of version

2008-03-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578886#action_12578886
 ] 

Michael McCandless commented on LUCENE-1232:



Hoss, you're right: if an app has stored away the previous result from
IndexReader.getVersion() of their index, does an upgrade to 2.4, and
then uses that previously stored version to compare with
IndexReader.getVersion(), the version would have gone backwards.

I think I can modify the patch such that only a newly created segments
file would switch to using generation as version, but a previously
opened and then newly committed segments file would retain the old
version.  Though ... I don't really like that approach because I don't
think we could ever remove that back-compatible code (a single index
can stay alive indefinitely).

Maybe it's best not to make any change here and live with the
[minor] redundancy?



> Use segments generation instead of version
> --
>
> Key: LUCENE-1232
> URL: https://issues.apache.org/jira/browse/LUCENE-1232
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.1, 2.2, 2.3, 2.3.1
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.4
>
> Attachments: check.version.vs.gen.diff, LUCENE-1232.patch
>
>
> Right now the segments file stores generation, a long starting with 0
> that increments by 1 with each commit, and version, a long starting
> with System.currentTimeMillis() that also increments by 1 with each
> commit.
> I think they are redundant so we can replace all methods/uses of
> version with generation instead.
> Spinoff from LUCENE-1228.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1232) Use segments generation instead of version

2008-03-14 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-1232:


Attachment: LUCENE-1232.dc.patch

I otoh liked this side effect of this change, i.e. that index version values 
will no longer be very large numbers, millis, but rather very readable 
numbers starting from 1.  But I didn't think of the problem Hoss pointed.

Anyhow I think that SegmentsInfo can be made a bit simpler by getting rid
of more 'version' related code - attached LUCENE-1232.dc.patch does this.
To prevent backwards compatibility issues it still writes and reads 0 (zero) 
for that version.
All tests pass.

> Use segments generation instead of version
> --
>
> Key: LUCENE-1232
> URL: https://issues.apache.org/jira/browse/LUCENE-1232
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.1, 2.2, 2.3, 2.3.1
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.4
>
> Attachments: check.version.vs.gen.diff, LUCENE-1232.dc.patch, 
> LUCENE-1232.patch
>
>
> Right now the segments file stores generation, a long starting with 0
> that increments by 1 with each commit, and version, a long starting
> with System.currentTimeMillis() that also increments by 1 with each
> commit.
> I think they are redundant so we can replace all methods/uses of
> version with generation instead.
> Spinoff from LUCENE-1228.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1232) Use segments generation instead of version

2008-03-14 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578894#action_12578894
 ] 

Doron Cohen commented on LUCENE-1232:
-

{quote}
Anyhow I think that SegmentsInfo can be made a bit simpler by getting rid f 
more 'version' related code
{quote}

I mean assuming the redundancy is removed.

> Use segments generation instead of version
> --
>
> Key: LUCENE-1232
> URL: https://issues.apache.org/jira/browse/LUCENE-1232
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.1, 2.2, 2.3, 2.3.1
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.4
>
> Attachments: check.version.vs.gen.diff, LUCENE-1232.dc.patch, 
> LUCENE-1232.patch
>
>
> Right now the segments file stores generation, a long starting with 0
> that increments by 1 with each commit, and version, a long starting
> with System.currentTimeMillis() that also increments by 1 with each
> commit.
> I think they are redundant so we can replace all methods/uses of
> version with generation instead.
> Spinoff from LUCENE-1228.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1232) Use segments generation instead of version

2008-03-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578897#action_12578897
 ] 

Michael McCandless commented on LUCENE-1232:


I like your new patch Doron.

But we still have the "backwards compatibility when someone saves the index 
version across an upgrade to 2.4" problem.  In theory someone could call 
getVersion(), store this into a database (say), upgrade, restart their app, 
pull the old version from the database, and compare it to the new getVersion() 
result.

Though, I think this would happen extremely rarely in practice?  I would expect 
the version is almost always only saved within the JVM, and not persisted, and 
is used to decide when to reopen a reader.  And since you'd have to shut down 
all readers & your writer in order to do an upgrade to 2.4, you will have 
re-opened the readers already.

> Use segments generation instead of version
> --
>
> Key: LUCENE-1232
> URL: https://issues.apache.org/jira/browse/LUCENE-1232
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.1, 2.2, 2.3, 2.3.1
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.4
>
> Attachments: check.version.vs.gen.diff, LUCENE-1232.dc.patch, 
> LUCENE-1232.patch
>
>
> Right now the segments file stores generation, a long starting with 0
> that increments by 1 with each commit, and version, a long starting
> with System.currentTimeMillis() that also increments by 1 with each
> commit.
> I think they are redundant so we can replace all methods/uses of
> version with generation instead.
> Spinoff from LUCENE-1228.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1232) Use segments generation instead of version

2008-03-14 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578904#action_12578904
 ] 

Yonik Seeley commented on LUCENE-1232:
--

A nice thing about version is that it has relatively high uniqueness.
If we use generation for version, one could blow away an index, rebuild it with 
slight changes, and get the same version number so version becomes less 
useful.

> Use segments generation instead of version
> --
>
> Key: LUCENE-1232
> URL: https://issues.apache.org/jira/browse/LUCENE-1232
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.1, 2.2, 2.3, 2.3.1
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.4
>
> Attachments: check.version.vs.gen.diff, LUCENE-1232.dc.patch, 
> LUCENE-1232.patch
>
>
> Right now the segments file stores generation, a long starting with 0
> that increments by 1 with each commit, and version, a long starting
> with System.currentTimeMillis() that also increments by 1 with each
> commit.
> I think they are redundant so we can replace all methods/uses of
> version with generation instead.
> Spinoff from LUCENE-1228.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-1232) Use segments generation instead of version

2008-03-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1232.


Resolution: Won't Fix

> Use segments generation instead of version
> --
>
> Key: LUCENE-1232
> URL: https://issues.apache.org/jira/browse/LUCENE-1232
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.1, 2.2, 2.3, 2.3.1
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.4
>
> Attachments: check.version.vs.gen.diff, LUCENE-1232.dc.patch, 
> LUCENE-1232.patch
>
>
> Right now the segments file stores generation, a long starting with 0
> that increments by 1 with each commit, and version, a long starting
> with System.currentTimeMillis() that also increments by 1 with each
> commit.
> I think they are redundant so we can replace all methods/uses of
> version with generation instead.
> Spinoff from LUCENE-1228.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1232) Use segments generation instead of version

2008-03-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578912#action_12578912
 ] 

Michael McCandless commented on LUCENE-1232:


That's a good point; because version seeds with System.currentTimeMillis() it 
increases even when you create a new index.

OK I think we should just leave version as is for now.

> Use segments generation instead of version
> --
>
> Key: LUCENE-1232
> URL: https://issues.apache.org/jira/browse/LUCENE-1232
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.1, 2.2, 2.3, 2.3.1
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.4
>
> Attachments: check.version.vs.gen.diff, LUCENE-1232.dc.patch, 
> LUCENE-1232.patch
>
>
> Right now the segments file stores generation, a long starting with 0
> that increments by 1 with each commit, and version, a long starting
> with System.currentTimeMillis() that also increments by 1 with each
> commit.
> I think they are redundant so we can replace all methods/uses of
> version with generation instead.
> Spinoff from LUCENE-1228.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-1232) Use segments generation instead of version

2008-03-14 Thread Doron Cohen
ok...

On Fri, Mar 14, 2008 at 10:00 PM, Michael McCandless (JIRA) <[EMAIL PROTECTED]>
wrote:

>
>[
> https://issues.apache.org/jira/browse/LUCENE-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578912#action_12578912]
>
> Michael McCandless commented on LUCENE-1232:
> 
>
> That's a good point; because version seeds with System.currentTimeMillis()
> it increases even when you create a new index.
>
> OK I think we should just leave version as is for now.
>
> > Use segments generation instead of version
> > --
> >
> > Key: LUCENE-1232
> > URL: https://issues.apache.org/jira/browse/LUCENE-1232
> > Project: Lucene - Java
> >  Issue Type: Improvement
> >  Components: Index
> >Affects Versions: 2.1, 2.2, 2.3, 2.3.1
> >Reporter: Michael McCandless
> >Assignee: Michael McCandless
> >Priority: Minor
> > Fix For: 2.4
> >
> > Attachments: check.version.vs.gen.diff, LUCENE-1232.dc.patch,
> LUCENE-1232.patch
> >
> >
> > Right now the segments file stores generation, a long starting with 0
> > that increments by 1 with each commit, and version, a long starting
> > with System.currentTimeMillis() that also increments by 1 with each
> > commit.
> > I think they are redundant so we can replace all methods/uses of
> > version with generation instead.
> > Spinoff from LUCENE-1228.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


[jira] Created: (LUCENE-1234) BoostingTermQuery's BoostingSpanScorer class should be protected instead of package access

2008-03-14 Thread Andi Vajda (JIRA)
BoostingTermQuery's BoostingSpanScorer class should be protected instead of 
package access
--

 Key: LUCENE-1234
 URL: https://issues.apache.org/jira/browse/LUCENE-1234
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.3.1
Reporter: Andi Vajda
Priority: Trivial


Currently, BoostingTermScorer, an inner class of BoostingTermQuery is not 
accessible from outside the search.payloads
making it difficult to write an extension of BoostingTermQuery. The other inner 
classes are protected already, as they should be.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1234) BoostingTermQuery's BoostingSpanScorer class should be protected instead of package access

2008-03-14 Thread Andi Vajda (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andi Vajda updated LUCENE-1234:
---

Attachment: patches-lucene-2.3.1

patch against lucene-2.3.1 sources

> BoostingTermQuery's BoostingSpanScorer class should be protected instead of 
> package access
> --
>
> Key: LUCENE-1234
> URL: https://issues.apache.org/jira/browse/LUCENE-1234
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.3.1
>Reporter: Andi Vajda
>Priority: Trivial
> Attachments: patches-lucene-2.3.1
>
>
> Currently, BoostingTermScorer, an inner class of BoostingTermQuery is not 
> accessible from outside the search.payloads
> making it difficult to write an extension of BoostingTermQuery. The other 
> inner classes are protected already, as they should be.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1234) BoostingTermQuery's BoostingSpanScorer class should be protected instead of package access

2008-03-14 Thread Andi Vajda (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578976#action_12578976
 ] 

Andi Vajda commented on LUCENE-1234:


The inaccessible class is called BoostingSpanScorer.
The method I'd to override there is the score() method.

> BoostingTermQuery's BoostingSpanScorer class should be protected instead of 
> package access
> --
>
> Key: LUCENE-1234
> URL: https://issues.apache.org/jira/browse/LUCENE-1234
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.3.1
>Reporter: Andi Vajda
>Priority: Trivial
> Attachments: patches-lucene-2.3.1
>
>
> Currently, BoostingTermScorer, an inner class of BoostingTermQuery is not 
> accessible from outside the search.payloads
> making it difficult to write an extension of BoostingTermQuery. The other 
> inner classes are protected already, as they should be.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Assigned: (LUCENE-1234) BoostingTermQuery's BoostingSpanScorer class should be protected instead of package access

2008-03-14 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned LUCENE-1234:
---

Assignee: Grant Ingersoll

> BoostingTermQuery's BoostingSpanScorer class should be protected instead of 
> package access
> --
>
> Key: LUCENE-1234
> URL: https://issues.apache.org/jira/browse/LUCENE-1234
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.3.1
>Reporter: Andi Vajda
>Assignee: Grant Ingersoll
>Priority: Trivial
> Attachments: patches-lucene-2.3.1
>
>
> Currently, BoostingTermScorer, an inner class of BoostingTermQuery is not 
> accessible from outside the search.payloads
> making it difficult to write an extension of BoostingTermQuery. The other 
> inner classes are protected already, as they should be.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-1234) BoostingTermQuery's BoostingSpanScorer class should be protected instead of package access

2008-03-14 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved LUCENE-1234.
-

   Resolution: Fixed
Lucene Fields: [New, Patch Available]  (was: [Patch Available, New])

> BoostingTermQuery's BoostingSpanScorer class should be protected instead of 
> package access
> --
>
> Key: LUCENE-1234
> URL: https://issues.apache.org/jira/browse/LUCENE-1234
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.3.1
>Reporter: Andi Vajda
>Assignee: Grant Ingersoll
>Priority: Trivial
> Attachments: patches-lucene-2.3.1
>
>
> Currently, BoostingTermScorer, an inner class of BoostingTermQuery is not 
> accessible from outside the search.payloads
> making it difficult to write an extension of BoostingTermQuery. The other 
> inner classes are protected already, as they should be.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1202) Clover setup currently has some problems

2008-03-14 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578992#action_12578992
 ] 

Grant Ingersoll commented on LUCENE-1202:
-

You expect me to remember something said that long ago?

I  _believe_ it has to do with where clover and other libraries are now 
located.  Before they were in ant/lib, now they are elsewhere.  When you commit 
these, I can look into that piece.

> Clover setup currently has some problems
> 
>
> Key: LUCENE-1202
> URL: https://issues.apache.org/jira/browse/LUCENE-1202
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Hoss Man
> Attachments: LUCENE-1202.db-contrib-instrumentation.patch, 
> LUCENE-1202.patch
>
>
> (tracking as a bug before it get lost in email...
>   
> http://www.nabble.com/Clover-reports-missing-from-hudson--to15510616.html#a15510616
> )
> The clover setup for Lucene currently has some problems, 3 i think...
> 1) instrumentation fails on contrib/db/ because it contains java packages the 
> ASF Clover lscence doesn't allow instrumentation of.  i have a patch for this.
> 2) running instrumented contrib tests for other contribs produce strange 
> errors...
> {{monospaced}}
> [junit] Testsuite: org.apache.lucene.analysis.el.GreekAnalyzerTest
> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.126 sec
> [junit]
> [junit] - Standard Error -
> [junit] [CLOVER] FATAL ERROR: Clover could not be initialised. Are you 
> sure you have Clover
> in the runtime classpath? (class
> java.lang.NoClassDefFoundError:com_cenqua_clover/CloverVersionInfo)
> [junit] -  ---
> [junit] Testcase: 
> testAnalyzer(org.apache.lucene.analysis.el.GreekAnalyzerTest):Caused
> an ERROR
> [junit] com_cenqua_clover/g
> [junit] java.lang.NoClassDefFoundError: com_cenqua_clover/g
> [junit] at 
> org.apache.lucene.analysis.el.GreekAnalyzer.(GreekAnalyzer.java:157)
> [junit] at
> org.apache.lucene.analysis.el.GreekAnalyzerTest.testAnalyzer(GreekAnalyzerTest.java:60)
> [junit]
> [junit]
> [junit] Test org.apache.lucene.analysis.el.GreekAnalyzerTest FAILED
> {{monospaced}}
> ...i'm not sure what's going on here.  the error seems to happen both when
> trying to run clover on just a single contrib, or when doing the full
> build ... i suspect there is an issue with the way the batchtests fork
> off, but I can't see why it would only happen to contribs (the regular
> tests fork as well)
> 3) according to Grant...
> {{quote}}
> ...There is also a bit of a change on Hudson during the migration to the new 
> servers that needs to be ironed  out. 
> {{quote}}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Assigned: (LUCENE-1202) Clover setup currently has some problems

2008-03-14 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man reassigned LUCENE-1202:


Assignee: Grant Ingersoll


I was hoping seeing it again would jog your memory  : )

i committed the changes to the build files, if the hudson problem was related 
to the classpath for clover this may magically solve that problem -- if not, 
just makesure whatever directory clover is in gets added to the CLASSPATH 
before running ant.

Committed revision 637344.

assigning to you to track the hudson config fiddling


> Clover setup currently has some problems
> 
>
> Key: LUCENE-1202
> URL: https://issues.apache.org/jira/browse/LUCENE-1202
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Grant Ingersoll
> Attachments: LUCENE-1202.db-contrib-instrumentation.patch, 
> LUCENE-1202.patch
>
>
> (tracking as a bug before it get lost in email...
>   
> http://www.nabble.com/Clover-reports-missing-from-hudson--to15510616.html#a15510616
> )
> The clover setup for Lucene currently has some problems, 3 i think...
> 1) instrumentation fails on contrib/db/ because it contains java packages the 
> ASF Clover lscence doesn't allow instrumentation of.  i have a patch for this.
> 2) running instrumented contrib tests for other contribs produce strange 
> errors...
> {{monospaced}}
> [junit] Testsuite: org.apache.lucene.analysis.el.GreekAnalyzerTest
> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.126 sec
> [junit]
> [junit] - Standard Error -
> [junit] [CLOVER] FATAL ERROR: Clover could not be initialised. Are you 
> sure you have Clover
> in the runtime classpath? (class
> java.lang.NoClassDefFoundError:com_cenqua_clover/CloverVersionInfo)
> [junit] -  ---
> [junit] Testcase: 
> testAnalyzer(org.apache.lucene.analysis.el.GreekAnalyzerTest):Caused
> an ERROR
> [junit] com_cenqua_clover/g
> [junit] java.lang.NoClassDefFoundError: com_cenqua_clover/g
> [junit] at 
> org.apache.lucene.analysis.el.GreekAnalyzer.(GreekAnalyzer.java:157)
> [junit] at
> org.apache.lucene.analysis.el.GreekAnalyzerTest.testAnalyzer(GreekAnalyzerTest.java:60)
> [junit]
> [junit]
> [junit] Test org.apache.lucene.analysis.el.GreekAnalyzerTest FAILED
> {{monospaced}}
> ...i'm not sure what's going on here.  the error seems to happen both when
> trying to run clover on just a single contrib, or when doing the full
> build ... i suspect there is an issue with the way the batchtests fork
> off, but I can't see why it would only happen to contribs (the regular
> tests fork as well)
> 3) according to Grant...
> {{quote}}
> ...There is also a bit of a change on Hudson during the migration to the new 
> servers that needs to be ironed  out. 
> {{quote}}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1235) NGramTokenizer optimization in query phase

2008-03-14 Thread Hiroaki Kawai (JIRA)
NGramTokenizer optimization in query phase
--

 Key: LUCENE-1235
 URL: https://issues.apache.org/jira/browse/LUCENE-1235
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Reporter: Hiroaki Kawai


As I described in LUCENE-1229, we can optimize token stream in query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1235) NGramTokenizer optimization in query phase

2008-03-14 Thread Hiroaki Kawai (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroaki Kawai updated LUCENE-1235:
--

Attachment: NGramTokenizer.patch

NGramTokenizer.patch includes LUCENE-1227, LUCENE-1225.

> NGramTokenizer optimization in query phase
> --
>
> Key: LUCENE-1235
> URL: https://issues.apache.org/jira/browse/LUCENE-1235
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Reporter: Hiroaki Kawai
> Attachments: NGramTokenizer.patch
>
>
> As I described in LUCENE-1229, we can optimize token stream in query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Summer of Code idea for lucene

2008-03-14 Thread Ian Holsman

If no one objects (I don't think it's too late)

would you mind a GSOC project to implement BM25 relevancy/scoring?


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1236) EdgeNGram* documentation improvement

2008-03-14 Thread Hiroaki Kawai (JIRA)
EdgeNGram* documentation improvement


 Key: LUCENE-1236
 URL: https://issues.apache.org/jira/browse/LUCENE-1236
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Reporter: Hiroaki Kawai
Priority: Trivial
 Attachments: EdgeNGram.patch

To clarify what "edge" means, I added some description. That edge means the 
beggining edge of a term or ending edge of a term.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1236) EdgeNGram* documentation improvement

2008-03-14 Thread Hiroaki Kawai (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroaki Kawai updated LUCENE-1236:
--

Attachment: EdgeNGram.patch

> EdgeNGram* documentation improvement
> 
>
> Key: LUCENE-1236
> URL: https://issues.apache.org/jira/browse/LUCENE-1236
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Reporter: Hiroaki Kawai
>Priority: Trivial
> Attachments: EdgeNGram.patch
>
>
> To clarify what "edge" means, I added some description. That edge means the 
> beggining edge of a term or ending edge of a term.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1236) EdgeNGram* documentation improvement

2008-03-14 Thread [EMAIL PROTECTED] (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579015#action_12579015
 ] 

[EMAIL PROTECTED] commented on LUCENE-1236:
-

Dear sender,

This domain is no longer in use. Please resend your email to the new 
@meltwater.com address.

If this is a support request, please resend your email to [EMAIL PROTECTED]

Best regards,
Meltwater Group


> EdgeNGram* documentation improvement
> 
>
> Key: LUCENE-1236
> URL: https://issues.apache.org/jira/browse/LUCENE-1236
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Reporter: Hiroaki Kawai
>Priority: Trivial
> Attachments: EdgeNGram.patch
>
>
> To clarify what "edge" means, I added some description. That edge means the 
> beggining edge of a term or ending edge of a term.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]