Re: [Lucene.Net] [jira] [Commented] (LUCENENET-412) Replacing ArrayLists, Hashtables etc. with appropriate Generics.

2011-05-18 Thread Alexander Bauer


Can i use this version with an existing index based on lucene.Java 3.0.3 ?

Alex


Am 19.05.2011 00:20, schrieb Digy (JIRA):

 [ 
https://issues.apache.org/jira/browse/LUCENENET-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035795#comment-13035795
 ]

Digy commented on LUCENENET-412:


Hi All,

Lucene.Net 2.9.4g is almost ready for testing  feedbacks.

While injecting generics  making some clean up in code, I tried to be close to 
lucene 3.0.3 as much as possible.
Therefore it's position is somewhere between lucene.Java 2.9.4  3.0.3

DIGY


PS: For those who might want to try this version:
It won't probably be a drop-in replacement since there are a few API changes 
like
- StopAnalyzer(Liststring  stopWords)
- Query.ExtractTerms(ICollectionstring)
- TopDocs.*TotalHits*, TopDocs.*ScoreDocs*
and some removed methods/classes like
- Filter.Bits
- JustCompileSearch
- Contrib/Similarity.Net





Replacing ArrayLists, Hashtables etc. with appropriate Generics.


 Key: LUCENENET-412
 URL: https://issues.apache.org/jira/browse/LUCENENET-412
 Project: Lucene.Net
  Issue Type: Improvement
Affects Versions: Lucene.Net 2.9.4
Reporter: Digy
Priority: Minor
 Fix For: Lucene.Net 2.9.4

 Attachments: IEquatable for QuerySubclasses.patch, 
LUCENENET-412.patch, lucene_2.9.4g_exceptions_fix


This will move Lucene.Net.2.9.4 closer to lucene.3.0.3 and allow some 
performance gains.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira





[jira] [Commented] (LUCENE-3102) Few issues with CachingCollector

2011-05-18 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035223#comment-13035223
 ] 

Shai Erera commented on LUCENE-3102:


There are two things left to do:

(1) Use bit set instead of int[] for docIDs. If we do this, then it means the 
Collector cannot support out-of-order collections (which is not a big deal 
IMO). It also means for large indexes, we might consume more RAM than int[].

(2) Allow this Collector to stand on its own, w/o necessarily wrapping another 
Collector. There are several ways we can achieve that:
* Take a 'null' Collector and check other != null. Adds an 'if' but not a big 
deal IMO. Also, acceptDocsOutOfOrder will have to either return false (or 
true), or we take that as a parameter.
* Take a 'null' Collector and set this.other to a private static instance of a 
NoOpCollector. We'll still be delegating calls to it, but hopefully it won't be 
expensive. Same issue w/ out-of-order
* Create two specialized variants of CachingCollector.

Personally I'm not too much in favor of the last option - too much code dup for 
not much gain.

The option I like the most is the 2nd (introducing a NoOpCollector). We can 
even introduce it as a public static member of CachingCollector and let users 
decide if they want to use it or not. For ease of use, we can still allow 
'null' to be passed to create().

What do you think?

 Few issues with CachingCollector
 

 Key: LUCENE-3102
 URL: https://issues.apache.org/jira/browse/LUCENE-3102
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3102-factory.patch, LUCENE-3102.patch, 
 LUCENE-3102.patch


 CachingCollector (introduced in LUCENE-1421) has few issues:
 # Since the wrapped Collector may support out-of-order collection, the 
 document IDs cached may be out-of-order (depends on the Query) and thus 
 replay(Collector) will forward document IDs out-of-order to a Collector that 
 may not support it.
 # It does not clear cachedScores + cachedSegs upon exceeding RAM limits
 # I think that instead of comparing curScores to null, in order to determine 
 if scores are requested, we should have a specific boolean - for clarity
 # This check if (base + nextLength  maxDocsToCache) (line 168) can be 
 relaxed? E.g., what if nextLength is, say, 512K, and I cannot satisfy the 
 maxDocsToCache constraint, but if it was 10K I would? Wouldn't we still want 
 to try and cache them?
 Also:
 * The TODO in line 64 (having Collector specify needsScores()) -- why do we 
 need that if CachingCollector ctor already takes a boolean cacheScores? I 
 think it's better defined explicitly than implicitly?
 * Let's introduce a factory method for creating a specialized version if 
 scoring is requested / not (i.e., impl the TODO in line 189)
 * I think it's a useful collector, which stands on its own and not specific 
 to grouping. Can we move it to core?
 * How about using OpenBitSet instead of int[] for doc IDs?
 ** If the number of hits is big, we'd gain some RAM back, and be able to 
 cache more entries
 ** NOTE: OpenBitSet can only be used for in-order collection only. So we can 
 use that if the wrapped Collector does not support out-of-order
 * Do you think we can modify this Collector to not necessarily wrap another 
 Collector? We have such Collector which stores (in-memory) all matching doc 
 IDs + scores (if required). Those are later fed into several processes that 
 operate on them (e.g. fetch more info from the index etc.). I am thinking, we 
 can make CachingCollector *optionally* wrap another Collector and then 
 someone can reuse it by setting RAM limit to unlimited (we should have a 
 constant for that) in order to simply collect all matching docs + scores.
 * I think a set of dedicated unit tests for this class alone would be good.
 That's it so far. Perhaps, if we do all of the above, more things will pop up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos

2011-05-18 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3084:
--

Attachment: LUCENE-3084-trunk-only.patch

This patch only additionally has a cache of the unmodified collections (like 
Java's core collections do). This prevents creation of new instance on each 
asList() call.

Mike: Do you have any further comments, else I will commit in a day or two 
(before leaving to Lucene Rev).

 MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos
 --

 Key: LUCENE-3084
 URL: https://issues.apache.org/jira/browse/LUCENE-3084
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084.patch


 SegmentInfos carries a bunch of fields beyond the list of SI, but for merging 
 purposes these fields are unused.
 We should cutover to ListSI instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos, Remove VectorSI subclassing from SegmentInfos more refactoring

2011-05-18 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3084:
--

Description: 
SegmentInfos carries a bunch of fields beyond the list of SI, but for merging 
purposes these fields are unused.

We should cutover to ListSI instead.

Also SegmentInfos subclasses VectorSI, this should be removed and the 
collections be hidden inside the class. We can add unmodifiable views on it 
(asList(), asSet()).

  was:
SegmentInfos carries a bunch of fields beyond the list of SI, but for merging 
purposes these fields are unused.

We should cutover to ListSI instead.

Summary: MergePolicy.OneMerge.segments should be ListSegmentInfo not 
SegmentInfos, Remove VectorSI subclassing from SegmentInfos  more 
refactoring  (was: MergePolicy.OneMerge.segments should be ListSegmentInfo 
not SegmentInfos)

 MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos, 
 Remove VectorSI subclassing from SegmentInfos  more refactoring
 --

 Key: LUCENE-3084
 URL: https://issues.apache.org/jira/browse/LUCENE-3084
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084.patch


 SegmentInfos carries a bunch of fields beyond the list of SI, but for merging 
 purposes these fields are unused.
 We should cutover to ListSI instead.
 Also SegmentInfos subclasses VectorSI, this should be removed and the 
 collections be hidden inside the class. We can add unmodifiable views on it 
 (asList(), asSet()).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3108) Land DocValues on trunk

2011-05-18 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035234#comment-13035234
 ] 

Simon Willnauer commented on LUCENE-3108:
-

Mike thanks for the review!

bq. Phew been a long time since I looked at this branch!

its been changing :) 

bq. We have some stale jdocs that reference .setIntValue methods (they
are now .setInt)
True - thanks I will fix.

bq. Hmm do we have byte ordering problems? Ie, if I write index on
machine with little-endian but then try to load values on
big-endian...? I think we're OK (we seem to always use
IndexOutput.writeInt, and we convert float-to-raw-int-bits using
java's APIs)?

We are ok here since we write big-endian (enforced by DataOutput) and read it 
back in as plain bytes. The created ByteBuffer will always use BIG_ENDIAN as 
the default order. I added a comment for this.

bq. How come codecID changed from String to int on the branch?
due to DocValues I need to compare the ID to certain fields to see for what 
field I stored and need to open docValues. I always had to parse the given 
string which is kind of odd. I think its more natural to have the same datatype 
on FieldInfo, SegmentCodecs and eventually in the Codec#files() method. Making 
a string out of it is way simpler / less risky than parsing IMO.

bq. What are oal.util.Pair and ParallelArray for?
legacy I will remove

bq. FloatsRef should state in the jdocs that it's really slicing a
double[]?

yep done!

bq. Can SortField somehow detect whether the needed field was stored
in FC vs DV and pick the right comparator accordingly...? Kind of
like how NumericField can detect whether the ints are encoded as
plain text or as NF? We can open a new issue for this,
post-landing...

This is tricky though. You can have a DV field that is indexed too so its hard 
to tell if we can reliably do it. If we can't make it reliable I think we 
should not do it at all.


bq. It looks like we can sort by int/long/float/double pulled from DV,
but not by terms? This is fine for landing... but I think we
should open a post-landing issue to also make FieldComparators for
the Terms cases?

Yeah true. I didn't add a FieldComparator for bytes yet. I think this is post 
landing!

bq. Should we rename oal.index.values.Type - .ValueType? Just
because... it looks so generic when its imported  used as Type
somewhere?

agreed. I also think we should rename Source but I don't have a good name yet. 
Any idea?

bq. Since we dynamically reserve a value to mean unset, does that
mean there are some datasets we cannot index? Or... do we tap
into the unused bit of a long, ie the sentinel value can be
negative? But if the data set spans Long.MIN_VALUE to
Long.MAX_VALUE, what do we do...?

This is tricky though. The quick answer is yes, but we can't do that anyway 
since I have not normalize the range to be 0 based since PackedInts doesn't 
allow negative values. so the range we can store is (2^63) -1. So essentially 
with the current impl we can store (2^63)-2 and the max value is 
Long#MAX_VALUE-1. Currently there is no assert for this which is needed I think 
but to get around this we need to have a different impl I think or do I miss 
something? 

I will make the changes once SVN is writeable again.



 Land DocValues on trunk
 ---

 Key: LUCENE-3108
 URL: https://issues.apache.org/jira/browse/LUCENE-3108
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index, core/search, core/store
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0


 Its time to move another feature from branch to trunk. I want to start this 
 process now while still a couple of issues remain on the branch. Currently I 
 am down to a single nocommit (javadocs on DocValues.java) and a couple of 
 testing TODOs (explicit multithreaded tests and unoptimized with deletions) 
 but I think those are not worth separate issues so we can resolve them as we 
 go. 
 The already created issues (LUCENE-3075 and LUCENE-3074) should not block 
 this process here IMO, we can fix them once we are on trunk. 
 Here is a quick feature overview of what has been implemented:
  * DocValues implementations for Ints (based on PackedInts), Float 32 / 64, 
 Bytes (fixed / variable size each in sorted, straight and deref variations)
  * Integration into Flex-API, Codec provides a 
 PerDocConsumer-DocValuesConsumer (write) / PerDocValues-DocValues (read) 
  * By-Default enabled in all codecs except of PreFlex
  * Follows other flex-API patterns like non-segment reader throw UOE forcing 
 MultiPerDocValues if on DirReader etc.
  * Integration into IndexWriter, FieldInfos etc.
  * Random-testing enabled via RandomIW - injecting random DocValues into 
 documents
  * Basic checks in CheckIndex (which 

[jira] [Issue Comment Edited] (LUCENE-3108) Land DocValues on trunk

2011-05-18 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035234#comment-13035234
 ] 

Simon Willnauer edited comment on LUCENE-3108 at 5/18/11 8:20 AM:
--

Mike thanks for the review!

bq. Phew been a long time since I looked at this branch!

its been changing :) 

{quote} We have some stale jdocs that reference .setIntValue methods (they
are now .setInt){quote}
True - thanks I will fix.

{quote} Hmm do we have byte ordering problems? Ie, if I write index on
machine with little-endian but then try to load values on
big-endian...? I think we're OK (we seem to always use
IndexOutput.writeInt, and we convert float-to-raw-int-bits using
java's APIs)?{quote}

We are ok here since we write big-endian (enforced by DataOutput) and read it 
back in as plain bytes. The created ByteBuffer will always use BIG_ENDIAN as 
the default order. I added a comment for this.

{quote}How come codecID changed from String to int on the branch?{quote}
due to DocValues I need to compare the ID to certain fields to see for what 
field I stored and need to open docValues. I always had to parse the given 
string which is kind of odd. I think its more natural to have the same datatype 
on FieldInfo, SegmentCodecs and eventually in the Codec#files() method. Making 
a string out of it is way simpler / less risky than parsing IMO.

{quote} What are oal.util.Pair and ParallelArray for?{quote}
legacy I will remove

{quote} FloatsRef should state in the jdocs that it's really slicing a
double[]?{quote}

yep done!

{quote} Can SortField somehow detect whether the needed field was stored
in FC vs DV and pick the right comparator accordingly...? Kind of
like how NumericField can detect whether the ints are encoded as
plain text or as NF? We can open a new issue for this,
post-landing...{quote}

This is tricky though. You can have a DV field that is indexed too so its hard 
to tell if we can reliably do it. If we can't make it reliable I think we 
should not do it at all.


{quote}It looks like we can sort by int/long/float/double pulled from DV,
but not by terms? This is fine for landing... but I think we
should open a post-landing issue to also make FieldComparators for
the Terms cases?{quote}

Yeah true. I didn't add a FieldComparator for bytes yet. I think this is post 
landing!

{quote} Should we rename oal.index.values.Type - .ValueType? Just
because... it looks so generic when its imported  used as Type
somewhere?{quote}

agreed. I also think we should rename Source but I don't have a good name yet. 
Any idea?

{quote} Since we dynamically reserve a value to mean unset, does that
mean there are some datasets we cannot index? Or... do we tap
into the unused bit of a long, ie the sentinel value can be
negative? But if the data set spans Long.MIN_VALUE to
Long.MAX_VALUE, what do we do...?{quote}

Again, tricky! The quick answer is yes, but we can't do that anyway since I 
have not normalize the range to be 0 based since PackedInts doesn't allow 
negative values. so the range we can store is (2^63) -1. So essentially with 
the current impl we can store (2^63)-2 and the max value is Long#MAX_VALUE-1. 
Currently there is no assert for this which is needed I think but to get around 
this we need to have a different impl I think or do I miss something? 

I will make the changes once SVN is writeable again.



  was (Author: simonw):
Mike thanks for the review!

bq. Phew been a long time since I looked at this branch!

its been changing :) 

bq. We have some stale jdocs that reference .setIntValue methods (they
are now .setInt)
True - thanks I will fix.

bq. Hmm do we have byte ordering problems? Ie, if I write index on
machine with little-endian but then try to load values on
big-endian...? I think we're OK (we seem to always use
IndexOutput.writeInt, and we convert float-to-raw-int-bits using
java's APIs)?

We are ok here since we write big-endian (enforced by DataOutput) and read it 
back in as plain bytes. The created ByteBuffer will always use BIG_ENDIAN as 
the default order. I added a comment for this.

bq. How come codecID changed from String to int on the branch?
due to DocValues I need to compare the ID to certain fields to see for what 
field I stored and need to open docValues. I always had to parse the given 
string which is kind of odd. I think its more natural to have the same datatype 
on FieldInfo, SegmentCodecs and eventually in the Codec#files() method. Making 
a string out of it is way simpler / less risky than parsing IMO.

bq. What are oal.util.Pair and ParallelArray for?
legacy I will remove

bq. FloatsRef should state in the jdocs that it's really slicing a
double[]?

yep done!

bq. Can SortField somehow detect whether the needed field was stored
in FC vs DV and pick the right comparator accordingly...? Kind of
like how NumericField can detect whether the 

Solr ByteUtils

2011-05-18 Thread Simon Willnauer
Hey there,

I just ran into org.apache.solr.util.ByteUtils which seems pretty much
like a duplication of UnicodeUtils in Lucene. I think we should get
rid of it and merge what needs to be merged into UnicodeUtils. This
utils class is really just doing unicode stuff.

simon

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Solr ByteUtils

2011-05-18 Thread Michael McCandless
+1

Mike

http://blog.mikemccandless.com

On Wed, May 18, 2011 at 4:34 AM, Simon Willnauer
simon.willna...@googlemail.com wrote:
 Hey there,

 I just ran into org.apache.solr.util.ByteUtils which seems pretty much
 like a duplication of UnicodeUtils in Lucene. I think we should get
 rid of it and merge what needs to be merged into UnicodeUtils. This
 utils class is really just doing unicode stuff.

 simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1888) Provide Option to Store Payloads on the Term Vector

2011-05-18 Thread Michal Fapso (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035277#comment-13035277
 ] 

Michal Fapso commented on LUCENE-1888:
--

Hi Peter,

I work on the same thing. You can get my code from here: 
http://speech.fit.vutbr.cz/en/software/speech-search (Lucene extension for bin 
sequences), there are also some testing data.

That code runs behind this website: http://www.superlectures.com/odyssey/

It is few months old, so if you are interested, I can send you our current 
version.

Best regards,
Michal Fapso

 Provide Option to Store Payloads on the Term Vector
 ---

 Key: LUCENE-1888
 URL: https://issues.apache.org/jira/browse/LUCENE-1888
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
 Fix For: 4.0


 Would be nice to have the option to access the payloads in a document-centric 
 way by adding them to the Term Vectors.  Naturally, this makes the Term 
 Vectors bigger, but it may be just what one needs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-1888) Provide Option to Store Payloads on the Term Vector

2011-05-18 Thread Michal Fapso (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035277#comment-13035277
 ] 

Michal Fapso edited comment on LUCENE-1888 at 5/18/11 10:40 AM:


Hi Peter,

I work on the same thing. You can get my code from here: 
http://speech.fit.vutbr.cz/en/software/speech-search (Lucene extension for bin 
sequences), there are also some testing data. Actually it indexes word 
confusion networks with scores of hypotheses, but of course it will work also 
for 1-best string transcripts.

That code runs behind this website: http://www.superlectures.com/odyssey/

It is few months old, so if you are interested, I can send you our current 
version.

Best regards,
Michal Fapso

  was (Author: michalfapso):
Hi Peter,

I work on the same thing. You can get my code from here: 
http://speech.fit.vutbr.cz/en/software/speech-search (Lucene extension for bin 
sequences), there are also some testing data.

That code runs behind this website: http://www.superlectures.com/odyssey/

It is few months old, so if you are interested, I can send you our current 
version.

Best regards,
Michal Fapso
  
 Provide Option to Store Payloads on the Term Vector
 ---

 Key: LUCENE-1888
 URL: https://issues.apache.org/jira/browse/LUCENE-1888
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
 Fix For: 4.0


 Would be nice to have the option to access the payloads in a document-centric 
 way by adding them to the Term Vectors.  Naturally, this makes the Term 
 Vectors bigger, but it may be just what one needs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3102) Few issues with CachingCollector

2011-05-18 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-3102:
---

Attachment: LUCENE-3102-nowrap.patch

Patch against 3x:
* Adds a create() to CachingCollector which does not take a Collector to wrap. 
Internally, it creates a no-op collector, which ignores everything.
* Javadocs for create()
* matching test.

 Few issues with CachingCollector
 

 Key: LUCENE-3102
 URL: https://issues.apache.org/jira/browse/LUCENE-3102
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3102-factory.patch, LUCENE-3102-nowrap.patch, 
 LUCENE-3102.patch, LUCENE-3102.patch


 CachingCollector (introduced in LUCENE-1421) has few issues:
 # Since the wrapped Collector may support out-of-order collection, the 
 document IDs cached may be out-of-order (depends on the Query) and thus 
 replay(Collector) will forward document IDs out-of-order to a Collector that 
 may not support it.
 # It does not clear cachedScores + cachedSegs upon exceeding RAM limits
 # I think that instead of comparing curScores to null, in order to determine 
 if scores are requested, we should have a specific boolean - for clarity
 # This check if (base + nextLength  maxDocsToCache) (line 168) can be 
 relaxed? E.g., what if nextLength is, say, 512K, and I cannot satisfy the 
 maxDocsToCache constraint, but if it was 10K I would? Wouldn't we still want 
 to try and cache them?
 Also:
 * The TODO in line 64 (having Collector specify needsScores()) -- why do we 
 need that if CachingCollector ctor already takes a boolean cacheScores? I 
 think it's better defined explicitly than implicitly?
 * Let's introduce a factory method for creating a specialized version if 
 scoring is requested / not (i.e., impl the TODO in line 189)
 * I think it's a useful collector, which stands on its own and not specific 
 to grouping. Can we move it to core?
 * How about using OpenBitSet instead of int[] for doc IDs?
 ** If the number of hits is big, we'd gain some RAM back, and be able to 
 cache more entries
 ** NOTE: OpenBitSet can only be used for in-order collection only. So we can 
 use that if the wrapped Collector does not support out-of-order
 * Do you think we can modify this Collector to not necessarily wrap another 
 Collector? We have such Collector which stores (in-memory) all matching doc 
 IDs + scores (if required). Those are later fed into several processes that 
 operate on them (e.g. fetch more info from the index etc.). I am thinking, we 
 can make CachingCollector *optionally* wrap another Collector and then 
 someone can reuse it by setting RAM limit to unlimited (we should have a 
 constant for that) in order to simply collect all matching docs + scores.
 * I think a set of dedicated unit tests for this class alone would be good.
 That's it so far. Perhaps, if we do all of the above, more things will pop up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos, Remove VectorSI subclassing from SegmentInfos more refactoring

2011-05-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035282#comment-13035282
 ] 

Michael McCandless commented on LUCENE-3084:


Looks awesome Uwe!  +1 to commit.  Some small variable naming
suggestions:

  * Rename cloneChilds - cloneChildren (sis.createBackupSIS)

  * Maybe call it (and, invert) mapIndexesValid instead of
mapIndexesInvalid (in SIS.java)?  I generally prefer not putting
not into boolean variables when possible, for sanity...


 MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos, 
 Remove VectorSI subclassing from SegmentInfos  more refactoring
 --

 Key: LUCENE-3084
 URL: https://issues.apache.org/jira/browse/LUCENE-3084
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084.patch


 SegmentInfos carries a bunch of fields beyond the list of SI, but for merging 
 purposes these fields are unused.
 We should cutover to ListSI instead.
 Also SegmentInfos subclasses VectorSI, this should be removed and the 
 collections be hidden inside the class. We can add unmodifiable views on it 
 (asList(), asSet()).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3102) Few issues with CachingCollector

2011-05-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035283#comment-13035283
 ] 

Michael McCandless commented on LUCENE-3102:


The committed CHANGES has typo (reply should be replay).

 Few issues with CachingCollector
 

 Key: LUCENE-3102
 URL: https://issues.apache.org/jira/browse/LUCENE-3102
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3102-factory.patch, LUCENE-3102-nowrap.patch, 
 LUCENE-3102.patch, LUCENE-3102.patch


 CachingCollector (introduced in LUCENE-1421) has few issues:
 # Since the wrapped Collector may support out-of-order collection, the 
 document IDs cached may be out-of-order (depends on the Query) and thus 
 replay(Collector) will forward document IDs out-of-order to a Collector that 
 may not support it.
 # It does not clear cachedScores + cachedSegs upon exceeding RAM limits
 # I think that instead of comparing curScores to null, in order to determine 
 if scores are requested, we should have a specific boolean - for clarity
 # This check if (base + nextLength  maxDocsToCache) (line 168) can be 
 relaxed? E.g., what if nextLength is, say, 512K, and I cannot satisfy the 
 maxDocsToCache constraint, but if it was 10K I would? Wouldn't we still want 
 to try and cache them?
 Also:
 * The TODO in line 64 (having Collector specify needsScores()) -- why do we 
 need that if CachingCollector ctor already takes a boolean cacheScores? I 
 think it's better defined explicitly than implicitly?
 * Let's introduce a factory method for creating a specialized version if 
 scoring is requested / not (i.e., impl the TODO in line 189)
 * I think it's a useful collector, which stands on its own and not specific 
 to grouping. Can we move it to core?
 * How about using OpenBitSet instead of int[] for doc IDs?
 ** If the number of hits is big, we'd gain some RAM back, and be able to 
 cache more entries
 ** NOTE: OpenBitSet can only be used for in-order collection only. So we can 
 use that if the wrapped Collector does not support out-of-order
 * Do you think we can modify this Collector to not necessarily wrap another 
 Collector? We have such Collector which stores (in-memory) all matching doc 
 IDs + scores (if required). Those are later fed into several processes that 
 operate on them (e.g. fetch more info from the index etc.). I am thinking, we 
 can make CachingCollector *optionally* wrap another Collector and then 
 someone can reuse it by setting RAM limit to unlimited (we should have a 
 constant for that) in order to simply collect all matching docs + scores.
 * I think a set of dedicated unit tests for this class alone would be good.
 That's it so far. Perhaps, if we do all of the above, more things will pop up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos, Remove VectorSI subclassing from SegmentInfos more refactoring

2011-05-18 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035284#comment-13035284
 ] 

Uwe Schindler commented on LUCENE-3084:
---

OK! Thanks Mike

bq. mapIndexesInvalid

I will remove the map again and replace by a simple Set. Using a map that maps 
to indexes is too complicated and does not bring us anything. contains() works 
without and indexOf() needs to rebuild the map whenever an insert or remove is 
done. Especially on remove(SI) it will rebuild the map two times in the badest 
case.

A linear scan for indexOf is in my opinion fine. We can only optimize by doing 
a contains on the set first.

 MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos, 
 Remove VectorSI subclassing from SegmentInfos  more refactoring
 --

 Key: LUCENE-3084
 URL: https://issues.apache.org/jira/browse/LUCENE-3084
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084.patch


 SegmentInfos carries a bunch of fields beyond the list of SI, but for merging 
 purposes these fields are unused.
 We should cutover to ListSI instead.
 Also SegmentInfos subclasses VectorSI, this should be removed and the 
 collections be hidden inside the class. We can add unmodifiable views on it 
 (asList(), asSet()).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3102) Few issues with CachingCollector

2011-05-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035285#comment-13035285
 ] 

Michael McCandless commented on LUCENE-3102:


Patch to allow no wrapped collector looks good!  I wonder/hope hotspot can 
realize those method calls are no-ops...

Maybe change TestGrouping to randomly use this ctor?  Ie, randomly, you can use 
caching collector (not wrapped), then call its replay method twice (once 
against 1st pass, then against 2nd pass, collectors), and then assert results 
like normal.  This is also a good verification that replay works twice...

On the OBS, it makes me nervous to just always do this; I'd rather have it 
cutover at some point?  Or perhaps it's an expert optional arg to create, 
whether it should back w/ OBS vs int[]?

Or, ideally... we make a bit set impl that does this all under the hood (uses 
int[] when there are few docs, and ugprades to OBS once there are enough to 
justify it...), then we can just use that bit set here.

 Few issues with CachingCollector
 

 Key: LUCENE-3102
 URL: https://issues.apache.org/jira/browse/LUCENE-3102
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3102-factory.patch, LUCENE-3102-nowrap.patch, 
 LUCENE-3102.patch, LUCENE-3102.patch


 CachingCollector (introduced in LUCENE-1421) has few issues:
 # Since the wrapped Collector may support out-of-order collection, the 
 document IDs cached may be out-of-order (depends on the Query) and thus 
 replay(Collector) will forward document IDs out-of-order to a Collector that 
 may not support it.
 # It does not clear cachedScores + cachedSegs upon exceeding RAM limits
 # I think that instead of comparing curScores to null, in order to determine 
 if scores are requested, we should have a specific boolean - for clarity
 # This check if (base + nextLength  maxDocsToCache) (line 168) can be 
 relaxed? E.g., what if nextLength is, say, 512K, and I cannot satisfy the 
 maxDocsToCache constraint, but if it was 10K I would? Wouldn't we still want 
 to try and cache them?
 Also:
 * The TODO in line 64 (having Collector specify needsScores()) -- why do we 
 need that if CachingCollector ctor already takes a boolean cacheScores? I 
 think it's better defined explicitly than implicitly?
 * Let's introduce a factory method for creating a specialized version if 
 scoring is requested / not (i.e., impl the TODO in line 189)
 * I think it's a useful collector, which stands on its own and not specific 
 to grouping. Can we move it to core?
 * How about using OpenBitSet instead of int[] for doc IDs?
 ** If the number of hits is big, we'd gain some RAM back, and be able to 
 cache more entries
 ** NOTE: OpenBitSet can only be used for in-order collection only. So we can 
 use that if the wrapped Collector does not support out-of-order
 * Do you think we can modify this Collector to not necessarily wrap another 
 Collector? We have such Collector which stores (in-memory) all matching doc 
 IDs + scores (if required). Those are later fed into several processes that 
 operate on them (e.g. fetch more info from the index etc.). I am thinking, we 
 can make CachingCollector *optionally* wrap another Collector and then 
 someone can reuse it by setting RAM limit to unlimited (we should have a 
 constant for that) in order to simply collect all matching docs + scores.
 * I think a set of dedicated unit tests for this class alone would be good.
 That's it so far. Perhaps, if we do all of the above, more things will pop up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2524) Adding grouping to Solr 3x

2011-05-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035287#comment-13035287
 ] 

Michael McCandless commented on SOLR-2524:
--

+1 this would be awesome Martijn!!

In general we should try hard to backport features we build on trunk, to 3.x, 
when feasible.

 Adding grouping to Solr 3x
 --

 Key: SOLR-2524
 URL: https://issues.apache.org/jira/browse/SOLR-2524
 Project: Solr
  Issue Type: New Feature
Affects Versions: 3.2
Reporter: Martijn van Groningen

 Grouping was recently added to Lucene 3x. See LUCENE-1421 for more 
 information.
 I think it would be nice if we expose this functionality also to the Solr 
 users that are bound to a 3.x version.
 The grouping feature added to Lucene is currently a subset of the 
 functionality that Solr 4.0-trunk offers. Mainly it doesn't support grouping 
 by function / query.
 The work involved getting the grouping contrib to work on Solr 3x is 
 acceptable. I have it more or less running here. It supports the response 
 format and request parameters (expect: group.query and group.func) described 
 in the FieldCollapse page on the Solr wiki.
 I think it would be great if this is included in the Solr 3.2 release. Many 
 people are using grouping as patch now and this would help them a lot. Any 
 thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync

2011-05-18 Thread Apache Jenkins Server
Build: https://builds.apache.org/hudson/job/Lucene-Solr-Maven-3.x/126/

No tests ran.

Build Log (for compile errors):
[...truncated 7931 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2883) Consolidate Solr Lucene FunctionQuery into modules

2011-05-18 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035331#comment-13035331
 ] 

Chris Male commented on LUCENE-2883:


Script needed to run to use patch:

{code}
svn mkdir --parents modules/queries/src/java/org/apache/lucene/queries/function
svn move solr/src/java/org/apache/solr/search/function/FunctionQuery.java 
modules/queries/src/java/org/apache/lucene/queries/function/FunctionQuery.java
svn move solr/src/java/org/apache/solr/search/function/ValueSource.java 
modules/queries/src/java/org/apache/lucene/queries/function/ValueSource.java
svn move solr/src/java/org/apache/solr/search/function/DocValues.java 
modules/queries/src/java/org/apache/lucene/queries/function/DocValues.java
svn move solr/src/java/org/apache/solr/search/MutableValue.java 
modules/queries/src/java/org/apache/lucene/queries/function/MutableValue.java
svn move solr/src/java/org/apache/solr/search/MutableValueFloat.java 
modules/queries/src/java/org/apache/lucene/queries/function/MutableValueFloat.java
{code}



 Consolidate Solr   Lucene FunctionQuery into modules
 -

 Key: LUCENE-2883
 URL: https://issues.apache.org/jira/browse/LUCENE-2883
 Project: Lucene - Java
  Issue Type: Task
  Components: core/search
Affects Versions: 4.0
Reporter: Simon Willnauer
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2883.patch


 Spin-off from the [dev list | 
 http://www.mail-archive.com/dev@lucene.apache.org/msg13261.html]  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2883) Consolidate Solr Lucene FunctionQuery into modules

2011-05-18 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-2883:
---

Attachment: LUCENE-2883.patch

Patch that factors out the core FunctionQuery stuff into a queries module.  
Theres alot of issues here but it does compile.  

The following issues need to be addressed:

- MutableValue  MutableFloatValue are used in the FunctionQuery code so I've 
pulled them into the module too.  Should all the other Mutable*Value classes 
come too? Should they go into some other module?

- What to return in ValueSource#getSortField which currently returns a 
SortField which implements SolrSortField.  This is currently commented out so 
we can determine what best to do.  Having this commented out breaks the Solr 
tests.

- Many of the ValueSources and DocValues in Solr could be moved to the module, 
but not all of them.  Some have dependencies on Solr dependencies / Solr core 
code.

- Module isn't full integrated into the build.xmls and dev-tools.

- Lucene core's FunctionQuery stuff needs to be removed.

I'll add a script that needs to be run before adding this patch shortly.

 Consolidate Solr   Lucene FunctionQuery into modules
 -

 Key: LUCENE-2883
 URL: https://issues.apache.org/jira/browse/LUCENE-2883
 Project: Lucene - Java
  Issue Type: Task
  Components: core/search
Affects Versions: 4.0
Reporter: Simon Willnauer
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2883.patch


 Spin-off from the [dev list | 
 http://www.mail-archive.com/dev@lucene.apache.org/msg13261.html]  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3115) Escaping stars and question marks do not work.

2011-05-18 Thread Vladimir Kornev (JIRA)
Escaping stars and question marks do not work.
--

 Key: LUCENE-3115
 URL: https://issues.apache.org/jira/browse/LUCENE-3115
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.1
Reporter: Vladimir Kornev


The string I have search by st*rs is indexed. Search by query *\** doesn't 
return matching result. This query returns all not empty values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3115) Escaping stars and question marks do not work.

2011-05-18 Thread Vladimir Kornev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Kornev updated LUCENE-3115:


Description: The string I have search by st*rs is indexed. Search by 
query \*\\*\* doesn't return matching result. This query returns all not 
empty values.  (was: The string I have search by st*rs is indexed. Search by 
query *\** doesn't return matching result. This query returns all not empty 
values.)

 Escaping stars and question marks do not work.
 --

 Key: LUCENE-3115
 URL: https://issues.apache.org/jira/browse/LUCENE-3115
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.1
Reporter: Vladimir Kornev

 The string I have search by st*rs is indexed. Search by query \*\\*\* 
 doesn't return matching result. This query returns all not empty values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3115) Escaping stars and question marks do not work.

2011-05-18 Thread Vladimir Kornev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Kornev updated LUCENE-3115:


Description: The string I have search by st*rs is indexed. Search by 
query \*\\*\* doesn't return matching result. This query returns all not 
empty values.  (was: The string I have search by st*rs is indexed. Search by 
query \\**\\* doesn't return matching result. This query returns all not 
empty values.)

 Escaping stars and question marks do not work.
 --

 Key: LUCENE-3115
 URL: https://issues.apache.org/jira/browse/LUCENE-3115
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.1
Reporter: Vladimir Kornev

 The string I have search by st*rs is indexed. Search by query \*\\*\* 
 doesn't return matching result. This query returns all not empty values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3115) Escaping stars and question marks do not work.

2011-05-18 Thread Vladimir Kornev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Kornev updated LUCENE-3115:


Description: The string I have search by st*rs is indexed. Search by 
query \*\\\*\* doesn't return matching result. This query returns all not 
empty values.  (was: The string I have search by st*rs is indexed. Search by 
query \*\\*\* doesn't return matching result. This query returns all not 
empty values.)

 Escaping stars and question marks do not work.
 --

 Key: LUCENE-3115
 URL: https://issues.apache.org/jira/browse/LUCENE-3115
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.1
Reporter: Vladimir Kornev

 The string I have search by st*rs is indexed. Search by query \*\\\*\* 
 doesn't return matching result. This query returns all not empty values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3115) Escaping stars and question marks do not work.

2011-05-18 Thread Vladimir Kornev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Kornev updated LUCENE-3115:


Description: The string I have search by st*rs is indexed. Search by 
query \\**\\* doesn't return matching result. This query returns all not 
empty values.  (was: The string I have search by st*rs is indexed. Search by 
query \*\\*\* doesn't return matching result. This query returns all not 
empty values.)

 Escaping stars and question marks do not work.
 --

 Key: LUCENE-3115
 URL: https://issues.apache.org/jira/browse/LUCENE-3115
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.1
Reporter: Vladimir Kornev

 The string I have search by st*rs is indexed. Search by query \\**\\* 
 doesn't return matching result. This query returns all not empty values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3115) Escaping stars and question marks do not work.

2011-05-18 Thread Vladimir Kornev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Kornev updated LUCENE-3115:


Description: The string I have search by st*rs is indexed. Search by 
query \*#92\*\* doesn't return matching result. This query returns all not 
empty values.  (was: The string I have search by st*rs is indexed. Search by 
query \*\\\*\* doesn't return matching result. This query returns all not 
empty values.)

 Escaping stars and question marks do not work.
 --

 Key: LUCENE-3115
 URL: https://issues.apache.org/jira/browse/LUCENE-3115
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.1
Reporter: Vladimir Kornev

 The string I have search by st*rs is indexed. Search by query \*#92\*\* 
 doesn't return matching result. This query returns all not empty values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3115) Escaping stars and question marks do not work.

2011-05-18 Thread Vladimir Kornev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Kornev updated LUCENE-3115:


Description: The string I have search by st*rs is indexed. Search by 
query \*\\ \*\* doesn't return matching result. This query returns all not 
empty values.  (was: The string I have search by st*rs is indexed. Search by 
query \*#92\*\* doesn't return matching result. This query returns all not 
empty values.)

 Escaping stars and question marks do not work.
 --

 Key: LUCENE-3115
 URL: https://issues.apache.org/jira/browse/LUCENE-3115
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.1
Reporter: Vladimir Kornev

 The string I have search by st*rs is indexed. Search by query \*\\ \*\* 
 doesn't return matching result. This query returns all not empty values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3115) Escaping stars and question marks do not work.

2011-05-18 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035334#comment-13035334
 ] 

Erick Erickson commented on LUCENE-3115:


Please raise this on the user's list first. Offhand I suspect Lucene is working 
as expected, but you haven't provided nearly enough information to decide 
whether this is a bug or not. A self-contained junit test would be ideal here.

 Escaping stars and question marks do not work.
 --

 Key: LUCENE-3115
 URL: https://issues.apache.org/jira/browse/LUCENE-3115
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.1
Reporter: Vladimir Kornev

 The string I have search by st*rs is indexed. Search by query \*\\\ \*\* 
 doesn't return matching result. This query returns all not empty values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3115) Escaping stars and question marks do not work.

2011-05-18 Thread Vladimir Kornev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Kornev updated LUCENE-3115:


Description: The string I have search by st*rs is indexed. Search by 
query \*\\\ \*\* doesn't return matching result. This query returns all not 
empty values.  (was: The string I have search by st*rs is indexed. Search by 
query \*\\ \*\* doesn't return matching result. This query returns all not 
empty values.)

 Escaping stars and question marks do not work.
 --

 Key: LUCENE-3115
 URL: https://issues.apache.org/jira/browse/LUCENE-3115
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.1
Reporter: Vladimir Kornev

 The string I have search by st*rs is indexed. Search by query \*\\\ \*\* 
 doesn't return matching result. This query returns all not empty values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Javadoc warnings failing the branch_3x build [was: RE: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync]

2011-05-18 Thread Steven A Rowe
This build failed because of Javadoc warnings:

  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.5.0_16-p9
  [javadoc] Building tree for all the packages and classes...
  [javadoc] .../org/apache/lucene/search/grouping/AllGroupsCollector.java:45: 
warning - Tag @link: reference not found: SentinelIntSet
  [javadoc] .../org/apache/lucene/search/grouping/AllGroupsCollector.java:45: 
warning - Tag @link: reference not found: SentinelIntSet
  [javadoc] .../org/apache/lucene/search/grouping/AllGroupsCollector.java:62: 
warning - Tag @link: reference not found: SentinelIntSet
  [javadoc] .../org/apache/lucene/search/grouping/AllGroupsCollector.java:76: 
warning - Tag @link: reference not found: SentinelIntSet

 -Original Message-
 From: Apache Jenkins Server [mailto:hud...@hudson.apache.org]
 Sent: Wednesday, May 18, 2011 7:36 AM
 To: dev@lucene.apache.org
 Subject: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync
 
 Build: https://builds.apache.org/hudson/job/Lucene-Solr-Maven-3.x/126/
 
 No tests ran.
 
 Build Log (for compile errors):
 [...truncated 7931 lines...]
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Javadoc warnings failing the branch_3x build [was: RE: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync]

2011-05-18 Thread Michael McCandless
Ugh my bad, sorry.  I'll fix!

Mike

http://blog.mikemccandless.com

On Wed, May 18, 2011 at 8:22 AM, Steven A Rowe sar...@syr.edu wrote:
 This build failed because of Javadoc warnings:

  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.5.0_16-p9
  [javadoc] Building tree for all the packages and classes...
  [javadoc] .../org/apache/lucene/search/grouping/AllGroupsCollector.java:45: 
 warning - Tag @link: reference not found: SentinelIntSet
  [javadoc] .../org/apache/lucene/search/grouping/AllGroupsCollector.java:45: 
 warning - Tag @link: reference not found: SentinelIntSet
  [javadoc] .../org/apache/lucene/search/grouping/AllGroupsCollector.java:62: 
 warning - Tag @link: reference not found: SentinelIntSet
  [javadoc] .../org/apache/lucene/search/grouping/AllGroupsCollector.java:76: 
 warning - Tag @link: reference not found: SentinelIntSet

 -Original Message-
 From: Apache Jenkins Server [mailto:hud...@hudson.apache.org]
 Sent: Wednesday, May 18, 2011 7:36 AM
 To: dev@lucene.apache.org
 Subject: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync

 Build: https://builds.apache.org/hudson/job/Lucene-Solr-Maven-3.x/126/

 No tests ran.

 Build Log (for compile errors):
 [...truncated 7931 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3115) Escaping stars and question marks do not work.

2011-05-18 Thread Vladimir Kornev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Kornev updated LUCENE-3115:


Description: The string I have search by st*rs is indexed. Search by 
query {code}index.query(*\**);{code} doesn't return matching result. This 
query returns all not empty values.  (was: The string I have search by st*rs 
is indexed. Search by query \*\\\ \*\* doesn't return matching result. This 
query returns all not empty values.)

 Escaping stars and question marks do not work.
 --

 Key: LUCENE-3115
 URL: https://issues.apache.org/jira/browse/LUCENE-3115
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.1
Reporter: Vladimir Kornev

 The string I have search by st*rs is indexed. Search by query 
 {code}index.query(*\**);{code} doesn't return matching result. This query 
 returns all not empty values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3115) Escaping stars and question marks do not work.

2011-05-18 Thread Vladimir Kornev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Kornev updated LUCENE-3115:


Description: The string I have search by st*rs is indexed. Search by 
query {code}index.query(key, *\**);{code} doesn't return matching result. 
This query returns all not empty values.  (was: The string I have search by 
st*rs is indexed. Search by query {code}index.query(*\**);{code} doesn't 
return matching result. This query returns all not empty values.)

 Escaping stars and question marks do not work.
 --

 Key: LUCENE-3115
 URL: https://issues.apache.org/jira/browse/LUCENE-3115
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.1
Reporter: Vladimir Kornev

 The string I have search by st*rs is indexed. Search by query 
 {code}index.query(key, *\**);{code} doesn't return matching result. This 
 query returns all not empty values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Javadoc warnings failing the branch_3x build [was: RE: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync]

2011-05-18 Thread Michael McCandless
Hmm, I know what's wrong here (SentinelIntSet is package private), and
I'll fix... but when I run ant javadocs I don't see these warnings
(but the build clearly does)...

Mike

http://blog.mikemccandless.com

On Wed, May 18, 2011 at 8:22 AM, Steven A Rowe sar...@syr.edu wrote:
 This build failed because of Javadoc warnings:

  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.5.0_16-p9
  [javadoc] Building tree for all the packages and classes...
  [javadoc] .../org/apache/lucene/search/grouping/AllGroupsCollector.java:45: 
 warning - Tag @link: reference not found: SentinelIntSet
  [javadoc] .../org/apache/lucene/search/grouping/AllGroupsCollector.java:45: 
 warning - Tag @link: reference not found: SentinelIntSet
  [javadoc] .../org/apache/lucene/search/grouping/AllGroupsCollector.java:62: 
 warning - Tag @link: reference not found: SentinelIntSet
  [javadoc] .../org/apache/lucene/search/grouping/AllGroupsCollector.java:76: 
 warning - Tag @link: reference not found: SentinelIntSet

 -Original Message-
 From: Apache Jenkins Server [mailto:hud...@hudson.apache.org]
 Sent: Wednesday, May 18, 2011 7:36 AM
 To: dev@lucene.apache.org
 Subject: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync

 Build: https://builds.apache.org/hudson/job/Lucene-Solr-Maven-3.x/126/

 No tests ran.

 Build Log (for compile errors):
 [...truncated 7931 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3102) Few issues with CachingCollector

2011-05-18 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035344#comment-13035344
 ] 

Shai Erera commented on LUCENE-3102:


bq. The committed CHANGES has typo (reply should be replay).

Thanks, will include it in the next commit.

bq. I'd rather have it cutover at some point

This can only be done if out-of-order collection wasn't done so far, because 
otherwise, cutting to OBS will take cached doc IDs and scores out of sync.

bq. we make a bit set impl that does this all under the hood (uses int[] when 
there are few docs, and ugprades to OBS once there are enough to justify 
it...)

That's a good idea. I think we should leave the OBS stuff for another issue. 
See first how this performs and optimize only if needed.

I'll take a look at TestGrouping.

 Few issues with CachingCollector
 

 Key: LUCENE-3102
 URL: https://issues.apache.org/jira/browse/LUCENE-3102
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3102-factory.patch, LUCENE-3102-nowrap.patch, 
 LUCENE-3102.patch, LUCENE-3102.patch


 CachingCollector (introduced in LUCENE-1421) has few issues:
 # Since the wrapped Collector may support out-of-order collection, the 
 document IDs cached may be out-of-order (depends on the Query) and thus 
 replay(Collector) will forward document IDs out-of-order to a Collector that 
 may not support it.
 # It does not clear cachedScores + cachedSegs upon exceeding RAM limits
 # I think that instead of comparing curScores to null, in order to determine 
 if scores are requested, we should have a specific boolean - for clarity
 # This check if (base + nextLength  maxDocsToCache) (line 168) can be 
 relaxed? E.g., what if nextLength is, say, 512K, and I cannot satisfy the 
 maxDocsToCache constraint, but if it was 10K I would? Wouldn't we still want 
 to try and cache them?
 Also:
 * The TODO in line 64 (having Collector specify needsScores()) -- why do we 
 need that if CachingCollector ctor already takes a boolean cacheScores? I 
 think it's better defined explicitly than implicitly?
 * Let's introduce a factory method for creating a specialized version if 
 scoring is requested / not (i.e., impl the TODO in line 189)
 * I think it's a useful collector, which stands on its own and not specific 
 to grouping. Can we move it to core?
 * How about using OpenBitSet instead of int[] for doc IDs?
 ** If the number of hits is big, we'd gain some RAM back, and be able to 
 cache more entries
 ** NOTE: OpenBitSet can only be used for in-order collection only. So we can 
 use that if the wrapped Collector does not support out-of-order
 * Do you think we can modify this Collector to not necessarily wrap another 
 Collector? We have such Collector which stores (in-memory) all matching doc 
 IDs + scores (if required). Those are later fed into several processes that 
 operate on them (e.g. fetch more info from the index etc.). I am thinking, we 
 can make CachingCollector *optionally* wrap another Collector and then 
 someone can reuse it by setting RAM limit to unlimited (we should have a 
 constant for that) in order to simply collect all matching docs + scores.
 * I think a set of dedicated unit tests for this class alone would be good.
 That's it so far. Perhaps, if we do all of the above, more things will pop up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Javadoc warnings failing the branch_3x build [was: RE: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync]

2011-05-18 Thread Steven A Rowe
Mike, I can repro on my Win7 box using Oracle JDK 1.5.0_22.  Maybe you're using 
1.6.0_X? - Steve

 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Wednesday, May 18, 2011 8:33 AM
 To: dev@lucene.apache.org
 Subject: Re: Javadoc warnings failing the branch_3x build [was: RE:
 [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync]
 
 Hmm, I know what's wrong here (SentinelIntSet is package private), and
 I'll fix... but when I run ant javadocs I don't see these warnings
 (but the build clearly does)...
 
 Mike
 
 http://blog.mikemccandless.com
 
 On Wed, May 18, 2011 at 8:22 AM, Steven A Rowe sar...@syr.edu wrote:
  This build failed because of Javadoc warnings:
 
   [javadoc] Constructing Javadoc information...
   [javadoc] Standard Doclet version 1.5.0_16-p9
   [javadoc] Building tree for all the packages and classes...
   [javadoc]
 .../org/apache/lucene/search/grouping/AllGroupsCollector.java:45: warning
 - Tag @link: reference not found: SentinelIntSet
   [javadoc]
 .../org/apache/lucene/search/grouping/AllGroupsCollector.java:45: warning
 - Tag @link: reference not found: SentinelIntSet
   [javadoc]
 .../org/apache/lucene/search/grouping/AllGroupsCollector.java:62: warning
 - Tag @link: reference not found: SentinelIntSet
   [javadoc]
 .../org/apache/lucene/search/grouping/AllGroupsCollector.java:76: warning
 - Tag @link: reference not found: SentinelIntSet
 
  -Original Message-
  From: Apache Jenkins Server [mailto:hud...@hudson.apache.org]
  Sent: Wednesday, May 18, 2011 7:36 AM
  To: dev@lucene.apache.org
  Subject: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync
 
  Build: https://builds.apache.org/hudson/job/Lucene-Solr-Maven-3.x/126/
 
  No tests ran.
 
  Build Log (for compile errors):
  [...truncated 7931 lines...]
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Javadoc warnings failing the branch_3x build [was: RE: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync]

2011-05-18 Thread Michael McCandless
Ahh yes I'm using 1.6, OK.  I'll switch to 1.5...

Mike

http://blog.mikemccandless.com

On Wed, May 18, 2011 at 8:40 AM, Steven A Rowe sar...@syr.edu wrote:
 Mike, I can repro on my Win7 box using Oracle JDK 1.5.0_22.  Maybe you're 
 using 1.6.0_X? - Steve

 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Wednesday, May 18, 2011 8:33 AM
 To: dev@lucene.apache.org
 Subject: Re: Javadoc warnings failing the branch_3x build [was: RE:
 [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync]

 Hmm, I know what's wrong here (SentinelIntSet is package private), and
 I'll fix... but when I run ant javadocs I don't see these warnings
 (but the build clearly does)...

 Mike

 http://blog.mikemccandless.com

 On Wed, May 18, 2011 at 8:22 AM, Steven A Rowe sar...@syr.edu wrote:
  This build failed because of Javadoc warnings:
 
   [javadoc] Constructing Javadoc information...
   [javadoc] Standard Doclet version 1.5.0_16-p9
   [javadoc] Building tree for all the packages and classes...
   [javadoc]
 .../org/apache/lucene/search/grouping/AllGroupsCollector.java:45: warning
 - Tag @link: reference not found: SentinelIntSet
   [javadoc]
 .../org/apache/lucene/search/grouping/AllGroupsCollector.java:45: warning
 - Tag @link: reference not found: SentinelIntSet
   [javadoc]
 .../org/apache/lucene/search/grouping/AllGroupsCollector.java:62: warning
 - Tag @link: reference not found: SentinelIntSet
   [javadoc]
 .../org/apache/lucene/search/grouping/AllGroupsCollector.java:76: warning
 - Tag @link: reference not found: SentinelIntSet
 
  -Original Message-
  From: Apache Jenkins Server [mailto:hud...@hudson.apache.org]
  Sent: Wednesday, May 18, 2011 7:36 AM
  To: dev@lucene.apache.org
  Subject: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync
 
  Build: https://builds.apache.org/hudson/job/Lucene-Solr-Maven-3.x/126/
 
  No tests ran.
 
  Build Log (for compile errors):
  [...truncated 7931 lines...]
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3115) Escaping stars and question marks do not work.

2011-05-18 Thread Vladimir Kornev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Kornev updated LUCENE-3115:


Affects Version/s: (was: 3.1)
   3.0

 Escaping stars and question marks do not work.
 --

 Key: LUCENE-3115
 URL: https://issues.apache.org/jira/browse/LUCENE-3115
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.0
Reporter: Vladimir Kornev

 The string I have search by st*rs is indexed. Search by query 
 {code}index.query(key, *\**);{code} doesn't return matching result. This 
 query returns all not empty values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos, Remove VectorSI subclassing from SegmentInfos more refactoring

2011-05-18 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3084:
--

Attachment: LUCENE-3084-trunk-only.patch

New patch with the renaming and removal of Map in favour of a simple Set.

Again ready to commit.

 MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos, 
 Remove VectorSI subclassing from SegmentInfos  more refactoring
 --

 Key: LUCENE-3084
 URL: https://issues.apache.org/jira/browse/LUCENE-3084
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, LUCENE-3084.patch


 SegmentInfos carries a bunch of fields beyond the list of SI, but for merging 
 purposes these fields are unused.
 We should cutover to ListSI instead.
 Also SegmentInfos subclasses VectorSI, this should be removed and the 
 collections be hidden inside the class. We can add unmodifiable views on it 
 (asList(), asSet()).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos, Remove VectorSI subclassing from SegmentInfos more refactoring

2011-05-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035349#comment-13035349
 ] 

Michael McCandless commented on LUCENE-3084:


Patch looks great Uwe!  +1 to commit.  Thanks!

 MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos, 
 Remove VectorSI subclassing from SegmentInfos  more refactoring
 --

 Key: LUCENE-3084
 URL: https://issues.apache.org/jira/browse/LUCENE-3084
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, LUCENE-3084.patch


 SegmentInfos carries a bunch of fields beyond the list of SI, but for merging 
 purposes these fields are unused.
 We should cutover to ListSI instead.
 Also SegmentInfos subclasses VectorSI, this should be removed and the 
 collections be hidden inside the class. We can add unmodifiable views on it 
 (asList(), asSet()).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos, Remove VectorSI subclassing from SegmentInfos more refactoring

2011-05-18 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-3084:
-

Assignee: Uwe Schindler  (was: Michael McCandless)

 MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos, 
 Remove VectorSI subclassing from SegmentInfos  more refactoring
 --

 Key: LUCENE-3084
 URL: https://issues.apache.org/jira/browse/LUCENE-3084
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, LUCENE-3084.patch


 SegmentInfos carries a bunch of fields beyond the list of SI, but for merging 
 purposes these fields are unused.
 We should cutover to ListSI instead.
 Also SegmentInfos subclasses VectorSI, this should be removed and the 
 collections be hidden inside the class. We can add unmodifiable views on it 
 (asList(), asSet()).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Javadoc warnings failing the branch_3x build [was: RE: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync]

2011-05-18 Thread Uwe Schindler
Hi Mike,

Same problem in trunk!

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Wednesday, May 18, 2011 2:27 PM
 To: dev@lucene.apache.org
 Subject: Re: Javadoc warnings failing the branch_3x build [was: RE:
[JENKINS-
 MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync]
 
 Ugh my bad, sorry.  I'll fix!
 
 Mike
 
 http://blog.mikemccandless.com
 
 On Wed, May 18, 2011 at 8:22 AM, Steven A Rowe sar...@syr.edu
 wrote:
  This build failed because of Javadoc warnings:
 
   [javadoc] Constructing Javadoc information...
   [javadoc] Standard Doclet version 1.5.0_16-p9
   [javadoc] Building tree for all the packages and classes...
   [javadoc]
  .../org/apache/lucene/search/grouping/AllGroupsCollector.java:45:
  warning - Tag @link: reference not found: SentinelIntSet
   [javadoc]
  .../org/apache/lucene/search/grouping/AllGroupsCollector.java:45:
  warning - Tag @link: reference not found: SentinelIntSet
   [javadoc]
  .../org/apache/lucene/search/grouping/AllGroupsCollector.java:62:
  warning - Tag @link: reference not found: SentinelIntSet
   [javadoc]
  .../org/apache/lucene/search/grouping/AllGroupsCollector.java:76:
  warning - Tag @link: reference not found: SentinelIntSet
 
  -Original Message-
  From: Apache Jenkins Server [mailto:hud...@hudson.apache.org]
  Sent: Wednesday, May 18, 2011 7:36 AM
  To: dev@lucene.apache.org
  Subject: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of
 sync
 
  Build:
  https://builds.apache.org/hudson/job/Lucene-Solr-Maven-3.x/126/
 
  No tests ran.
 
  Build Log (for compile errors):
  [...truncated 7931 lines...]
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3113) fix analyzer bugs found by MockTokenizer

2011-05-18 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3113.
-

Resolution: Fixed

Committed revision 1104519, 1124242 (branch_3x)

 fix analyzer bugs found by MockTokenizer
 

 Key: LUCENE-3113
 URL: https://issues.apache.org/jira/browse/LUCENE-3113
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Reporter: Robert Muir
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3113.patch, LUCENE-3113.patch


 In LUCENE-3064, we beefed up MockTokenizer with assertions, and I've switched 
 over the analysis tests to use MockTokenizer for better coverage.
 However, this found a few bugs (one of which is LUCENE-3106):
 * incrementToken() after it returns false in CommonGramsQueryFilter, 
 HyphenatedWordsFilter, ShingleFilter, SynonymFilter
 * missing end() implementation for PrefixAwareTokenFilter
 * double reset() in QueryAutoStopWordAnalyzer and ReusableAnalyzerBase
 * missing correctOffset()s in MockTokenizer itself.
 I think it would be nice to just fix all the bugs on one issue... I've fixed 
 everything except Shingle and Synonym

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3092) NRTCachingDirectory, to buffer small segments in a RAMDir

2011-05-18 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035357#comment-13035357
 ] 

Robert Muir commented on LUCENE-3092:
-

{quote}
Tests? It's nice to have a test use a RAMDirectory for speed, but still follow 
the same code path as FSDirectory for debugging + orthogonality.
{quote}

FWIW currently the lucene tests use a RAMDirectory 90% of the time (and 
something else the other 10%).
We could adjust this... at the time I set it, it seemed to not slow the tests 
down that much but
still give us a little more coverage.

 NRTCachingDirectory, to buffer small segments in a RAMDir
 -

 Key: LUCENE-3092
 URL: https://issues.apache.org/jira/browse/LUCENE-3092
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3092-listener.patch, LUCENE-3092.patch, 
 LUCENE-3092.patch, LUCENE-3092.patch, LUCENE-3092.patch


 I created this simply Directory impl, whose goal is reduce IO
 contention in a frequent reopen NRT use case.
 The idea is, when reopening quickly, but not indexing that much
 content, you wind up with many small files created with time, that can
 possibly stress the IO system eg if merges, searching are also
 fighting for IO.
 So, NRTCachingDirectory puts these newly created files into a RAMDir,
 and only when they are merged into a too-large segment, does it then
 write-through to the real (delegate) directory.
 This lets you spend some RAM to reduce I0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3014) comparator API for segment versions

2011-05-18 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035359#comment-13035359
 ] 

Robert Muir commented on LUCENE-3014:
-

any Objections? Uwe you still want to take this or should I?

I want to get LUCENE-3012 wrapped up.

 comparator API for segment versions
 ---

 Key: LUCENE-3014
 URL: https://issues.apache.org/jira/browse/LUCENE-3014
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Uwe Schindler
Priority: Critical
 Fix For: 3.2

 Attachments: LUCENE-3014.patch


 See LUCENE-3012 for an example.
 Things get ugly if you want to use SegmentInfo.getVersion()
 For example, what if we committed my patch, release 3.2, but later released 
 3.1.1 (will 3.1.1 this be whats written and returned by this function?)
 Then suddenly we broke the index format because we are using Strings here 
 without a reasonable comparator API.
 In this case one should be able to compute if the version is  3.2 safely.
 If we don't do this, and we rely upon this version information internally in 
 lucene, I think we are going to break something.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3014) comparator API for segment versions

2011-05-18 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035361#comment-13035361
 ] 

Uwe Schindler commented on LUCENE-3014:
---

It's fine, commit it!

We may look for usage of the version field in SegmentInfos, and use this 
comparator there (especially e.g. my new one for upgrades or the standard 
IndexTooOldException stuff).

But I think that should be a new issue.

 comparator API for segment versions
 ---

 Key: LUCENE-3014
 URL: https://issues.apache.org/jira/browse/LUCENE-3014
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Uwe Schindler
Priority: Critical
 Fix For: 3.2

 Attachments: LUCENE-3014.patch


 See LUCENE-3012 for an example.
 Things get ugly if you want to use SegmentInfo.getVersion()
 For example, what if we committed my patch, release 3.2, but later released 
 3.1.1 (will 3.1.1 this be whats written and returned by this function?)
 Then suddenly we broke the index format because we are using Strings here 
 without a reasonable comparator API.
 In this case one should be able to compute if the version is  3.2 safely.
 If we don't do this, and we rely upon this version information internally in 
 lucene, I think we are going to break something.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3115) Escaping stars and question marks do not work.

2011-05-18 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe resolved LUCENE-3115.
-

Resolution: Invalid

Hi Vladimir,

I'm resolving this issue as Invalid.  When you have a problem with Lucene, 
please post to the Lucene Java User mailing list first - see 
[http://lucene.apache.org/java/docs/mailinglists.html].

Also, the next time you use JIRA, please make use of the Preview button below 
the text entry box, rather than re-editing lots of times.  Every time you 
submit an edit, a message is sent to the Lucene/Solr developer mailing list.  I 
have 11 different versions of your issue description clogging up my mailbox...

Steve

 Escaping stars and question marks do not work.
 --

 Key: LUCENE-3115
 URL: https://issues.apache.org/jira/browse/LUCENE-3115
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.0
Reporter: Vladimir Kornev

 The string I have search by st*rs is indexed. Search by query 
 {code}index.query(key, *\**);{code} doesn't return matching result. This 
 query returns all not empty values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3014) comparator API for segment versions

2011-05-18 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035370#comment-13035370
 ] 

Shai Erera commented on LUCENE-3014:


Hey guys, does this affect LUCENE-2921 (or vice versa)?

Basically, I thought that we should stop writing version header in files and 
just use the release version as a header.

Robert, I don't think we are allowed to change index format versions on bug-fix 
releases and even if we do, that same bug fix would go into the 3.x release so 
it would still know how to read 3.1.1? Perhaps that was your point and I missed 
it ...

 comparator API for segment versions
 ---

 Key: LUCENE-3014
 URL: https://issues.apache.org/jira/browse/LUCENE-3014
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Uwe Schindler
Priority: Critical
 Fix For: 3.2

 Attachments: LUCENE-3014.patch


 See LUCENE-3012 for an example.
 Things get ugly if you want to use SegmentInfo.getVersion()
 For example, what if we committed my patch, release 3.2, but later released 
 3.1.1 (will 3.1.1 this be whats written and returned by this function?)
 Then suddenly we broke the index format because we are using Strings here 
 without a reasonable comparator API.
 In this case one should be able to compute if the version is  3.2 safely.
 If we don't do this, and we rely upon this version information internally in 
 lucene, I think we are going to break something.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3014) comparator API for segment versions

2011-05-18 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035376#comment-13035376
 ] 

Uwe Schindler commented on LUCENE-3014:
---

Shai: we should not change index format, but it still feels bad not to have a 
correct version comparison API. With this patch you can even compare 3.0 
against only 3 or 3.0.0.0.0 and they will be equal. And once we are at version 
10, a simple string compare is a bad idea :-)

Thats why Robert and me are against pure string comparisons.

 comparator API for segment versions
 ---

 Key: LUCENE-3014
 URL: https://issues.apache.org/jira/browse/LUCENE-3014
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Uwe Schindler
Priority: Critical
 Fix For: 3.2

 Attachments: LUCENE-3014.patch


 See LUCENE-3012 for an example.
 Things get ugly if you want to use SegmentInfo.getVersion()
 For example, what if we committed my patch, release 3.2, but later released 
 3.1.1 (will 3.1.1 this be whats written and returned by this function?)
 Then suddenly we broke the index format because we are using Strings here 
 without a reasonable comparator API.
 In this case one should be able to compute if the version is  3.2 safely.
 If we don't do this, and we rely upon this version information internally in 
 lucene, I think we are going to break something.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3014) comparator API for segment versions

2011-05-18 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035378#comment-13035378
 ] 

Robert Muir commented on LUCENE-3014:
-

{quote}
Hey guys, does this affect LUCENE-2921 (or vice versa)?
{quote}

Hi Shai, I think this helps LUCENE-2921. This is a comparator to use, when you 
want to examine the release version that created the segment (the one you added 
in LUCENE-2720). Its guaranteed to compare correctly if say, we we released 
3.10, and also if the number of trailing zeros etc are different.

In other words, if you implement LUCENE-2921 I think the idea is typically you 
will want to use this comparator when examining the version string.

{quote}
Robert, I don't think we are allowed to change index format versions on bug-fix 
releases and even if we do, that same bug fix would go into the 3.x release so 
it would still know how to read 3.1.1? Perhaps that was your point and I missed 
it ...
{quote}

On LUCENE-3012, I've proposed a fix-for version for Lucene 3.2. But we can 
discuss on that issue.


 comparator API for segment versions
 ---

 Key: LUCENE-3014
 URL: https://issues.apache.org/jira/browse/LUCENE-3014
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Uwe Schindler
Priority: Critical
 Fix For: 3.2

 Attachments: LUCENE-3014.patch


 See LUCENE-3012 for an example.
 Things get ugly if you want to use SegmentInfo.getVersion()
 For example, what if we committed my patch, release 3.2, but later released 
 3.1.1 (will 3.1.1 this be whats written and returned by this function?)
 Then suddenly we broke the index format because we are using Strings here 
 without a reasonable comparator API.
 In this case one should be able to compute if the version is  3.2 safely.
 If we don't do this, and we rely upon this version information internally in 
 lucene, I think we are going to break something.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3014) comparator API for segment versions

2011-05-18 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035379#comment-13035379
 ] 

Shai Erera commented on LUCENE-3014:


Yes, that makes sense. I can use that API in LUCENE-2921. Thanks a lot for 
saving me some effort :).

 comparator API for segment versions
 ---

 Key: LUCENE-3014
 URL: https://issues.apache.org/jira/browse/LUCENE-3014
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Uwe Schindler
Priority: Critical
 Fix For: 3.2

 Attachments: LUCENE-3014.patch


 See LUCENE-3012 for an example.
 Things get ugly if you want to use SegmentInfo.getVersion()
 For example, what if we committed my patch, release 3.2, but later released 
 3.1.1 (will 3.1.1 this be whats written and returned by this function?)
 Then suddenly we broke the index format because we are using Strings here 
 without a reasonable comparator API.
 In this case one should be able to compute if the version is  3.2 safely.
 If we don't do this, and we rely upon this version information internally in 
 lucene, I think we are going to break something.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3014) comparator API for segment versions

2011-05-18 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3014.
-

   Resolution: Fixed
Fix Version/s: 4.0

Committed revision 1124266, 1124269 (branch3x)

 comparator API for segment versions
 ---

 Key: LUCENE-3014
 URL: https://issues.apache.org/jira/browse/LUCENE-3014
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Uwe Schindler
Priority: Critical
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3014.patch


 See LUCENE-3012 for an example.
 Things get ugly if you want to use SegmentInfo.getVersion()
 For example, what if we committed my patch, release 3.2, but later released 
 3.1.1 (will 3.1.1 this be whats written and returned by this function?)
 Then suddenly we broke the index format because we are using Strings here 
 without a reasonable comparator API.
 In this case one should be able to compute if the version is  3.2 safely.
 If we don't do this, and we rely upon this version information internally in 
 lucene, I think we are going to break something.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3108) Land DocValues on trunk

2011-05-18 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035386#comment-13035386
 ] 

Simon Willnauer commented on LUCENE-3108:
-

FYI. I ran indexing benchmarks trunk vs. branch and they are super close 
together. its like 3 sec difference while branch was faster so its in the 
noise. I also indexed one docvalues field (floats) which was also about the 
same 2 sec. slower including merges etc. So we are on the save side that this 
feature does not influence indexing performance. I didn't expect anything else 
really since the only difference is a single condition in DocFieldProcessor.

 Land DocValues on trunk
 ---

 Key: LUCENE-3108
 URL: https://issues.apache.org/jira/browse/LUCENE-3108
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index, core/search, core/store
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0


 Its time to move another feature from branch to trunk. I want to start this 
 process now while still a couple of issues remain on the branch. Currently I 
 am down to a single nocommit (javadocs on DocValues.java) and a couple of 
 testing TODOs (explicit multithreaded tests and unoptimized with deletions) 
 but I think those are not worth separate issues so we can resolve them as we 
 go. 
 The already created issues (LUCENE-3075 and LUCENE-3074) should not block 
 this process here IMO, we can fix them once we are on trunk. 
 Here is a quick feature overview of what has been implemented:
  * DocValues implementations for Ints (based on PackedInts), Float 32 / 64, 
 Bytes (fixed / variable size each in sorted, straight and deref variations)
  * Integration into Flex-API, Codec provides a 
 PerDocConsumer-DocValuesConsumer (write) / PerDocValues-DocValues (read) 
  * By-Default enabled in all codecs except of PreFlex
  * Follows other flex-API patterns like non-segment reader throw UOE forcing 
 MultiPerDocValues if on DirReader etc.
  * Integration into IndexWriter, FieldInfos etc.
  * Random-testing enabled via RandomIW - injecting random DocValues into 
 documents
  * Basic checks in CheckIndex (which runs after each test)
  * FieldComparator for int and float variants (Sorting, currently directly 
 integrated into SortField, this might go into a separate DocValuesSortField 
 eventually)
  * Extended TestSort for DocValues
  * RAM-Resident random access API plus on-disk DocValuesEnum (currently only 
 sequential access) - Source.java / DocValuesEnum.java
  * Extensible Cache implementation for RAM-Resident DocValues (by-default 
 loaded into RAM only once and freed once IR is closed) - SourceCache.java
  
 PS: Currently the RAM resident API is named Source (Source.java) which seems 
 too generic. I think we should rename it into RamDocValues or something like 
 that, suggestion welcome!   
 Any comments, questions (rants :)) are very much appreciated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3012) if you use setNorm, lucene writes a headerless separate norms file

2011-05-18 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3012:


Attachment: LUCENE-3012_3x.patch

Updated patch (against branch_3x for simplicity) that uses the LUCENE-3014 
comparator API.

Because separate norms files are independent of the version that created the 
segment (e.g. one can call setNorm with 3.6 for a 3.1 segment), I think its 
really important that we fix this in 3.2 to write the header.

If there are no objections, I'd like to commit, and then regenerate the 
tentative 3.2 indexes for trunk's TestBackwardsCompatibility.

There's no need to change the fileformats.html documentation, as what we are 
doing now is actually inconsistent with it, thus the bug.


 if you use setNorm, lucene writes a headerless separate norms file
 --

 Key: LUCENE-3012
 URL: https://issues.apache.org/jira/browse/LUCENE-3012
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.2

 Attachments: LUCENE-3012.patch, LUCENE-3012_3x.patch


 In this case SR.reWrite just writes the bytes with no header...
 we should write it always.
 we can detect in these cases (segment written = 3.1) with a 
 sketchy length == maxDoc check.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #124: POMs out of sync

2011-05-18 Thread Apache Jenkins Server
Build: https://builds.apache.org/hudson/job/Lucene-Solr-Maven-trunk/124/

No tests ran.

Build Log (for compile errors):
[...truncated 7394 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3103) create a simple test that indexes and searches byte[] terms

2011-05-18 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3103.
-

Resolution: Fixed

Committed revision 1124288.

 create a simple test that indexes and searches byte[] terms
 ---

 Key: LUCENE-3103
 URL: https://issues.apache.org/jira/browse/LUCENE-3103
 Project: Lucene - Java
  Issue Type: Test
  Components: general/test
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-3103.patch


 Currently, the only good test that does this is Test2BTerms (disabled by 
 default)
 I think we should test this capability, and also have a simpler example for 
 how to do this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-18 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035422#comment-13035422
 ] 

Doron Cohen commented on LUCENE-3068:
-

fixed in trunk in r1124293.

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, 
 LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3102) Few issues with CachingCollector

2011-05-18 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-3102:
---

Attachment: LUCENE-3102-nowrap.patch

Patch adds random to TestGrouping and fixes the CHANGES typo.

Mike, TestGrouping fails w/ this seed: 
-Dtests.seed=7295196064099074191:-1632255311098421589 (it picks a no wrapping 
collector).

I guess I didn't insert the random thing properly. It's the only place where 
the test creates a CachingCollector though. I noticed that it fails on the 
'doCache' but '!doAllGroups' case.

Can you please take a look? I'm not familiar with this test, and cannot debug 
it anymore today.

 Few issues with CachingCollector
 

 Key: LUCENE-3102
 URL: https://issues.apache.org/jira/browse/LUCENE-3102
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3102-factory.patch, LUCENE-3102-nowrap.patch, 
 LUCENE-3102-nowrap.patch, LUCENE-3102.patch, LUCENE-3102.patch


 CachingCollector (introduced in LUCENE-1421) has few issues:
 # Since the wrapped Collector may support out-of-order collection, the 
 document IDs cached may be out-of-order (depends on the Query) and thus 
 replay(Collector) will forward document IDs out-of-order to a Collector that 
 may not support it.
 # It does not clear cachedScores + cachedSegs upon exceeding RAM limits
 # I think that instead of comparing curScores to null, in order to determine 
 if scores are requested, we should have a specific boolean - for clarity
 # This check if (base + nextLength  maxDocsToCache) (line 168) can be 
 relaxed? E.g., what if nextLength is, say, 512K, and I cannot satisfy the 
 maxDocsToCache constraint, but if it was 10K I would? Wouldn't we still want 
 to try and cache them?
 Also:
 * The TODO in line 64 (having Collector specify needsScores()) -- why do we 
 need that if CachingCollector ctor already takes a boolean cacheScores? I 
 think it's better defined explicitly than implicitly?
 * Let's introduce a factory method for creating a specialized version if 
 scoring is requested / not (i.e., impl the TODO in line 189)
 * I think it's a useful collector, which stands on its own and not specific 
 to grouping. Can we move it to core?
 * How about using OpenBitSet instead of int[] for doc IDs?
 ** If the number of hits is big, we'd gain some RAM back, and be able to 
 cache more entries
 ** NOTE: OpenBitSet can only be used for in-order collection only. So we can 
 use that if the wrapped Collector does not support out-of-order
 * Do you think we can modify this Collector to not necessarily wrap another 
 Collector? We have such Collector which stores (in-memory) all matching doc 
 IDs + scores (if required). Those are later fed into several processes that 
 operate on them (e.g. fetch more info from the index etc.). I am thinking, we 
 can make CachingCollector *optionally* wrap another Collector and then 
 someone can reuse it by setting RAM limit to unlimited (we should have a 
 constant for that) in order to simply collect all matching docs + scores.
 * I think a set of dedicated unit tests for this class alone would be good.
 That's it so far. Perhaps, if we do all of the above, more things will pop up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Javadoc warnings failing the branch_3x build [was: RE: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync]

2011-05-18 Thread Michael McCandless
I think it's resolved now?  Sorry!

But it's great we now catch jdoc errors before releasing... should we
fix our while(1) test to also catch this (not just nightly / maven)?

Mike

http://blog.mikemccandless.com

On Wed, May 18, 2011 at 9:01 AM, Uwe Schindler u...@thetaphi.de wrote:
 Hi Mike,

 Same problem in trunk!

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Wednesday, May 18, 2011 2:27 PM
 To: dev@lucene.apache.org
 Subject: Re: Javadoc warnings failing the branch_3x build [was: RE:
 [JENKINS-
 MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync]

 Ugh my bad, sorry.  I'll fix!

 Mike

 http://blog.mikemccandless.com

 On Wed, May 18, 2011 at 8:22 AM, Steven A Rowe sar...@syr.edu
 wrote:
  This build failed because of Javadoc warnings:
 
   [javadoc] Constructing Javadoc information...
   [javadoc] Standard Doclet version 1.5.0_16-p9
   [javadoc] Building tree for all the packages and classes...
   [javadoc]
  .../org/apache/lucene/search/grouping/AllGroupsCollector.java:45:
  warning - Tag @link: reference not found: SentinelIntSet
   [javadoc]
  .../org/apache/lucene/search/grouping/AllGroupsCollector.java:45:
  warning - Tag @link: reference not found: SentinelIntSet
   [javadoc]
  .../org/apache/lucene/search/grouping/AllGroupsCollector.java:62:
  warning - Tag @link: reference not found: SentinelIntSet
   [javadoc]
  .../org/apache/lucene/search/grouping/AllGroupsCollector.java:76:
  warning - Tag @link: reference not found: SentinelIntSet
 
  -Original Message-
  From: Apache Jenkins Server [mailto:hud...@hudson.apache.org]
  Sent: Wednesday, May 18, 2011 7:36 AM
  To: dev@lucene.apache.org
  Subject: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of
 sync
 
  Build:
  https://builds.apache.org/hudson/job/Lucene-Solr-Maven-3.x/126/
 
  No tests ran.
 
  Build Log (for compile errors):
  [...truncated 7931 lines...]
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Javadoc warnings failing the branch_3x build [was: RE: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync]

2011-05-18 Thread Robert Muir
On Wed, May 18, 2011 at 11:31 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 I think it's resolved now?  Sorry!

 But it's great we now catch jdoc errors before releasing... should we
 fix our while(1) test to also catch this (not just nightly / maven)?


yeah couldn't we just fire off the 'javadocs-all' task after
compiling? this takes 10 seconds on my computer and it could catch
these things quicker.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-18 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-3068.
-

Resolution: Fixed

fix merged to 3x in r1124302.

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, 
 LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Javadoc warnings failing the branch_3x build [was: RE: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync]

2011-05-18 Thread Uwe Schindler
All fine now!

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Wednesday, May 18, 2011 5:31 PM
 To: dev@lucene.apache.org
 Subject: Re: Javadoc warnings failing the branch_3x build [was: RE:
[JENKINS-
 MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync]
 
 I think it's resolved now?  Sorry!
 
 But it's great we now catch jdoc errors before releasing... should we fix
our
 while(1) test to also catch this (not just nightly / maven)?
 
 Mike
 
 http://blog.mikemccandless.com
 
 On Wed, May 18, 2011 at 9:01 AM, Uwe Schindler u...@thetaphi.de
 wrote:
  Hi Mike,
 
  Same problem in trunk!
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Michael McCandless [mailto:luc...@mikemccandless.com]
  Sent: Wednesday, May 18, 2011 2:27 PM
  To: dev@lucene.apache.org
  Subject: Re: Javadoc warnings failing the branch_3x build [was: RE:
  [JENKINS-
  MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync]
 
  Ugh my bad, sorry.  I'll fix!
 
  Mike
 
  http://blog.mikemccandless.com
 
  On Wed, May 18, 2011 at 8:22 AM, Steven A Rowe sar...@syr.edu
  wrote:
   This build failed because of Javadoc warnings:
  
    [javadoc] Constructing Javadoc information...
    [javadoc] Standard Doclet version 1.5.0_16-p9
    [javadoc] Building tree for all the packages and classes...
    [javadoc]
   .../org/apache/lucene/search/grouping/AllGroupsCollector.java:45:
   warning - Tag @link: reference not found: SentinelIntSet
    [javadoc]
   .../org/apache/lucene/search/grouping/AllGroupsCollector.java:45:
   warning - Tag @link: reference not found: SentinelIntSet
    [javadoc]
   .../org/apache/lucene/search/grouping/AllGroupsCollector.java:62:
   warning - Tag @link: reference not found: SentinelIntSet
    [javadoc]
   .../org/apache/lucene/search/grouping/AllGroupsCollector.java:76:
   warning - Tag @link: reference not found: SentinelIntSet
  
   -Original Message-
   From: Apache Jenkins Server [mailto:hud...@hudson.apache.org]
   Sent: Wednesday, May 18, 2011 7:36 AM
   To: dev@lucene.apache.org
   Subject: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of
  sync
  
   Build:
   https://builds.apache.org/hudson/job/Lucene-Solr-Maven-3.x/126/
  
   No tests ran.
  
   Build Log (for compile errors):
   [...truncated 7931 lines...]
  
  
  
   --
   --- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
   additional commands, e-mail: dev-h...@lucene.apache.org
  
  
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Javadoc warnings failing the branch_3x build [was: RE: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync]

2011-05-18 Thread Uwe Schindler
Hi,

Definitely not. Javadocs-all takes 2 minutes here, so please don’t bundle it 
with compile, I will stop working on Lucene then, that does not help during 
development (I use no Eclipse to develop...).

We can trigger this for Hudson half-hourly.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Wednesday, May 18, 2011 5:35 PM
 To: dev@lucene.apache.org
 Subject: Re: Javadoc warnings failing the branch_3x build [was: RE: [JENKINS-
 MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync]
 
 On Wed, May 18, 2011 at 11:31 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
  I think it's resolved now?  Sorry!
 
  But it's great we now catch jdoc errors before releasing... should we
  fix our while(1) test to also catch this (not just nightly / maven)?
 
 
 yeah couldn't we just fire off the 'javadocs-all' task after compiling? this 
 takes
 10 seconds on my computer and it could catch these things quicker.
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3102) Few issues with CachingCollector

2011-05-18 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3102:
---

Attachment: LUCENE-3102.patch

Patch.

I think I fixed TestGrouping to exercise the no wrapped collector and replay 
twice case for CachingCollector.

 Few issues with CachingCollector
 

 Key: LUCENE-3102
 URL: https://issues.apache.org/jira/browse/LUCENE-3102
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3102-factory.patch, LUCENE-3102-nowrap.patch, 
 LUCENE-3102-nowrap.patch, LUCENE-3102.patch, LUCENE-3102.patch, 
 LUCENE-3102.patch


 CachingCollector (introduced in LUCENE-1421) has few issues:
 # Since the wrapped Collector may support out-of-order collection, the 
 document IDs cached may be out-of-order (depends on the Query) and thus 
 replay(Collector) will forward document IDs out-of-order to a Collector that 
 may not support it.
 # It does not clear cachedScores + cachedSegs upon exceeding RAM limits
 # I think that instead of comparing curScores to null, in order to determine 
 if scores are requested, we should have a specific boolean - for clarity
 # This check if (base + nextLength  maxDocsToCache) (line 168) can be 
 relaxed? E.g., what if nextLength is, say, 512K, and I cannot satisfy the 
 maxDocsToCache constraint, but if it was 10K I would? Wouldn't we still want 
 to try and cache them?
 Also:
 * The TODO in line 64 (having Collector specify needsScores()) -- why do we 
 need that if CachingCollector ctor already takes a boolean cacheScores? I 
 think it's better defined explicitly than implicitly?
 * Let's introduce a factory method for creating a specialized version if 
 scoring is requested / not (i.e., impl the TODO in line 189)
 * I think it's a useful collector, which stands on its own and not specific 
 to grouping. Can we move it to core?
 * How about using OpenBitSet instead of int[] for doc IDs?
 ** If the number of hits is big, we'd gain some RAM back, and be able to 
 cache more entries
 ** NOTE: OpenBitSet can only be used for in-order collection only. So we can 
 use that if the wrapped Collector does not support out-of-order
 * Do you think we can modify this Collector to not necessarily wrap another 
 Collector? We have such Collector which stores (in-memory) all matching doc 
 IDs + scores (if required). Those are later fed into several processes that 
 operate on them (e.g. fetch more info from the index etc.). I am thinking, we 
 can make CachingCollector *optionally* wrap another Collector and then 
 someone can reuse it by setting RAM limit to unlimited (we should have a 
 constant for that) in order to simply collect all matching docs + scores.
 * I think a set of dedicated unit tests for this class alone would be good.
 That's it so far. Perhaps, if we do all of the above, more things will pop up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3012) if you use setNorm, lucene writes a headerless separate norms file

2011-05-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035459#comment-13035459
 ] 

Michael McCandless commented on LUCENE-3012:


I agree this is important to fix!

Patch looks good.


 if you use setNorm, lucene writes a headerless separate norms file
 --

 Key: LUCENE-3012
 URL: https://issues.apache.org/jira/browse/LUCENE-3012
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.2

 Attachments: LUCENE-3012.patch, LUCENE-3012_3x.patch


 In this case SR.reWrite just writes the bytes with no header...
 we should write it always.
 we can detect in these cases (segment written = 3.1) with a 
 sketchy length == maxDoc check.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos, Remove VectorSI subclassing from SegmentInfos more refactoring

2011-05-18 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035474#comment-13035474
 ] 

Uwe Schindler commented on LUCENE-3084:
---

Committed trunk revision: 1124307, 1124316 (copy-paste error)

Now backporting...

 MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos, 
 Remove VectorSI subclassing from SegmentInfos  more refactoring
 --

 Key: LUCENE-3084
 URL: https://issues.apache.org/jira/browse/LUCENE-3084
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, LUCENE-3084.patch


 SegmentInfos carries a bunch of fields beyond the list of SI, but for merging 
 purposes these fields are unused.
 We should cutover to ListSI instead.
 Also SegmentInfos subclasses VectorSI, this should be removed and the 
 collections be hidden inside the class. We can add unmodifiable views on it 
 (asList(), asSet()).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Javadoc warnings failing the branch_3x build [was: RE: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #126: POMs out of sync]

2011-05-18 Thread Robert Muir
On Wed, May 18, 2011 at 11:44 AM, Uwe Schindler u...@thetaphi.de wrote:
 Hi,

 Definitely not. Javadocs-all takes 2 minutes here, so please don’t bundle it 
 with compile, I will stop working on Lucene then, that does not help during 
 development (I use no Eclipse to develop...).

 We can trigger this for Hudson half-hourly.

Hi uwe... this is what i was asking for, to run it after compile in
the half-hourly.

i'm sorry your computer has such as slow io system!

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2524) Adding grouping to Solr 3x

2011-05-18 Thread Martijn van Groningen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen updated SOLR-2524:


Attachment: SOLR-2524.patch

Attached the initial patch.

* Patch is based on what is in the trunk. 
** Integrated the grouping contrib collectors
** Same response formats.
** All parameters except group.query and group.func are supported.
** Computed DocSet (for facetComponent and StatsComponent) is based the 
ungrouped result.
* Also integrated the caching collector. For this I added the 
group.cache=true|false and group.cache.maxSize=[number] parameters.

Things still todo:
* Integrate AllGroupsCollector for total count based on groups.
* Create a Solr Test for grouping
* Cleanup / Refactor / java doc

 Adding grouping to Solr 3x
 --

 Key: SOLR-2524
 URL: https://issues.apache.org/jira/browse/SOLR-2524
 Project: Solr
  Issue Type: New Feature
Affects Versions: 3.2
Reporter: Martijn van Groningen
 Attachments: SOLR-2524.patch


 Grouping was recently added to Lucene 3x. See LUCENE-1421 for more 
 information.
 I think it would be nice if we expose this functionality also to the Solr 
 users that are bound to a 3.x version.
 The grouping feature added to Lucene is currently a subset of the 
 functionality that Solr 4.0-trunk offers. Mainly it doesn't support grouping 
 by function / query.
 The work involved getting the grouping contrib to work on Solr 3x is 
 acceptable. I have it more or less running here. It supports the response 
 format and request parameters (expect: group.query and group.func) described 
 in the FieldCollapse page on the Solr wiki.
 I think it would be great if this is included in the Solr 3.2 release. Many 
 people are using grouping as patch now and this would help them a lot. Any 
 thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos, Remove VectorSI subclassing from SegmentInfos more refactoring

2011-05-18 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3084:
--

Attachment: LUCENE-3084-3.x-only.patch

Merged patch. Will commit now.

 MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos, 
 Remove VectorSI subclassing from SegmentInfos  more refactoring
 --

 Key: LUCENE-3084
 URL: https://issues.apache.org/jira/browse/LUCENE-3084
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3084-3.x-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084.patch


 SegmentInfos carries a bunch of fields beyond the list of SI, but for merging 
 purposes these fields are unused.
 We should cutover to ListSI instead.
 Also SegmentInfos subclasses VectorSI, this should be removed and the 
 collections be hidden inside the class. We can add unmodifiable views on it 
 (asList(), asSet()).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3084) MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos, Remove VectorSI subclassing from SegmentInfos more refactoring

2011-05-18 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-3084.
---

Resolution: Fixed

Committed 3.x revision: 1124339

 MergePolicy.OneMerge.segments should be ListSegmentInfo not SegmentInfos, 
 Remove VectorSI subclassing from SegmentInfos  more refactoring
 --

 Key: LUCENE-3084
 URL: https://issues.apache.org/jira/browse/LUCENE-3084
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3084-3.x-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
 LUCENE-3084-trunk-only.patch, LUCENE-3084.patch


 SegmentInfos carries a bunch of fields beyond the list of SI, but for merging 
 purposes these fields are unused.
 We should cutover to ListSI instead.
 Also SegmentInfos subclasses VectorSI, this should be removed and the 
 collections be hidden inside the class. We can add unmodifiable views on it 
 (asList(), asSet()).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2883) Consolidate Solr Lucene FunctionQuery into modules

2011-05-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035508#comment-13035508
 ] 

Michael McCandless commented on LUCENE-2883:


Thanks Chris!  The patch applies cleanly for me (after running the svn
commands) and everything compiles.

I think the patch is a great start, ie, we will need the low level
infra used by FQs in the module.

bq. MutableValue  MutableFloatValue are used in the FunctionQuery code so I've 
pulled them into the module too. Should all the other Mutable*Value classes 
come too? Should they go into some other module?

I think we should move Mutable* over?  Grouping module will need all
of these, I think?  (Ie if we want to allow users to group by
arbitrary typed field).

bq. What to return in ValueSource#getSortField which currently returns a 
SortField which implements SolrSortField. This is currently commented out so we 
can determine what best to do. Having this commented out breaks the Solr tests.

Hmm good question.  This looks to be related to sorting by FQ
(SOLR-1297) because some FQs need to be weighted.  Not sure what to do
here yet... which FQs in particular require this?

bq. Many of the ValueSources and DocValues in Solr could be moved to the 
module, but not all of them. Some have dependencies on Solr dependencies / Solr 
core code.

I think apply 90/10 rule here?  Start with the easy-to-move queries?
We don't need initial go to be perfect... progress not perfection.

bq. Lucene core's FunctionQuery stuff needs to be removed.

Do you have a sense of whether Solr's FQs are a superset of Lucene's?
Ie, is there anything Lucene's FQs can do that Solr's can't?

Probably, as a separate issue, we should also move contrib/queries -
modules/queries.  And I think the cool nested queries (LUCENE-2454)
would also go into this module...


 Consolidate Solr   Lucene FunctionQuery into modules
 -

 Key: LUCENE-2883
 URL: https://issues.apache.org/jira/browse/LUCENE-2883
 Project: Lucene - Java
  Issue Type: Task
  Components: core/search
Affects Versions: 4.0
Reporter: Simon Willnauer
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2883.patch


 Spin-off from the [dev list | 
 http://www.mail-archive.com/dev@lucene.apache.org/msg13261.html]  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3109) Rename FieldsConsumer to InvertedFieldsConsumer

2011-05-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035510#comment-13035510
 ] 

Michael McCandless commented on LUCENE-3109:


+1

 Rename FieldsConsumer to InvertedFieldsConsumer
 ---

 Key: LUCENE-3109
 URL: https://issues.apache.org/jira/browse/LUCENE-3109
 Project: Lucene - Java
  Issue Type: Task
  Components: core/codecs
Affects Versions: 4.0
Reporter: Simon Willnauer
Priority: Minor
 Fix For: 4.0


 The name FieldsConsumer is missleading here it really is an 
 InvertedFieldsConsumer and since we are extending codecs to consume 
 non-inverted Fields we should be clear here. Same applies to Fields.java as 
 well as FieldsProducer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3116) pendingCommit in IndexWriter is not thoroughly tested

2011-05-18 Thread Uwe Schindler (JIRA)
pendingCommit in IndexWriter is not thoroughly tested
-

 Key: LUCENE-3116
 URL: https://issues.apache.org/jira/browse/LUCENE-3116
 Project: Lucene - Java
  Issue Type: Test
  Components: core/index
Affects Versions: 3.2, 4.0
Reporter: Uwe Schindler


When working on LUCENE-3084, I had a copy-paste error in my patch (see revision 
1124307 and corrected in 1124316), I replaced pendingCommit by segmentInfos in 
IndexWriter, corrected by the following patch:

{noformat}
--- lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexWriter.java 
(original)
+++ lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexWriter.java 
Wed May 18 16:16:29 2011
@@ -2552,7 +2552,7 @@ public class IndexWriter implements Clos
 lastCommitChangeCount = pendingCommitChangeCount;
 segmentInfos.updateGeneration(pendingCommit);
 segmentInfos.setUserData(pendingCommit.getUserData());
-rollbackSegments = segmentInfos.createBackupSegmentInfos(true);
+rollbackSegments = pendingCommit.createBackupSegmentInfos(true);
 deleter.checkpoint(pendingCommit, true);
   } finally {
 // Matches the incRef done in startCommit:
{noformat}

This did not cause any test failure.

On IRC, Mike said:

{quote}
[19:21] mikemccand: ThetaPh1: hmm
[19:21] mikemccand: well
[19:22] mikemccand: pendingCommit and sis only differ while commit() is running
[19:22] mikemccand: ie if a thread starts commit
[19:22] mikemccand: but fsync is taking a long time
[19:22] mikemccand: and another thread makes a change to sis
[19:22] ThetaPh1: ok so hard to find that bug
[19:22] mikemccand: we need our mock dir wrapper to sometimes take a long time 
syncing
{quote}

Maybe we need such a test, I feel bad when such stupid changes don't make any 
test fail.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3116) pendingCommit in IndexWriter is not thoroughly tested

2011-05-18 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3116:
---

Fix Version/s: 4.0
   3.2

 pendingCommit in IndexWriter is not thoroughly tested
 -

 Key: LUCENE-3116
 URL: https://issues.apache.org/jira/browse/LUCENE-3116
 Project: Lucene - Java
  Issue Type: Test
  Components: core/index
Affects Versions: 3.2, 4.0
Reporter: Uwe Schindler
 Fix For: 3.2, 4.0


 When working on LUCENE-3084, I had a copy-paste error in my patch (see 
 revision 1124307 and corrected in 1124316), I replaced pendingCommit by 
 segmentInfos in IndexWriter, corrected by the following patch:
 {noformat}
 --- lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexWriter.java 
 (original)
 +++ lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexWriter.java 
 Wed May 18 16:16:29 2011
 @@ -2552,7 +2552,7 @@ public class IndexWriter implements Clos
  lastCommitChangeCount = pendingCommitChangeCount;
  segmentInfos.updateGeneration(pendingCommit);
  segmentInfos.setUserData(pendingCommit.getUserData());
 -rollbackSegments = segmentInfos.createBackupSegmentInfos(true);
 +rollbackSegments = pendingCommit.createBackupSegmentInfos(true);
  deleter.checkpoint(pendingCommit, true);
} finally {
  // Matches the incRef done in startCommit:
 {noformat}
 This did not cause any test failure.
 On IRC, Mike said:
 {quote}
 [19:21]   mikemccand: ThetaPh1: hmm
 [19:21]   mikemccand: well
 [19:22]   mikemccand: pendingCommit and sis only differ while commit() is 
 running
 [19:22]   mikemccand: ie if a thread starts commit
 [19:22]   mikemccand: but fsync is taking a long time
 [19:22]   mikemccand: and another thread makes a change to sis
 [19:22]   ThetaPh1: ok so hard to find that bug
 [19:22]   mikemccand: we need our mock dir wrapper to sometimes take a 
 long time syncing
 {quote}
 Maybe we need such a test, I feel bad when such stupid changes don't make any 
 test fail.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-152) [PATCH] KStem for Lucene

2011-05-18 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035519#comment-13035519
 ] 

Steven Rowe commented on LUCENE-152:


bq. Code is fine to afaik: http://www.apache.org/legal/3party.html

My interpretation of this is that we can directly include the KStem source code 
in Lucene/Solr's source tree, and then modify it at will, since its license 
(BSD style) is in Category A (authorized licenses).

Thoughts?

 [PATCH] KStem for Lucene
 

 Key: LUCENE-152
 URL: https://issues.apache.org/jira/browse/LUCENE-152
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: unspecified
 Environment: Operating System: other
 Platform: Other
Reporter: Otis Gospodnetic
Priority: Minor

 September 10th 2003 contributionn from Sergio Guzman-Lara 
 guz...@cs.umass.edu
 Original email:
 Hi all,
   I have ported the kstem stemmer to Java and incorporated it to 
 Lucene. You can get the source code (Kstem.jar) from the following website:
 http://ciir.cs.umass.edu/downloads/
   Just click on KStem Java Implementation (you will need to register 
 your e-mail, for free of course, with the CIIR --Center for Intelligent 
 Information Retrieval, UMass -- and get an access code).
 Content of Kstem.jar:
 java/org/apache/lucene/analysis/KStemData1.java
 java/org/apache/lucene/analysis/KStemData2.java
 java/org/apache/lucene/analysis/KStemData3.java
 java/org/apache/lucene/analysis/KStemData4.java
 java/org/apache/lucene/analysis/KStemData5.java
 java/org/apache/lucene/analysis/KStemData6.java
 java/org/apache/lucene/analysis/KStemData7.java
 java/org/apache/lucene/analysis/KStemData8.java
 java/org/apache/lucene/analysis/KStemFilter.java
 java/org/apache/lucene/analysis/KStemmer.java
 KStemData1.java, ..., KStemData8.java   Contain several lists of words 
 used by Kstem
 KStemmer.java  Implements the Kstem algorithm 
 KStemFilter.java Extends TokenFilter applying Kstem
 To compile
 unjar the file Kstem.jar to Lucene's src directory, and compile it 
 there. 
 What is Kstem?
   A stemmer designed by Bob Krovetz (for more information see 
 http://ciir.cs.umass.edu/pubfiles/ir-35.pdf). 
 Copyright issues
   This is open source. The actual license agreement is included at the 
 top of every source file.
  Any comments/questions/suggestions are welcome,
   Sergio Guzman-Lara
   Senior Research Fellow
   CIIR UMass

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3116) pendingCommit in IndexWriter is not thoroughly tested

2011-05-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035520#comment-13035520
 ] 

Michael McCandless commented on LUCENE-3116:


It's great you caught this on backport Uwe!  And, yes, spooky no tests failed...

It'll be challenging to have a test catch this.  Fixing MockDirWrapper to 
sometimes take unusually long time to do the fsync is a great start.  What 
this change would have caused is .rollback() would roll back to a wrong copy of 
the sis, ie not a commit point but rather a commit point plus some additional 
flushes.

 pendingCommit in IndexWriter is not thoroughly tested
 -

 Key: LUCENE-3116
 URL: https://issues.apache.org/jira/browse/LUCENE-3116
 Project: Lucene - Java
  Issue Type: Test
  Components: core/index
Affects Versions: 3.2, 4.0
Reporter: Uwe Schindler
 Fix For: 3.2, 4.0


 When working on LUCENE-3084, I had a copy-paste error in my patch (see 
 revision 1124307 and corrected in 1124316), I replaced pendingCommit by 
 segmentInfos in IndexWriter, corrected by the following patch:
 {noformat}
 --- lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexWriter.java 
 (original)
 +++ lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexWriter.java 
 Wed May 18 16:16:29 2011
 @@ -2552,7 +2552,7 @@ public class IndexWriter implements Clos
  lastCommitChangeCount = pendingCommitChangeCount;
  segmentInfos.updateGeneration(pendingCommit);
  segmentInfos.setUserData(pendingCommit.getUserData());
 -rollbackSegments = segmentInfos.createBackupSegmentInfos(true);
 +rollbackSegments = pendingCommit.createBackupSegmentInfos(true);
  deleter.checkpoint(pendingCommit, true);
} finally {
  // Matches the incRef done in startCommit:
 {noformat}
 This did not cause any test failure.
 On IRC, Mike said:
 {quote}
 [19:21]   mikemccand: ThetaPh1: hmm
 [19:21]   mikemccand: well
 [19:22]   mikemccand: pendingCommit and sis only differ while commit() is 
 running
 [19:22]   mikemccand: ie if a thread starts commit
 [19:22]   mikemccand: but fsync is taking a long time
 [19:22]   mikemccand: and another thread makes a change to sis
 [19:22]   ThetaPh1: ok so hard to find that bug
 [19:22]   mikemccand: we need our mock dir wrapper to sometimes take a 
 long time syncing
 {quote}
 Maybe we need such a test, I feel bad when such stupid changes don't make any 
 test fail.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2524) Adding grouping to Solr 3x

2011-05-18 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned SOLR-2524:


Assignee: Michael McCandless

 Adding grouping to Solr 3x
 --

 Key: SOLR-2524
 URL: https://issues.apache.org/jira/browse/SOLR-2524
 Project: Solr
  Issue Type: New Feature
Affects Versions: 3.2
Reporter: Martijn van Groningen
Assignee: Michael McCandless
 Attachments: SOLR-2524.patch


 Grouping was recently added to Lucene 3x. See LUCENE-1421 for more 
 information.
 I think it would be nice if we expose this functionality also to the Solr 
 users that are bound to a 3.x version.
 The grouping feature added to Lucene is currently a subset of the 
 functionality that Solr 4.0-trunk offers. Mainly it doesn't support grouping 
 by function / query.
 The work involved getting the grouping contrib to work on Solr 3x is 
 acceptable. I have it more or less running here. It supports the response 
 format and request parameters (expect: group.query and group.func) described 
 in the FieldCollapse page on the Solr wiki.
 I think it would be great if this is included in the Solr 3.2 release. Many 
 people are using grouping as patch now and this would help them a lot. Any 
 thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2524) Adding grouping to Solr 3x

2011-05-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035526#comment-13035526
 ] 

Michael McCandless commented on SOLR-2524:
--

Awesome, that was fast!

Maybe rename group.cache.maxSize - .maxSizeMB?  (So it's clear what the units 
are).

Should we default group.cache to true?  (It's false now?).

When you get the top groups from collector2, should you pass in offset instead 
of 0?  (Hmm -- maybe groupOffset?  It seems like you're using offset for both 
the first  second phase collectors?  Maybe I'm confused...).

bq. Computed DocSet (for facetComponent and StatsComponent) is based the 
ungrouped result.

This matches how Solr does grouping on trunk right?

 Adding grouping to Solr 3x
 --

 Key: SOLR-2524
 URL: https://issues.apache.org/jira/browse/SOLR-2524
 Project: Solr
  Issue Type: New Feature
Affects Versions: 3.2
Reporter: Martijn van Groningen
 Attachments: SOLR-2524.patch


 Grouping was recently added to Lucene 3x. See LUCENE-1421 for more 
 information.
 I think it would be nice if we expose this functionality also to the Solr 
 users that are bound to a 3.x version.
 The grouping feature added to Lucene is currently a subset of the 
 functionality that Solr 4.0-trunk offers. Mainly it doesn't support grouping 
 by function / query.
 The work involved getting the grouping contrib to work on Solr 3x is 
 acceptable. I have it more or less running here. It supports the response 
 format and request parameters (expect: group.query and group.func) described 
 in the FieldCollapse page on the Solr wiki.
 I think it would be great if this is included in the Solr 3.2 release. Many 
 people are using grouping as patch now and this would help them a lot. Any 
 thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3012) if you use setNorm, lucene writes a headerless separate norms file

2011-05-18 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3012.
-

   Resolution: Fixed
Fix Version/s: 4.0

Committed revision 1124366, 1124369

 if you use setNorm, lucene writes a headerless separate norms file
 --

 Key: LUCENE-3012
 URL: https://issues.apache.org/jira/browse/LUCENE-3012
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3012.patch, LUCENE-3012_3x.patch


 In this case SR.reWrite just writes the bytes with no header...
 we should write it always.
 we can detect in these cases (segment written = 3.1) with a 
 sketchy length == maxDoc check.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3102) Few issues with CachingCollector

2011-05-18 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-3102.


Resolution: Fixed

Thanks Mike. Seems that TestGrouping is indeed fixed.

Committed revision 1124378 (3x).
Committed revision 1124379 (trunk).

Resolving this. We can tackle OBS and other optimizations in subsequent issues 
if the need arises.

Thanks Mike !

 Few issues with CachingCollector
 

 Key: LUCENE-3102
 URL: https://issues.apache.org/jira/browse/LUCENE-3102
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3102-factory.patch, LUCENE-3102-nowrap.patch, 
 LUCENE-3102-nowrap.patch, LUCENE-3102.patch, LUCENE-3102.patch, 
 LUCENE-3102.patch


 CachingCollector (introduced in LUCENE-1421) has few issues:
 # Since the wrapped Collector may support out-of-order collection, the 
 document IDs cached may be out-of-order (depends on the Query) and thus 
 replay(Collector) will forward document IDs out-of-order to a Collector that 
 may not support it.
 # It does not clear cachedScores + cachedSegs upon exceeding RAM limits
 # I think that instead of comparing curScores to null, in order to determine 
 if scores are requested, we should have a specific boolean - for clarity
 # This check if (base + nextLength  maxDocsToCache) (line 168) can be 
 relaxed? E.g., what if nextLength is, say, 512K, and I cannot satisfy the 
 maxDocsToCache constraint, but if it was 10K I would? Wouldn't we still want 
 to try and cache them?
 Also:
 * The TODO in line 64 (having Collector specify needsScores()) -- why do we 
 need that if CachingCollector ctor already takes a boolean cacheScores? I 
 think it's better defined explicitly than implicitly?
 * Let's introduce a factory method for creating a specialized version if 
 scoring is requested / not (i.e., impl the TODO in line 189)
 * I think it's a useful collector, which stands on its own and not specific 
 to grouping. Can we move it to core?
 * How about using OpenBitSet instead of int[] for doc IDs?
 ** If the number of hits is big, we'd gain some RAM back, and be able to 
 cache more entries
 ** NOTE: OpenBitSet can only be used for in-order collection only. So we can 
 use that if the wrapped Collector does not support out-of-order
 * Do you think we can modify this Collector to not necessarily wrap another 
 Collector? We have such Collector which stores (in-memory) all matching doc 
 IDs + scores (if required). Those are later fed into several processes that 
 operate on them (e.g. fetch more info from the index etc.). I am thinking, we 
 can make CachingCollector *optionally* wrap another Collector and then 
 someone can reuse it by setting RAM limit to unlimited (we should have a 
 constant for that) in order to simply collect all matching docs + scores.
 * I think a set of dedicated unit tests for this class alone would be good.
 That's it so far. Perhaps, if we do all of the above, more things will pop up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2524) Adding grouping to Solr 3x

2011-05-18 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035586#comment-13035586
 ] 

Martijn van Groningen commented on SOLR-2524:
-

bq. Maybe rename group.cache.maxSize - .maxSizeMB? (So it's clear what the 
units are).
Yes that is a more descriptive name.

bq. Should we default group.cache to true? (It's false now?).
That makes sense. 

I think that if the cachedCollector.isCached() returns false we should put 
something in the response indication that the cache wasn't used because it hit 
the cache.maxSizeMB limit. Otherwise the nobody will no whether the cache was 
utilized.

When I was playing around with the cache options I noticed that searching 
without cache (~350 ms) was faster then with cache (~500 ms) on a 10M index 
with 1711 distinct group values. This is not what I'd expect.

bq. When you get the top groups from collector2, should you pass in offset 
instead of 0? (Hmm – maybe groupOffset? It seems like you're using offset for 
both the first  second phase collectors? Maybe I'm confused...).
I know that is confusing, but the DocSlice expects offset + len documents. So 
that was a quick of doing that. I will clean that up.

bq. This matches how Solr does grouping on trunk right?
Yes it does. I'm already thinking about a new collector that collects all most 
relevant documents of all groups. This collector should produce something like 
an OpenBitSet. We can use the OpenBitSet to create a DocSet. I think this 
should be implemented in a different issue.

 Adding grouping to Solr 3x
 --

 Key: SOLR-2524
 URL: https://issues.apache.org/jira/browse/SOLR-2524
 Project: Solr
  Issue Type: New Feature
Affects Versions: 3.2
Reporter: Martijn van Groningen
Assignee: Michael McCandless
 Attachments: SOLR-2524.patch


 Grouping was recently added to Lucene 3x. See LUCENE-1421 for more 
 information.
 I think it would be nice if we expose this functionality also to the Solr 
 users that are bound to a 3.x version.
 The grouping feature added to Lucene is currently a subset of the 
 functionality that Solr 4.0-trunk offers. Mainly it doesn't support grouping 
 by function / query.
 The work involved getting the grouping contrib to work on Solr 3x is 
 acceptable. I have it more or less running here. It supports the response 
 format and request parameters (expect: group.query and group.func) described 
 in the FieldCollapse page on the Solr wiki.
 I think it would be great if this is included in the Solr 3.2 release. Many 
 people are using grouping as patch now and this would help them a lot. Any 
 thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene/Solr JIRA

2011-05-18 Thread Simon Willnauer
On Tue, May 17, 2011 at 9:23 PM, Steven A Rowe sar...@syr.edu wrote:
 On 5/17/2011 at 3:02 PM, Chris Hostetter wrote:
 If we were starting from scratch, i'd agree with you that having a single
 Jira project makes more sense, but given where we are today, i think we
 should probably keep them distinct -- partly from a pain of migration
 standpoint on our end, but also from a user expecations standpoint -- i
 think the Solr users/community as a whole is use to the existence of the
 SOLR project in Jira, and use to the SOLR-* issue naming convention, and
 it would likely be more confusing for *them* to change now.

just a few words. I disagree here with you hoss IMO the suggestion to
merge JIRA would help to move us closer together and help close the
gap between Solr and Lucene. I think we need to start identifying us
with what we work on. It feels like we don't do that today and we
should work hard to stop that and make hard breaks that might hurt but
I think its the way to go. Drawn from what has happend in the last
weeks / month it would be good to start from the scratch at least in
JIRA. I'd go even further and nuke the name entirely and call
everything lucene - I know not many folks like the idea and it might
take a while to bake in but I think for us (PMC / Committers) and the
community it would be good. I am not calling a vote here just stating
my opinion.

here is my +1 to the JIRA suggestion

Simon

 +1




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene/Solr JIRA

2011-05-18 Thread Shai Erera
I didn't know that it was decided that top-level modules issues go under the
Lucene project. That indeed reduces some of the confusion (as long as users
will adhere to it, but I guess it's also up to us to enforce it).

So Lucene project becomes everything that is not precisely Solr, i.e. not
under solr/?

I think that one day we will need to merge. The more modules we'll have, the
less issues will be open under Solr project (if it uses those modules). I
agree w/ what Simon wrote - users will get used to it, so that's not a good
reason IMO. Also, if we keep claiming the user base is different, then I
think we have a problem ... every Solr user is also a Lucene user
(eventually) -- true, some only interact w/ Solr REST API, and may not
know/care Lucene is run at the lower level. But for the community's sake, I
think merging JIRA will only help down the road.

Not very related to JIRA merge, but still ... if I look at Lucene project
today, I see ~30 issues marked for 3.2, so I think to myself well, 3.2 in
maybe a month seems reasonable. But then I look at Solr project and see
~230 marked for 3.2 and I think if we need to release both Lucene and Solr,
then we're definitely far from 3.2.

Now, I don't know if Solr's ~230 is just bad JIRA management, and most of
the issues are just drifted from version to version for several releases, or
Solr really has 230 issues that need to be addressed in 3.2, and then we
have a serious manpower.

I'm not saying that if the two were merged under the same project we'd have
*less* 3.2 issues overall for sure, but I have a feeling that would happen,
because at least for me, it's hard to track two projects and I usually look
@ Lucene. I can imagine that if they were merged under the same project, and
I'd see a longer list of issues, I'd do something (radical, like
closing/cleaning them :)). But I'm only guessing. Maybe I should try to run
w/ Hoss's query for some time and see if it affects my itch to reduce the
number of issues.

At the end of the day, I don't think we can maintain two projects for much
longer, and I don't think it's the right thing to do at all. And if one day
we'll merge JIRA projects, then tomorrow is as good a day as any other day.
Our users, I'm sure, will get used to it very quickly. I doubt users care
that much about the prefix SOLR-* for Solr issues.

Shai

On Wed, May 18, 2011 at 10:10 PM, Simon Willnauer 
simon.willna...@googlemail.com wrote:

 On Tue, May 17, 2011 at 9:23 PM, Steven A Rowe sar...@syr.edu wrote:
  On 5/17/2011 at 3:02 PM, Chris Hostetter wrote:
  If we were starting from scratch, i'd agree with you that having a
 single
  Jira project makes more sense, but given where we are today, i think we
  should probably keep them distinct -- partly from a pain of migration
  standpoint on our end, but also from a user expecations standpoint -- i
  think the Solr users/community as a whole is use to the existence of the
  SOLR project in Jira, and use to the SOLR-* issue naming convention, and
  it would likely be more confusing for *them* to change now.

 just a few words. I disagree here with you hoss IMO the suggestion to
 merge JIRA would help to move us closer together and help close the
 gap between Solr and Lucene. I think we need to start identifying us
 with what we work on. It feels like we don't do that today and we
 should work hard to stop that and make hard breaks that might hurt but
 I think its the way to go. Drawn from what has happend in the last
 weeks / month it would be good to start from the scratch at least in
 JIRA. I'd go even further and nuke the name entirely and call
 everything lucene - I know not many folks like the idea and it might
 take a while to bake in but I think for us (PMC / Committers) and the
 community it would be good. I am not calling a vote here just stating
 my opinion.

 here is my +1 to the JIRA suggestion

 Simon
 
  +1
 
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Commented] (LUCENE-152) [PATCH] KStem for Lucene

2011-05-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035616#comment-13035616
 ] 

Michael McCandless commented on LUCENE-152:
---

I think that's right.

 [PATCH] KStem for Lucene
 

 Key: LUCENE-152
 URL: https://issues.apache.org/jira/browse/LUCENE-152
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: unspecified
 Environment: Operating System: other
 Platform: Other
Reporter: Otis Gospodnetic
Priority: Minor

 September 10th 2003 contributionn from Sergio Guzman-Lara 
 guz...@cs.umass.edu
 Original email:
 Hi all,
   I have ported the kstem stemmer to Java and incorporated it to 
 Lucene. You can get the source code (Kstem.jar) from the following website:
 http://ciir.cs.umass.edu/downloads/
   Just click on KStem Java Implementation (you will need to register 
 your e-mail, for free of course, with the CIIR --Center for Intelligent 
 Information Retrieval, UMass -- and get an access code).
 Content of Kstem.jar:
 java/org/apache/lucene/analysis/KStemData1.java
 java/org/apache/lucene/analysis/KStemData2.java
 java/org/apache/lucene/analysis/KStemData3.java
 java/org/apache/lucene/analysis/KStemData4.java
 java/org/apache/lucene/analysis/KStemData5.java
 java/org/apache/lucene/analysis/KStemData6.java
 java/org/apache/lucene/analysis/KStemData7.java
 java/org/apache/lucene/analysis/KStemData8.java
 java/org/apache/lucene/analysis/KStemFilter.java
 java/org/apache/lucene/analysis/KStemmer.java
 KStemData1.java, ..., KStemData8.java   Contain several lists of words 
 used by Kstem
 KStemmer.java  Implements the Kstem algorithm 
 KStemFilter.java Extends TokenFilter applying Kstem
 To compile
 unjar the file Kstem.jar to Lucene's src directory, and compile it 
 there. 
 What is Kstem?
   A stemmer designed by Bob Krovetz (for more information see 
 http://ciir.cs.umass.edu/pubfiles/ir-35.pdf). 
 Copyright issues
   This is open source. The actual license agreement is included at the 
 top of every source file.
  Any comments/questions/suggestions are welcome,
   Sergio Guzman-Lara
   Senior Research Fellow
   CIIR UMass

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene/Solr JIRA

2011-05-18 Thread Doron Cohen
On Tue, May 17, 2011 at 10:23 PM, Steven A Rowe sar...@syr.edu wrote:

 On 5/17/2011 at 3:02 PM, Chris Hostetter wrote:
  If we were starting from scratch, i'd agree with you that having a single
  Jira project makes more sense, but given where we are today, i think we
  should probably keep them distinct -- partly from a pain of migration
  standpoint on our end, but also from a user expecations standpoint -- i
  think the Solr users/community as a whole is use to the existence of the
  SOLR project in Jira, and use to the SOLR-* issue naming convention, and
  it would likely be more confusing for *them* to change now.

 +1


+1 for keeping separate user lists and separate JIRA projects, stabilize, no
rush, release a few, then, perhaps, reiterate on this.


[jira] [Created] (SOLR-2526) Grouping on multiple fields

2011-05-18 Thread Arian Karbasi (JIRA)
Grouping on multiple fields
---

 Key: SOLR-2526
 URL: https://issues.apache.org/jira/browse/SOLR-2526
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.0
Reporter: Arian Karbasi
Priority: Minor


Grouping on multiple fields and/or ranges should be an option (X,Y) groupings.  
 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2526) Grouping on multiple fields

2011-05-18 Thread Arian Karbasi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arian Karbasi updated SOLR-2526:


Component/s: search

 Grouping on multiple fields
 ---

 Key: SOLR-2526
 URL: https://issues.apache.org/jira/browse/SOLR-2526
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.0
Reporter: Arian Karbasi
Priority: Minor

 Grouping on multiple fields and/or ranges should be an option (X,Y) 
 groupings.   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2526) Grouping on multiple fields

2011-05-18 Thread Arian Karbasi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arian Karbasi updated SOLR-2526:


Component/s: (was: search)

 Grouping on multiple fields
 ---

 Key: SOLR-2526
 URL: https://issues.apache.org/jira/browse/SOLR-2526
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.0
Reporter: Arian Karbasi
Priority: Minor

 Grouping on multiple fields and/or ranges should be an option (X,Y) 
 groupings.   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2524) Adding grouping to Solr 3x

2011-05-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035631#comment-13035631
 ] 

Michael McCandless commented on SOLR-2524:
--

bq. I think that if the cachedCollector.isCached() returns false we should put 
something in the response indication that the cache wasn't used because it hit 
the cache.maxSizeMB limit. Otherwise the nobody will no whether the cache was 
utilized.

+1, and maybe log a warning?  Or is that going to be too much logging?

bq. When I was playing around with the cache options I noticed that searching 
without cache (~350 ms) was faster then with cache (~500 ms) on a 10M index 
with 1711 distinct group values. This is not what I'd expect.

That is worrisome!!  Was this a simple TermQuery?  Is it somehow possible Solr 
is already caching the queries results itself...?

bq. I'm already thinking about a new collector that collects all most relevant 
documents of all groups. This collector should produce something like an 
OpenBitSet. We can use the OpenBitSet to create a DocSet. I think this should 
be implemented in a different issue.

Cool!

 Adding grouping to Solr 3x
 --

 Key: SOLR-2524
 URL: https://issues.apache.org/jira/browse/SOLR-2524
 Project: Solr
  Issue Type: New Feature
Affects Versions: 3.2
Reporter: Martijn van Groningen
Assignee: Michael McCandless
 Attachments: SOLR-2524.patch


 Grouping was recently added to Lucene 3x. See LUCENE-1421 for more 
 information.
 I think it would be nice if we expose this functionality also to the Solr 
 users that are bound to a 3.x version.
 The grouping feature added to Lucene is currently a subset of the 
 functionality that Solr 4.0-trunk offers. Mainly it doesn't support grouping 
 by function / query.
 The work involved getting the grouping contrib to work on Solr 3x is 
 acceptable. I have it more or less running here. It supports the response 
 format and request parameters (expect: group.query and group.func) described 
 in the FieldCollapse page on the Solr wiki.
 I think it would be great if this is included in the Solr 3.2 release. Many 
 people are using grouping as patch now and this would help them a lot. Any 
 thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2526) Grouping on multiple fields

2011-05-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035635#comment-13035635
 ] 

Michael McCandless commented on SOLR-2526:
--

I think LUCENE-3099 could make this possible, by allowing subclasses to define 
arbitrary group keys per document.  Today the grouping module is hardwired to 
use BytesRef (pulled from FieldCache of a single-valued indexed field) as the 
group key, but really it should be able to be any key.

 Grouping on multiple fields
 ---

 Key: SOLR-2526
 URL: https://issues.apache.org/jira/browse/SOLR-2526
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.0
Reporter: Arian Karbasi
Priority: Minor

 Grouping on multiple fields and/or ranges should be an option (X,Y) 
 groupings.   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3102) Few issues with CachingCollector

2011-05-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035637#comment-13035637
 ] 

Michael McCandless commented on LUCENE-3102:


Thanks Shai -- this is awesome progress!

 Few issues with CachingCollector
 

 Key: LUCENE-3102
 URL: https://issues.apache.org/jira/browse/LUCENE-3102
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3102-factory.patch, LUCENE-3102-nowrap.patch, 
 LUCENE-3102-nowrap.patch, LUCENE-3102.patch, LUCENE-3102.patch, 
 LUCENE-3102.patch


 CachingCollector (introduced in LUCENE-1421) has few issues:
 # Since the wrapped Collector may support out-of-order collection, the 
 document IDs cached may be out-of-order (depends on the Query) and thus 
 replay(Collector) will forward document IDs out-of-order to a Collector that 
 may not support it.
 # It does not clear cachedScores + cachedSegs upon exceeding RAM limits
 # I think that instead of comparing curScores to null, in order to determine 
 if scores are requested, we should have a specific boolean - for clarity
 # This check if (base + nextLength  maxDocsToCache) (line 168) can be 
 relaxed? E.g., what if nextLength is, say, 512K, and I cannot satisfy the 
 maxDocsToCache constraint, but if it was 10K I would? Wouldn't we still want 
 to try and cache them?
 Also:
 * The TODO in line 64 (having Collector specify needsScores()) -- why do we 
 need that if CachingCollector ctor already takes a boolean cacheScores? I 
 think it's better defined explicitly than implicitly?
 * Let's introduce a factory method for creating a specialized version if 
 scoring is requested / not (i.e., impl the TODO in line 189)
 * I think it's a useful collector, which stands on its own and not specific 
 to grouping. Can we move it to core?
 * How about using OpenBitSet instead of int[] for doc IDs?
 ** If the number of hits is big, we'd gain some RAM back, and be able to 
 cache more entries
 ** NOTE: OpenBitSet can only be used for in-order collection only. So we can 
 use that if the wrapped Collector does not support out-of-order
 * Do you think we can modify this Collector to not necessarily wrap another 
 Collector? We have such Collector which stores (in-memory) all matching doc 
 IDs + scores (if required). Those are later fed into several processes that 
 operate on them (e.g. fetch more info from the index etc.). I am thinking, we 
 can make CachingCollector *optionally* wrap another Collector and then 
 someone can reuse it by setting RAM limit to unlimited (we should have a 
 constant for that) in order to simply collect all matching docs + scores.
 * I think a set of dedicated unit tests for this class alone would be good.
 That's it so far. Perhaps, if we do all of the above, more things will pop up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Solr Config XML DTD's

2011-05-18 Thread Mike Sokolov
I looked into inserting a formal validation step in o.a.solr.core.Config 
and ran some preliminary simple tests.  The code is fairly simple; just 
a couple of gotchas:


1) to use the RNC validation language (my preference), we would need to 
pull in a couple of new jars, one of which is over 600K.  Also, support 
for RNC in the XML world is not very widespread: it's gotten more 
interest from researchers and less uptake more broadly, so it might not 
be the best choice, even if, aesthetically it is superior IMO.


2) The other alternatives are XML Schema and DTD.  I think DTD is a 
non-starter since it just can't allow things like arbitrary attributes 
on an element (you have to list them explicitly).  Schema is probably 
the best choice all things considered: support for it is built into the 
XML tools already in use, and it is widely adopted.  The drawback is 
that it's a baroque and unwieldy syntax designed by an indecisive 
committee that loaded it down with excessive featuritis, and someone 
will end up having to maintain this: every time you add a new 
configuration option to the schema (or solrconfig, etc), then the 
schema-schema (validation schema?) will have to be updated to reflect that.


3) Finally, to get good error reporting it's important to show file name 
and line number where an error occurred.  Although you can validate a 
constructed XML tree (a DOM), it's better to run validation on a Stream 
so the line numbers are available.  Therefore it will probably be 
necessary to run two passes (one to validate, and one to construct the 
DOM), which means buffering the config.  Doesn't seem like a big deal: 
these are small files that only get loaded once, but this is a cost of 
validation, I think.


Of course the benefit is that users would actually get fast-failing 
specific and informative error messages covering a wide variety of 
misconfigurations: I would hope we could be restrictive enough to catch 
mis-spelled versions of known element and attribute names, or places 
where elements are out of order.


I'd be willing to work this up, develop a preliminary schema (of 
whichever sort we choose), and send in a patch, but other folks would 
probably end up having to maintain it from time to time if it's to have 
any value at all and not just get disabled, so I just want to make sure 
this is something you all think is worth while before going any further.


-Mike



On 05/17/2011 09:04 AM, Michael McCandless wrote:

https://issues.apache.org/jira/browse/SOLR-2119 is a good example
where we are failing to catch mis-configuration on startup.

Is there some way we can baby step here?  EG use one of these XML
validation packages, incrementally, on only sub-strings from the XML?
(Or simpler is to just do the checking ourselves w/ custom code).

Mike

http://blog.mikemccandless.com

On Wed, May 4, 2011 at 10:50 PM, Michael Sokolovsoko...@ifactory.com  wrote:
   

I'm not sure you will find anyone wanting to put in this effort now, but
another suggestion for a general approach might be:

1 very basic static analysis to catch what you can - this should be a pretty
minimal effort only given what can reasonably be achieved

2 throw runtime errors as Hoss says (probably already doing this well
enough, but maybe some incremental improvements are needed?)

3 an option to run a configtest like httpd provides that preloads all
declared handlers/plugins/modules etc, instantiates them and gives them an
opportunity to read their config and throw whatever errors they find.  This
way you can set a standard (error on unrecognized parameter, say) in some
core areas, and distribute the effort.  This is a hugely useful sanity check
to be able to run when you want to make config changes and not have your
server fall over when it starts (or worse - later).

-Mike kibitzer Sokolov

On 5/4/2011 6:55 PM, Chris Hostetter wrote:
 

As i said: any improvements to help catch the mistakes we can identify
would be great, but we should maintain perspective of the effort/gain
tradeoff given that there is likely nothing we can do about the basic
problem of a string that won't be evaluated until runtime

   


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

   


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3117) yank SegmentReader.norm out of SegmentReader.java

2011-05-18 Thread Robert Muir (JIRA)
yank SegmentReader.norm out of SegmentReader.java
-

 Key: LUCENE-3117
 URL: https://issues.apache.org/jira/browse/LUCENE-3117
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir


While working on flex scoring branch and LUCENE-3012, I noticed it was 
difficult to navigate 
the norms handling in SegmentReader's code.

I think we should yank this inner class out into a separate file as a start.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3117) yank SegmentReader.norm out of SegmentReader.java

2011-05-18 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3117:


Attachment: LUCENE-3117.patch

 yank SegmentReader.norm out of SegmentReader.java
 -

 Key: LUCENE-3117
 URL: https://issues.apache.org/jira/browse/LUCENE-3117
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-3117.patch


 While working on flex scoring branch and LUCENE-3012, I noticed it was 
 difficult to navigate 
 the norms handling in SegmentReader's code.
 I think we should yank this inner class out into a separate file as a start.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3117) yank SegmentReader.norm out of SegmentReader.java

2011-05-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035642#comment-13035642
 ] 

Michael McCandless commented on LUCENE-3117:


+1, this code is scary, and pulling it out is a great baby step.

 yank SegmentReader.norm out of SegmentReader.java
 -

 Key: LUCENE-3117
 URL: https://issues.apache.org/jira/browse/LUCENE-3117
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-3117.patch


 While working on flex scoring branch and LUCENE-3012, I noticed it was 
 difficult to navigate 
 the norms handling in SegmentReader's code.
 I think we should yank this inner class out into a separate file as a start.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-18 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035643#comment-13035643
 ] 

Doron Cohen commented on LUCENE-3068:
-

I wonder if this should be fixed also in 3.1 branch?
Probably so only if we make a 3.1.1, but not needed if its gonna be a 3.2. 
What's the best practice then? Reopen until decision?
Or rely on rescanning all 3.2 changes in case its gonna be 3.1.1?

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, 
 LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3117) yank SegmentReader.norm out of SegmentReader.java

2011-05-18 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3117:


Attachment: LUCENE-3117.patch

oops the last patch had an outdated hack (for calling the silly SR.cloneBytes)

 yank SegmentReader.norm out of SegmentReader.java
 -

 Key: LUCENE-3117
 URL: https://issues.apache.org/jira/browse/LUCENE-3117
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-3117.patch, LUCENE-3117.patch


 While working on flex scoring branch and LUCENE-3012, I noticed it was 
 difficult to navigate 
 the norms handling in SegmentReader's code.
 I think we should yank this inner class out into a separate file as a start.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-152) [PATCH] KStem for Lucene

2011-05-18 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035647#comment-13035647
 ] 

Yonik Seeley commented on LUCENE-152:
-

heh - I had heard enough times that the license wouldn't permit it that I never 
looked into it myself.
http://markmail.org/message/zlett7y3dj76xa2f

Anyway, I did a bunch of optimizations for Lucid's version way back when.  It 
makes sense for those to be contributed back here... I'll see what I can do 
(but it might be delayed a week by everyone being busy at Lucene Revolution).

 [PATCH] KStem for Lucene
 

 Key: LUCENE-152
 URL: https://issues.apache.org/jira/browse/LUCENE-152
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: unspecified
 Environment: Operating System: other
 Platform: Other
Reporter: Otis Gospodnetic
Priority: Minor

 September 10th 2003 contributionn from Sergio Guzman-Lara 
 guz...@cs.umass.edu
 Original email:
 Hi all,
   I have ported the kstem stemmer to Java and incorporated it to 
 Lucene. You can get the source code (Kstem.jar) from the following website:
 http://ciir.cs.umass.edu/downloads/
   Just click on KStem Java Implementation (you will need to register 
 your e-mail, for free of course, with the CIIR --Center for Intelligent 
 Information Retrieval, UMass -- and get an access code).
 Content of Kstem.jar:
 java/org/apache/lucene/analysis/KStemData1.java
 java/org/apache/lucene/analysis/KStemData2.java
 java/org/apache/lucene/analysis/KStemData3.java
 java/org/apache/lucene/analysis/KStemData4.java
 java/org/apache/lucene/analysis/KStemData5.java
 java/org/apache/lucene/analysis/KStemData6.java
 java/org/apache/lucene/analysis/KStemData7.java
 java/org/apache/lucene/analysis/KStemData8.java
 java/org/apache/lucene/analysis/KStemFilter.java
 java/org/apache/lucene/analysis/KStemmer.java
 KStemData1.java, ..., KStemData8.java   Contain several lists of words 
 used by Kstem
 KStemmer.java  Implements the Kstem algorithm 
 KStemFilter.java Extends TokenFilter applying Kstem
 To compile
 unjar the file Kstem.jar to Lucene's src directory, and compile it 
 there. 
 What is Kstem?
   A stemmer designed by Bob Krovetz (for more information see 
 http://ciir.cs.umass.edu/pubfiles/ir-35.pdf). 
 Copyright issues
   This is open source. The actual license agreement is included at the 
 top of every source file.
  Any comments/questions/suggestions are welcome,
   Sergio Guzman-Lara
   Senior Research Fellow
   CIIR UMass

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene/Solr JIRA

2011-05-18 Thread Chris Hostetter
: I didn't know that it was decided that top-level modules issues go under the
: Lucene project. That indeed reduces some of the confusion (as long as users
: will adhere to it, but I guess it's also up to us to enforce it).

And as noted: moving a Jira issue from SOLR-LUCENE (or vice versa) is 
really simple with teh current version of Jira ... almost as easy as 
changing Component

: I think that one day we will need to merge. The more modules we'll have, the
: less issues will be open under Solr project (if it uses those modules). I
: agree w/ what Simon wrote - users will get used to it, so that's not a good
: reason IMO. Also, if we keep claiming the user base is different, then I
: think we have a problem ... every Solr user is also a Lucene user
: (eventually) -- true, some only interact w/ Solr REST API, and may not
: know/care Lucene is run at the lower level. But for the community's sake, I
: think merging JIRA will only help down the road.

I just don't see it that way ... saying every Solr user is also a Lucene 
user is like saying ever Solr user is a Java user, or every Solr user is a 
commons-io user ... we don't expect our users to know even know that, let 
alone assume that Solr users should know what layer of the stack a bug/feature 
should be filed against. 

If a Solr user has an issue/improvement at the Solr level of the stack 
they file a SOLR issue -- if they are a savy dev and know that the only 
real change is at the Lucene level they file a LUCENE issue, if we feel 
like we need to move a SOLR issue to a LUCENE issue we can do so, if we 
feel like there should be two issus to track the bug/change, one covering the 
Solr layer changes and one covering some dependent Lucene layer change we 
can do that to -- just like we could if someone filed a bug against a 
module component and we decided there was a dependent, but fundementally 
distinct core change that we wanted to track as a distinct jira.

: At the end of the day, I don't think we can maintain two projects for much
: longer, and I don't think it's the right thing to do at all. And if one day
: we'll merge JIRA projects, then tomorrow is as good a day as any other day.

We are one Apache project, two Apache Products.  The fact that the 
Jira terminology is Project is an implementation detail.

If we get to the point where we decide that we want to release some 
module as distinct release artifacts, because we think it has a distinct 
user community who doesn't know/care about the Lucene Core as a whole, 
then I would totally argue in favor of that module having a distinct Jira 
Product/Project as well.


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3118) Tools for making explanations easier to consume/understand

2011-05-18 Thread Grant Ingersoll (JIRA)
Tools for making explanations easier to consume/understand
--

 Key: LUCENE-3118
 URL: https://issues.apache.org/jira/browse/LUCENE-3118
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor


Often times, reading Explanations (i.e. the breakdown of scores for a 
particular query and result, say via Solr's debugQuery) is a pretty cryptic 
and hard to do undertaking.  I often say people suffer from explain blindness 
from staring at explanation results for too long.  We could add a layer of 
explanation helpers above the core Explain functionality that help people 
understand better what is going on.  The goal is to give a higher level of 
tools to people who aren't necessarily well versed in all the underpinnings of 
Lucene's scoring mechanisms but still want information about why something 
didn't match

For instance (brainstorming some things that might be doable):
* Explain Diff Tool -- Given an 1 or more explanations, quickly highlight what 
the key things are that differentiate the results  (i.e. fieldNorm is higher, 
etc.)
* Given a query and any document, give a more friendly reason why it ranks 
lower than others without the need to have to parse through all the pieces of 
the score, for instance, could you simply say something like, programatically 
that is, this document scored lower compared to your top 10 b/c it had no 
values in the foo Field.
* Could even maybe return codes for these reasons which could then be hooked 
into actual user messages.

I don't have anything concrete patch-wise here, but am putting this up as a way 
to capture the idea and potentially spur others to think about it.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene/Solr JIRA

2011-05-18 Thread Chris Hostetter

: just a few words. I disagree here with you hoss IMO the suggestion to
: merge JIRA would help to move us closer together and help close the
: gap between Solr and Lucene. I think we need to start identifying us
: with what we work on. It feels like we don't do that today and we
: should work hard to stop that and make hard breaks that might hurt but

I just don't see how you think that would help anything ... we still need 
to distinguish Jira issues to identify where in the stack they affect.

If there is a divide among the developers because of the niches where 
they tend to work, will that divide magicly go away because we partition 
all issues using the component feature of instead of by the Jira 
project feature?

I don't really see how that makes any sense.  

Even if we all thought it did, and even if the cost/effort of 
migrating/converting were totally free, the user bases (who interact with 
the Solr APIs vs directly using the Lucene-Core/Module APIs) are so 
distinct that I genuinely think sticking with distinct Jira Projects 
makes more sense for our users.

: JIRA. I'd go even further and nuke the name entirely and call
: everything lucene - I know not many folks like the idea and it might
: take a while to bake in but I think for us (PMC / Committers) and the

Everything already is called Lucene ... the Project is Apache Lucene 
the community is Lucene ... the Lucene project currently releases 
several products, and one of them is called Apache Solr ... if you're 
suggestion that we should ultimately elimianate the name Solr then we'd 
still have to decide what we're going going to call that end product, the 
artifact that we ship that provides the abstraction layer that Solr 
currently provides.  

Even if you mean to suggest that we should only have one unified product 
-- one singular release artifact -- that abstraction layer still needs a 
name.  The name we have now is Solr, it has brand awareness and a user 
base who understands what it means to say they are Installing Solr or 
that a new feature is available when Using Solr

Eliminating that name doesn't seem like it would benefit the user 
community in anyway.



-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >