[jira] [Updated] (LUCENE-5189) Numeric DocValues Updates

2013-09-15 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5189:
---

Attachment: LUCENE-5189.patch

Added some javadocs, converted all nocommits to TODOs. I think it's ready for 
trunk. I'd like to handle FIS.gen next.

 Numeric DocValues Updates
 -

 Key: LUCENE-5189
 URL: https://issues.apache.org/jira/browse/LUCENE-5189
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch


 In LUCENE-4258 we started to work on incremental field updates, however the 
 amount of changes are immense and hard to follow/consume. The reason is that 
 we targeted postings, stored fields, DV etc., all from the get go.
 I'd like to start afresh here, with numeric-dv-field updates only. There are 
 a couple of reasons to that:
 * NumericDV fields should be easier to update, if e.g. we write all the 
 values of all the documents in a segment for the updated field (similar to 
 how livedocs work, and previously norms).
 * It's a fairly contained issue, attempting to handle just one data type to 
 update, yet requires many changes to core code which will also be useful for 
 updating other data types.
 * It has value in and on itself, and we don't need to allow updating all the 
 data types in Lucene at once ... we can do that gradually.
 I have some working patch already which I'll upload next, explaining the 
 changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5180) ShingleFilter should make shingles from trailing holes

2013-09-15 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767718#comment-13767718
 ] 

Steve Rowe commented on LUCENE-5180:


+1, patch looks good.

+1 to your suggestion about ShingleFilterTest.TestTokenStream:

bq. // TODO: merge w/ CannedTokenStream?


 ShingleFilter should make shingles from trailing holes
 --

 Key: LUCENE-5180
 URL: https://issues.apache.org/jira/browse/LUCENE-5180
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.6

 Attachments: LUCENE-5180.patch


 When ShingleFilter hits a hole, it uses _ as the token, e.g. bigrams for the 
 dog barked, if you have a StopFilter removing the, would be: _ dog, dog 
 barked.
 But if the input ends with a stopword, e.g. wizard of, ShingleFilter fails 
 to produce wizard _ due to LUCENE-3849 ... once we fix that I think we 
 should fix ShingleFilter to make shingles for trailing holes too ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4988) Upgrade svnkit to version compatible with svn 1.8

2013-09-15 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767721#comment-13767721
 ] 

Shai Erera commented on SOLR-4988:
--

I see they now have a beta version available for download: 
http://www.svnkit.com/download.php. Since we only use it for precommit, is 
there anything that should prevent us from using the beta version? I am 
currently unable to run precommit because it fails with this error:

{noformat}
BUILD FAILED
D:\dev\lucene\lucene-5189\build.xml:335: The following error occurred while 
executing this line:
D:\dev\lucene\lucene-5189\extra-targets.xml:66: The following error occurred 
while executing this line:
D:\dev\lucene\lucene-5189\extra-targets.xml:82: 
org.tmatesoft.svn.core.SVNException: svn: E155021: This client is too old to 
work with the working copy at
'D:\dev\lucene\lucene-5189' (format '31').
at 
org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:64)
at 
org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:51)
at 
org.tmatesoft.svn.core.internal.wc17.db.SVNWCDbRoot.init(SVNWCDbRoot.java:95)

{noformat}

 Upgrade svnkit to version compatible with svn 1.8
 -

 Key: SOLR-4988
 URL: https://issues.apache.org/jira/browse/SOLR-4988
 Project: Solr
  Issue Type: Task
Reporter: Alan Woodward
Assignee: Alan Woodward

 If you've got subversion 1.8 installed, ant precommit fails due to svn 
 version incompatibilities.  It looks as though there isn't an svnkit release 
 yet that supports 1.8.  Once one is available, we should upgrade our 
 dependencies.
 See http://subversion.1072662.n5.nabble.com/ETA-on-1-8-support-td181632.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5207) lucene expressions module

2013-09-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767732#comment-13767732
 ] 

ASF subversion and git services commented on LUCENE-5207:
-

Commit 1523419 from [~thetaphi] in branch 'dev/branches/lucene5207'
[ https://svn.apache.org/r1523419 ]

LUCENE-5207: Add a test which verifies that the classloader restrictions work 
correctly

 lucene expressions module
 -

 Key: LUCENE-5207
 URL: https://issues.apache.org/jira/browse/LUCENE-5207
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Ryan Ernst
 Attachments: LUCENE-5207.patch, LUCENE-5207.patch


 Expressions are geared at defining an alternative ranking function (e.g. 
 incorporating the text relevance score and other field values/ranking
 signals). So they are conceptually much more like ElasticSearch's scripting 
 support (http://www.elasticsearch.org/guide/reference/modules/scripting/) 
 than solr's function queries.
 Some additional notes:
 * In addition to referring to other fields, they can also refer to other 
 expressions, so they can be used as computed fields.
 * You can rank documents easily by multiple expressions (its a SortField at 
 the end), e.g. Sort by year descending, then some function of score price and 
 time ascending.
 * The provided javascript expression syntax is much more efficient than using 
 a scripting engine, because it does not have dynamic typing (compiles to 
 .class files that work on doubles). Performance is similar to writing a 
 custom FieldComparator yourself, but much easier to do.
 * We have solr integration to contribute in the future, but this is just the 
 standalone lucene part as a start. Since lucene has no schema, it includes an 
 implementation of Bindings (SimpleBindings) that maps variable names to 
 SortField's or other expressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5207) lucene expressions module

2013-09-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767733#comment-13767733
 ] 

ASF subversion and git services commented on LUCENE-5207:
-

Commit 1523421 from [~thetaphi] in branch 'dev/branches/lucene5207'
[ https://svn.apache.org/r1523421 ]

LUCENE-5207: Actually test that it works with mixed classloaders

 lucene expressions module
 -

 Key: LUCENE-5207
 URL: https://issues.apache.org/jira/browse/LUCENE-5207
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Ryan Ernst
 Attachments: LUCENE-5207.patch, LUCENE-5207.patch


 Expressions are geared at defining an alternative ranking function (e.g. 
 incorporating the text relevance score and other field values/ranking
 signals). So they are conceptually much more like ElasticSearch's scripting 
 support (http://www.elasticsearch.org/guide/reference/modules/scripting/) 
 than solr's function queries.
 Some additional notes:
 * In addition to referring to other fields, they can also refer to other 
 expressions, so they can be used as computed fields.
 * You can rank documents easily by multiple expressions (its a SortField at 
 the end), e.g. Sort by year descending, then some function of score price and 
 time ascending.
 * The provided javascript expression syntax is much more efficient than using 
 a scripting engine, because it does not have dynamic typing (compiles to 
 .class files that work on doubles). Performance is similar to writing a 
 custom FieldComparator yourself, but much easier to do.
 * We have solr integration to contribute in the future, but this is just the 
 standalone lucene part as a start. Since lucene has no schema, it includes an 
 implementation of Bindings (SimpleBindings) that maps variable names to 
 SortField's or other expressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5207) lucene expressions module

2013-09-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767765#comment-13767765
 ] 

ASF subversion and git services commented on LUCENE-5207:
-

Commit 1523426 from [~thetaphi] in branch 'dev/branches/lucene5207'
[ https://svn.apache.org/r1523426 ]

LUCENE-5207: Better classloade test: It now uses a completely synthetic class, 
so its 100% unreachable from main classloader. Also remove the static Opcodes 
imports

 lucene expressions module
 -

 Key: LUCENE-5207
 URL: https://issues.apache.org/jira/browse/LUCENE-5207
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Ryan Ernst
 Attachments: LUCENE-5207.patch, LUCENE-5207.patch


 Expressions are geared at defining an alternative ranking function (e.g. 
 incorporating the text relevance score and other field values/ranking
 signals). So they are conceptually much more like ElasticSearch's scripting 
 support (http://www.elasticsearch.org/guide/reference/modules/scripting/) 
 than solr's function queries.
 Some additional notes:
 * In addition to referring to other fields, they can also refer to other 
 expressions, so they can be used as computed fields.
 * You can rank documents easily by multiple expressions (its a SortField at 
 the end), e.g. Sort by year descending, then some function of score price and 
 time ascending.
 * The provided javascript expression syntax is much more efficient than using 
 a scripting engine, because it does not have dynamic typing (compiles to 
 .class files that work on doubles). Performance is similar to writing a 
 custom FieldComparator yourself, but much easier to do.
 * We have solr integration to contribute in the future, but this is just the 
 standalone lucene part as a start. Since lucene has no schema, it includes an 
 implementation of Bindings (SimpleBindings) that maps variable names to 
 SortField's or other expressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5180) ShingleFilter should make shingles from trailing holes

2013-09-15 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5180:
---

Attachment: LUCENE-5180.patch

Thanks Steve!

Here's a new patch w/ that TODO done ... I think it's ready.

 ShingleFilter should make shingles from trailing holes
 --

 Key: LUCENE-5180
 URL: https://issues.apache.org/jira/browse/LUCENE-5180
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.6

 Attachments: LUCENE-5180.patch, LUCENE-5180.patch


 When ShingleFilter hits a hole, it uses _ as the token, e.g. bigrams for the 
 dog barked, if you have a StopFilter removing the, would be: _ dog, dog 
 barked.
 But if the input ends with a stopword, e.g. wizard of, ShingleFilter fails 
 to produce wizard _ due to LUCENE-3849 ... once we fix that I think we 
 should fix ShingleFilter to make shingles for trailing holes too ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3425) NRT Caching Dir to allow for exact memory usage, better buffer allocation and global cross indices control

2013-09-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767770#comment-13767770
 ] 

Michael McCandless commented on LUCENE-3425:


OK, thanks for the explanation; now I understand AverageMergePolicy's purpose, 
and it makes sense.  It's ironic that a fully optimized index is the worst 
thing you could do when searching segments concurrently ...

But, I still don't understand why AverageMergePolicy is not merging the little 
segments from NRTCachingDir.  Do you tell it to target a maximum number of 
segments in the index?  If so, once the index is large enough, it seems like 
that'd force the small segments to be merged.  Maybe, you could also tell it a 
minimum size of the segments, so that it would merge away any segments still 
held in NRTCachingDir?

 NRT Caching Dir to allow for exact memory usage, better buffer allocation and 
 global cross indices control
 

 Key: LUCENE-3425
 URL: https://issues.apache.org/jira/browse/LUCENE-3425
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.4, 4.0-ALPHA
Reporter: Shay Banon
 Fix For: 5.0, 4.5


 A discussion on IRC raised several improvements that can be made to NRT 
 caching dir. Some of the problems it currently has are:
 1. Not explicitly controlling the memory usage, which can result in overusing 
 memory (for example, large new segments being committed because refreshing is 
 too far behind).
 2. Heap fragmentation because of constant allocation of (probably promoted to 
 old gen) byte buffers.
 3. Not being able to control the memory usage across indices for multi index 
 usage within a single JVM.
 A suggested solution (which still needs to be ironed out) is to have a 
 BufferAllocator that controls allocation of byte[], and allow to return 
 unused byte[] to it. It will have a cap on the size of memory it allows to be 
 allocated.
 The NRT caching dir will use the allocator, which can either be provided (for 
 usage across several indices) or created internally. The caching dir will 
 also create a wrapped IndexOutput, that will flush to the main dir if the 
 allocator can no longer provide byte[] (exhausted).
 When a file is flushed from the cache to the main directory, it will return 
 all the currently allocated byte[] to the BufferAllocator to be reused by 
 other files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5214) Add new FreeTextSuggester, to handle long tail suggestions

2013-09-15 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-5214:
--

 Summary: Add new FreeTextSuggester, to handle long tail 
suggestions
 Key: LUCENE-5214
 URL: https://issues.apache.org/jira/browse/LUCENE-5214
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.6


The current suggesters are all based on a finite space of possible
suggestions, i.e. the ones they were built on, so they can only
suggest a full suggestion from that space.

This means if the current query goes outside of that space then no
suggestions will be found.

The goal of FreeTextSuggester is to address this, by giving
predictions based on an ngram language model, i.e. using the last few
tokens from the user's query to predict likely following token.

I got the idea from this blog post about Google's suggest:
http://googleblog.blogspot.com/2011/04/more-predictions-in-autocomplete.html

This is very much still a work in progress, but it seems to be
working.  I've tested it on the AOL query logs, using an interactive
tool from luceneutil to show the suggestions, and it seems to work well.
It's fun to use that tool to explore the word associations...

I don't think this suggester would be used standalone; rather, I think
it'd be a fallback for times when the primary suggester fails to find
anything.  You can see this behavior on google.com, if you type the
fast and the , you see entire queries being suggested, but then if
the next word you type is burning then suddenly you see the
suggestions are only based on the last word, not the entire query.

It uses ShingleFilter under-the-hood to generate the token ngrams;
once LUCENE-5180 is in it will be able to properly handle a user query
that ends with stop-words (e.g. wizard of ), and then stores the
ngrams in an FST.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5214) Add new FreeTextSuggester, to handle long tail suggestions

2013-09-15 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5214:
---

Attachment: LUCENE-5214.patch

Current patch, very much work in progress...

 Add new FreeTextSuggester, to handle long tail suggestions
 

 Key: LUCENE-5214
 URL: https://issues.apache.org/jira/browse/LUCENE-5214
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.6

 Attachments: LUCENE-5214.patch


 The current suggesters are all based on a finite space of possible
 suggestions, i.e. the ones they were built on, so they can only
 suggest a full suggestion from that space.
 This means if the current query goes outside of that space then no
 suggestions will be found.
 The goal of FreeTextSuggester is to address this, by giving
 predictions based on an ngram language model, i.e. using the last few
 tokens from the user's query to predict likely following token.
 I got the idea from this blog post about Google's suggest:
 http://googleblog.blogspot.com/2011/04/more-predictions-in-autocomplete.html
 This is very much still a work in progress, but it seems to be
 working.  I've tested it on the AOL query logs, using an interactive
 tool from luceneutil to show the suggestions, and it seems to work well.
 It's fun to use that tool to explore the word associations...
 I don't think this suggester would be used standalone; rather, I think
 it'd be a fallback for times when the primary suggester fails to find
 anything.  You can see this behavior on google.com, if you type the
 fast and the , you see entire queries being suggested, but then if
 the next word you type is burning then suddenly you see the
 suggestions are only based on the last word, not the entire query.
 It uses ShingleFilter under-the-hood to generate the token ngrams;
 once LUCENE-5180 is in it will be able to properly handle a user query
 that ends with stop-words (e.g. wizard of ), and then stores the
 ngrams in an FST.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767799#comment-13767799
 ] 

Robert Muir commented on LUCENE-5189:
-

+1 to go to trunk. thanks Shai.

 Numeric DocValues Updates
 -

 Key: LUCENE-5189
 URL: https://issues.apache.org/jira/browse/LUCENE-5189
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch


 In LUCENE-4258 we started to work on incremental field updates, however the 
 amount of changes are immense and hard to follow/consume. The reason is that 
 we targeted postings, stored fields, DV etc., all from the get go.
 I'd like to start afresh here, with numeric-dv-field updates only. There are 
 a couple of reasons to that:
 * NumericDV fields should be easier to update, if e.g. we write all the 
 values of all the documents in a segment for the updated field (similar to 
 how livedocs work, and previously norms).
 * It's a fairly contained issue, attempting to handle just one data type to 
 update, yet requires many changes to core code which will also be useful for 
 updating other data types.
 * It has value in and on itself, and we don't need to allow updating all the 
 data types in Lucene at once ... we can do that gradually.
 I have some working patch already which I'll upload next, explaining the 
 changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5215) Add support for FieldInfos generation

2013-09-15 Thread Shai Erera (JIRA)
Shai Erera created LUCENE-5215:
--

 Summary: Add support for FieldInfos generation
 Key: LUCENE-5215
 URL: https://issues.apache.org/jira/browse/LUCENE-5215
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera


In LUCENE-5189 we've identified few reasons to do that:

# If you want to update docs' values of field 'foo', where 'foo' exists in the 
index, but not in a specific segment (sparse DV), we cannot allow that and have 
to throw a late UOE. If we could rewrite FieldInfos (with generation), this 
would be possible since we'd also write a new generation of FIS.

# When we apply NDV updates, we call DVF.fieldsConsumer. Currently the consumer 
isn't allowed to change FI.attributes because we cannot modify the existing 
FIS. This is implicit however, and we silently ignore any modified attributes. 
FieldInfos.gen will allow that too.

The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and add 
support for FIS generation in FieldInfosFormat, SegReader etc., like we now do 
for DocValues. I'll work on a patch.

Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that 
have same limitation -- if a Codec modifies them, they are silently being 
ignored, since we don't gen the .si files. I think we can easily solve that by 
recording SI.attributes in SegmentInfos, so they are recorded per-commit. But I 
think it should be handled in a separate issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5215) Add support for FieldInfos generation

2013-09-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767811#comment-13767811
 ] 

Robert Muir commented on LUCENE-5215:
-

SI attributes may not be used at all today. It worked well for handling the 3.x 
integration as it was a place for us to stuff things like hasSharedDocStores 
and indexwriter was still able to hackishly get at it, but deprecation might be 
a good option. we shoudl see what is using this in trunk.

 Add support for FieldInfos generation
 -

 Key: LUCENE-5215
 URL: https://issues.apache.org/jira/browse/LUCENE-5215
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera

 In LUCENE-5189 we've identified few reasons to do that:
 # If you want to update docs' values of field 'foo', where 'foo' exists in 
 the index, but not in a specific segment (sparse DV), we cannot allow that 
 and have to throw a late UOE. If we could rewrite FieldInfos (with 
 generation), this would be possible since we'd also write a new generation of 
 FIS.
 # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the 
 consumer isn't allowed to change FI.attributes because we cannot modify the 
 existing FIS. This is implicit however, and we silently ignore any modified 
 attributes. FieldInfos.gen will allow that too.
 The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and 
 add support for FIS generation in FieldInfosFormat, SegReader etc., like we 
 now do for DocValues. I'll work on a patch.
 Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that 
 have same limitation -- if a Codec modifies them, they are silently being 
 ignored, since we don't gen the .si files. I think we can easily solve that 
 by recording SI.attributes in SegmentInfos, so they are recorded per-commit. 
 But I think it should be handled in a separate issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5237) Add Lucene Index heap usage to LukeRequestHandler

2013-09-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767815#comment-13767815
 ] 

ASF subversion and git services commented on SOLR-5237:
---

Commit 1523442 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1523442 ]

SOLR-5237: add lucene index heap usage to luke request handler

 Add Lucene Index heap usage to LukeRequestHandler 
 --

 Key: SOLR-5237
 URL: https://issues.apache.org/jira/browse/SOLR-5237
 Project: Solr
  Issue Type: Improvement
Reporter: Areek Zillur
 Attachments: SOLR-5237.patch, SOLR-5237.patch


 It would be useful to see the current index heap usage 
 (https://issues.apache.org/jira/browse/LUCENE-5197) by lucene in the 
 LukeRequestHandler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5237) Add Lucene Index heap usage to LukeRequestHandler

2013-09-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767816#comment-13767816
 ] 

ASF subversion and git services commented on SOLR-5237:
---

Commit 1523443 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1523443 ]

SOLR-5237: add lucene index heap usage to luke request handler

 Add Lucene Index heap usage to LukeRequestHandler 
 --

 Key: SOLR-5237
 URL: https://issues.apache.org/jira/browse/SOLR-5237
 Project: Solr
  Issue Type: Improvement
Reporter: Areek Zillur
 Attachments: SOLR-5237.patch, SOLR-5237.patch


 It would be useful to see the current index heap usage 
 (https://issues.apache.org/jira/browse/LUCENE-5197) by lucene in the 
 LukeRequestHandler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5237) Add Lucene Index heap usage to LukeRequestHandler

2013-09-15 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-5237.
---

   Resolution: Fixed
Fix Version/s: 4.6
   5.0

Thanks Areek!

Maybe you want to open an issue to add this to the Admin UI?

 Add Lucene Index heap usage to LukeRequestHandler 
 --

 Key: SOLR-5237
 URL: https://issues.apache.org/jira/browse/SOLR-5237
 Project: Solr
  Issue Type: Improvement
Reporter: Areek Zillur
 Fix For: 5.0, 4.6

 Attachments: SOLR-5237.patch, SOLR-5237.patch


 It would be useful to see the current index heap usage 
 (https://issues.apache.org/jira/browse/LUCENE-5197) by lucene in the 
 LukeRequestHandler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5214) Add new FreeTextSuggester, to handle long tail suggestions

2013-09-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767820#comment-13767820
 ] 

Robert Muir commented on LUCENE-5214:
-

This looks awesome: I think LUCENE-5180 will resolve a lot of the TODOs?

I'm glad these corner cases of trailing stopwords etc were fixed properly in 
the analysis chain.

And I like the name...

 Add new FreeTextSuggester, to handle long tail suggestions
 

 Key: LUCENE-5214
 URL: https://issues.apache.org/jira/browse/LUCENE-5214
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.6

 Attachments: LUCENE-5214.patch


 The current suggesters are all based on a finite space of possible
 suggestions, i.e. the ones they were built on, so they can only
 suggest a full suggestion from that space.
 This means if the current query goes outside of that space then no
 suggestions will be found.
 The goal of FreeTextSuggester is to address this, by giving
 predictions based on an ngram language model, i.e. using the last few
 tokens from the user's query to predict likely following token.
 I got the idea from this blog post about Google's suggest:
 http://googleblog.blogspot.com/2011/04/more-predictions-in-autocomplete.html
 This is very much still a work in progress, but it seems to be
 working.  I've tested it on the AOL query logs, using an interactive
 tool from luceneutil to show the suggestions, and it seems to work well.
 It's fun to use that tool to explore the word associations...
 I don't think this suggester would be used standalone; rather, I think
 it'd be a fallback for times when the primary suggester fails to find
 anything.  You can see this behavior on google.com, if you type the
 fast and the , you see entire queries being suggested, but then if
 the next word you type is burning then suddenly you see the
 suggestions are only based on the last word, not the entire query.
 It uses ShingleFilter under-the-hood to generate the token ngrams;
 once LUCENE-5180 is in it will be able to properly handle a user query
 that ends with stop-words (e.g. wizard of ), and then stores the
 ngrams in an FST.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5207) lucene expressions module

2013-09-15 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767825#comment-13767825
 ] 

Uwe Schindler commented on LUCENE-5207:
---

+1 to merge  commit this to trunk and 4.x. In the last 3 commits to the branch 
I just added new tests, so no functional changes anymore.

Robert: If you create a new patch for reference, it would be good. Otherwise do 
merge --reintegrate, commit, and delete the branch!

 lucene expressions module
 -

 Key: LUCENE-5207
 URL: https://issues.apache.org/jira/browse/LUCENE-5207
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Ryan Ernst
 Attachments: LUCENE-5207.patch, LUCENE-5207.patch


 Expressions are geared at defining an alternative ranking function (e.g. 
 incorporating the text relevance score and other field values/ranking
 signals). So they are conceptually much more like ElasticSearch's scripting 
 support (http://www.elasticsearch.org/guide/reference/modules/scripting/) 
 than solr's function queries.
 Some additional notes:
 * In addition to referring to other fields, they can also refer to other 
 expressions, so they can be used as computed fields.
 * You can rank documents easily by multiple expressions (its a SortField at 
 the end), e.g. Sort by year descending, then some function of score price and 
 time ascending.
 * The provided javascript expression syntax is much more efficient than using 
 a scripting engine, because it does not have dynamic typing (compiles to 
 .class files that work on doubles). Performance is similar to writing a 
 custom FieldComparator yourself, but much easier to do.
 * We have solr integration to contribute in the future, but this is just the 
 standalone lucene part as a start. Since lucene has no schema, it includes an 
 implementation of Bindings (SimpleBindings) that maps variable names to 
 SortField's or other expressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5167) Ability to use AnalyzingInfixSuggester in Solr

2013-09-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767827#comment-13767827
 ] 

ASF subversion and git services commented on SOLR-5167:
---

Commit 1523451 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1523451 ]

SOLR-5167: Ability to use AnalyzingInfixSuggester in Solr

 Ability to use AnalyzingInfixSuggester in Solr
 --

 Key: SOLR-5167
 URL: https://issues.apache.org/jira/browse/SOLR-5167
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Reporter: Varun Thacker
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-5167.patch, SOLR-5167.patch


 We should be able to use AnalyzingInfixSuggester in Solr by defining it in 
 solrconfig.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5215) Add support for FieldInfos generation

2013-09-15 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767831#comment-13767831
 ] 

Shai Erera commented on LUCENE-5215:


Well, SI.attributes says it's the place for Codecs to put custom attributes in, 
and I remember Mike and I once discussed using them for putting some facet 
related stuff, but we didn't pursue it. Maybe if we record them in SIS, it's 
simple enough and we can keep them? If they are meant to be used by Codecs 
only, then maybe we can force Codecs to manage them themselves, but if e.g. 
some other code will want to rely on them, would it be possible?

 Add support for FieldInfos generation
 -

 Key: LUCENE-5215
 URL: https://issues.apache.org/jira/browse/LUCENE-5215
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera

 In LUCENE-5189 we've identified few reasons to do that:
 # If you want to update docs' values of field 'foo', where 'foo' exists in 
 the index, but not in a specific segment (sparse DV), we cannot allow that 
 and have to throw a late UOE. If we could rewrite FieldInfos (with 
 generation), this would be possible since we'd also write a new generation of 
 FIS.
 # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the 
 consumer isn't allowed to change FI.attributes because we cannot modify the 
 existing FIS. This is implicit however, and we silently ignore any modified 
 attributes. FieldInfos.gen will allow that too.
 The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and 
 add support for FIS generation in FieldInfosFormat, SegReader etc., like we 
 now do for DocValues. I'll work on a patch.
 Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that 
 have same limitation -- if a Codec modifies them, they are silently being 
 ignored, since we don't gen the .si files. I think we can easily solve that 
 by recording SI.attributes in SegmentInfos, so they are recorded per-commit. 
 But I think it should be handled in a separate issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4988) Upgrade svnkit to version compatible with svn 1.8

2013-09-15 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated SOLR-4988:


Attachment: SOLR-4988.patch

Thanks Shai.  Here's a patch that adds the tmatesoft snapshots repository to 
ivy-settings.xml, and updates extra-targets.xml to use svnkit 1.8.0-SNAPSHOT.

I'm a bit wary of committing it, though, as I don't like adding dependencies on 
snapshots.  Maybe just keep this patch up for people who want to use svn 1.8 
until tmatesoft officially release svnkit 1.8.0?  (I ended up working round the 
precommit problem by just using svn 1.7 for this project.  Or you could just be 
resigned to breaking the build occasionally :-)

 Upgrade svnkit to version compatible with svn 1.8
 -

 Key: SOLR-4988
 URL: https://issues.apache.org/jira/browse/SOLR-4988
 Project: Solr
  Issue Type: Task
Reporter: Alan Woodward
Assignee: Alan Woodward
 Attachments: SOLR-4988.patch


 If you've got subversion 1.8 installed, ant precommit fails due to svn 
 version incompatibilities.  It looks as though there isn't an svnkit release 
 yet that supports 1.8.  Once one is available, we should upgrade our 
 dependencies.
 See http://subversion.1072662.n5.nabble.com/ETA-on-1-8-support-td181632.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5215) Add support for FieldInfos generation

2013-09-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767832#comment-13767832
 ] 

Robert Muir commented on LUCENE-5215:
-

Yes but the issue is that they are write-once. 

If a codec component that needs attributes (e.g. a dv one) was to write this 
stuff in its own file, it would work with updates because of segment suffix. 

Additionally we already have per-commit MapString,String: the one used by 
setCommitData(MapString,String commitUserData)...


 Add support for FieldInfos generation
 -

 Key: LUCENE-5215
 URL: https://issues.apache.org/jira/browse/LUCENE-5215
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera

 In LUCENE-5189 we've identified few reasons to do that:
 # If you want to update docs' values of field 'foo', where 'foo' exists in 
 the index, but not in a specific segment (sparse DV), we cannot allow that 
 and have to throw a late UOE. If we could rewrite FieldInfos (with 
 generation), this would be possible since we'd also write a new generation of 
 FIS.
 # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the 
 consumer isn't allowed to change FI.attributes because we cannot modify the 
 existing FIS. This is implicit however, and we silently ignore any modified 
 attributes. FieldInfos.gen will allow that too.
 The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and 
 add support for FIS generation in FieldInfosFormat, SegReader etc., like we 
 now do for DocValues. I'll work on a patch.
 Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that 
 have same limitation -- if a Codec modifies them, they are silently being 
 ignored, since we don't gen the .si files. I think we can easily solve that 
 by recording SI.attributes in SegmentInfos, so they are recorded per-commit. 
 But I think it should be handled in a separate issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5215) Add support for FieldInfos generation

2013-09-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767836#comment-13767836
 ] 

Robert Muir commented on LUCENE-5215:
-

Anyway I agree we should spin off a separate issue for this... this issue for 
fieldinfos will be fun enough by itself :)

 Add support for FieldInfos generation
 -

 Key: LUCENE-5215
 URL: https://issues.apache.org/jira/browse/LUCENE-5215
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera

 In LUCENE-5189 we've identified few reasons to do that:
 # If you want to update docs' values of field 'foo', where 'foo' exists in 
 the index, but not in a specific segment (sparse DV), we cannot allow that 
 and have to throw a late UOE. If we could rewrite FieldInfos (with 
 generation), this would be possible since we'd also write a new generation of 
 FIS.
 # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the 
 consumer isn't allowed to change FI.attributes because we cannot modify the 
 existing FIS. This is implicit however, and we silently ignore any modified 
 attributes. FieldInfos.gen will allow that too.
 The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and 
 add support for FIS generation in FieldInfosFormat, SegReader etc., like we 
 now do for DocValues. I'll work on a patch.
 Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that 
 have same limitation -- if a Codec modifies them, they are silently being 
 ignored, since we don't gen the .si files. I think we can easily solve that 
 by recording SI.attributes in SegmentInfos, so they are recorded per-commit. 
 But I think it should be handled in a separate issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5167) Ability to use AnalyzingInfixSuggester in Solr

2013-09-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767837#comment-13767837
 ] 

ASF subversion and git services commented on SOLR-5167:
---

Commit 1523454 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1523454 ]

SOLR-5167: Ability to use AnalyzingInfixSuggester in Solr

 Ability to use AnalyzingInfixSuggester in Solr
 --

 Key: SOLR-5167
 URL: https://issues.apache.org/jira/browse/SOLR-5167
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Reporter: Varun Thacker
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-5167.patch, SOLR-5167.patch


 We should be able to use AnalyzingInfixSuggester in Solr by defining it in 
 solrconfig.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5167) Ability to use AnalyzingInfixSuggester in Solr

2013-09-15 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-5167.
---

   Resolution: Fixed
Fix Version/s: (was: 4.5)
   4.6

Thanks Varun and Areek!

 Ability to use AnalyzingInfixSuggester in Solr
 --

 Key: SOLR-5167
 URL: https://issues.apache.org/jira/browse/SOLR-5167
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Reporter: Varun Thacker
Priority: Minor
 Fix For: 5.0, 4.6

 Attachments: SOLR-5167.patch, SOLR-5167.patch


 We should be able to use AnalyzingInfixSuggester in Solr by defining it in 
 solrconfig.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5207) lucene expressions module

2013-09-15 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5207:


Attachment: LUCENE-5207.patch

Yes, I'll merge (because I am not totally sure about the python script really 
showing all the differences in various config stuff).

Here is the updated patch from the script though.

 lucene expressions module
 -

 Key: LUCENE-5207
 URL: https://issues.apache.org/jira/browse/LUCENE-5207
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Ryan Ernst
 Attachments: LUCENE-5207.patch, LUCENE-5207.patch, LUCENE-5207.patch


 Expressions are geared at defining an alternative ranking function (e.g. 
 incorporating the text relevance score and other field values/ranking
 signals). So they are conceptually much more like ElasticSearch's scripting 
 support (http://www.elasticsearch.org/guide/reference/modules/scripting/) 
 than solr's function queries.
 Some additional notes:
 * In addition to referring to other fields, they can also refer to other 
 expressions, so they can be used as computed fields.
 * You can rank documents easily by multiple expressions (its a SortField at 
 the end), e.g. Sort by year descending, then some function of score price and 
 time ascending.
 * The provided javascript expression syntax is much more efficient than using 
 a scripting engine, because it does not have dynamic typing (compiles to 
 .class files that work on doubles). Performance is similar to writing a 
 custom FieldComparator yourself, but much easier to do.
 * We have solr integration to contribute in the future, but this is just the 
 standalone lucene part as a start. Since lucene has no schema, it includes an 
 implementation of Bindings (SimpleBindings) that maps variable names to 
 SortField's or other expressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5207) lucene expressions module

2013-09-15 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767851#comment-13767851
 ] 

Uwe Schindler commented on LUCENE-5207:
---

Thanks!

 lucene expressions module
 -

 Key: LUCENE-5207
 URL: https://issues.apache.org/jira/browse/LUCENE-5207
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Ryan Ernst
 Attachments: LUCENE-5207.patch, LUCENE-5207.patch, LUCENE-5207.patch


 Expressions are geared at defining an alternative ranking function (e.g. 
 incorporating the text relevance score and other field values/ranking
 signals). So they are conceptually much more like ElasticSearch's scripting 
 support (http://www.elasticsearch.org/guide/reference/modules/scripting/) 
 than solr's function queries.
 Some additional notes:
 * In addition to referring to other fields, they can also refer to other 
 expressions, so they can be used as computed fields.
 * You can rank documents easily by multiple expressions (its a SortField at 
 the end), e.g. Sort by year descending, then some function of score price and 
 time ascending.
 * The provided javascript expression syntax is much more efficient than using 
 a scripting engine, because it does not have dynamic typing (compiles to 
 .class files that work on doubles). Performance is similar to writing a 
 custom FieldComparator yourself, but much easier to do.
 * We have solr integration to contribute in the future, but this is just the 
 standalone lucene part as a start. Since lucene has no schema, it includes an 
 implementation of Bindings (SimpleBindings) that maps variable names to 
 SortField's or other expressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4988) Upgrade svnkit to version compatible with svn 1.8

2013-09-15 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767848#comment-13767848
 ] 

Uwe Schindler commented on SOLR-4988:
-

Thanks Alan!

Adding snapshot repos is indeed not the best idea (reproducibility of builds), 
so we should for now stick with this version.

Currently I am not sure if svnkit 1.8 will be available from Maven Central in 
time. The latest Maven Central version of 1.7 is still behind the one on their 
own repo. But if its urgent, once 1.8 is released, we can of course add the 
official svnkit repo to our list of repositories (the release one).

 Upgrade svnkit to version compatible with svn 1.8
 -

 Key: SOLR-4988
 URL: https://issues.apache.org/jira/browse/SOLR-4988
 Project: Solr
  Issue Type: Task
Reporter: Alan Woodward
Assignee: Alan Woodward
 Attachments: SOLR-4988.patch


 If you've got subversion 1.8 installed, ant precommit fails due to svn 
 version incompatibilities.  It looks as though there isn't an svnkit release 
 yet that supports 1.8.  Once one is available, we should upgrade our 
 dependencies.
 See http://subversion.1072662.n5.nabble.com/ETA-on-1-8-support-td181632.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767852#comment-13767852
 ] 

ASF subversion and git services commented on LUCENE-5189:
-

Commit 1523461 from [~shaie] in branch 'dev/trunk'
[ https://svn.apache.org/r1523461 ]

LUCENE-5189: add NumericDocValues updates

 Numeric DocValues Updates
 -

 Key: LUCENE-5189
 URL: https://issues.apache.org/jira/browse/LUCENE-5189
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch


 In LUCENE-4258 we started to work on incremental field updates, however the 
 amount of changes are immense and hard to follow/consume. The reason is that 
 we targeted postings, stored fields, DV etc., all from the get go.
 I'd like to start afresh here, with numeric-dv-field updates only. There are 
 a couple of reasons to that:
 * NumericDV fields should be easier to update, if e.g. we write all the 
 values of all the documents in a segment for the updated field (similar to 
 how livedocs work, and previously norms).
 * It's a fairly contained issue, attempting to handle just one data type to 
 update, yet requires many changes to core code which will also be useful for 
 updating other data types.
 * It has value in and on itself, and we don't need to allow updating all the 
 data types in Lucene at once ... we can do that gradually.
 I have some working patch already which I'll upload next, explaining the 
 changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-15 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767853#comment-13767853
 ] 

Shai Erera commented on LUCENE-5189:


Thanks, committed to trunk, revision 1523461. After we resolve all corner 
issues, and let Jenkins sleep on it for a while, I'll port to 4x.

 Numeric DocValues Updates
 -

 Key: LUCENE-5189
 URL: https://issues.apache.org/jira/browse/LUCENE-5189
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch


 In LUCENE-4258 we started to work on incremental field updates, however the 
 amount of changes are immense and hard to follow/consume. The reason is that 
 we targeted postings, stored fields, DV etc., all from the get go.
 I'd like to start afresh here, with numeric-dv-field updates only. There are 
 a couple of reasons to that:
 * NumericDV fields should be easier to update, if e.g. we write all the 
 values of all the documents in a segment for the updated field (similar to 
 how livedocs work, and previously norms).
 * It's a fairly contained issue, attempting to handle just one data type to 
 update, yet requires many changes to core code which will also be useful for 
 updating other data types.
 * It has value in and on itself, and we don't need to allow updating all the 
 data types in Lucene at once ... we can do that gradually.
 I have some working patch already which I'll upload next, explaining the 
 changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4988) Upgrade svnkit to version compatible with svn 1.8

2013-09-15 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767854#comment-13767854
 ] 

Shai Erera commented on SOLR-4988:
--

Thanks Alan!

I see that 'precommit' runs 'documentation-lint', 'validate' and 
'check-svn-working-copy'. The latter is the one that fails, so for now I run 
the first two only.

 Upgrade svnkit to version compatible with svn 1.8
 -

 Key: SOLR-4988
 URL: https://issues.apache.org/jira/browse/SOLR-4988
 Project: Solr
  Issue Type: Task
Reporter: Alan Woodward
Assignee: Alan Woodward
 Attachments: SOLR-4988.patch


 If you've got subversion 1.8 installed, ant precommit fails due to svn 
 version incompatibilities.  It looks as though there isn't an svnkit release 
 yet that supports 1.8.  Once one is available, we should upgrade our 
 dependencies.
 See http://subversion.1072662.n5.nabble.com/ETA-on-1-8-support-td181632.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5207) lucene expressions module

2013-09-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767855#comment-13767855
 ] 

ASF subversion and git services commented on LUCENE-5207:
-

Commit 1523462 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1523462 ]

LUCENE-5207: lucene expressions module

 lucene expressions module
 -

 Key: LUCENE-5207
 URL: https://issues.apache.org/jira/browse/LUCENE-5207
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Ryan Ernst
 Attachments: LUCENE-5207.patch, LUCENE-5207.patch, LUCENE-5207.patch


 Expressions are geared at defining an alternative ranking function (e.g. 
 incorporating the text relevance score and other field values/ranking
 signals). So they are conceptually much more like ElasticSearch's scripting 
 support (http://www.elasticsearch.org/guide/reference/modules/scripting/) 
 than solr's function queries.
 Some additional notes:
 * In addition to referring to other fields, they can also refer to other 
 expressions, so they can be used as computed fields.
 * You can rank documents easily by multiple expressions (its a SortField at 
 the end), e.g. Sort by year descending, then some function of score price and 
 time ascending.
 * The provided javascript expression syntax is much more efficient than using 
 a scripting engine, because it does not have dynamic typing (compiles to 
 .class files that work on doubles). Performance is similar to writing a 
 custom FieldComparator yourself, but much easier to do.
 * We have solr integration to contribute in the future, but this is just the 
 standalone lucene part as a start. Since lucene has no schema, it includes an 
 implementation of Bindings (SimpleBindings) that maps variable names to 
 SortField's or other expressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5207) lucene expressions module

2013-09-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767858#comment-13767858
 ] 

ASF subversion and git services commented on LUCENE-5207:
-

Commit 1523464 from [~thetaphi] in branch 'dev/trunk'
[ https://svn.apache.org/r1523464 ]

LUCENE-5207: Fix a bug in the test and add final to the classloader

 lucene expressions module
 -

 Key: LUCENE-5207
 URL: https://issues.apache.org/jira/browse/LUCENE-5207
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Ryan Ernst
 Attachments: LUCENE-5207.patch, LUCENE-5207.patch, LUCENE-5207.patch


 Expressions are geared at defining an alternative ranking function (e.g. 
 incorporating the text relevance score and other field values/ranking
 signals). So they are conceptually much more like ElasticSearch's scripting 
 support (http://www.elasticsearch.org/guide/reference/modules/scripting/) 
 than solr's function queries.
 Some additional notes:
 * In addition to referring to other fields, they can also refer to other 
 expressions, so they can be used as computed fields.
 * You can rank documents easily by multiple expressions (its a SortField at 
 the end), e.g. Sort by year descending, then some function of score price and 
 time ascending.
 * The provided javascript expression syntax is much more efficient than using 
 a scripting engine, because it does not have dynamic typing (compiles to 
 .class files that work on doubles). Performance is similar to writing a 
 custom FieldComparator yourself, but much easier to do.
 * We have solr integration to contribute in the future, but this is just the 
 standalone lucene part as a start. Since lucene has no schema, it includes an 
 implementation of Bindings (SimpleBindings) that maps variable names to 
 SortField's or other expressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5216) Fix SegmentInfo.attributes when updates are involved

2013-09-15 Thread Shai Erera (JIRA)
Shai Erera created LUCENE-5216:
--

 Summary: Fix SegmentInfo.attributes when updates are involved
 Key: LUCENE-5216
 URL: https://issues.apache.org/jira/browse/LUCENE-5216
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera


Today, SegmentInfo.attributes are write-once. However, in the presence of field 
updates (see LUCENE-5189 and LUCENE-5215) this creates an issue, in which if a 
Codec decides to alter the attributes when updates are applied, they are 
silently discarded. This is rather a corner case, though one that should be 
addressed.

There were two solutions to address this:

# Record SI.attributes in SegmentInfos, so they are written per-commit, instead 
of the .si file.
# Remove them altogether, as they don't seem to be used anywhere in Lucene code 
today.

If we remove them, we basically don't take away special capability from Codecs, 
because they can still write the attributes to a separate file, or even the 
file they record the other data in. This will work even with updates, as long 
as Codecs respect the given segmentSuffix.

If we keep them, I think the simplest solution is to read/write them by 
SegmentInfos. But if we don't see a good use case, I suggest we remove them, as 
it's just extra code to maintain. I think we can even risk a backwards break 
and remove them completely from 4x, though if that's a problem, we can 
deprecate too.

If anyone sees a good usage for them, or better - already uses them, please 
speak up, so we can make the proper decision.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5216) Fix SegmentInfo.attributes when updates are involved

2013-09-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767864#comment-13767864
 ] 

Michael McCandless commented on LUCENE-5216:


+1 to remove them.

 Fix SegmentInfo.attributes when updates are involved
 

 Key: LUCENE-5216
 URL: https://issues.apache.org/jira/browse/LUCENE-5216
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera

 Today, SegmentInfo.attributes are write-once. However, in the presence of 
 field updates (see LUCENE-5189 and LUCENE-5215) this creates an issue, in 
 which if a Codec decides to alter the attributes when updates are applied, 
 they are silently discarded. This is rather a corner case, though one that 
 should be addressed.
 There were two solutions to address this:
 # Record SI.attributes in SegmentInfos, so they are written per-commit, 
 instead of the .si file.
 # Remove them altogether, as they don't seem to be used anywhere in Lucene 
 code today.
 If we remove them, we basically don't take away special capability from 
 Codecs, because they can still write the attributes to a separate file, or 
 even the file they record the other data in. This will work even with 
 updates, as long as Codecs respect the given segmentSuffix.
 If we keep them, I think the simplest solution is to read/write them by 
 SegmentInfos. But if we don't see a good use case, I suggest we remove them, 
 as it's just extra code to maintain. I think we can even risk a backwards 
 break and remove them completely from 4x, though if that's a problem, we can 
 deprecate too.
 If anyone sees a good usage for them, or better - already uses them, please 
 speak up, so we can make the proper decision.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5216) Fix SegmentInfo.attributes when updates are involved

2013-09-15 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767867#comment-13767867
 ] 

Shai Erera commented on LUCENE-5216:


I searched for SI.attributes(), they aren't used anywhere. Can we just remove 
them from the API? If so, it's easy -- only need to create a new format 
version. What do you think?

 Fix SegmentInfo.attributes when updates are involved
 

 Key: LUCENE-5216
 URL: https://issues.apache.org/jira/browse/LUCENE-5216
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera

 Today, SegmentInfo.attributes are write-once. However, in the presence of 
 field updates (see LUCENE-5189 and LUCENE-5215) this creates an issue, in 
 which if a Codec decides to alter the attributes when updates are applied, 
 they are silently discarded. This is rather a corner case, though one that 
 should be addressed.
 There were two solutions to address this:
 # Record SI.attributes in SegmentInfos, so they are written per-commit, 
 instead of the .si file.
 # Remove them altogether, as they don't seem to be used anywhere in Lucene 
 code today.
 If we remove them, we basically don't take away special capability from 
 Codecs, because they can still write the attributes to a separate file, or 
 even the file they record the other data in. This will work even with 
 updates, as long as Codecs respect the given segmentSuffix.
 If we keep them, I think the simplest solution is to read/write them by 
 SegmentInfos. But if we don't see a good use case, I suggest we remove them, 
 as it's just extra code to maintain. I think we can even risk a backwards 
 break and remove them completely from 4x, though if that's a problem, we can 
 deprecate too.
 If anyone sees a good usage for them, or better - already uses them, please 
 speak up, so we can make the proper decision.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5207) lucene expressions module

2013-09-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767869#comment-13767869
 ] 

ASF subversion and git services commented on LUCENE-5207:
-

Commit 1523470 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1523470 ]

LUCENE-5207: lucene expressions module

 lucene expressions module
 -

 Key: LUCENE-5207
 URL: https://issues.apache.org/jira/browse/LUCENE-5207
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Ryan Ernst
 Attachments: LUCENE-5207.patch, LUCENE-5207.patch, LUCENE-5207.patch


 Expressions are geared at defining an alternative ranking function (e.g. 
 incorporating the text relevance score and other field values/ranking
 signals). So they are conceptually much more like ElasticSearch's scripting 
 support (http://www.elasticsearch.org/guide/reference/modules/scripting/) 
 than solr's function queries.
 Some additional notes:
 * In addition to referring to other fields, they can also refer to other 
 expressions, so they can be used as computed fields.
 * You can rank documents easily by multiple expressions (its a SortField at 
 the end), e.g. Sort by year descending, then some function of score price and 
 time ascending.
 * The provided javascript expression syntax is much more efficient than using 
 a scripting engine, because it does not have dynamic typing (compiles to 
 .class files that work on doubles). Performance is similar to writing a 
 custom FieldComparator yourself, but much easier to do.
 * We have solr integration to contribute in the future, but this is just the 
 standalone lucene part as a start. Since lucene has no schema, it includes an 
 implementation of Bindings (SimpleBindings) that maps variable names to 
 SortField's or other expressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5207) lucene expressions module

2013-09-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767871#comment-13767871
 ] 

ASF subversion and git services commented on LUCENE-5207:
-

Commit 1523471 from [~rcmuir] in branch 'dev/branches/lucene5207'
[ https://svn.apache.org/r1523471 ]

LUCENE-5207: remove branch

 lucene expressions module
 -

 Key: LUCENE-5207
 URL: https://issues.apache.org/jira/browse/LUCENE-5207
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Ryan Ernst
 Attachments: LUCENE-5207.patch, LUCENE-5207.patch, LUCENE-5207.patch


 Expressions are geared at defining an alternative ranking function (e.g. 
 incorporating the text relevance score and other field values/ranking
 signals). So they are conceptually much more like ElasticSearch's scripting 
 support (http://www.elasticsearch.org/guide/reference/modules/scripting/) 
 than solr's function queries.
 Some additional notes:
 * In addition to referring to other fields, they can also refer to other 
 expressions, so they can be used as computed fields.
 * You can rank documents easily by multiple expressions (its a SortField at 
 the end), e.g. Sort by year descending, then some function of score price and 
 time ascending.
 * The provided javascript expression syntax is much more efficient than using 
 a scripting engine, because it does not have dynamic typing (compiles to 
 .class files that work on doubles). Performance is similar to writing a 
 custom FieldComparator yourself, but much easier to do.
 * We have solr integration to contribute in the future, but this is just the 
 standalone lucene part as a start. Since lucene has no schema, it includes an 
 implementation of Bindings (SimpleBindings) that maps variable names to 
 SortField's or other expressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5207) lucene expressions module

2013-09-15 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5207.
-

   Resolution: Fixed
Fix Version/s: 4.6
   5.0

Thanks Ryan!

 lucene expressions module
 -

 Key: LUCENE-5207
 URL: https://issues.apache.org/jira/browse/LUCENE-5207
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Ryan Ernst
 Fix For: 5.0, 4.6

 Attachments: LUCENE-5207.patch, LUCENE-5207.patch, LUCENE-5207.patch


 Expressions are geared at defining an alternative ranking function (e.g. 
 incorporating the text relevance score and other field values/ranking
 signals). So they are conceptually much more like ElasticSearch's scripting 
 support (http://www.elasticsearch.org/guide/reference/modules/scripting/) 
 than solr's function queries.
 Some additional notes:
 * In addition to referring to other fields, they can also refer to other 
 expressions, so they can be used as computed fields.
 * You can rank documents easily by multiple expressions (its a SortField at 
 the end), e.g. Sort by year descending, then some function of score price and 
 time ascending.
 * The provided javascript expression syntax is much more efficient than using 
 a scripting engine, because it does not have dynamic typing (compiles to 
 .class files that work on doubles). Performance is similar to writing a 
 custom FieldComparator yourself, but much easier to do.
 * We have solr integration to contribute in the future, but this is just the 
 standalone lucene part as a start. Since lucene has no schema, it includes an 
 implementation of Bindings (SimpleBindings) that maps variable names to 
 SortField's or other expressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5216) Fix SegmentInfo.attributes when updates are involved

2013-09-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767874#comment-13767874
 ] 

Robert Muir commented on LUCENE-5216:
-

AFAIK They are being used in branch_4x for 3.x back compat. So there i think we 
should simply deprecate, just so we dont have to reimplement hairy back compat 
:)

 Fix SegmentInfo.attributes when updates are involved
 

 Key: LUCENE-5216
 URL: https://issues.apache.org/jira/browse/LUCENE-5216
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera

 Today, SegmentInfo.attributes are write-once. However, in the presence of 
 field updates (see LUCENE-5189 and LUCENE-5215) this creates an issue, in 
 which if a Codec decides to alter the attributes when updates are applied, 
 they are silently discarded. This is rather a corner case, though one that 
 should be addressed.
 There were two solutions to address this:
 # Record SI.attributes in SegmentInfos, so they are written per-commit, 
 instead of the .si file.
 # Remove them altogether, as they don't seem to be used anywhere in Lucene 
 code today.
 If we remove them, we basically don't take away special capability from 
 Codecs, because they can still write the attributes to a separate file, or 
 even the file they record the other data in. This will work even with 
 updates, as long as Codecs respect the given segmentSuffix.
 If we keep them, I think the simplest solution is to read/write them by 
 SegmentInfos. But if we don't see a good use case, I suggest we remove them, 
 as it's just extra code to maintain. I think we can even risk a backwards 
 break and remove them completely from 4x, though if that's a problem, we can 
 deprecate too.
 If anyone sees a good usage for them, or better - already uses them, please 
 speak up, so we can make the proper decision.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #448: POMs out of sync

2013-09-15 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/448/

2 tests failed.
REGRESSION:  org.apache.solr.cloud.BasicDistributedZk2Test.testDistribSearch

Error Message:
null

Stack Trace:
java.lang.NullPointerException: null
at 
__randomizedtesting.SeedInfo.seed([1FF121B874B7B1F7:9E17AFA003E8D1CB]:0)
at 
org.apache.solr.common.cloud.ZkCoreNodeProps.getBaseUrl(ZkCoreNodeProps.java:40)
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.buildUrlMap(CloudSolrServer.java:406)
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:304)
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:498)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168)
at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:146)
at 
org.apache.solr.cloud.AbstractFullDistribZkTestBase.commit(AbstractFullDistribZkTestBase.java:1517)
at 
org.apache.solr.cloud.BasicDistributedZk2Test.brindDownShardIndexSomeDocsAndRecover(BasicDistributedZk2Test.java:288)
at 
org.apache.solr.cloud.BasicDistributedZk2Test.doTest(BasicDistributedZk2Test.java:115)


FAILED:  org.apache.solr.cloud.SyncSliceTest.testDistribSearch

Error Message:
expected:5 but was:4

Stack Trace:
java.lang.AssertionError: expected:5 but was:4
at 
__randomizedtesting.SeedInfo.seed([FBE1BE5413A67E6B:7A07304C64F91E57]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at org.apache.solr.cloud.SyncSliceTest.doTest(SyncSliceTest.java:175)




Build Log:
[...truncated 25001 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0-ea-b106) - Build # 7478 - Failure!

2013-09-15 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/7478/
Java: 32bit/jdk1.8.0-ea-b106 -client -XX:+UseG1GC

All tests passed

Build Log:
[...truncated 31860 lines...]
BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:396: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:335: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:66: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:135: The 
following files are missing svn:eol-style (or binary svn:mime-type):
* ./lucene/core/src/java/org/apache/lucene/index/NumericUpdate.java
* ./lucene/core/src/java/org/apache/lucene/util/RefCount.java
* 
./lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java

Total time: 42 minutes 17 seconds
Build step 'Invoke Ant' marked build as failure
Description set: Java: 32bit/jdk1.8.0-ea-b106 -client -XX:+UseG1GC
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0-ea-b106) - Build # 7478 - Failure!

2013-09-15 Thread Shai Erera
committed a fix.

Shai


On Sun, Sep 15, 2013 at 9:54 PM, Policeman Jenkins Server 
jenk...@thetaphi.de wrote:

 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/7478/
 Java: 32bit/jdk1.8.0-ea-b106 -client -XX:+UseG1GC

 All tests passed

 Build Log:
 [...truncated 31860 lines...]
 BUILD FAILED
 /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:396: The
 following error occurred while executing this line:
 /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:335: The
 following error occurred while executing this line:
 /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:66:
 The following error occurred while executing this line:
 /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:135:
 The following files are missing svn:eol-style (or binary svn:mime-type):
 * ./lucene/core/src/java/org/apache/lucene/index/NumericUpdate.java
 * ./lucene/core/src/java/org/apache/lucene/util/RefCount.java
 *
 ./lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java

 Total time: 42 minutes 17 seconds
 Build step 'Invoke Ant' marked build as failure
 Description set: Java: 32bit/jdk1.8.0-ea-b106 -client -XX:+UseG1GC
 Archiving artifacts
 Recording test results
 Email was triggered for: Failure
 Sending email for trigger: Failure




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5234) Allow SolrResourceLoader to load resources from URLs

2013-09-15 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated SOLR-5234:


Attachment: SOLR-5234.patch

New patch, with test for local URLs (thanks for the suggestion, Mark!)

 Allow SolrResourceLoader to load resources from URLs
 

 Key: SOLR-5234
 URL: https://issues.apache.org/jira/browse/SOLR-5234
 Project: Solr
  Issue Type: Improvement
Reporter: Alan Woodward
Assignee: Alan Woodward
Priority: Minor
 Attachments: SOLR-5234.patch, SOLR-5234.patch


 This would allow multiple solr instance to share large configuration files.  
 It would also help resolve problems caused by attempting to store 1Mb files 
 in zookeeper.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 828 - Failure!

2013-09-15 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/828/
Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 35870 lines...]
BUILD FAILED
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/build.xml:396: The following 
error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/build.xml:335: The following 
error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/extra-targets.xml:66: The 
following error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/extra-targets.xml:135: The 
following files are missing svn:eol-style (or binary svn:mime-type):
* ./lucene/core/src/java/org/apache/lucene/index/NumericUpdate.java
* ./lucene/core/src/java/org/apache/lucene/util/RefCount.java
* 
./lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java

Total time: 130 minutes 41 seconds
Build step 'Invoke Ant' marked build as failure
Description set: Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops 
-XX:+UseConcMarkSweepGC
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5215) Add support for FieldInfos generation

2013-09-15 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767887#comment-13767887
 ] 

Shai Erera commented on LUCENE-5215:


I started by creating a new Lucene46Codec and matching Lucene46FieldInfosFormat 
(+reader/writer). There is an API issue with FISFormat - it doesn't take 
segmentSuffix in neither FISReader.read() nor FISWriter.write(). We need to 
make an API break and I'm thinking if we want to do it big time and pass 
SegRead/WriteState already, instead of adding just one parameter? To be 
consistent with the other formats (well, as much as possible - the other 
formats take SRS/SWS and pass to their reader/writer).

 Add support for FieldInfos generation
 -

 Key: LUCENE-5215
 URL: https://issues.apache.org/jira/browse/LUCENE-5215
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera

 In LUCENE-5189 we've identified few reasons to do that:
 # If you want to update docs' values of field 'foo', where 'foo' exists in 
 the index, but not in a specific segment (sparse DV), we cannot allow that 
 and have to throw a late UOE. If we could rewrite FieldInfos (with 
 generation), this would be possible since we'd also write a new generation of 
 FIS.
 # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the 
 consumer isn't allowed to change FI.attributes because we cannot modify the 
 existing FIS. This is implicit however, and we silently ignore any modified 
 attributes. FieldInfos.gen will allow that too.
 The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and 
 add support for FIS generation in FieldInfosFormat, SegReader etc., like we 
 now do for DocValues. I'll work on a patch.
 Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that 
 have same limitation -- if a Codec modifies them, they are silently being 
 ignored, since we don't gen the .si files. I think we can easily solve that 
 by recording SI.attributes in SegmentInfos, so they are recorded per-commit. 
 But I think it should be handled in a separate issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Parquet dictionary encoding bit packing

2013-09-15 Thread Otis Gospodnetic
Hi,

I was reading the Parquet announcement from July:
https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop

And a few things caught my attention - Dictionary encoding and
(dynamic) bit packing.  This smells like something Adrien likes to eat
for breakfast.

Over in the Hadoop ecosystem Parquet interest has picked up:
http://search-hadoop.com/?q=parquet

I thought I'd point it out as I haven't seen anyone bring this up.  I
imagine there are ideas to be borrowed there.

Otis
--
Solr  ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5215) Add support for FieldInfos generation

2013-09-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767921#comment-13767921
 ] 

Robert Muir commented on LUCENE-5215:
-

No, they dont take this and for damn good reason: SRS/SWS contain fieldinfos. 

 Add support for FieldInfos generation
 -

 Key: LUCENE-5215
 URL: https://issues.apache.org/jira/browse/LUCENE-5215
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera

 In LUCENE-5189 we've identified few reasons to do that:
 # If you want to update docs' values of field 'foo', where 'foo' exists in 
 the index, but not in a specific segment (sparse DV), we cannot allow that 
 and have to throw a late UOE. If we could rewrite FieldInfos (with 
 generation), this would be possible since we'd also write a new generation of 
 FIS.
 # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the 
 consumer isn't allowed to change FI.attributes because we cannot modify the 
 existing FIS. This is implicit however, and we silently ignore any modified 
 attributes. FieldInfos.gen will allow that too.
 The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and 
 add support for FIS generation in FieldInfosFormat, SegReader etc., like we 
 now do for DocValues. I'll work on a patch.
 Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that 
 have same limitation -- if a Codec modifies them, they are silently being 
 ignored, since we don't gen the .si files. I think we can easily solve that 
 by recording SI.attributes in SegmentInfos, so they are recorded per-commit. 
 But I think it should be handled in a separate issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2548) Multithreaded faceting

2013-09-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-2548:
---

Attachment: SOLR-2548_multithreaded_faceting,_dsmiley.patch

The attached patch improves on my previous one a little -- a few more comments, 
a variable rename for clarity, an assertion.  And of course I removed the 
future.cancel() loop.

I think this code is pretty clear as far as multithreaded code goes:  One loop 
that submits tasks, and a follow-on loop that consumes the results of those 
tasks, and a semaphore to ensure no more than the desired number of threads are 
computing the facets.

It'd be cool to eventually extend multithreading across all the faceting types. 
I'll look into that next week. 

 Multithreaded faceting
 --

 Key: SOLR-2548
 URL: https://issues.apache.org/jira/browse/SOLR-2548
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 3.1
Reporter: Janne Majaranta
Assignee: Erick Erickson
Priority: Minor
  Labels: facet
 Fix For: 4.5, 5.0

 Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
 SOLR-2548_multithreaded_faceting,_dsmiley.patch, 
 SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548.patch, 
 SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, 
 SOLR-2548.patch


 Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Parquet dictionary encoding bit packing

2013-09-15 Thread Jack Krupansky

Okay, but what exactly does Parquet have to offer to a search engine?

I mean, is it simply an alternate form of codec?

Would it merely reduce I/O and mass storage requirements?

Would it impact search performance at all?

Would it add a significant search start-up warming overhead? Or, does it 
offer some magic that would in fact dramatically reduce the time to do the 
first query?


Or, is it merely an alternative format for ingestion of an input stream? 
Like, say, better than JavaBin? Or, maybe for more efficient internode 
transfers of documents for SolrCloud?


-- Jack Krupansky

-Original Message- 
From: Otis Gospodnetic

Sent: Sunday, September 15, 2013 5:17 PM
To: dev@lucene.apache.org
Subject: Parquet dictionary encoding  bit packing

Hi,

I was reading the Parquet announcement from July:
https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop

And a few things caught my attention - Dictionary encoding and
(dynamic) bit packing.  This smells like something Adrien likes to eat
for breakfast.

Over in the Hadoop ecosystem Parquet interest has picked up:
http://search-hadoop.com/?q=parquet

I thought I'd point it out as I haven't seen anyone bring this up.  I
imagine there are ideas to be borrowed there.

Otis
--
Solr  ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5242) Runtime Oracle Corporation OpenJDK 64-Bit Server VM (1.7.0_25 23.7-b01) is not from oracle

2013-09-15 Thread james michael dupont (JIRA)
james michael dupont created SOLR-5242:
--

 Summary: Runtime Oracle Corporation OpenJDK 64-Bit Server VM 
(1.7.0_25 23.7-b01) is not from oracle
 Key: SOLR-5242
 URL: https://issues.apache.org/jira/browse/SOLR-5242
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 5.0
Reporter: james michael dupont


the webpage says that it is oracle openjdk but that is not from oracle
Runtime
Oracle Corporation OpenJDK 64-Bit Server VM (1.7.0_25 23.7-b01)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2548) Multithreaded faceting

2013-09-15 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767941#comment-13767941
 ] 

Erick Erickson commented on SOLR-2548:
--

So maybe just commit this when you think it's ready? I'll probably get a chance 
to look it over Tuesday on the airplane, but if you're happy with it feel free. 
We can always put the other faceting types into a new JIRA?

 Multithreaded faceting
 --

 Key: SOLR-2548
 URL: https://issues.apache.org/jira/browse/SOLR-2548
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 3.1
Reporter: Janne Majaranta
Assignee: Erick Erickson
Priority: Minor
  Labels: facet
 Fix For: 4.5, 5.0

 Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
 SOLR-2548_multithreaded_faceting,_dsmiley.patch, 
 SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548.patch, 
 SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, 
 SOLR-2548.patch


 Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #971: POMs out of sync

2013-09-15 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/971/

No tests ran.

Build Log:
[...truncated 11794 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS-MAVEN] Lucene-Solr-Maven-trunk #971: POMs out of sync

2013-09-15 Thread Robert Muir
I committed a fix...

I will open an issue to replace these exclusions with wildcards (so it
works like ivy).


On Sun, Sep 15, 2013 at 6:53 PM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/971/

 No tests ran.

 Build Log:
 [...truncated 11794 lines...]




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5217) disable transitive dependencies in maven config

2013-09-15 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-5217:
---

 Summary: disable transitive dependencies in maven config
 Key: LUCENE-5217
 URL: https://issues.apache.org/jira/browse/LUCENE-5217
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir


Our ivy configuration does this: each dependency is specified and so we know 
what will happen. Unfortunately the maven setup is not configured the same way.

Instead the maven setup is configured to download the internet: and it excludes 
certain things specifically.

This is really hard to configure and maintain: we added a 
'validate-maven-dependencies' that tries to fail on any extra jars, but all it 
really does is run a license check after maven runs. It wouldnt find 
unnecessary dependencies being dragged in if something else in lucene was using 
them and thus they had a license file.

Since maven supports wildcard exclusions: MNG-3832, we can disable this 
transitive shit completely.

We should do this, so its configuration is the exact parallel of ivy.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5217) disable transitive dependencies in maven config

2013-09-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767950#comment-13767950
 ] 

Robert Muir commented on LUCENE-5217:
-

This is also described here: 
http://www.smartjava.org/content/maven-and-wildcard-exclusions

I think it just means we have to require a minimum of maven 3 instead of also 
supporting 2. Since this has been out for 3 years (in fact older than the ant 
1.8.2 that we require), I don't see this as a significant imposition on anyone?

 disable transitive dependencies in maven config
 ---

 Key: LUCENE-5217
 URL: https://issues.apache.org/jira/browse/LUCENE-5217
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Our ivy configuration does this: each dependency is specified and so we know 
 what will happen. Unfortunately the maven setup is not configured the same 
 way.
 Instead the maven setup is configured to download the internet: and it 
 excludes certain things specifically.
 This is really hard to configure and maintain: we added a 
 'validate-maven-dependencies' that tries to fail on any extra jars, but all 
 it really does is run a license check after maven runs. It wouldnt find 
 unnecessary dependencies being dragged in if something else in lucene was 
 using them and thus they had a license file.
 Since maven supports wildcard exclusions: MNG-3832, we can disable this 
 transitive shit completely.
 We should do this, so its configuration is the exact parallel of ivy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767999#comment-13767999
 ] 

ASF subversion and git services commented on LUCENE-5189:
-

Commit 1523525 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1523525 ]

LUCENE-5189: add testcase

 Numeric DocValues Updates
 -

 Key: LUCENE-5189
 URL: https://issues.apache.org/jira/browse/LUCENE-5189
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch


 In LUCENE-4258 we started to work on incremental field updates, however the 
 amount of changes are immense and hard to follow/consume. The reason is that 
 we targeted postings, stored fields, DV etc., all from the get go.
 I'd like to start afresh here, with numeric-dv-field updates only. There are 
 a couple of reasons to that:
 * NumericDV fields should be easier to update, if e.g. we write all the 
 values of all the documents in a segment for the updated field (similar to 
 how livedocs work, and previously norms).
 * It's a fairly contained issue, attempting to handle just one data type to 
 update, yet requires many changes to core code which will also be useful for 
 updating other data types.
 * It has value in and on itself, and we don't need to allow updating all the 
 data types in Lucene at once ... we can do that gradually.
 I have some working patch already which I'll upload next, explaining the 
 changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5242) Runtime Oracle Corporation OpenJDK 64-Bit Server VM (1.7.0_25 23.7-b01) is not from oracle

2013-09-15 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-5242.
---

Resolution: Not A Problem

This is not a problem, this is what openjdk reports.
you can easily see this yourself with any openjdk:

java -XshowSettings:properties -version | grep -i oracle
java.specification.vendor = Oracle Corporation
java.vendor = Oracle Corporation
java.vendor.url = http://java.oracle.com/
java.vm.specification.vendor = Oracle Corporation
java.vm.vendor = Oracle Corporation


 Runtime Oracle Corporation OpenJDK 64-Bit Server VM (1.7.0_25 23.7-b01) is 
 not from oracle
 --

 Key: SOLR-5242
 URL: https://issues.apache.org/jira/browse/SOLR-5242
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 5.0
Reporter: james michael dupont

 the webpage says that it is oracle openjdk but that is not from oracle
 Runtime
 Oracle Corporation OpenJDK 64-Bit Server VM (1.7.0_25 23.7-b01)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5210) Unit tests for LicenseCheckTask.

2013-09-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768019#comment-13768019
 ] 

Robert Muir commented on LUCENE-5210:
-

I have no opinion on the eclipse/ant/antunit stuff.

I just want to say currently there is no test, so I think we should start with 
a test and then improve it.

My one suggestion about testing and the jars: if the test is in java, it can 
easily create jars on the fly in temp dirs so we dont have to package them 
(with fake licenses). This is done in ResourceLoaderTest in solr for example:

{code}
  public void testClassLoaderLibs() throws Exception {
File tmpRoot = _TestUtil.getTempDir(testClassLoaderLibs);

File lib = new File(tmpRoot, lib);
lib.mkdirs();

JarOutputStream jar1 = new JarOutputStream(new FileOutputStream(new 
File(lib, jar1.jar)));
jar1.putNextEntry(new JarEntry(aLibFile));
jar1.closeEntry();
jar1.close();
...
{code}


 Unit tests for LicenseCheckTask.
 

 Key: LUCENE-5210
 URL: https://issues.apache.org/jira/browse/LUCENE-5210
 Project: Lucene - Core
  Issue Type: Test
  Components: general/build
Reporter: Mark Miller
 Attachments: LUCENE-5210.patch, LUCENE-5210.patch


 While working on LUCENE-5209, I noticed the LicenseCheckTask is kind of a 
 second class citizen - excluded from UI src folder setup and with no units 
 tests. This was a little scary to me.
 I've started adding some units tests. So far I have mainly just done the 
 lifting of getting units tests to work as part of tools.
 I have added two super simple tests - really just the start - but something 
 to build on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5210) Unit tests for LicenseCheckTask.

2013-09-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768025#comment-13768025
 ] 

Robert Muir commented on LUCENE-5210:
-

As far as current patch, i dont really have a problem with it (any other 
simplifications can be done later).

I have only one concern: will this make a lucene-tools module (e.g. packaged in 
releases, published in maven?)

It seems like it might, which separately might be a good idea so someone can 
use the stuff in this folder in their own project, except a few things would be 
off as far as packaging:
* it should probably be restructured, so that various configs used by the build 
are in src/resources and put inside its jar file (e.g. forbiddenApis configs 
and so on)
* I think this depends on ant, but there is no dependency of ant in the ivy.xml
* it would need maven configuration and so on, added in smoketester, etc.
* there might be other exclusions for tools/ in the build that are not 
appropriate, etc.
* as far as the name, maybe build-tools would be a better one (since its not 
tools for working on lucene indexes).

If smoketester passes though, I am happy: We can just make sure its excluded 
from the right places and not doing something we don't want wrt packaging for 
now, and discuss this stuff on other issues.


 Unit tests for LicenseCheckTask.
 

 Key: LUCENE-5210
 URL: https://issues.apache.org/jira/browse/LUCENE-5210
 Project: Lucene - Core
  Issue Type: Test
  Components: general/build
Reporter: Mark Miller
 Attachments: LUCENE-5210.patch, LUCENE-5210.patch


 While working on LUCENE-5209, I noticed the LicenseCheckTask is kind of a 
 second class citizen - excluded from UI src folder setup and with no units 
 tests. This was a little scary to me.
 I've started adding some units tests. So far I have mainly just done the 
 lifting of getting units tests to work as part of tools.
 I have added two super simple tests - really just the start - but something 
 to build on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5210) Unit tests for LicenseCheckTask.

2013-09-15 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768033#comment-13768033
 ] 

Mark Miller commented on LUCENE-5210:
-

Also note, prepare-release does not currently pass the way things are - 
something to do with a maven artifact that now tries to run on tools.

 Unit tests for LicenseCheckTask.
 

 Key: LUCENE-5210
 URL: https://issues.apache.org/jira/browse/LUCENE-5210
 Project: Lucene - Core
  Issue Type: Test
  Components: general/build
Reporter: Mark Miller
 Attachments: LUCENE-5210.patch, LUCENE-5210.patch


 While working on LUCENE-5209, I noticed the LicenseCheckTask is kind of a 
 second class citizen - excluded from UI src folder setup and with no units 
 tests. This was a little scary to me.
 I've started adding some units tests. So far I have mainly just done the 
 lifting of getting units tests to work as part of tools.
 I have added two super simple tests - really just the start - but something 
 to build on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5210) Unit tests for LicenseCheckTask.

2013-09-15 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768032#comment-13768032
 ] 

Mark Miller commented on LUCENE-5210:
-

Yeah, my first thought was to write out the test files to a tmp dir, but 
essentially I was too lazy to code it up.

 Unit tests for LicenseCheckTask.
 

 Key: LUCENE-5210
 URL: https://issues.apache.org/jira/browse/LUCENE-5210
 Project: Lucene - Core
  Issue Type: Test
  Components: general/build
Reporter: Mark Miller
 Attachments: LUCENE-5210.patch, LUCENE-5210.patch


 While working on LUCENE-5209, I noticed the LicenseCheckTask is kind of a 
 second class citizen - excluded from UI src folder setup and with no units 
 tests. This was a little scary to me.
 I've started adding some units tests. So far I have mainly just done the 
 lifting of getting units tests to work as part of tools.
 I have added two super simple tests - really just the start - but something 
 to build on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5210) Unit tests for LicenseCheckTask.

2013-09-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768034#comment-13768034
 ] 

Robert Muir commented on LUCENE-5210:
-

I think if we _just want to run tests_ for now, we should change the test 
target to explicitly recurse to tools, rather than modifying the 'general 
module macro' in common-build.

otherwise other tasks (like packaging, javadocs, maven, etc) will try to do 
things with tools.

 Unit tests for LicenseCheckTask.
 

 Key: LUCENE-5210
 URL: https://issues.apache.org/jira/browse/LUCENE-5210
 Project: Lucene - Core
  Issue Type: Test
  Components: general/build
Reporter: Mark Miller
 Attachments: LUCENE-5210.patch, LUCENE-5210.patch


 While working on LUCENE-5209, I noticed the LicenseCheckTask is kind of a 
 second class citizen - excluded from UI src folder setup and with no units 
 tests. This was a little scary to me.
 I've started adding some units tests. So far I have mainly just done the 
 lifting of getting units tests to work as part of tools.
 I have added two super simple tests - really just the start - but something 
 to build on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5210) Unit tests for LicenseCheckTask.

2013-09-15 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768035#comment-13768035
 ] 

Mark Miller commented on LUCENE-5210:
-

bq. will this make a lucene-tools module (e.g. packaged in releases, published 
in maven?)

Yeah, that is what is happening currently - I'm sure that is what is causing 
prepare-release to have issues.

 Unit tests for LicenseCheckTask.
 

 Key: LUCENE-5210
 URL: https://issues.apache.org/jira/browse/LUCENE-5210
 Project: Lucene - Core
  Issue Type: Test
  Components: general/build
Reporter: Mark Miller
 Attachments: LUCENE-5210.patch, LUCENE-5210.patch


 While working on LUCENE-5209, I noticed the LicenseCheckTask is kind of a 
 second class citizen - excluded from UI src folder setup and with no units 
 tests. This was a little scary to me.
 I've started adding some units tests. So far I have mainly just done the 
 lifting of getting units tests to work as part of tools.
 I have added two super simple tests - really just the start - but something 
 to build on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5210) Unit tests for LicenseCheckTask.

2013-09-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768037#comment-13768037
 ] 

Robert Muir commented on LUCENE-5210:
-

look at regenerate task in build.xml, it has a subant explicitly going to 
'core' to run a task.

we'd just want something like that subant, call it someting like 'test-tools'

and have 'test' depend on that.

 Unit tests for LicenseCheckTask.
 

 Key: LUCENE-5210
 URL: https://issues.apache.org/jira/browse/LUCENE-5210
 Project: Lucene - Core
  Issue Type: Test
  Components: general/build
Reporter: Mark Miller
 Attachments: LUCENE-5210.patch, LUCENE-5210.patch


 While working on LUCENE-5209, I noticed the LicenseCheckTask is kind of a 
 second class citizen - excluded from UI src folder setup and with no units 
 tests. This was a little scary to me.
 I've started adding some units tests. So far I have mainly just done the 
 lifting of getting units tests to work as part of tools.
 I have added two super simple tests - really just the start - but something 
 to build on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5215) Add support for FieldInfos generation

2013-09-15 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768041#comment-13768041
 ] 

Shai Erera commented on LUCENE-5215:


It's ok that SWS contains fieldInfos - FieldsWriter needs to write them. And I 
don't think it's bad SRS contains fieldInfos, we can just assert in 
FieldsReader that they are null?

 Add support for FieldInfos generation
 -

 Key: LUCENE-5215
 URL: https://issues.apache.org/jira/browse/LUCENE-5215
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera

 In LUCENE-5189 we've identified few reasons to do that:
 # If you want to update docs' values of field 'foo', where 'foo' exists in 
 the index, but not in a specific segment (sparse DV), we cannot allow that 
 and have to throw a late UOE. If we could rewrite FieldInfos (with 
 generation), this would be possible since we'd also write a new generation of 
 FIS.
 # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the 
 consumer isn't allowed to change FI.attributes because we cannot modify the 
 existing FIS. This is implicit however, and we silently ignore any modified 
 attributes. FieldInfos.gen will allow that too.
 The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and 
 add support for FIS generation in FieldInfosFormat, SegReader etc., like we 
 now do for DocValues. I'll work on a patch.
 Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that 
 have same limitation -- if a Codec modifies them, they are silently being 
 ignored, since we don't gen the .si files. I think we can easily solve that 
 by recording SI.attributes in SegmentInfos, so they are recorded per-commit. 
 But I think it should be handled in a separate issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5215) Add support for FieldInfos generation

2013-09-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768049#comment-13768049
 ] 

Robert Muir commented on LUCENE-5215:
-

Sorry, that would be horribly really confusing.

SegmentRead/Write state are for the *data* portions of the codec: postings, 
vectors, ...
they have all metadata available at this point, so it makes sense.

However these metadata portions are bootstrapped do not. they only have 
limited things available. The api should only pass what they actually have 
access to: no nulls!

 Add support for FieldInfos generation
 -

 Key: LUCENE-5215
 URL: https://issues.apache.org/jira/browse/LUCENE-5215
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera

 In LUCENE-5189 we've identified few reasons to do that:
 # If you want to update docs' values of field 'foo', where 'foo' exists in 
 the index, but not in a specific segment (sparse DV), we cannot allow that 
 and have to throw a late UOE. If we could rewrite FieldInfos (with 
 generation), this would be possible since we'd also write a new generation of 
 FIS.
 # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the 
 consumer isn't allowed to change FI.attributes because we cannot modify the 
 existing FIS. This is implicit however, and we silently ignore any modified 
 attributes. FieldInfos.gen will allow that too.
 The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and 
 add support for FIS generation in FieldInfosFormat, SegReader etc., like we 
 now do for DocValues. I'll work on a patch.
 Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that 
 have same limitation -- if a Codec modifies them, they are silently being 
 ignored, since we don't gen the .si files. I think we can easily solve that 
 by recording SI.attributes in SegmentInfos, so they are recorded per-commit. 
 But I think it should be handled in a separate issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2013-09-15 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768078#comment-13768078
 ] 

Kranti Parisa commented on SOLR-4787:
-

I have implemented multi-value keys for hjoin using a new field 
UnIvertedLongField. Sanity checks looks good. Also tested with FQs (nested 
Joins). I will prepare a patch sometime tomorrow and post here.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
  lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 *BitSetJoinQParserPlugin aka bjoin*
 The bjoin behaves exactly like the hjoin but uses a 

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-09-15 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768078#comment-13768078
 ] 

Kranti Parisa edited comment on SOLR-4787 at 9/16/13 5:38 AM:
--

I have implemented multi-value keys for hjoin using a new field 
UnIvertedLongField. Sanity checks looks good. Also tested with FQs (nested 
Joins). I will run some performance tests and prepare the patch sometime 
tomorrow.

  was (Author: krantiparisa):
I have implemented multi-value keys for hjoin using a new field 
UnIvertedLongField. Sanity checks looks good. Also tested with FQs (nested 
Joins). I will prepare a patch sometime tomorrow and post here.
  
 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser 

[jira] [Created] (LUCENE-5218) background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException

2013-09-15 Thread Littlestar (JIRA)
Littlestar created LUCENE-5218:
--

 Summary: background merge hit exception  Caused by: 
java.lang.ArrayIndexOutOfBoundsException
 Key: LUCENE-5218
 URL: https://issues.apache.org/jira/browse/LUCENE-5218
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.4
 Environment: Linux MMapDirectory.
Reporter: Littlestar


forceMerge(80)
==
Caused by: java.io.IOException: background merge hit exception: 
_3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 
_16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 
_dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt 
[maxNumSegments=80]
at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714)
at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
at 
com.trs.hybase.engine.lucene.LuceneEngine.flushAndReopen(LuceneEngine.java:1295)
... 4 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
at 
org.apache.lucene.util.PagedBytes$Reader.fillSlice(PagedBytes.java:92)
at 
org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$6.get(Lucene42DocValuesProducer.java:267)
at 
org.apache.lucene.codecs.DocValuesConsumer$2$1.setNext(DocValuesConsumer.java:239)
at 
org.apache.lucene.codecs.DocValuesConsumer$2$1.hasNext(DocValuesConsumer.java:201)
at 
org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addBinaryField(Lucene42DocValuesConsumer.java:218)
at 
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:110)
at 
org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:186)
at 
org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:171)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

===

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5218) background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException

2013-09-15 Thread Littlestar (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Littlestar updated LUCENE-5218:
---

Description: 
forceMerge(80)
==
Caused by: java.io.IOException: background merge hit exception: 
_3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 
_16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 
_dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt 
[maxNumSegments=80]
at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714)
at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
at 
com.xxx.yyy.engine.lucene.LuceneEngine.flushAndReopen(LuceneEngine.java:1295)
... 4 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
at 
org.apache.lucene.util.PagedBytes$Reader.fillSlice(PagedBytes.java:92)
at 
org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$6.get(Lucene42DocValuesProducer.java:267)
at 
org.apache.lucene.codecs.DocValuesConsumer$2$1.setNext(DocValuesConsumer.java:239)
at 
org.apache.lucene.codecs.DocValuesConsumer$2$1.hasNext(DocValuesConsumer.java:201)
at 
org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addBinaryField(Lucene42DocValuesConsumer.java:218)
at 
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:110)
at 
org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:186)
at 
org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:171)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

===

  was:
forceMerge(80)
==
Caused by: java.io.IOException: background merge hit exception: 
_3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 
_16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 
_dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt 
[maxNumSegments=80]
at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714)
at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
at 
com.trs.hybase.engine.lucene.LuceneEngine.flushAndReopen(LuceneEngine.java:1295)
... 4 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
at 
org.apache.lucene.util.PagedBytes$Reader.fillSlice(PagedBytes.java:92)
at 
org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$6.get(Lucene42DocValuesProducer.java:267)
at 
org.apache.lucene.codecs.DocValuesConsumer$2$1.setNext(DocValuesConsumer.java:239)
at 
org.apache.lucene.codecs.DocValuesConsumer$2$1.hasNext(DocValuesConsumer.java:201)
at 
org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addBinaryField(Lucene42DocValuesConsumer.java:218)
at 
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:110)
at 
org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:186)
at 
org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:171)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

===


 background merge hit exception  Caused by: 
 java.lang.ArrayIndexOutOfBoundsException
 -

 Key: LUCENE-5218
 URL: https://issues.apache.org/jira/browse/LUCENE-5218
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.4
 Environment: Linux MMapDirectory.
Reporter: Littlestar

 forceMerge(80)
 ==
 Caused by: java.io.IOException: background merge hit exception: 
 _3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 
 _16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 
 _dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt