date:20121202

[jira] [Commented] (SOLR-4123) ICUTokenizerFactory - per-script RBBI customization

2012-12-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508553#comment-13508553
 ] 

Uwe Schindler commented on SOLR-4123:
-

Can you please remove the workaround inside the factory (prepending "/")? This 
will break non-tests (e.g. when those classes are loaded from file system). 
Just *only* load classes from the local package of the class that was passed 
into the resourceloader.

bq. Yes, we should remove this /-stuff!!!

I misunderstood that. I thought you wanted to fix something else. YES, PLEASE 
REMOVE, it may break non-tests!

> ICUTokenizerFactory - per-script RBBI customization
> ---
>
> Key: SOLR-4123
> URL: https://issues.apache.org/jira/browse/SOLR-4123
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Affects Versions: 4.0
>Reporter: Shawn Heisey
> Fix For: 4.1, 5.0
>
> Attachments: SOLR-4123.patch, SOLR-4123.patch, SOLR-4123.patch
>
>
> Initially this started out as an idea for a configuration knob on 
> ICUTokenizer that would allow me to tell it not to tokenize on punctuation.  
> Through IRC discussion on #lucene, it sorta ballooned.  The committers had a 
> long discussion about it that I don't really understand, so I'll be including 
> it in the comments.
> I am a Solr user, so I would also need the ability to access the 
> configuration from there, likely either in schema.xml or solrconfig.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 112 - Failure

2012-12-02 Thread Dawid Weiss

Good to see those suite timeouts are working :)

Dawid

On Sun, Dec 2, 2012 at 11:50 PM, Michael McCandless
 wrote:
> You're welcome!
>
> I committed a fix to cut back on the index size when MockRandomMP is
> used, because this MP is O(N^2) cost!
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Sun, Dec 2, 2012 at 5:00 PM, Uwe Schindler  wrote:
>> Thanks!
>>
>> Uwe
>>
>>
>>
>> Michael McCandless  schrieb:
>>>
>>> I'll dig ...
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Sun, Dec 2, 2012 at 3:10 PM, Apache Jenkins Server
>>>  wrote:

 Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/112/

 2 tests failed.
 REGRESSION:  org.apache.lucene.index.TestBagOfPositions.test

 Error Message:
 Test abandoned because suite timeout was reached.

 Stack Trace:
 java.lang.Exception: Test abandoned because suite timeout was reached.
 at __randomizedtesting.SeedInfo.seed([E4E0F4496BBBE86F]:0)


 FAILED:
 junit.framework.TestSuite.org.apache.lucene.index.TestBagOfPositions

 Error Message:
 Suite timeout exceeded (>= 720 msec).

 Stack Trace:
 java.lang.Exception: Suite timeout exceeded (>= 720 msec).
 at __randomizedtesting.SeedInfo.seed([E4E0F4496BBBE86F]:0)




 Build Log:
 [...truncated 1360 lines...]
 [junit4:junit4] Suite: org.apache.lucene.index.TestBagOfPositions
 [junit4:junit4]   2> 2012-12-3 2:05:06
 com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate
 [junit4:junit4]   2> WARNING: Suite execution timed out:
 org.apache.lucene.index.TestBagOfPositions
 [junit4:junit4]   2>  jstack at approximately timeout time 
 [junit4:junit4]   2> "Thread-319" ID=415 RUNNABLE
 [junit4:junit4]   2>at
 org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:118)
 [junit4:junit4]
  2>at
 org.apache.lucene.store.MockIndexOutputWrapper.writeByte(MockIndexOutputWrapper.java:73)
 [junit4:junit4]   2>at
 org.apache.lucene.store.DataOutput.writeInt(DataOutput.java:70)
 [junit4:junit4]   2>at
 org.apache.lucene.store.DataOutput.writeLong(DataOutput.java:205)
 [junit4:junit4]   2>at
 org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.addRawDocuments(Lucene40TermVectorsWriter.java:298)
 [junit4:junit4]   2>at
 org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.copyVectorsNoDeletions(Lucene40TermVectorsWriter.java:407)
 [junit4:junit4]   2>at
 org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.merge(Lucene40TermVectorsWriter.java:330)
 [junit4:junit4]   2>at
 org.apache.lucene.index.SegmentMerger.mergeVectors(SegmentMerger.java:261)
 [junit4:junit4]   2>at
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:115)
 [junit4:junit4]   2>at
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3682)
 [junit4:junit4]   2>at
 org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3288)
 [junit4:junit4]   2>at
 org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:40)
 [junit4:junit4]   2>- locked
 org.apache.lucene.index.SerialMergeScheduler@4f20a4cb
 [junit4:junit4]   2>at
 org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1825)
 [junit4:junit4]   2>at
 org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1236)
 [junit4:junit4]   2>at
 org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1186)
 [junit4:junit4]   2>at
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:172)
 [junit4:junit4]   2>at
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:160)
 [junit4:junit4]   2>at

 org.apache.lucene.index.TestBagOfPositions$1.run(TestBagOfPositions.java:117)
 [junit4:junit4]   2>
 [junit4:junit4]   2>
 "TEST-TestBagOfPositions.test-seed#[E4E0F4496BBBE86F]" ID=414 WAITING on
 org.apache.lucene.index.TestBagOfPositions$1@36e5c19f
 [junit4:junit4]   2>at java.lang.Object.wait(Native Method)
 [junit4:junit4]   2>- waiting on
 org.apache.lucene.index.TestBagOfPositions$1@36e5c19f
 [junit4:junit4]   2>at java.lang.Thread.join(Thread.java:1203)
 [junit4:junit4]   2>at java.lang.Thread.join(Thread.java:1256)
 [junit4:junit4]   2>at
 org.apache.lucene.index.TestBagOfPositions.test(TestBagOfPositions.java:128)
 [junit4:junit4]   2>at
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit4:junit4]   2>at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 [junit4:junit4]   2>at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:

[jira] [Comment Edited] (SOLR-4123) ICUTokenizerFactory - per-script RBBI customization

2012-12-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508546#comment-13508546
 ] 

Uwe Schindler edited comment on SOLR-4123 at 12/3/12 7:46 AM:
--

The rule is simple: Never prepend /. 

For tests we added a special case and that may confuse here: 
ClasspathResourceLoader with a class in ctor param, there you can pass in the 
base class from which the packages are load. This is added to make writing 
tests easy: You can pass in a plain file name and it is loaded from the package 
of the corresponding class. This is to mimic what we always had in our tests: 
Loading local class resources:

{code}
// this will load the
new ClassPathResourceLoader(getClass()).openResource("file.txt");
{code}

Code like Solr uses FileSystemResourceLoader that wants relative path to the 
local working directory or uses the classloader, but thats for Solr and other 
applications like ElasticSearch. Tests should use 
ClasspathResourceLoader(getClass()) and only pass a file name fro their own 
package.

bq. Yes, we should remove this /-stuff!!!

We can do nothing here, the confusion is created by Java's API by itsself: If 
you call Class.getResource() without a path (only file name), it loads from 
same package as the class, if you prepend with "/" it uses the given path as 
full package name. In contrast, if you directly use the ClassLoader (not 
Class), you must give a full path, but without a /.

  was (Author: thetaphi):
The rule is simple: Never prepend /. 

For tests we added a special case and that may confuse here: 
ClasspathResourceLoader with a class in ctor param, there you can pass in the 
base class from which the packages are load.

{code}
// this will load the
new ClassPathResourceLoader(getClass()).openResource("file.txt");
{code}

bq. Yes, we should remove this /-stuff!!!

We can do nothing here, the confusion is created by Java's API by itsself: If 
you call Class.getResource() without a path (only file name), it loads from 
same package as the class, if you prepend with "/" it uses the given path as 
full package name. In contrast, if you directly use the ClassLoader (not 
Class), you must give a full path, but without a /.
  
> ICUTokenizerFactory - per-script RBBI customization
> ---
>
> Key: SOLR-4123
> URL: https://issues.apache.org/jira/browse/SOLR-4123
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Affects Versions: 4.0
>Reporter: Shawn Heisey
> Fix For: 4.1, 5.0
>
> Attachments: SOLR-4123.patch, SOLR-4123.patch, SOLR-4123.patch
>
>
> Initially this started out as an idea for a configuration knob on 
> ICUTokenizer that would allow me to tell it not to tokenize on punctuation.  
> Through IRC discussion on #lucene, it sorta ballooned.  The committers had a 
> long discussion about it that I don't really understand, so I'll be including 
> it in the comments.
> I am a Solr user, so I would also need the ability to access the 
> configuration from there, likely either in schema.xml or solrconfig.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-12-02 Thread Commit Tag Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508551#comment-13508551
 ] 

Commit Tag Bot commented on LUCENE-4575:


[branch_4x commit] Shai Erera
http://svn.apache.org/viewvc?view=revision&revision=1416367

LUCENE-4575: add IndexWriter.setCommitData



> Allow IndexWriter to commit, even just commitData
> -
>
> Key: LUCENE-4575
> URL: https://issues.apache.org/jira/browse/LUCENE-4575
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Minor
> Fix For: 4.1, 5.0
>
> Attachments: LUCENE-4575.patch, LUCENE-4575.patch, LUCENE-4575.patch, 
> LUCENE-4575-testcase.patch
>
>
> Spinoff from here 
> http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
> In some cases, it is valuable to be able to commit changes to the index, even 
> if the changes are just commitData. Such data is sometimes used by 
> applications to register in the index some global application 
> information/state.
> The proposal is:
> * Add a setCommitData() API and separate it from commit() and prepareCommit() 
> (simplify their API)
> * When that API is called, flip on the dirty/changes bit, so that this gets 
> committed even if no other changes were made to the index.
> I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-12-02 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-4575.


   Resolution: Fixed
Fix Version/s: 5.0
   4.1
 Assignee: Shai Erera
Lucene Fields: New,Patch Available  (was: New)

Committed to trunk and 4x. Thanks Mike !

> Allow IndexWriter to commit, even just commitData
> -
>
> Key: LUCENE-4575
> URL: https://issues.apache.org/jira/browse/LUCENE-4575
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Minor
> Fix For: 4.1, 5.0
>
> Attachments: LUCENE-4575.patch, LUCENE-4575.patch, LUCENE-4575.patch, 
> LUCENE-4575-testcase.patch
>
>
> Spinoff from here 
> http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
> In some cases, it is valuable to be able to commit changes to the index, even 
> if the changes are just commitData. Such data is sometimes used by 
> applications to register in the index some global application 
> information/state.
> The proposal is:
> * Add a setCommitData() API and separate it from commit() and prepareCommit() 
> (simplify their API)
> * When that API is called, flip on the dirty/changes bit, so that this gets 
> committed even if no other changes were made to the index.
> I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4123) ICUTokenizerFactory - per-script RBBI customization

2012-12-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508546#comment-13508546
 ] 

Uwe Schindler commented on SOLR-4123:
-

The rule is simple: Never prepend /. 

For tests we added a special case and that may confuse here: 
ClasspathResourceLoader with a class in ctor param, there you can pass in the 
base class from which the packages are load.

{code}
// this will load the
new ClassPathResourceLoader(getClass()).openResource("file.txt");
{code}

bq. Yes, we should remove this /-stuff!!!

We can do nothing here, the confusion is created by Java's API by itsself: If 
you call Class.getResource() without a path (only file name), it loads from 
same package as the class, if you prepend with "/" it uses the given path as 
full package name. In contrast, if you directly use the ClassLoader (not 
Class), you must give a full path, but without a /.

> ICUTokenizerFactory - per-script RBBI customization
> ---
>
> Key: SOLR-4123
> URL: https://issues.apache.org/jira/browse/SOLR-4123
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Affects Versions: 4.0
>Reporter: Shawn Heisey
> Fix For: 4.1, 5.0
>
> Attachments: SOLR-4123.patch, SOLR-4123.patch, SOLR-4123.patch
>
>
> Initially this started out as an idea for a configuration knob on 
> ICUTokenizer that would allow me to tell it not to tokenize on punctuation.  
> Through IRC discussion on #lucene, it sorta ballooned.  The committers had a 
> long discussion about it that I don't really understand, so I'll be including 
> it in the comments.
> I am a Solr user, so I would also need the ability to access the 
> configuration from there, likely either in schema.xml or solrconfig.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-12-02 Thread Commit Tag Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508524#comment-13508524
 ] 

Commit Tag Bot commented on LUCENE-4575:


[trunk commit] Shai Erera
http://svn.apache.org/viewvc?view=revision&revision=1416361

LUCENE-4575: add IndexWriter.setCommitData



> Allow IndexWriter to commit, even just commitData
> -
>
> Key: LUCENE-4575
> URL: https://issues.apache.org/jira/browse/LUCENE-4575
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4575.patch, LUCENE-4575.patch, LUCENE-4575.patch, 
> LUCENE-4575-testcase.patch
>
>
> Spinoff from here 
> http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
> In some cases, it is valuable to be able to commit changes to the index, even 
> if the changes are just commitData. Such data is sometimes used by 
> applications to register in the index some global application 
> information/state.
> The proposal is:
> * Add a setCommitData() API and separate it from commit() and prepareCommit() 
> (simplify their API)
> * When that API is called, flip on the dirty/changes bit, so that this gets 
> committed even if no other changes were made to the index.
> I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4085) Commit-free ExternalFileField

2012-12-02 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508499#comment-13508499
 ] 

Mikhail Khludnev commented on SOLR-4085:


[~jpountz],
Your feedback is appreciated. To make this ticket even more valuable for 
community can we come through the particular points of confusing behavior, 
which you mention. Can you list them?

I also want to wait until [~romseygeek] leaves his feedback as a person who 
provided and interest to the subj.  

Thank you, guys

> Commit-free ExternalFileField
> -
>
> Key: SOLR-4085
> URL: https://issues.apache.org/jira/browse/SOLR-4085
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Affects Versions: 4.1
>Reporter: Mikhail Khludnev
>  Labels: externalfilefield
> Attachments: SOLR-4085.patch
>
>
> Let's reload ExternalFileFields without commit!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4123) ICUTokenizerFactory - per-script RBBI customization

2012-12-02 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508449#comment-13508449
 ] 

Robert Muir commented on SOLR-4123:
---

Hi, thanks for tackling this! you beat me to getting to the tests.

Yes, we should remove this /-stuff!!!

> ICUTokenizerFactory - per-script RBBI customization
> ---
>
> Key: SOLR-4123
> URL: https://issues.apache.org/jira/browse/SOLR-4123
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Affects Versions: 4.0
>Reporter: Shawn Heisey
> Fix For: 4.1, 5.0
>
> Attachments: SOLR-4123.patch, SOLR-4123.patch, SOLR-4123.patch
>
>
> Initially this started out as an idea for a configuration knob on 
> ICUTokenizer that would allow me to tell it not to tokenize on punctuation.  
> Through IRC discussion on #lucene, it sorta ballooned.  The committers had a 
> long discussion about it that I don't really understand, so I'll be including 
> it in the comments.
> I am a Solr user, so I would also need the ability to access the 
> configuration from there, likely either in schema.xml or solrconfig.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Pivot facets enhancements

2012-12-02 Thread Otis Gospodnetic

Hi Steve - don't be discouraged by the lack of the response here.  A better
place to ask is actually the user list.  I suspect the number of people
using pivot faceting is low, so again, don't be discouraged by possibly
weak response.

> We first need to distribute that request, for which I've seen and locally
applied the existing patch and it seems to work ok for our needs.

You may want to give that JIRA issue a vote and comment saying it worked
for you + any feedback you may have.

> First, for any field in the facet pivot, I've added the possibility to
add a query (through f.field.facet.povot.query) that would compute
> the the count for the intersection between the query and the document set
matching a particular value.

Maybe you can provide an example in your email to user ML to help people
understand if this would be useful to them or not.

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Fri, Nov 23, 2012 at 6:07 PM, Steve Molloy  wrote:

> Hi, I'm currently working on a project based on solr 4.0 which relies on
> facet pivot to populate a treemap visualization. We've been able to get
> something in place, but will now need to move further. We first need to
> distribute that request, for which I've seen and locally applied the
> existing patch and it seems to work ok for our needs. But while in the
> code, I also added 2 features that will be helpful for us and which we
> would be willing to contribute back if it makes sense.
>
> So before sending any code (which I need to cleanup anyhow), I'll describe
> the changes.
>
> First, for any field in the facet pivot, I've added the possibility to add
> a query (through f.field.facet.povot.query) that would compute the the
> count for the intersection between the query and the document set matching
> a particular value. We're planning on using this to produce the count for
> the overlay coloring in the treemap. It seems to work fine and is actually
> more efficient than having a third level of pivot.
>
> The second thing is a deduplication flag. This is mostly for our case
> where the second level is a document path, which is stored using the
> pathhierarchytokeniser. So to avoid documents from being counted for every
> folder in its path (which we do want at query time) but not have to store a
> separate field (to reduce index size).
>
> So, if these features are of interest, I will send more details and code
> once I've cleaned it up.
>
> Thanks,
>
> Steve
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

[jira] [Updated] (SOLR-4123) ICUTokenizerFactory - per-script RBBI customization

2012-12-02 Thread Steven Rowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated SOLR-4123:
--

Attachment: SOLR-4123.patch

bq. I don't understand {{Object.getClass().getResourceAsStream()}}, which is 
delegated to by {{ResourceLoader.loadResource()}} - even resources in the same 
package as the Object can't be found???  By contrast, 
{{Object.getClass().getClassLoader().getResourceAsStream()}} succeeds in 
finding resources without first prepending a {{"/"}}.  The 
{{ClasspathResourceLoader}} ctor that allows direct specification of the 
{{ClassLoader}} separately from the {{Class}} has private access, though.

Hmm, I just retried removing the package from the path for resource that it's 
in the same package as the test class, and it now works (why did I think it 
didn't?  I thought I tried that...).  Modified patch attached.

So I guess {{getClass().getResourceAsStream()}} makes sense: it only searches 
the same package as the class unless you prepend a {{"/"}}.  Should I leave in 
the {{"/"}}-prepending fallback?

> ICUTokenizerFactory - per-script RBBI customization
> ---
>
> Key: SOLR-4123
> URL: https://issues.apache.org/jira/browse/SOLR-4123
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Affects Versions: 4.0
>Reporter: Shawn Heisey
> Fix For: 4.1, 5.0
>
> Attachments: SOLR-4123.patch, SOLR-4123.patch, SOLR-4123.patch
>
>
> Initially this started out as an idea for a configuration knob on 
> ICUTokenizer that would allow me to tell it not to tokenize on punctuation.  
> Through IRC discussion on #lucene, it sorta ballooned.  The committers had a 
> long discussion about it that I don't really understand, so I'll be including 
> it in the comments.
> I am a Solr user, so I would also need the ability to access the 
> configuration from there, likely either in schema.xml or solrconfig.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4123) ICUTokenizerFactory - per-script RBBI customization

2012-12-02 Thread Steven Rowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated SOLR-4123:
--

Attachment: SOLR-4123.patch

Patch with more tests and example {{.rbbi}} files.

I changed the {{rulefiles="..."}} arg format to relax allowable resource names 
& locations, e.g. {{rulefiles="Latn:, ..."}}.

I added some logic to {{ICUTokenizerFactory.parseRules()}} to retry when 
{{ResourceLoader.loadResource()}} fails, after first prepending a {{"/"}} to 
the resource path, because none of the test resources under 
{{lucene/analysis/icu/src/test-files/}}, which is on the {{test.classpath}}, 
were found.  I don't understand {{Object.getClass().getResourceAsStream()}}, 
which is delegated to by {{ResourceLoader.loadResource()}} - even resources in 
the same package as the Object can't be found???  By contrast, 
{{Object.getClass().getClassLoader().getResourceAsStream()}} succeeds in 
finding resources without first prepending a {{"/"}}.  The 
{{ClasspathResourceLoader}} ctor that allows direct specification of the 
{{ClassLoader}} separately from the {{Class}} has private access, though.

> ICUTokenizerFactory - per-script RBBI customization
> ---
>
> Key: SOLR-4123
> URL: https://issues.apache.org/jira/browse/SOLR-4123
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Affects Versions: 4.0
>Reporter: Shawn Heisey
> Fix For: 4.1, 5.0
>
> Attachments: SOLR-4123.patch, SOLR-4123.patch
>
>
> Initially this started out as an idea for a configuration knob on 
> ICUTokenizer that would allow me to tell it not to tokenize on punctuation.  
> Through IRC discussion on #lucene, it sorta ballooned.  The committers had a 
> long discussion about it that I don't really understand, so I'll be including 
> it in the comments.
> I am a Solr user, so I would also need the ability to access the 
> configuration from there, likely either in schema.xml or solrconfig.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-4085) Commit-free ExternalFileField

2012-12-02 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved SOLR-4085.


Resolution: Won't Fix

> Commit-free ExternalFileField
> -
>
> Key: SOLR-4085
> URL: https://issues.apache.org/jira/browse/SOLR-4085
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Affects Versions: 4.1
>Reporter: Mikhail Khludnev
>  Labels: externalfilefield
> Attachments: SOLR-4085.patch
>
>
> Let's reload ExternalFileFields without commit!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4584) Compare the LZ4 implementation in Lucene against the original impl

2012-12-02 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-4584.
--

Resolution: Won't Fix

Comparing the compressed output against the original impl seemed to be a good 
mean to detect bugs, but if we want to be able to have a different algorithm as 
Uwe suggests, I'll try to add softer tests (like checking that the algorithm 
manages to detect a match which is 65535 bytes backwards, gives a reasonable 
compression ratio on inputs that are known to be easily compressible, etc.)

> Compare the LZ4 implementation in Lucene against the original impl
> --
>
> Key: LUCENE-4584
> URL: https://issues.apache.org/jira/browse/LUCENE-4584
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
> Fix For: 4.1
>
>
> We should add tests to make sure that the LZ4 impl in Lucene compresses data 
> the exact same way as the original impl.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.6.0_37) - Build # 3016 - Failure!

2012-12-02 Thread Policeman Jenkins Server

Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux/3016/
Java: 64bit/jdk1.6.0_37 -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 24159 lines...]
BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:60: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build.xml:284: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/common-build.xml:1526: 
The following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/common-build.xml:1560: 
Compile failed; see the compiler error output for details.

Total time: 31 minutes 16 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Description set: Java: 64bit/jdk1.6.0_37 -XX:+UseSerialGC
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Windows (32bit/jdk1.6.0_37) - Build # 2023 - Failure!

2012-12-02 Thread Policeman Jenkins Server

Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/2023/
Java: 32bit/jdk1.6.0_37 -client -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 24165 lines...]
BUILD FAILED
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\build.xml:60: The 
following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\lucene\build.xml:284: 
The following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\lucene\common-build.xml:1526:
 The following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\lucene\common-build.xml:1560:
 Compile failed; see the compiler error output for details.

Total time: 57 minutes 9 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Description set: Java: 32bit/jdk1.6.0_37 -client -XX:+UseParallelGC
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4136) SolrCloud bugs when servlet context contains "/" or "_"

2012-12-02 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-4136:
---

Attachment: SOLR-4136.patch


Context...

* 
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201211.mbox/%3Calpine.DEB.2.02.1211292004430.2543@frisbee%3E
* 
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201211.mbox/%3c551c5e62-0520-42a2-bf71-165fda360...@gmail.com%3E

Mark's suggestion in that email regarding my original question (about 
prohibiting "/" in nodeNames) was that zkcontroller should replace them "/" 
with "_" -- but that would cause potential collisions between contexts like 
"/foo/solr" and "/foo_solr", so i think using something like URLEncoding makes 
more sense (and shouldn't impact existing ZK cluster state data for most 
existing users)

The attached patch enhances the test base classes to allow for randomized 
hostContext values, and then uses this URLEncoding logic in ZKController to 
build nodeNames -- and in most cases seems to work.  But thinking about "_" in 
paths got me paranoid about explicitly testing that which is how I discovered 
the crufty logic in OverseerCollectionProcessor.  (NOTE: you can see the 
obvious OverseerCollectionProcessor errors trying to talk to the wrong URL in 
the test logs, and they seem to explain the subsequent test failure message, 
but it's also possible there is a subsequent problem i haven't noticed yet)

I haven't dug into this part of the code/problem very much yet, but i *think* 
the right fix here is to clean this up this code so that intead of making 
assumptions about the node name, is uses the clusterstate to lookup the 
base_url from the nodeName.

Logged error (repeated for multiple shards)...

{noformat}
[junit4:junit4]   2> 204647 T33 oasc.SolrException.log SEVERE Collection 
createcollection of awholynewcollection_1 failed
[junit4:junit4]   2> 204686 T31 oasc.DistributedQueue$LatchChildWatcher.process 
Watcher fired on path: /overseer/collection-queue-work state: SyncConnected 
type NodeChildrenChanged
[junit4:junit4]   2> 204688 T33 
oasc.OverseerCollectionProcessor.createCollection Create collection 
awholynewcollection_2 on [127.0.0.1:57855_randctxmqvf%2F_ay, 
127.0.0.1:37463_randctxmqvf%2F_ay]
[junit4:junit4]   2> 204691 T33 
oasc.OverseerCollectionProcessor.createCollection SEVERE Error talking to 
shard: 127.0.0.1:37463/randctxmqvf%2F/ay org.apache.solr.common.SolrException: 
Server at http://127.0.0.1:37463/randctxmqvf%2F/ay returned non ok status:404, 
message:Not Found
[junit4:junit4]   2>at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
[junit4:junit4]   2>at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
[junit4:junit4]   2>at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
[junit4:junit4]   2>at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
[junit4:junit4]   2>at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
{noformat}

Final test failure message...

{noformat}
   
  java.lang.AssertionError: Could not find new 2 
slice collection called awholynewcollection_0
at 
__randomizedtesting.SeedInfo.seed([1BD856523B97C07C:9A3ED84A4CC8A040]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.cloud.BasicDistributedZkTest.checkForCollection(BasicDistributedZkTest.java:1053)
at 
org.apache.solr.cloud.BasicDistributedZkTest.testCollectionsAPI(BasicDistributedZkTest.java:768)
at 
org.apache.solr.cloud.BasicDistributedZkTest.doTest(BasicDistributedZkTest.java:361)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:712)
{noformat}



> SolrCloud bugs when servlet context contains "/" or "_"
> ---
>
> Key: SOLR-4136
> URL: https://issues.apache.org/jira/browse/SOLR-4136
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
>Reporter: Hoss Man
>Assignee: Hoss Man
> Attachments: SOLR-4136.patch
>
>
> SolrCloud does not work properly with non-trivial values for "hostContext" 
> (ie: the servlet context path).  In particular...
> * Using a hostContext containing a  "/" (ie: a servlet context with a subdir 
> path, semi-common among people who organize webapps hierarchically for lod 
> blanacer rules) is explicitly forbidden in ZkController because of how the 
> hostContext is used to build a ZK nodeName
> * Using a hostContext containing a "_" causes problems in 
> OverseerCollectionProcessor where it assumes all "_" characters should be 
> converted to "/" to reconstitute a URL from nodeName (NOTE: this code 
> specifically has a TODO to fix this,

[jira] [Updated] (SOLR-4136) SolrCloud bugs when servlet context contains "/" or "_"

2012-12-02 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-4136:
---

Description: 
SolrCloud does not work properly with non-trivial values for "hostContext" (ie: 
the servlet context path).  In particular...

* Using a hostContext containing a  "/" (ie: a servlet context with a subdir 
path, semi-common among people who organize webapps hierarchically for lod 
blanacer rules) is explicitly forbidden in ZkController because of how the 
hostContext is used to build a ZK nodeName
* Using a hostContext containing a "\_" causes problems in 
OverseerCollectionProcessor where it assumes all "\_" characters should be 
converted to "/" to reconstitute a URL from nodeName (NOTE: this code 
specifically has a TODO to fix this, and then has a subsequent TODO about 
assuming "http://"; labeled "this sucks")



  was:
SolrCloud does not work properly with non-trivial values for "hostContext" (ie: 
the servlet context path).  In particular...

* Using a hostContext containing a  "/" (ie: a servlet context with a subdir 
path, semi-common among people who organize webapps hierarchically for lod 
blanacer rules) is explicitly forbidden in ZkController because of how the 
hostContext is used to build a ZK nodeName
* Using a hostContext containing a "_" causes problems in 
OverseerCollectionProcessor where it assumes all "_" characters should be 
converted to "/" to reconstitute a URL from nodeName (NOTE: this code 
specifically has a TODO to fix this, and then has a subsequent TODO about 
assuming "http://"; labeled "this sucks")




> SolrCloud bugs when servlet context contains "/" or "_"
> ---
>
> Key: SOLR-4136
> URL: https://issues.apache.org/jira/browse/SOLR-4136
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
>Reporter: Hoss Man
>Assignee: Hoss Man
> Attachments: SOLR-4136.patch
>
>
> SolrCloud does not work properly with non-trivial values for "hostContext" 
> (ie: the servlet context path).  In particular...
> * Using a hostContext containing a  "/" (ie: a servlet context with a subdir 
> path, semi-common among people who organize webapps hierarchically for lod 
> blanacer rules) is explicitly forbidden in ZkController because of how the 
> hostContext is used to build a ZK nodeName
> * Using a hostContext containing a "\_" causes problems in 
> OverseerCollectionProcessor where it assumes all "\_" characters should be 
> converted to "/" to reconstitute a URL from nodeName (NOTE: this code 
> specifically has a TODO to fix this, and then has a subsequent TODO about 
> assuming "http://"; labeled "this sucks")

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4136) SolrCloud bugs when servlet context contains "/" or "_"

2012-12-02 Thread Hoss Man (JIRA)

Hoss Man created SOLR-4136:
--

 Summary: SolrCloud bugs when servlet context contains "/" or "_"
 Key: SOLR-4136
 URL: https://issues.apache.org/jira/browse/SOLR-4136
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: SOLR-4136.patch

SolrCloud does not work properly with non-trivial values for "hostContext" (ie: 
the servlet context path).  In particular...

* Using a hostContext containing a  "/" (ie: a servlet context with a subdir 
path, semi-common among people who organize webapps hierarchically for lod 
blanacer rules) is explicitly forbidden in ZkController because of how the 
hostContext is used to build a ZK nodeName
* Using a hostContext containing a "_" causes problems in 
OverseerCollectionProcessor where it assumes all "_" characters should be 
converted to "/" to reconstitute a URL from nodeName (NOTE: this code 
specifically has a TODO to fix this, and then has a subsequent TODO about 
assuming "http://"; labeled "this sucks")



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_09) - Build # 3026 - Failure!

2012-12-02 Thread Michael McCandless

Woops, I committed a fix ...

Mike McCandless

http://blog.mikemccandless.com


On Sun, Dec 2, 2012 at 5:53 PM, Policeman Jenkins Server
 wrote:
> Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/3026/
> Java: 32bit/jdk1.7.0_09 -server -XX:+UseParallelGC
>
> All tests passed
>
> Build Log:
> [...truncated 25027 lines...]
> BUILD FAILED
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:60: The 
> following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:285: The 
> following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1526:
>  The following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1560:
>  Compile failed; see the compiler error output for details.
>
> Total time: 28 minutes 52 seconds
> Build step 'Invoke Ant' marked build as failure
> Archiving artifacts
> Recording test results
> Description set: Java: 32bit/jdk1.7.0_09 -server -XX:+UseParallelGC
> Email was triggered for: Failure
> Sending email for trigger: Failure
>
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_09) - Build # 3026 - Failure!

2012-12-02 Thread Policeman Jenkins Server

Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/3026/
Java: 32bit/jdk1.7.0_09 -server -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 25027 lines...]
BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:60: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:285: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1526:
 The following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1560:
 Compile failed; see the compiler error output for details.

Total time: 28 minutes 52 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Description set: Java: 32bit/jdk1.7.0_09 -server -XX:+UseParallelGC
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 112 - Failure

2012-12-02 Thread Michael McCandless

You're welcome!

I committed a fix to cut back on the index size when MockRandomMP is
used, because this MP is O(N^2) cost!

Mike McCandless

http://blog.mikemccandless.com

On Sun, Dec 2, 2012 at 5:00 PM, Uwe Schindler  wrote:
> Thanks!
>
> Uwe
>
>
>
> Michael McCandless  schrieb:
>>
>> I'll dig ...
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Sun, Dec 2, 2012 at 3:10 PM, Apache Jenkins Server
>>  wrote:
>>>
>>> Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/112/
>>>
>>> 2 tests failed.
>>> REGRESSION:  org.apache.lucene.index.TestBagOfPositions.test
>>>
>>> Error Message:
>>> Test abandoned because suite timeout was reached.
>>>
>>> Stack Trace:
>>> java.lang.Exception: Test abandoned because suite timeout was reached.
>>> at __randomizedtesting.SeedInfo.seed([E4E0F4496BBBE86F]:0)
>>>
>>>
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.lucene.index.TestBagOfPositions
>>>
>>> Error Message:
>>> Suite timeout exceeded (>= 720 msec).
>>>
>>> Stack Trace:
>>> java.lang.Exception: Suite timeout exceeded (>= 720 msec).
>>> at __randomizedtesting.SeedInfo.seed([E4E0F4496BBBE86F]:0)
>>>
>>>
>>>
>>>
>>> Build Log:
>>> [...truncated 1360 lines...]
>>> [junit4:junit4] Suite: org.apache.lucene.index.TestBagOfPositions
>>> [junit4:junit4]   2> 2012-12-3 2:05:06
>>> com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate
>>> [junit4:junit4]   2> WARNING: Suite execution timed out:
>>> org.apache.lucene.index.TestBagOfPositions
>>> [junit4:junit4]   2>  jstack at approximately timeout time 
>>> [junit4:junit4]   2> "Thread-319" ID=415 RUNNABLE
>>> [junit4:junit4]   2>at
>>> org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:118)
>>> [junit4:junit4]
>>>  2>at
>>> org.apache.lucene.store.MockIndexOutputWrapper.writeByte(MockIndexOutputWrapper.java:73)
>>> [junit4:junit4]   2>at
>>> org.apache.lucene.store.DataOutput.writeInt(DataOutput.java:70)
>>> [junit4:junit4]   2>at
>>> org.apache.lucene.store.DataOutput.writeLong(DataOutput.java:205)
>>> [junit4:junit4]   2>at
>>> org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.addRawDocuments(Lucene40TermVectorsWriter.java:298)
>>> [junit4:junit4]   2>at
>>> org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.copyVectorsNoDeletions(Lucene40TermVectorsWriter.java:407)
>>> [junit4:junit4]   2>at
>>> org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.merge(Lucene40TermVectorsWriter.java:330)
>>> [junit4:junit4]   2>at
>>> org.apache.lucene.index.SegmentMerger.mergeVectors(SegmentMerger.java:261)
>>> [junit4:junit4]   2>at
>>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:115)
>>> [junit4:junit4]   2>at
>>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3682)
>>> [junit4:junit4]   2>at
>>> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3288)
>>> [junit4:junit4]   2>at
>>> org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:40)
>>> [junit4:junit4]   2>- locked
>>> org.apache.lucene.index.SerialMergeScheduler@4f20a4cb
>>> [junit4:junit4]   2>at
>>> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1825)
>>> [junit4:junit4]   2>at
>>> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1236)
>>> [junit4:junit4]   2>at
>>> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1186)
>>> [junit4:junit4]   2>at
>>> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:172)
>>> [junit4:junit4]   2>at
>>> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:160)
>>> [junit4:junit4]   2>at
>>>
>>> org.apache.lucene.index.TestBagOfPositions$1.run(TestBagOfPositions.java:117)
>>> [junit4:junit4]   2>
>>> [junit4:junit4]   2>
>>> "TEST-TestBagOfPositions.test-seed#[E4E0F4496BBBE86F]" ID=414 WAITING on
>>> org.apache.lucene.index.TestBagOfPositions$1@36e5c19f
>>> [junit4:junit4]   2>at java.lang.Object.wait(Native Method)
>>> [junit4:junit4]   2>- waiting on
>>> org.apache.lucene.index.TestBagOfPositions$1@36e5c19f
>>> [junit4:junit4]   2>at java.lang.Thread.join(Thread.java:1203)
>>> [junit4:junit4]   2>at java.lang.Thread.join(Thread.java:1256)
>>> [junit4:junit4]   2>at
>>> org.apache.lucene.index.TestBagOfPositions.test(TestBagOfPositions.java:128)
>>> [junit4:junit4]   2>at
>>> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> [junit4:junit4]   2>at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> [junit4:junit4]   2>at
>>>
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> [junit4:junit4]   2>at
>>> java.lang.reflect.Method.invoke(Method.java:616)
>>> [junit4:junit4]   2>at
>>> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
>>> [junit4:junit4]   2>at
>

[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-12-02 Thread Per Steffensen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508382#comment-13508382
 ] 

Per Steffensen commented on SOLR-4114:
--

Hope you will commit, and consider backporting to 4.x, since we expect to 
upgrade to 4.1 when it is released, and we would really like this feature to be 
included.

> Collection API: Allow multiple shards from one collection on the same Solr 
> server
> -
>
> Key: SOLR-4114
> URL: https://issues.apache.org/jira/browse/SOLR-4114
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore, SolrCloud
>Affects Versions: 4.0
> Environment: Solr 4.0.0 release
>Reporter: Per Steffensen
>Assignee: Per Steffensen
>  Labels: collection-api, multicore, shard, shard-allocation
> Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, 
> SOLR-4114_trunk.patch
>
>
> We should support running multiple shards from one collection on the same 
> Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
> (each Solr server running 2 shards).
> Performance tests at our side has shown that this is a good idea, and it is 
> also a good idea for easy elasticity later on - it is much easier to move an 
> entire existing shards from one Solr server to another one that just joined 
> the cluter than it is to split an exsiting shard among the Solr that used to 
> run it and the new Solr.
> See dev mailing list discussion "Multiple shards for one collection on the 
> same Solr server"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-12-02 Thread Per Steffensen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508379#comment-13508379
 ] 

Per Steffensen edited comment on SOLR-4114 at 12/2/12 10:04 PM:


Here is the patch for trunk (5.x). The main mistake was that you didnt used the 
calculated shardName as the shardName - instead you used collectionName. This 
caused different shards on the same node to share name and data-dir - not so 
cool :-)

  was (Author: steff1193):
Here is the patch for trunk (5.x). The main mistake was the you didnt used 
the calculated shardName as the shardName - instead you used collectionName. 
This caused different shards on the same node to shard name and data-dir - not 
so cool :-)
  
> Collection API: Allow multiple shards from one collection on the same Solr 
> server
> -
>
> Key: SOLR-4114
> URL: https://issues.apache.org/jira/browse/SOLR-4114
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore, SolrCloud
>Affects Versions: 4.0
> Environment: Solr 4.0.0 release
>Reporter: Per Steffensen
>Assignee: Per Steffensen
>  Labels: collection-api, multicore, shard, shard-allocation
> Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, 
> SOLR-4114_trunk.patch
>
>
> We should support running multiple shards from one collection on the same 
> Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
> (each Solr server running 2 shards).
> Performance tests at our side has shown that this is a good idea, and it is 
> also a good idea for easy elasticity later on - it is much easier to move an 
> entire existing shards from one Solr server to another one that just joined 
> the cluter than it is to split an exsiting shard among the Solr that used to 
> run it and the new Solr.
> See dev mailing list discussion "Multiple shards for one collection on the 
> same Solr server"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-12-02 Thread Per Steffensen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Per Steffensen updated SOLR-4114:
-

Attachment: SOLR-4114_trunk.patch

Here is the patch for trunk (5.x). The main mistake was the you didnt used the 
calculated shardName as the shardName - instead you used collectionName. This 
caused different shards on the same node to shard name and data-dir - not so 
cool :-)

> Collection API: Allow multiple shards from one collection on the same Solr 
> server
> -
>
> Key: SOLR-4114
> URL: https://issues.apache.org/jira/browse/SOLR-4114
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore, SolrCloud
>Affects Versions: 4.0
> Environment: Solr 4.0.0 release
>Reporter: Per Steffensen
>Assignee: Per Steffensen
>  Labels: collection-api, multicore, shard, shard-allocation
> Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, 
> SOLR-4114_trunk.patch
>
>
> We should support running multiple shards from one collection on the same 
> Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
> (each Solr server running 2 shards).
> Performance tests at our side has shown that this is a good idea, and it is 
> also a good idea for easy elasticity later on - it is much easier to move an 
> entire existing shards from one Solr server to another one that just joined 
> the cluter than it is to split an exsiting shard among the Solr that used to 
> run it and the new Solr.
> See dev mailing list discussion "Multiple shards for one collection on the 
> same Solr server"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 112 - Failure

2012-12-02 Thread Uwe Schindler

Thanks! 

Uwe



Michael McCandless  schrieb:

>I'll dig ...
>
>Mike McCandless
>
>http://blog.mikemccandless.com
>
>On Sun, Dec 2, 2012 at 3:10 PM, Apache Jenkins Server
> wrote:
>> Build:
>https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/112/
>>
>> 2 tests failed.
>> REGRESSION:  org.apache.lucene.index.TestBagOfPositions.test
>>
>> Error Message:
>> Test abandoned because suite timeout was reached.
>>
>> Stack Trace:
>> java.lang.Exception: Test abandoned because suite timeout was
>reached.
>> at __randomizedtesting.SeedInfo.seed([E4E0F4496BBBE86F]:0)
>>
>>
>> FAILED: 
>junit.framework.TestSuite.org.apache.lucene.index.TestBagOfPositions
>>
>> Error Message:
>> Suite timeout exceeded (>= 720 msec).
>>
>> Stack Trace:
>> java.lang.Exception: Suite timeout exceeded (>= 720 msec).
>> at __randomizedtesting.SeedInfo.seed([E4E0F4496BBBE86F]:0)
>>
>>
>>
>>
>> Build Log:
>> [...truncated 1360 lines...]
>> [junit4:junit4] Suite: org.apache.lucene.index.TestBagOfPositions
>> [junit4:junit4]   2> 2012-12-3 2:05:06
>com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate
>> [junit4:junit4]   2> WARNING: Suite execution timed out:
>org.apache.lucene.index.TestBagOfPositions
>> [junit4:junit4]   2>  jstack at approximately timeout time 
>> [junit4:junit4]   2> "Thread-319" ID=415 RUNNABLE
>> [junit4:junit4]   2>at
>org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:118)
>> [junit4:junit4]   2>at
>org.apache.lucene.store.MockIndexOutputWrapper.writeByte(MockIndexOutputWrapper.java:73)
>> [junit4:junit4]   2>at
>org.apache.lucene.store.DataOutput.writeInt(DataOutput.java:70)
>> [junit4:junit4]   2>at
>org.apache.lucene.store.DataOutput.writeLong(DataOutput.java:205)
>> [junit4:junit4]   2>at
>org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.addRawDocuments(Lucene40TermVectorsWriter.java:298)
>> [junit4:junit4]   2>at
>org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.copyVectorsNoDeletions(Lucene40TermVectorsWriter.java:407)
>> [junit4:junit4]   2>at
>org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.merge(Lucene40TermVectorsWriter.java:330)
>> [junit4:junit4]   2>at
>org.apache.lucene.index.SegmentMerger.mergeVectors(SegmentMerger.java:261)
>> [junit4:junit4]   2>at
>org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:115)
>> [junit4:junit4]   2>at
>org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3682)
>> [junit4:junit4]   2>at
>org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3288)
>> [junit4:junit4]   2>at
>org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:40)
>> [junit4:junit4]   2>- locked
>org.apache.lucene.index.SerialMergeScheduler@4f20a4cb
>> [junit4:junit4]   2>at
>org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1825)
>> [junit4:junit4]   2>at
>org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1236)
>> [junit4:junit4]   2>at
>org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1186)
>> [junit4:junit4]   2>at
>org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:172)
>> [junit4:junit4]   2>at
>org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:160)
>> [junit4:junit4]   2>at
>org.apache.lucene.index.TestBagOfPositions$1.run(TestBagOfPositions.java:117)
>> [junit4:junit4]   2>
>> [junit4:junit4]   2>
>"TEST-TestBagOfPositions.test-seed#[E4E0F4496BBBE86F]" ID=414 WAITING
>on org.apache.lucene.index.TestBagOfPositions$1@36e5c19f
>> [junit4:junit4]   2>at java.lang.Object.wait(Native Method)
>> [junit4:junit4]   2>- waiting on
>org.apache.lucene.index.TestBagOfPositions$1@36e5c19f
>> [junit4:junit4]   2>at java.lang.Thread.join(Thread.java:1203)
>> [junit4:junit4]   2>at java.lang.Thread.join(Thread.java:1256)
>> [junit4:junit4]   2>at
>org.apache.lucene.index.TestBagOfPositions.test(TestBagOfPositions.java:128)
>> [junit4:junit4]   2>at
>sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> [junit4:junit4]   2>at
>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> [junit4:junit4]   2>at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> [junit4:junit4]   2>at
>java.lang.reflect.Method.invoke(Method.java:616)
>> [junit4:junit4]   2>at
>com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
>> [junit4:junit4]   2>at
>com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
>> [junit4:junit4]   2>at
>com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
>> [junit4:junit4]   2>at
>com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
>> [junit4:junit4]   2>at
>com.carrotsearch.randomizedtesting.Randomiz

[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-12-02 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508375#comment-13508375
 ] 

Michael McCandless commented on LUCENE-4575:


+1, looks great.  Thanks Shai!

> Allow IndexWriter to commit, even just commitData
> -
>
> Key: LUCENE-4575
> URL: https://issues.apache.org/jira/browse/LUCENE-4575
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4575.patch, LUCENE-4575.patch, LUCENE-4575.patch, 
> LUCENE-4575-testcase.patch
>
>
> Spinoff from here 
> http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
> In some cases, it is valuable to be able to commit changes to the index, even 
> if the changes are just commitData. Such data is sometimes used by 
> applications to register in the index some global application 
> information/state.
> The proposal is:
> * Add a setCommitData() API and separate it from commit() and prepareCommit() 
> (simplify their API)
> * When that API is called, flip on the dirty/changes bit, so that this gets 
> committed even if no other changes were made to the index.
> I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-12-02 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4575:
---

Attachment: LUCENE-4575.patch

Patch addresses the bug that Mike reported and adds a test for it. Also adds 
IW.getCommitData().

> Allow IndexWriter to commit, even just commitData
> -
>
> Key: LUCENE-4575
> URL: https://issues.apache.org/jira/browse/LUCENE-4575
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4575.patch, LUCENE-4575.patch, LUCENE-4575.patch, 
> LUCENE-4575-testcase.patch
>
>
> Spinoff from here 
> http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
> In some cases, it is valuable to be able to commit changes to the index, even 
> if the changes are just commitData. Such data is sometimes used by 
> applications to register in the index some global application 
> information/state.
> The proposal is:
> * Add a setCommitData() API and separate it from commit() and prepareCommit() 
> (simplify their API)
> * When that API is called, flip on the dirty/changes bit, so that this gets 
> committed even if no other changes were made to the index.
> I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 112 - Failure

2012-12-02 Thread Michael McCandless

I'll dig ...

Mike McCandless

http://blog.mikemccandless.com

On Sun, Dec 2, 2012 at 3:10 PM, Apache Jenkins Server
 wrote:
> Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/112/
>
> 2 tests failed.
> REGRESSION:  org.apache.lucene.index.TestBagOfPositions.test
>
> Error Message:
> Test abandoned because suite timeout was reached.
>
> Stack Trace:
> java.lang.Exception: Test abandoned because suite timeout was reached.
> at __randomizedtesting.SeedInfo.seed([E4E0F4496BBBE86F]:0)
>
>
> FAILED:  junit.framework.TestSuite.org.apache.lucene.index.TestBagOfPositions
>
> Error Message:
> Suite timeout exceeded (>= 720 msec).
>
> Stack Trace:
> java.lang.Exception: Suite timeout exceeded (>= 720 msec).
> at __randomizedtesting.SeedInfo.seed([E4E0F4496BBBE86F]:0)
>
>
>
>
> Build Log:
> [...truncated 1360 lines...]
> [junit4:junit4] Suite: org.apache.lucene.index.TestBagOfPositions
> [junit4:junit4]   2> 2012-12-3 2:05:06 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate
> [junit4:junit4]   2> WARNING: Suite execution timed out: 
> org.apache.lucene.index.TestBagOfPositions
> [junit4:junit4]   2>  jstack at approximately timeout time 
> [junit4:junit4]   2> "Thread-319" ID=415 RUNNABLE
> [junit4:junit4]   2>at 
> org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:118)
> [junit4:junit4]   2>at 
> org.apache.lucene.store.MockIndexOutputWrapper.writeByte(MockIndexOutputWrapper.java:73)
> [junit4:junit4]   2>at 
> org.apache.lucene.store.DataOutput.writeInt(DataOutput.java:70)
> [junit4:junit4]   2>at 
> org.apache.lucene.store.DataOutput.writeLong(DataOutput.java:205)
> [junit4:junit4]   2>at 
> org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.addRawDocuments(Lucene40TermVectorsWriter.java:298)
> [junit4:junit4]   2>at 
> org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.copyVectorsNoDeletions(Lucene40TermVectorsWriter.java:407)
> [junit4:junit4]   2>at 
> org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.merge(Lucene40TermVectorsWriter.java:330)
> [junit4:junit4]   2>at 
> org.apache.lucene.index.SegmentMerger.mergeVectors(SegmentMerger.java:261)
> [junit4:junit4]   2>at 
> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:115)
> [junit4:junit4]   2>at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3682)
> [junit4:junit4]   2>at 
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3288)
> [junit4:junit4]   2>at 
> org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:40)
> [junit4:junit4]   2>- locked 
> org.apache.lucene.index.SerialMergeScheduler@4f20a4cb
> [junit4:junit4]   2>at 
> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1825)
> [junit4:junit4]   2>at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1236)
> [junit4:junit4]   2>at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1186)
> [junit4:junit4]   2>at 
> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:172)
> [junit4:junit4]   2>at 
> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:160)
> [junit4:junit4]   2>at 
> org.apache.lucene.index.TestBagOfPositions$1.run(TestBagOfPositions.java:117)
> [junit4:junit4]   2>
> [junit4:junit4]   2> "TEST-TestBagOfPositions.test-seed#[E4E0F4496BBBE86F]" 
> ID=414 WAITING on org.apache.lucene.index.TestBagOfPositions$1@36e5c19f
> [junit4:junit4]   2>at java.lang.Object.wait(Native Method)
> [junit4:junit4]   2>- waiting on 
> org.apache.lucene.index.TestBagOfPositions$1@36e5c19f
> [junit4:junit4]   2>at java.lang.Thread.join(Thread.java:1203)
> [junit4:junit4]   2>at java.lang.Thread.join(Thread.java:1256)
> [junit4:junit4]   2>at 
> org.apache.lucene.index.TestBagOfPositions.test(TestBagOfPositions.java:128)
> [junit4:junit4]   2>at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [junit4:junit4]   2>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> [junit4:junit4]   2>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [junit4:junit4]   2>at java.lang.reflect.Method.invoke(Method.java:616)
> [junit4:junit4]   2>at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
> [junit4:junit4]   2>at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
> [junit4:junit4]   2>at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
> [junit4:junit4]   2>at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
> [junit4:junit4]   2>at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
> [junit4:junit4

[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-12-02 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508363#comment-13508363
 ] 

Shai Erera commented on LUCENE-4575:


I don't think that we should add more work to finishCommit() either. Being able 
to setCommitData after prep() is just a bonus. It didn't work before, and it 
will continue to not work now. And I can't think of a good usecase for why an 
app would not be able to set commitData prior to prep(). If it comes up, we can 
discuss a solution again. At least we know that moving commitData write to 
finishCommit will solve it.

I'll make sure the test exposes the bug you reported in IW.finishCommit().

> Allow IndexWriter to commit, even just commitData
> -
>
> Key: LUCENE-4575
> URL: https://issues.apache.org/jira/browse/LUCENE-4575
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4575.patch, LUCENE-4575.patch, 
> LUCENE-4575-testcase.patch
>
>
> Spinoff from here 
> http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
> In some cases, it is valuable to be able to commit changes to the index, even 
> if the changes are just commitData. Such data is sometimes used by 
> applications to register in the index some global application 
> information/state.
> The proposal is:
> * Add a setCommitData() API and separate it from commit() and prepareCommit() 
> (simplify their API)
> * When that API is called, flip on the dirty/changes bit, so that this gets 
> committed even if no other changes were made to the index.
> I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-12-02 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508356#comment-13508356
 ] 

Michael McCandless commented on LUCENE-4575:


bq. Hmmm ... setting the commitData on pendingCommit cannot work, b/c the 
commitData is written to segnOutput on prepareCommit().

Oh yeah ... I forgot about that :)

Hmm ... I don't think we should move writing the commit data to finishCommit?  
Is it really so hard for the app to provide the commit data before calling 
prepareCommit?

> Allow IndexWriter to commit, even just commitData
> -
>
> Key: LUCENE-4575
> URL: https://issues.apache.org/jira/browse/LUCENE-4575
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4575.patch, LUCENE-4575.patch, 
> LUCENE-4575-testcase.patch
>
>
> Spinoff from here 
> http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
> In some cases, it is valuable to be able to commit changes to the index, even 
> if the changes are just commitData. Such data is sometimes used by 
> applications to register in the index some global application 
> information/state.
> The proposal is:
> * Add a setCommitData() API and separate it from commit() and prepareCommit() 
> (simplify their API)
> * When that API is called, flip on the dirty/changes bit, so that this gets 
> committed even if no other changes were made to the index.
> I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 112 - Failure

2012-12-02 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/112/

2 tests failed.
REGRESSION:  org.apache.lucene.index.TestBagOfPositions.test

Error Message:
Test abandoned because suite timeout was reached.

Stack Trace:
java.lang.Exception: Test abandoned because suite timeout was reached.
at __randomizedtesting.SeedInfo.seed([E4E0F4496BBBE86F]:0)


FAILED:  junit.framework.TestSuite.org.apache.lucene.index.TestBagOfPositions

Error Message:
Suite timeout exceeded (>= 720 msec).

Stack Trace:
java.lang.Exception: Suite timeout exceeded (>= 720 msec).
at __randomizedtesting.SeedInfo.seed([E4E0F4496BBBE86F]:0)




Build Log:
[...truncated 1360 lines...]
[junit4:junit4] Suite: org.apache.lucene.index.TestBagOfPositions
[junit4:junit4]   2> 2012-12-3 2:05:06 
com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate
[junit4:junit4]   2> WARNING: Suite execution timed out: 
org.apache.lucene.index.TestBagOfPositions
[junit4:junit4]   2>  jstack at approximately timeout time 
[junit4:junit4]   2> "Thread-319" ID=415 RUNNABLE
[junit4:junit4]   2>at 
org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:118)
[junit4:junit4]   2>at 
org.apache.lucene.store.MockIndexOutputWrapper.writeByte(MockIndexOutputWrapper.java:73)
[junit4:junit4]   2>at 
org.apache.lucene.store.DataOutput.writeInt(DataOutput.java:70)
[junit4:junit4]   2>at 
org.apache.lucene.store.DataOutput.writeLong(DataOutput.java:205)
[junit4:junit4]   2>at 
org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.addRawDocuments(Lucene40TermVectorsWriter.java:298)
[junit4:junit4]   2>at 
org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.copyVectorsNoDeletions(Lucene40TermVectorsWriter.java:407)
[junit4:junit4]   2>at 
org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.merge(Lucene40TermVectorsWriter.java:330)
[junit4:junit4]   2>at 
org.apache.lucene.index.SegmentMerger.mergeVectors(SegmentMerger.java:261)
[junit4:junit4]   2>at 
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:115)
[junit4:junit4]   2>at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3682)
[junit4:junit4]   2>at 
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3288)
[junit4:junit4]   2>at 
org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:40)
[junit4:junit4]   2>- locked 
org.apache.lucene.index.SerialMergeScheduler@4f20a4cb
[junit4:junit4]   2>at 
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1825)
[junit4:junit4]   2>at 
org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1236)
[junit4:junit4]   2>at 
org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1186)
[junit4:junit4]   2>at 
org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:172)
[junit4:junit4]   2>at 
org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:160)
[junit4:junit4]   2>at 
org.apache.lucene.index.TestBagOfPositions$1.run(TestBagOfPositions.java:117)
[junit4:junit4]   2> 
[junit4:junit4]   2> "TEST-TestBagOfPositions.test-seed#[E4E0F4496BBBE86F]" 
ID=414 WAITING on org.apache.lucene.index.TestBagOfPositions$1@36e5c19f
[junit4:junit4]   2>at java.lang.Object.wait(Native Method)
[junit4:junit4]   2>- waiting on 
org.apache.lucene.index.TestBagOfPositions$1@36e5c19f
[junit4:junit4]   2>at java.lang.Thread.join(Thread.java:1203)
[junit4:junit4]   2>at java.lang.Thread.join(Thread.java:1256)
[junit4:junit4]   2>at 
org.apache.lucene.index.TestBagOfPositions.test(TestBagOfPositions.java:128)
[junit4:junit4]   2>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
[junit4:junit4]   2>at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
[junit4:junit4]   2>at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[junit4:junit4]   2>at java.lang.reflect.Method.invoke(Method.java:616)
[junit4:junit4]   2>at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
[junit4:junit4]   2>at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
[junit4:junit4]   2>at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
[junit4:junit4]   2>at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
[junit4:junit4]   2>at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
[junit4:junit4]   2>at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
[junit4:junit4]   2>at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
[junit4:junit4]   2>at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.

[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-12-02 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508327#comment-13508327
 ] 

Shai Erera commented on LUCENE-4575:


Hmmm ... setting the commitData on pendingCommit cannot work, b/c the 
commitData is written to segnOutput on prepareCommit(). Following commit() 
merely calls infos.finishCommit() which writes the checksum and closes the 
output.

Can we modify segmentInfos.write() to not write the commitData, but move it to 
finishCommit()? Not sure that I like this approach, because it means that 
finishCommit() will do slightly more work, which increases the chance of 
getting an IOException during commit() after prepareCommit() successfully 
returned, but on the other hand it's the gains might be worth it? Being able to 
write commitData after you know all your document additions/deletions/updates 
are 'safe' might prove valuable. And finishCommit() already does I/O, writing 
checksum ...

What do you think?

> Allow IndexWriter to commit, even just commitData
> -
>
> Key: LUCENE-4575
> URL: https://issues.apache.org/jira/browse/LUCENE-4575
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4575.patch, LUCENE-4575.patch, 
> LUCENE-4575-testcase.patch
>
>
> Spinoff from here 
> http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
> In some cases, it is valuable to be able to commit changes to the index, even 
> if the changes are just commitData. Such data is sometimes used by 
> applications to register in the index some global application 
> information/state.
> The proposal is:
> * Add a setCommitData() API and separate it from commit() and prepareCommit() 
> (simplify their API)
> * When that API is called, flip on the dirty/changes bit, so that this gets 
> committed even if no other changes were made to the index.
> I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-12-02 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508319#comment-13508319
 ] 

Shai Erera commented on LUCENE-4575:


The test isn't exactly accurate, because it tests a scenario that is currently 
not supported. I.e., after calling prepareCommit(), nothing that you do on IW 
will be committed. Rather, to expose the bug it should be modified as follows:

{code}
iw.setCommitData(data1);
iw.prepareCommit();
iw.setCommitData(data2); // that will be ignored by follow-on commit
iw.commit();
checkCommitData(); // will see data1
iw.commit(); // that 'should' commit data2
checkCommitData(); // that will see data1 again, because of the copy that 
happens in finishCommit()
{code}

I'll modify the test like so and include it in my next patch.

> Allow IndexWriter to commit, even just commitData
> -
>
> Key: LUCENE-4575
> URL: https://issues.apache.org/jira/browse/LUCENE-4575
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4575.patch, LUCENE-4575.patch, 
> LUCENE-4575-testcase.patch
>
>
> Spinoff from here 
> http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
> In some cases, it is valuable to be able to commit changes to the index, even 
> if the changes are just commitData. Such data is sometimes used by 
> applications to register in the index some global application 
> information/state.
> The proposal is:
> * Add a setCommitData() API and separate it from commit() and prepareCommit() 
> (simplify their API)
> * When that API is called, flip on the dirty/changes bit, so that this gets 
> committed even if no other changes were made to the index.
> I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-02 Thread Commit Tag Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508307#comment-13508307
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[trunk commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revision&revision=1416216

SOLR-2592: refactor doc routers, use implicit router when implicity creating 
collection, use collection router to find correct shard when indexing



> Custom Hashing
> --
>
> Key: SOLR-2592
> URL: https://issues.apache.org/jira/browse/SOLR-2592
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Affects Versions: 4.0-ALPHA
>Reporter: Noble Paul
> Fix For: 4.1
>
> Attachments: dbq_fix.patch, pluggable_sharding.patch, 
> pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
> SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
> SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch
>
>
> If the data in a cloud can be partitioned on some criteria (say range, hash, 
> attribute value etc) It will be easy to narrow down the search to a smaller 
> subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4499) Multi-word synonym filter (synonym expansion)

2012-12-02 Thread Nolan Lawson (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508304#comment-13508304
 ] 

Nolan Lawson commented on LUCENE-4499:
--

@Robert: Thanks for the clarification.  I've corrected my blog post.

@Roman: Yes, I think it's a very common use case.  Especially considering that 
your query expander seems to be doing the same thing as ours!

My idea with the custom QueryParserPlugin was just to have a self-contained 
solution that didn't mess with the core Lucene/Solr logic too much.  And I 
think it's still configurable enough that it can handle your case-insensitivity 
tweaks (which I totally understand - "MIT" is not the same thing as "mit").  
You'd just have to have some pretty fancy XML in the "synonymAnalyzers" 
section. :)

> Multi-word synonym filter (synonym expansion)
> -
>
> Key: LUCENE-4499
> URL: https://issues.apache.org/jira/browse/LUCENE-4499
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Affects Versions: 4.1, 5.0
>Reporter: Roman Chyla
>Priority: Minor
>  Labels: analysis, multi-word, synonyms
> Fix For: 5.0
>
> Attachments: LUCENE-4499.patch
>
>
> I apologize for bringing the multi-token synonym expansion up again. There is 
> an old, unresolved issue at LUCENE-1622 [1]
> While solving the problem for our needs [2], I discovered that the current 
> SolrSynonym parser (and the wonderful FTS) have almost everything to 
> satisfactorily handle both the query and index time synonym expansion. It 
> seems that people often need to use the synonym filter *slightly* differently 
> at indexing and query time.
> In our case, we must do different things during indexing and querying.
> Example sentence: Mirrors of the Hubble space telescope pointed at XA5
> This is what we need (comma marks position bump):
> indexing: mirrors,hubble|hubble space 
> telescope|hst,space,telescope,pointed,xa5|astroobject#5
> querying: +mirrors +(hubble space telescope | hst) +pointed 
> +(xa5|astroboject#5)
> This translated to following needs:
>   indexing time: 
> single-token synonyms => return only synonyms
> multi-token synonyms => return original tokens *AND* the synonyms
>   query time:
> single-token: return only synonyms (but preserve case)
> multi-token: return only synonyms
>  
> We need the original tokens for the proximity queries, if we indexed 'hubble 
> space telescope'
> as one token, we cannot search for 'hubble NEAR telescope'
> You may (not) be surprised, but Lucene already supports ALL of these 
> requirements. The patch is an attempt to state the problem differently. I am 
> not sure if it is the best option, however it works perfectly for our needs 
> and it seems it could work for general public too. Especially if the 
> SynonymFilterFactory had a preconfigured sets of SynonymMapBuilders - and 
> people would just choose what situation they use. Please look at the unittest.
> links:
> [1] https://issues.apache.org/jira/browse/LUCENE-1622
> [2] http://labs.adsabs.harvard.edu/trac/ads-invenio/ticket/158
> [3] seems to have similar request: 
> http://lucene.472066.n3.nabble.com/Proposal-Full-support-for-multi-word-synonyms-at-query-time-td4000522.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-12-02 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4575:
---

Attachment: LUCENE-4575-testcase.patch

Simple test showing that commit data is lost ... I didn't need to use threads; 
just call .setCommitData after prepareCommit and before commit.

> Allow IndexWriter to commit, even just commitData
> -
>
> Key: LUCENE-4575
> URL: https://issues.apache.org/jira/browse/LUCENE-4575
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4575.patch, LUCENE-4575.patch, 
> LUCENE-4575-testcase.patch
>
>
> Spinoff from here 
> http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
> In some cases, it is valuable to be able to commit changes to the index, even 
> if the changes are just commitData. Such data is sometimes used by 
> applications to register in the index some global application 
> information/state.
> The proposal is:
> * Add a setCommitData() API and separate it from commit() and prepareCommit() 
> (simplify their API)
> * When that API is called, flip on the dirty/changes bit, so that this gets 
> committed even if no other changes were made to the index.
> I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4584) Compare the LZ4 implementation in Lucene against the original impl

2012-12-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508292#comment-13508292
 ] 

Uwe Schindler commented on LUCENE-4584:
---

I agree with Robert here. We don't need to test random data, for Lucene only 2 
things are important:
- When you compress random data and decompress it again, the same exact bytes 
must come back. This should be tested and needs no external C code. This is the 
doesn't corrumpt™ Robert is talking about.
- The compressed content should never get significantly bigger

There is no reason at all that Lucene's LZ4 returns the same compressed output. 
E.g. if we find a better algorithm that performs better in Hotspot, although it 
compresses to a different byte array, we are perfectly fine.

If we want to assert for now that both algorithms create the same compressed 
output, we should have three different size random byte files (e.g. generated 
by /dev/urandom) as test resources and the C-compressed ones also as test 
resources, and then we can compare the results. We should just document how the 
test data was created. But keep in mind: We may change the algorithm to produce 
different bytes, so this is not mandatory. I think we may only assert that the 
compression percentage of the random data is identical, not the actual bytes.

> Compare the LZ4 implementation in Lucene against the original impl
> --
>
> Key: LUCENE-4584
> URL: https://issues.apache.org/jira/browse/LUCENE-4584
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
> Fix For: 4.1
>
>
> We should add tests to make sure that the LZ4 impl in Lucene compresses data 
> the exact same way as the original impl.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-12-02 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508288#comment-13508288
 ] 

Mark Miller commented on SOLR-4114:
---

Should be against 5x - I'm going to US west coast for a week - so not sure when 
I'll get back to this - I may try and get it going while I'm out there and I 
may not have time till I get back.


> Collection API: Allow multiple shards from one collection on the same Solr 
> server
> -
>
> Key: SOLR-4114
> URL: https://issues.apache.org/jira/browse/SOLR-4114
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore, SolrCloud
>Affects Versions: 4.0
> Environment: Solr 4.0.0 release
>Reporter: Per Steffensen
>Assignee: Per Steffensen
>  Labels: collection-api, multicore, shard, shard-allocation
> Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch
>
>
> We should support running multiple shards from one collection on the same 
> Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
> (each Solr server running 2 shards).
> Performance tests at our side has shown that this is a good idea, and it is 
> also a good idea for easy elasticity later on - it is much easier to move an 
> entire existing shards from one Solr server to another one that just joined 
> the cluter than it is to split an exsiting shard among the Solr that used to 
> run it and the new Solr.
> See dev mailing list discussion "Multiple shards for one collection on the 
> same Solr server"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-12-02 Thread Per Steffensen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508287#comment-13508287
 ] 

Per Steffensen commented on SOLR-4114:
--

Where does your patch fit, Mark?

> Collection API: Allow multiple shards from one collection on the same Solr 
> server
> -
>
> Key: SOLR-4114
> URL: https://issues.apache.org/jira/browse/SOLR-4114
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore, SolrCloud
>Affects Versions: 4.0
> Environment: Solr 4.0.0 release
>Reporter: Per Steffensen
>Assignee: Per Steffensen
>  Labels: collection-api, multicore, shard, shard-allocation
> Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch
>
>
> We should support running multiple shards from one collection on the same 
> Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
> (each Solr server running 2 shards).
> Performance tests at our side has shown that this is a good idea, and it is 
> also a good idea for easy elasticity later on - it is much easier to move an 
> entire existing shards from one Solr server to another one that just joined 
> the cluter than it is to split an exsiting shard among the Solr that used to 
> run it and the new Solr.
> See dev mailing list discussion "Multiple shards for one collection on the 
> same Solr server"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes > 32k

2012-12-02 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508284#comment-13508284
 ] 

Robert Muir commented on LUCENE-4583:
-

The most important thing: if this implementation (or if we decide dv itself) 
should be limited,
then it should check this at index-time and throw a useful exception.

> StraightBytesDocValuesField fails if bytes > 32k
> 
>
> Key: LUCENE-4583
> URL: https://issues.apache.org/jira/browse/LUCENE-4583
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 4.0, 4.1, 5.0
>Reporter: David Smiley
>Priority: Critical
>
> I didn't observe any limitations on the size of a bytes based DocValues field 
> value in the docs.  It appears that the limit is 32k, although I didn't get 
> any friendly error telling me that was the limit.  32k is kind of small IMO; 
> I suspect this limit is unintended and as such is a bug.The following 
> test fails:
> {code:java}
>   public void testBigDocValue() throws IOException {
> Directory dir = newDirectory();
> IndexWriter writer = new IndexWriter(dir, writerConfig(false));
> Document doc = new Document();
> BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
> bytes.length = bytes.bytes.length;//byte data doesn't matter
> doc.add(new StraightBytesDocValuesField("dvField", bytes));
> writer.addDocument(doc);
> writer.commit();
> writer.close();
> DirectoryReader reader = DirectoryReader.open(dir);
> DocValues docValues = MultiDocValues.getDocValues(reader, "dvField");
> //FAILS IF BYTES IS BIG!
> docValues.getSource().getBytes(0, bytes);
> reader.close();
> dir.close();
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-12-02 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508275#comment-13508275
 ] 

Michael McCandless commented on LUCENE-4575:


I'll make a test exposing the bug ...

> Allow IndexWriter to commit, even just commitData
> -
>
> Key: LUCENE-4575
> URL: https://issues.apache.org/jira/browse/LUCENE-4575
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4575.patch, LUCENE-4575.patch
>
>
> Spinoff from here 
> http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
> In some cases, it is valuable to be able to commit changes to the index, even 
> if the changes are just commitData. Such data is sometimes used by 
> applications to register in the index some global application 
> information/state.
> The proposal is:
> * Add a setCommitData() API and separate it from commit() and prepareCommit() 
> (simplify their API)
> * When that API is called, flip on the dirty/changes bit, so that this gets 
> committed even if no other changes were made to the index.
> I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4584) Compare the LZ4 implementation in Lucene against the original impl

2012-12-02 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508276#comment-13508276
 ] 

Adrien Grand commented on LUCENE-4584:
--

bq. I only care that it compresses well, is reasonably fast, and doesn't 
corrupt.

Right, the issue is probably badly named. The reason why I want to compare 
against the original impl is exacly for the reasons you mention: making sure 
that our impl compresses well and trying to find bugs in it.

> Compare the LZ4 implementation in Lucene against the original impl
> --
>
> Key: LUCENE-4584
> URL: https://issues.apache.org/jira/browse/LUCENE-4584
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
> Fix For: 4.1
>
>
> We should add tests to make sure that the LZ4 impl in Lucene compresses data 
> the exact same way as the original impl.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4584) Compare the LZ4 implementation in Lucene against the original impl

2012-12-02 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508270#comment-13508270
 ] 

Adrien Grand commented on LUCENE-4584:
--

bq. You wouldn't need static files if you compared output lengths

Even the output length depends on the endianess: LZ4 uses a hash table without 
collision resolution (it maps hash -> last offset that produced this hash) to 
find matchs of 4 consecutive bytes in the input bytes, and this hash function 
is not endian-neutral (it interprets the 4 bytes as an 32-bits int, multiplies 
it by a prime number and keeps the 12 first bits (13 if there are less than 
2^16 input bytes)), so the collisions won't be the same depending on the 
endianess and LZ4 won't find the same matchs.

> Compare the LZ4 implementation in Lucene against the original impl
> --
>
> Key: LUCENE-4584
> URL: https://issues.apache.org/jira/browse/LUCENE-4584
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
> Fix For: 4.1
>
>
> We should add tests to make sure that the LZ4 impl in Lucene compresses data 
> the exact same way as the original impl.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4584) Compare the LZ4 implementation in Lucene against the original impl

2012-12-02 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4584:


Priority: Major  (was: Blocker)

> Compare the LZ4 implementation in Lucene against the original impl
> --
>
> Key: LUCENE-4584
> URL: https://issues.apache.org/jira/browse/LUCENE-4584
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
> Fix For: 4.1
>
>
> We should add tests to make sure that the LZ4 impl in Lucene compresses data 
> the exact same way as the original impl.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4584) Compare the LZ4 implementation in Lucene against the original impl

2012-12-02 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508269#comment-13508269
 ] 

Robert Muir commented on LUCENE-4584:
-

I'm confused why this is a blocker at all: I'm going to unset it.

I don't actually care if our LZ4 is conformant to the original impl.

I only care that it compresses well, is reasonably fast, and doesn't corrumpt.

> Compare the LZ4 implementation in Lucene against the original impl
> --
>
> Key: LUCENE-4584
> URL: https://issues.apache.org/jira/browse/LUCENE-4584
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Blocker
> Fix For: 4.1
>
>
> We should add tests to make sure that the LZ4 impl in Lucene compresses data 
> the exact same way as the original impl.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-12-02 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508268#comment-13508268
 ] 

Shai Erera commented on LUCENE-4575:


I'll make the changes, and also it seems like you were suggesting that earlier 
-- allow setCommitData to affect the pendingCommit too. I think that's valuable 
because you can e.g. call prerCommit() -> setCommitData() -> commit() -- the 
setCD() in the middle lets you create a commitData that will pertain to the 
state of the index after the commit.

I'll make all the changes and post a new patch, probably tomorrow.

> Allow IndexWriter to commit, even just commitData
> -
>
> Key: LUCENE-4575
> URL: https://issues.apache.org/jira/browse/LUCENE-4575
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4575.patch, LUCENE-4575.patch
>
>
> Spinoff from here 
> http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
> In some cases, it is valuable to be able to commit changes to the index, even 
> if the changes are just commitData. Such data is sometimes used by 
> applications to register in the index some global application 
> information/state.
> The proposal is:
> * Add a setCommitData() API and separate it from commit() and prepareCommit() 
> (simplify their API)
> * When that API is called, flip on the dirty/changes bit, so that this gets 
> committed even if no other changes were made to the index.
> I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-12-02 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508267#comment-13508267
 ] 

Michael McCandless commented on LUCENE-4575:


{quote}
I agree that calling that setCommitData in finishCommit is redundant, but 
perhaps we can solve it more elegantly by either:
# Not storing the setCommitData in infos, but rather in a private IW member. 
Then in startCommit set it on the cloned infos. It's essentially how it's done 
today, only now the commit data will be copied from a member.
# Stick w/ current API commit(commitData) and prepareCommit(commitData), and 
just make sure that commit goes through even if changeCount == 
previousChangeCount, but commitUserData != null.
{quote}

Hmm, I'd rather not store the member inside IW *and* inside SIS; just seems 
safer to have a single clear place where this is tracked.

Also, I like the new API so I'd rather not do #2?

I think just removing that line in finishCommit should fix the bug ... but 
first we need a test exposing it.

bq. I think that the code in finishCommit ensures that we can always pull the 
commitData from segmentInfos?

Yes.

> Allow IndexWriter to commit, even just commitData
> -
>
> Key: LUCENE-4575
> URL: https://issues.apache.org/jira/browse/LUCENE-4575
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4575.patch, LUCENE-4575.patch
>
>
> Spinoff from here 
> http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
> In some cases, it is valuable to be able to commit changes to the index, even 
> if the changes are just commitData. Such data is sometimes used by 
> applications to register in the index some global application 
> information/state.
> The proposal is:
> * Add a setCommitData() API and separate it from commit() and prepareCommit() 
> (simplify their API)
> * When that API is called, flip on the dirty/changes bit, so that this gets 
> committed even if no other changes were made to the index.
> I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-12-02 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508254#comment-13508254
 ] 

Shai Erera commented on LUCENE-4575:


bq. I thought we were going to rename ensureOpen's confusing boolean param?

Right, but for some reason I thought that you're going to do that :). I'll do 
it in the next patch.

bq. IW.setCommitData should be sync'd I think, eg to ensure visibility across 
threads of the changes to sis.userData?

Ok

bq. Hmm ... I think there's a thread hazard here, during commit

I think you're right. Not sure how practical, because I believe that usually 
the commit thread will also be the one that calls setCommitData, but it is 
possible.
I agree that calling that setCommitData in finishCommit is redundant, but 
perhaps we can solve it more elegantly by either:

# Not storing the setCommitData in infos, but rather in a private IW member. 
Then in startCommit set it on the cloned infos. It's essentially how it's done 
today, only now the commit data will be copied from a member.
# Stick w/ current API commit(commitData) and prepareCommit(commitData), and 
just make sure that commit goes through even if changeCount == 
previousChangeCount, but commitUserData != null.

Option #2 means that there's no API break, no synchronization is needed on 
setCommitData and practically everything remains the same. We can still remove 
the redundant .setCommitData in finishCommit regadless.

bq. should we add an IW.getCommitData?

I think that that'd be great ! Today the only way to do it is if you refresh a 
reader (expensive). I think that the code in finishCommit ensures that we can 
always pull the commitData from segmentInfos?

> Allow IndexWriter to commit, even just commitData
> -
>
> Key: LUCENE-4575
> URL: https://issues.apache.org/jira/browse/LUCENE-4575
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4575.patch, LUCENE-4575.patch
>
>
> Spinoff from here 
> http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
> In some cases, it is valuable to be able to commit changes to the index, even 
> if the changes are just commitData. Such data is sometimes used by 
> applications to register in the index some global application 
> information/state.
> The proposal is:
> * Add a setCommitData() API and separate it from commit() and prepareCommit() 
> (simplify their API)
> * When that API is called, flip on the dirty/changes bit, so that this gets 
> committed even if no other changes were made to the index.
> I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-12-02 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508251#comment-13508251
 ] 

Michael McCandless commented on LUCENE-4575:


Actually I think we should just remove that .setUserData inside finishCommit?

Also, should we add an IW.getCommitData?

> Allow IndexWriter to commit, even just commitData
> -
>
> Key: LUCENE-4575
> URL: https://issues.apache.org/jira/browse/LUCENE-4575
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4575.patch, LUCENE-4575.patch
>
>
> Spinoff from here 
> http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
> In some cases, it is valuable to be able to commit changes to the index, even 
> if the changes are just commitData. Such data is sometimes used by 
> applications to register in the index some global application 
> information/state.
> The proposal is:
> * Add a setCommitData() API and separate it from commit() and prepareCommit() 
> (simplify their API)
> * When that API is called, flip on the dirty/changes bit, so that this gets 
> committed even if no other changes were made to the index.
> I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-12-02 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508245#comment-13508245
 ] 

Michael McCandless commented on LUCENE-4575:


I thought we were going to rename ensureOpen's confusing boolean param?

IW.setCommitData should be sync'd I think, eg to ensure visibility
across threads of the changes to sis.userData?

Hmm ... I think there's a thread hazard here, during commit; I think
if pendingCommit is not null you should also call
pendingCommit.setUserData?  Else, a commit can finish and "undo" the
user's change to the commit data (see finishCommit, where it calls
.setUserData).  Maybe we need a thread safety test
here ...


> Allow IndexWriter to commit, even just commitData
> -
>
> Key: LUCENE-4575
> URL: https://issues.apache.org/jira/browse/LUCENE-4575
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Priority: Minor
> Attachments: LUCENE-4575.patch, LUCENE-4575.patch
>
>
> Spinoff from here 
> http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
> In some cases, it is valuable to be able to commit changes to the index, even 
> if the changes are just commitData. Such data is sometimes used by 
> applications to register in the index some global application 
> information/state.
> The proposal is:
> * Add a setCommitData() API and separate it from commit() and prepareCommit() 
> (simplify their API)
> * When that API is called, flip on the dirty/changes bit, so that this gets 
> committed even if no other changes were made to the index.
> I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4580) Facet DrillDown should return a ConstantScoreQuery

2012-12-02 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4580:
---

Summary: Facet DrillDown should return a ConstantScoreQuery  (was: Facet 
DrillDown should return a Filter not Query)

> Facet DrillDown should return a ConstantScoreQuery
> --
>
> Key: LUCENE-4580
> URL: https://issues.apache.org/jira/browse/LUCENE-4580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
>
> DrillDown is a helper class which the user can use to convert a facet value 
> that a user selected into a Query for performing drill-down or narrowing the 
> results. The API has several static methods that create e.g. a Term or Query.
> Rather than creating a Query, it would make more sense to create a Filter I 
> think. In most cases, the clicked facets should not affect the scoring of 
> documents. Anyway, even if it turns out that it must return a Query (which I 
> doubt), we should at least modify the impl to return a ConstantScoreQuery.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query

2012-12-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508243#comment-13508243
 ] 

Uwe Schindler commented on LUCENE-4580:
---

OK. I would add a test that verifies that the scores dont change...

> Facet DrillDown should return a Filter not Query
> 
>
> Key: LUCENE-4580
> URL: https://issues.apache.org/jira/browse/LUCENE-4580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
>
> DrillDown is a helper class which the user can use to convert a facet value 
> that a user selected into a Query for performing drill-down or narrowing the 
> results. The API has several static methods that create e.g. a Term or Query.
> Rather than creating a Query, it would make more sense to create a Filter I 
> think. In most cases, the clicked facets should not affect the scoring of 
> documents. Anyway, even if it turns out that it must return a Query (which I 
> doubt), we should at least modify the impl to return a ConstantScoreQuery.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query

2012-12-02 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508242#comment-13508242
 ] 

Shai Erera commented on LUCENE-4580:


It seems then that the only thing that needs to be done here is fix the 
{{query()}} code to return CSQ (and set the coord and boost properly). The API 
today doesn't support disjunction between categories, but it is doable with a 
combination of term() and query() calls, so rather than adding more API, I say 
that we leave it simple.

If you agree, I'll rename this issue and fix DrillDown.

> Facet DrillDown should return a Filter not Query
> 
>
> Key: LUCENE-4580
> URL: https://issues.apache.org/jira/browse/LUCENE-4580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
>
> DrillDown is a helper class which the user can use to convert a facet value 
> that a user selected into a Query for performing drill-down or narrowing the 
> results. The API has several static methods that create e.g. a Term or Query.
> Rather than creating a Query, it would make more sense to create a Filter I 
> think. In most cases, the clicked facets should not affect the scoring of 
> documents. Anyway, even if it turns out that it must return a Query (which I 
> doubt), we should at least modify the impl to return a ConstantScoreQuery.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query

2012-12-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508241#comment-13508241
 ] 

Uwe Schindler commented on LUCENE-4580:
---

bq. If a Filter is not cached, how efficient is using TermsFilter(oneTerm) vs. 
CSQ(TermQuery)? Are we talking huge gains here? If not, let's keep the API 
simple. DrillDown offers the terms() API too, so one can construct 
BooleanFilter, TermsFilter and whatever he wants out of them.

CSQ(TermQuery)) is way faster, as it can leap-frog. TermsFilter with one term 
will allocate a Bitset and then mark all positings in it; also those postings 
which are not needed (this depends on the FilteredQuery mode, which is used to 
apply filters). CSQ(TermQuery) will leap-frog so the original query and the 
single TermQuery will advance each other and lead to fastest execution, while 
the TermsFilter prepares a bitset before the query execution, so the latency 
will be bigger (2 steps).

> Facet DrillDown should return a Filter not Query
> 
>
> Key: LUCENE-4580
> URL: https://issues.apache.org/jira/browse/LUCENE-4580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
>
> DrillDown is a helper class which the user can use to convert a facet value 
> that a user selected into a Query for performing drill-down or narrowing the 
> results. The API has several static methods that create e.g. a Term or Query.
> Rather than creating a Query, it would make more sense to create a Filter I 
> think. In most cases, the clicked facets should not affect the scoring of 
> documents. Anyway, even if it turns out that it must return a Query (which I 
> doubt), we should at least modify the impl to return a ConstantScoreQuery.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query

2012-12-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508240#comment-13508240
 ] 

Uwe Schindler commented on LUCENE-4580:
---

Hi,

In general I would prefer another approach for the whole thing. We should not 
make the users decide if then need to use a Filter or Query or whatever drill 
down approach. The user API should only use Query: Query in, Query out:

{code:java}
Query drilldown(Query originalQuery, CategoryPath... categories)
{code}

This would get the user query to drill down as input and return a new Query 
with the same scoring, but somehow filtered. Internally this method can use a 
Filter or Query or whatever to do the drill down, the user does not need to 
think about. It should just add 2 options: conjunction or disjunction.

The following possibilities are available:
- one or more category path, conjunction: returns new BooleanQuery(true) [no 
coord], consisting of the original Query as clause and multiple 
CSQ(TermQuery(category)) with boost=0.0 (boost=0 means the BQ does not get any 
value from the filter clause and with disableCoord=true nothing changes)
- more than one category path, disjunction between categories: return 
FilteredQuery(originalQuery, new TermsFilter(terms))

> Facet DrillDown should return a Filter not Query
> 
>
> Key: LUCENE-4580
> URL: https://issues.apache.org/jira/browse/LUCENE-4580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
>
> DrillDown is a helper class which the user can use to convert a facet value 
> that a user selected into a Query for performing drill-down or narrowing the 
> results. The API has several static methods that create e.g. a Term or Query.
> Rather than creating a Query, it would make more sense to create a Filter I 
> think. In most cases, the clicked facets should not affect the scoring of 
> documents. Anyway, even if it turns out that it must return a Query (which I 
> doubt), we should at least modify the impl to return a ConstantScoreQuery.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query

2012-12-02 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508239#comment-13508239
 ] 

Shai Erera commented on LUCENE-4580:


Uwe, the thinking that I had about Filter is that if you e.g. wrap it w/ CWF, 
then you pay that cost once, and that's it. Therefore BooleanFilter is just 
used as a means to create a more complicated Filter.

But, I'm not sure that I want to over-complicate DrillDown API. So perhaps this 
is what we do:

* Fix DrillDown to always return CSQ, irregardless of the case.
* Document that for caching purposes, one can wrap the returned Query with 
CachingWrapperFilter(QueryWrapperFilter(Query))

If a Filter is not cached, how efficient is using TermsFilter(oneTerm) vs. 
CSQ(TermQuery)? Are we talking huge gains here? If not, let's keep the API 
simple. DrillDown offers the terms() API too, so one can construct 
BooleanFilter, TermsFilter and whatever he wants out of them.

> Facet DrillDown should return a Filter not Query
> 
>
> Key: LUCENE-4580
> URL: https://issues.apache.org/jira/browse/LUCENE-4580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
>
> DrillDown is a helper class which the user can use to convert a facet value 
> that a user selected into a Query for performing drill-down or narrowing the 
> results. The API has several static methods that create e.g. a Term or Query.
> Rather than creating a Query, it would make more sense to create a Filter I 
> think. In most cases, the clicked facets should not affect the scoring of 
> documents. Anyway, even if it turns out that it must return a Query (which I 
> doubt), we should at least modify the impl to return a ConstantScoreQuery.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query

2012-12-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508215#comment-13508215
 ] 

Uwe Schindler commented on LUCENE-4580:
---

bq. This is exactly what I proposed. I'm +1 for it (and BooleanFilter).

-1, BooleanFilter is horrible and slow for this use-case.

> Facet DrillDown should return a Filter not Query
> 
>
> Key: LUCENE-4580
> URL: https://issues.apache.org/jira/browse/LUCENE-4580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
>
> DrillDown is a helper class which the user can use to convert a facet value 
> that a user selected into a Query for performing drill-down or narrowing the 
> results. The API has several static methods that create e.g. a Term or Query.
> Rather than creating a Query, it would make more sense to create a Filter I 
> think. In most cases, the clicked facets should not affect the scoring of 
> documents. Anyway, even if it turns out that it must return a Query (which I 
> doubt), we should at least modify the impl to return a ConstantScoreQuery.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query

2012-12-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508211#comment-13508211
 ] 

Uwe Schindler commented on LUCENE-4580:
---

bq. Or better, move TermsFilter and BooleanFilter to core – why are they 
treated differently than TermQuery and BooleanQuery? Especially now that Filter 
is applied more efficiently, I suspect more people will want to use it?

TermsFilter - yes. See my comment above. We already have a very good 
Automaton-based one in test-framework that also needs to be moved to core (as a 
MTQ rewrite method).

BUT, about BooleanFilter: This class is horrible ineffective, inconsistent, and 
not good for drill downs (you should use it only when you want to do caching of 
filters with bitsets). If you use it for those type of queries you pay the 
price of allocating bitsets, iterate the wrapped queries/filters completely 
instead of advanceing the underlying scorers (leap-frogging). So for drilldowns 
BooleanFilter is the worst you can do!

The way to go from my opinion is to use constant score queries (like Solr does).

In addition we recently reopened / discussed again the very old issue to nuke 
Filters at all and just provide queries and nothing more. Filters are nothing 
more than constant score queries

> Facet DrillDown should return a Filter not Query
> 
>
> Key: LUCENE-4580
> URL: https://issues.apache.org/jira/browse/LUCENE-4580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
>
> DrillDown is a helper class which the user can use to convert a facet value 
> that a user selected into a Query for performing drill-down or narrowing the 
> results. The API has several static methods that create e.g. a Term or Query.
> Rather than creating a Query, it would make more sense to create a Filter I 
> think. In most cases, the clicked facets should not affect the scoring of 
> documents. Anyway, even if it turns out that it must return a Query (which I 
> doubt), we should at least modify the impl to return a ConstantScoreQuery.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query

2012-12-02 Thread Gilad Barkai (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508202#comment-13508202
 ] 

Gilad Barkai commented on LUCENE-4580:
--

bq. it should be a combination of TermsFilter and BooleanFilter. So in fact if 
we want to keep DrillDown behave like today, we should use BooleanFilter and 
TermsFilter.

+1

> Facet DrillDown should return a Filter not Query
> 
>
> Key: LUCENE-4580
> URL: https://issues.apache.org/jira/browse/LUCENE-4580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
>
> DrillDown is a helper class which the user can use to convert a facet value 
> that a user selected into a Query for performing drill-down or narrowing the 
> results. The API has several static methods that create e.g. a Term or Query.
> Rather than creating a Query, it would make more sense to create a Filter I 
> think. In most cases, the clicked facets should not affect the scoring of 
> documents. Anyway, even if it turns out that it must return a Query (which I 
> doubt), we should at least modify the impl to return a ConstantScoreQuery.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query

2012-12-02 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508201#comment-13508201
 ] 

Shai Erera commented on LUCENE-4580:


bq. In my opinion, for Lucene 4.x we should move the TermsFilter to core.

This is exactly what I proposed. I'm +1 for it (and BooleanFilter).

bq. TermsFilter is a Disjunction, but for drill downs you generally need 
Conjunctions

You're right, it should be a combination of TermsFilter and BooleanFilter. So 
in fact if we want to keep DrillDown behave like today, we should use 
BooleanFilter and TermsFilter.

> Facet DrillDown should return a Filter not Query
> 
>
> Key: LUCENE-4580
> URL: https://issues.apache.org/jira/browse/LUCENE-4580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
>
> DrillDown is a helper class which the user can use to convert a facet value 
> that a user selected into a Query for performing drill-down or narrowing the 
> results. The API has several static methods that create e.g. a Term or Query.
> Rather than creating a Query, it would make more sense to create a Filter I 
> think. In most cases, the clicked facets should not affect the scoring of 
> documents. Anyway, even if it turns out that it must return a Query (which I 
> doubt), we should at least modify the impl to return a ConstantScoreQuery.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query

2012-12-02 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508200#comment-13508200
 ] 

Shai Erera commented on LUCENE-4580:


Not that I'm against adding dependencies between modules, but just to give some 
data points:

* The queries module is not a MUST for every search application (let alone 
faceted search). The basic query components are in core already (Filter, Query, 
TermQuery, BooleanQuery etc.). I found the queries module useful (so far) for 
the {{BooleanFilter}} and {{TermsFilter}} classes.

* A question was recently asked on the user list how to make {{DrillDown}} 
create OR queries instead of AND. The scenario -- you have a facet dimension 
for which you would like to allow people to select multiple values, and OR them 
(while still AND-ing with other dimensions). Since {{DrillDown}} doesn't have 
that option, I offered the user to use DrillDown.term() and construct his own 
BooleanQuery.
** My point is that {{DrillDown}} is a helper class that doesn't cover all 
cases already. Even if we make it return a Filter, that user will still need to 
construct BooleanFilter doing several API calls.
** So I'm ok if it only exposes terms(), but I'm also ok if we add the queries 
dependency and just make the cut over to Filter instead of Query.
** Or better, move TermsFilter and BooleanFilter to core -- why are they 
treated differently than TermQuery and BooleanQuery? Especially now that Filter 
is applied more efficiently, I suspect more people will want to use it?

* I am all for usability, but {{TermsFilter}} is not like {{BooleanQuery}} in 
the sense that it's very easy to create it (just one line of code). I'm not 
sure that if BooleanQuery had a ctor which accepts {{List}}, we wouldn't 
have used it in {{DrillDown}}, or if we even create the DrillDown.query API. So 
the 'same code over and over' is not comparable between the two cases, I think.

> Facet DrillDown should return a Filter not Query
> 
>
> Key: LUCENE-4580
> URL: https://issues.apache.org/jira/browse/LUCENE-4580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
>
> DrillDown is a helper class which the user can use to convert a facet value 
> that a user selected into a Query for performing drill-down or narrowing the 
> results. The API has several static methods that create e.g. a Term or Query.
> Rather than creating a Query, it would make more sense to create a Filter I 
> think. In most cases, the clicked facets should not affect the scoring of 
> documents. Anyway, even if it turns out that it must return a Query (which I 
> doubt), we should at least modify the impl to return a ConstantScoreQuery.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query

2012-12-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508199#comment-13508199
 ] 

Uwe Schindler commented on LUCENE-4580:
---

In my opinion, for Lucene 4.x we should move the TermsFilter to core. This 
filter is very often used and we already have a good Automaton-based variant 
(DahizukMihov) filter that performs very well on lots of terms!

On the other hand: TermsFilter is a Disjunction, but for drill downs you 
generally need Conjunctions?

> Facet DrillDown should return a Filter not Query
> 
>
> Key: LUCENE-4580
> URL: https://issues.apache.org/jira/browse/LUCENE-4580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
>
> DrillDown is a helper class which the user can use to convert a facet value 
> that a user selected into a Query for performing drill-down or narrowing the 
> results. The API has several static methods that create e.g. a Term or Query.
> Rather than creating a Query, it would make more sense to create a Filter I 
> think. In most cases, the clicked facets should not affect the scoring of 
> documents. Anyway, even if it turns out that it must return a Query (which I 
> doubt), we should at least modify the impl to return a ConstantScoreQuery.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query

2012-12-02 Thread Gilad Barkai (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508195#comment-13508195
 ] 

Gilad Barkai commented on LUCENE-4580:
--

{{DrillDown}} is a useful class with a straight-forward API, which makes the 
life of basic users simpler.
As Shai pointed out, today there is no dependency on the Queries module, but 
the code contains a hidden bug in which a 'drill down' operation may change the 
score of the results. And adding a Filter or a {{ConstantScoreQuery}} looks the 
right way to go.
That sort of a fix is possible, while keeping the usefulness of the DrillDown 
class, only if the code becomes dependent on the queries module.
On the other hand, removing the dependency would force most faceted users to 
write that exact extra code as mentioned. Preventing such cases was the reason 
that utility class was created.

'Drilling Down' is a basic feature of a faceted search application, and the 
DrillDown class provides an easy way of invoking it.
Having a faceted search application without utilizing the queries module (e.g 
filtering) seems remote - is there any such scenario?
Module dependency may result with a user loading jars he does not need or care 
about, but the queries module jar is likely to be found on any faceted search 
application.

Modules should be independent, but I see enough gain in here. It would not 
bother me if the faceted module would depend on the query module. I find it 
logical.

-1 for forcing users to write same code over and over to keep facet module 
independent of the queries module
+1 for adding {{DrillDown.filter(CategoryPath...)}} - That looks like the way 
to go


> Facet DrillDown should return a Filter not Query
> 
>
> Key: LUCENE-4580
> URL: https://issues.apache.org/jira/browse/LUCENE-4580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
>Priority: Minor
>
> DrillDown is a helper class which the user can use to convert a facet value 
> that a user selected into a Query for performing drill-down or narrowing the 
> results. The API has several static methods that create e.g. a Term or Query.
> Rather than creating a Query, it would make more sense to create a Filter I 
> think. In most cases, the clicked facets should not affect the scoring of 
> documents. Anyway, even if it turns out that it must return a Query (which I 
> doubt), we should at least modify the impl to return a ConstantScoreQuery.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4584) Compare the LZ4 implementation in Lucene against the original impl

2012-12-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508191#comment-13508191
 ] 

Uwe Schindler commented on LUCENE-4584:
---

bq. cpp-tasks is used to compile NativePosixUtil.cpp, so there is precedent for 
this in our project...

-1. THIS IS NOT PART OF OUR BUILD SYSTEM; IT IS NOT EVEN TESTED AT ALL!

> Compare the LZ4 implementation in Lucene against the original impl
> --
>
> Key: LUCENE-4584
> URL: https://issues.apache.org/jira/browse/LUCENE-4584
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Blocker
> Fix For: 4.1
>
>
> We should add tests to make sure that the LZ4 impl in Lucene compresses data 
> the exact same way as the original impl.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-4584) Compare the LZ4 implementation in Lucene against the original impl

2012-12-02 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508191#comment-13508191
 ] 

Uwe Schindler edited comment on LUCENE-4584 at 12/2/12 9:10 AM:


bq. cpp-tasks is used to compile NativePosixUtil.cpp, so there is precedent for 
this in our project...

-1. THIS IS NOT PART OF OUR (OFFICIALLY SUPPORTED) BUILD SYSTEM; IT IS NOT EVEN 
TESTED AT ALL!

  was (Author: thetaphi):
bq. cpp-tasks is used to compile NativePosixUtil.cpp, so there is precedent 
for this in our project...

-1. THIS IS NOT PART OF OUR BUILD SYSTEM; IT IS NOT EVEN TESTED AT ALL!
  
> Compare the LZ4 implementation in Lucene against the original impl
> --
>
> Key: LUCENE-4584
> URL: https://issues.apache.org/jira/browse/LUCENE-4584
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Blocker
> Fix For: 4.1
>
>
> We should add tests to make sure that the LZ4 impl in Lucene compresses data 
> the exact same way as the original impl.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

66 matches

Mail list logo