date:20130816

Re: backward incompatibility with MockTokenFilter

2013-08-16 Thread Simon Willnauer

Hey John, this class is used for testing only. It's part of the
testing framework and I don't think we can provide migration
suggestion or BW compat for that package. If you rely on the
functionality I suggest you to fork the code into your code base or
move to an official alternative in the analysis jars.

simon

On Fri, Aug 16, 2013 at 7:06 AM, John Wang john.w...@gmail.com wrote:
 Hi folks:

 In release 4.3.1, MockTokenFilter has an api to turn on/off position
 increments, e.g. :

 set/getEnablePositionIncrements()

 In release 4.4.0 that was removed. And the default behavior in 4.4.0 is that
 it is assumed to be true.

 But I don't see this change documented or a migration suggestion.

 Please advise.

 Thanks

 -John

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5168) ByteSliceReader assert trips with 32-bit oracle 1.7.0_25 + G1GC

2013-08-16 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741968#comment-13741968
 ] 

Dawid Weiss commented on LUCENE-5168:
-

I can reproduce the issue on a different scenario too (core tests) so it's 
quite definitely a compiler bug lurking somewhere.
{code}
   [junit4] ERROR   0.00s | TestSimpleExplanations (suite) 
   [junit4] Throwable #1: java.lang.AssertionError
   [junit4]at 
__randomizedtesting.SeedInfo.seed([8C5A2DB2970990FA]:0)
   [junit4]at 
org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:457)
   [junit4]at 
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
   [junit4]at 
org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
   [junit4]at 
org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
   [junit4]at 
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
   [junit4]at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:501)
   [junit4]at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:478)
   [junit4]at 
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:615)
   [junit4]at 
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:365)
   [junit4]at 
org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:307)
   [junit4]at 
org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:249)
   [junit4]at 
org.apache.lucene.search.TestExplanations.beforeClassTestExplanations(TestExplanations.java:82)
   [junit4]at java.lang.Thread.run(Thread.java:724)Throwable #2: 
java.lang.NullPointerException
   [junit4]at 
__randomizedtesting.SeedInfo.seed([8C5A2DB2970990FA]:0)
   [junit4]at 
org.apache.lucene.search.TestExplanations.afterClassTestExplanations(TestExplanations.java:63)
   [junit4]at java.lang.Thread.run(Thread.java:724)
   [junit4] Completed in 0.06s, 0 tests, 1 failure, 1 error  FAILURES!
{code}

Five failures out of a hundred full runs of lucene's core tests. So it's not a 
frequent thing, but it does happen. Java 1.8 b102, 32-bit (Windows).

 ByteSliceReader assert trips with 32-bit oracle 1.7.0_25 + G1GC
 ---

 Key: LUCENE-5168
 URL: https://issues.apache.org/jira/browse/LUCENE-5168
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: java8-windows-4x-3075-console.txt


 This assertion trips (sometimes from different tests), if you run the 
 highlighting tests on branch_4x with r1512807.
 It reproduces about half the time, always only with 32bit + G1GC (other 
 combinations do not seem to trip it, i didnt try looping or anything really 
 though).
 {noformat}
 rmuir@beast:~/workspace/branch_4x$ svn up -r 1512807
 rmuir@beast:~/workspace/branch_4x$ ant clean
 rmuir@beast:~/workspace/branch_4x$ rm -rf .caches #this is important,
 otherwise master seed does not work!
 rmuir@beast:~/workspace/branch_4x/lucene/highlighter$ ant test
 -Dtests.jvms=2 -Dtests.seed=EBBFA6F4E80A7365 -Dargs=-server
 -XX:+UseG1GC
 {noformat}
 Originally showed up like this:
 {noformat}
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6874/
 Java: 32bit/jdk1.7.0_25 -server -XX:+UseG1GC
 1 tests failed.
 REGRESSION:  
 org.apache.lucene.search.postingshighlight.TestPostingsHighlighter.testUserFailedToIndexOffsets
 Error Message:
 Stack Trace:
 java.lang.AssertionError
 at 
 __randomizedtesting.SeedInfo.seed([EBBFA6F4E80A7365:1FBF811885F2D611]:0)
 at 
 org.apache.lucene.index.ByteSliceReader.readByte(ByteSliceReader.java:73)
 at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
 at 
 org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:453)
 at 
 org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
 at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
 at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
 at 
 org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
 at 
 org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:501)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail:

[jira] [Commented] (SOLR-3280) to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / after replication

2013-08-16 Thread Bernd Fehling (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741972#comment-13741972
]

Bernd Fehling commented on SOLR-3280:
-

After going from solr 3.6 to 4.2.1 I haven't seen this anymore. There was
pretty much rework done in SnapPuller due to multicore. Which version are you
using?

to many / sometimes stale CLOSE_WAIT connections from SnapPuller during /
after replication
---

Key: SOLR-3280
URL: https://issues.apache.org/jira/browse/SOLR-3280
Project: Solr
Issue Type: Bug
Affects Versions: 3.5, 3.6, 4.0-ALPHA
Reporter: Bernd Fehling
Assignee: Robert Muir
Priority: Minor
Attachments: SOLR-3280.patch

There are sometimes to many and also stale CLOSE_WAIT connections
during/after replication left over on SLAVE server.
Normally GC should clean up this but this is not always the case.
Also if a CLOSE_WAIT is hanging then the new replication won't load.
Dirty work around so far is to fake a TCP connection as root to that
connection and close it.
After that the new replication will load, the old index and searcher released
and the system will
return to normal operation.
Background:
The SnapPuller is using Apache httpclient 3.x and uses the
MultiThreadedHttpConnectionManager.
The manager holds a connection in CLOSE_WAIT after its use for further
requests.
This is done by calling releaseConnection. But if a connection is stuck it is
not available any more and a new
connection from the pool is used.
Solution:
After calling releaseConnection clean up with closeIdleConnections(0).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5168) ByteSliceReader assert trips with 32-bit oracle 1.7.0_25 + G1GC

2013-08-16 Thread Dawid Weiss (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-5168:


Attachment: log.0100
log.0086
log.0078
log.0042
log.0025

Failed logs from 1.8b102 runs.

 ByteSliceReader assert trips with 32-bit oracle 1.7.0_25 + G1GC
 ---

 Key: LUCENE-5168
 URL: https://issues.apache.org/jira/browse/LUCENE-5168
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: java8-windows-4x-3075-console.txt, log.0025, log.0042, 
 log.0078, log.0086, log.0100


 This assertion trips (sometimes from different tests), if you run the 
 highlighting tests on branch_4x with r1512807.
 It reproduces about half the time, always only with 32bit + G1GC (other 
 combinations do not seem to trip it, i didnt try looping or anything really 
 though).
 {noformat}
 rmuir@beast:~/workspace/branch_4x$ svn up -r 1512807
 rmuir@beast:~/workspace/branch_4x$ ant clean
 rmuir@beast:~/workspace/branch_4x$ rm -rf .caches #this is important,
 otherwise master seed does not work!
 rmuir@beast:~/workspace/branch_4x/lucene/highlighter$ ant test
 -Dtests.jvms=2 -Dtests.seed=EBBFA6F4E80A7365 -Dargs=-server
 -XX:+UseG1GC
 {noformat}
 Originally showed up like this:
 {noformat}
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6874/
 Java: 32bit/jdk1.7.0_25 -server -XX:+UseG1GC
 1 tests failed.
 REGRESSION:  
 org.apache.lucene.search.postingshighlight.TestPostingsHighlighter.testUserFailedToIndexOffsets
 Error Message:
 Stack Trace:
 java.lang.AssertionError
 at 
 __randomizedtesting.SeedInfo.seed([EBBFA6F4E80A7365:1FBF811885F2D611]:0)
 at 
 org.apache.lucene.index.ByteSliceReader.readByte(ByteSliceReader.java:73)
 at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
 at 
 org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:453)
 at 
 org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
 at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
 at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
 at 
 org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
 at 
 org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:501)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5164) Can not create a collection via collections API (cloud mode)

2013-08-16 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742014#comment-13742014
 ] 

Alan Woodward commented on SOLR-5164:
-

Related:  SOLR-5099.

I think we need an explicit test for creating collections via the API, though.  
It's a bit scary that this bug can occur without the test suite complaining 
about it.  I'm busy for the next couple of days, but will have some time next 
week if nobody else gets there first.

 Can not create a collection via collections API (cloud mode)
 

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker
 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5168) ByteSliceReader assert trips with 32-bit oracle 1.7.0_25 + G1GC

2013-08-16 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742015#comment-13742015
 ] 

Dawid Weiss commented on LUCENE-5168:
-

This issue also affects 1.7.0_21-b11 (32 bit).

 ByteSliceReader assert trips with 32-bit oracle 1.7.0_25 + G1GC
 ---

 Key: LUCENE-5168
 URL: https://issues.apache.org/jira/browse/LUCENE-5168
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: java8-windows-4x-3075-console.txt, log.0025, log.0042, 
 log.0078, log.0086, log.0100


 This assertion trips (sometimes from different tests), if you run the 
 highlighting tests on branch_4x with r1512807.
 It reproduces about half the time, always only with 32bit + G1GC (other 
 combinations do not seem to trip it, i didnt try looping or anything really 
 though).
 {noformat}
 rmuir@beast:~/workspace/branch_4x$ svn up -r 1512807
 rmuir@beast:~/workspace/branch_4x$ ant clean
 rmuir@beast:~/workspace/branch_4x$ rm -rf .caches #this is important,
 otherwise master seed does not work!
 rmuir@beast:~/workspace/branch_4x/lucene/highlighter$ ant test
 -Dtests.jvms=2 -Dtests.seed=EBBFA6F4E80A7365 -Dargs=-server
 -XX:+UseG1GC
 {noformat}
 Originally showed up like this:
 {noformat}
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6874/
 Java: 32bit/jdk1.7.0_25 -server -XX:+UseG1GC
 1 tests failed.
 REGRESSION:  
 org.apache.lucene.search.postingshighlight.TestPostingsHighlighter.testUserFailedToIndexOffsets
 Error Message:
 Stack Trace:
 java.lang.AssertionError
 at 
 __randomizedtesting.SeedInfo.seed([EBBFA6F4E80A7365:1FBF811885F2D611]:0)
 at 
 org.apache.lucene.index.ByteSliceReader.readByte(ByteSliceReader.java:73)
 at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
 at 
 org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:453)
 at 
 org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
 at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
 at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
 at 
 org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
 at 
 org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:501)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5152) EdgeNGramFilterFactory deletes token

2013-08-16 Thread Christoph Lingg (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742051#comment-13742051
 ] 

Christoph Lingg commented on SOLR-5152:
---

how about a properties as in _WhitespaceTokenizerFactory_: preserveOriginal=1

 EdgeNGramFilterFactory deletes token
 

 Key: SOLR-5152
 URL: https://issues.apache.org/jira/browse/SOLR-5152
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.4
Reporter: Christoph Lingg

 I am using EdgeNGramFilterFactory in my schema.xml
 {code:xml}fieldType name=text class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 !-- ... --
 filter class=solr.EdgeNGramFilterFactory minGramSize=2 
 maxGramSize=10 side=front /
   /analyzer
 /fieldType{code}
 Some tokens in my index only consist of one character, let's say {{R}}. 
 minGramSize is set to 2 and is bigger than the length of the token. I 
 expected the NGramFilter to left {{R}} unchanged but in fact it is deleting 
 the token.
 For my use case this interpretation is undesirable, and probably for most use 
 cases too!?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5178) doc values should allow configurable defaults

2013-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742072#comment-13742072
 ] 

ASF subversion and git services commented on LUCENE-5178:
-

Commit 1514642 from [~rcmuir] in branch 'dev/branches/lucene5178'
[ https://svn.apache.org/r1514642 ]

LUCENE-5178: add 'missing' support to docvalues (simpletext only)

 doc values should allow configurable defaults
 -

 Key: LUCENE-5178
 URL: https://issues.apache.org/jira/browse/LUCENE-5178
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Yonik Seeley

 DocValues should somehow allow a configurable default per-field.
 Possible implementations include setting it on the field in the document or 
 registration of an IndexWriter callback.
 If we don't make the default configurable, then another option is to have 
 DocValues fields keep track of whether a value was indexed for that document 
 or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5178) doc values should allow configurable defaults

2013-08-16 Thread Robert Muir (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742073#comment-13742073
]

Robert Muir commented on LUCENE-5178:
-

I created a patch with getDocsWithField (and the current
fieldcache.getDocsWithField passing thru to it) for docvalues so you know if a
value is missing.

It also means e.g. SortedDocValues returns -1 ord for missing like fieldcache.
so its completely consistent there with FC.

currently simpletext is the only one implementing it: the other codecs return
MatchAllBits (and thats how the backcompat will work, because they never had
missing values before).

all tests are passing, but I want to think about strategies for the efficient
codecs (Memory/Disk) before doing anything.

one other thing i like is, if we do it this way, the codec has the chance to
represent missing values in a more efficient way than if the users do it
themselves on top.

doc values should allow configurable defaults
-

Key: LUCENE-5178
URL: https://issues.apache.org/jira/browse/LUCENE-5178
Project: Lucene - Core
Issue Type: Improvement
Reporter: Yonik Seeley

DocValues should somehow allow a configurable default per-field.
Possible implementations include setting it on the field in the document or
registration of an IndexWriter callback.
If we don't make the default configurable, then another option is to have
DocValues fields keep track of whether a value was indexed for that document
or not.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5164) Can not create a collection via collections API (cloud mode)

2013-08-16 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742098#comment-13742098
 ] 

Erick Erickson commented on SOLR-5164:
--

Blast, I wish I'd paid more attention to SOLR-5099, it'd have saved me some 
time. Sigh..

[~romseygeek] I looked and there are some collection creation tests, but I 
didn't dig enough to understand completely why the second solr in the path 
didn't trip this condition. What it didn't seem like we have was a way to 
restart from scratch. And in the case of SOLR-5099, core creation does succeed 
it's the restart that's the problem.

FWIW.

 Can not create a collection via collections API (cloud mode)
 

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker
 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-5099) The core.properties not created during collection creation

2013-08-16 Thread Erick Erickson (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson resolved SOLR-5099.
--

Resolution: Fixed
Fix Version/s: 5.0
4.5
Assignee: Erick Erickson (was: Alan Woodward)

Herb:

I stumbled across this as well. I sure wish I'd paid more attention to this
JIRA before, you'd have saved me a couple of hours of head-scratching. Nice
sleuthing, you nailed the problem.

Anyway, I'll check in the fixes for SOLR-5164 this morning and this will be
fixed.

The core.properties not created during collection creation
--

Key: SOLR-5099
URL: https://issues.apache.org/jira/browse/SOLR-5099
Project: Solr
Issue Type: Bug
Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Herb Jiang
Assignee: Erick Erickson
Priority: Critical
Fix For: 4.5, 5.0

Attachments: CorePropertiesLocator.java.patch

When using the new solr.xml structure. The core auto discovery mechanism
trying to find core.properties.
But I found the core.properties cannot be create when I dynamically create a
collection.
The root issue is the CorePropertiesLocator trying to create properties
before the instanceDir is created.
And collection creation process will done and looks fine at runtime, but it
will cause issues (cores are not auto discovered after server restart).

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: MoreLikeThis (MLT) - AND operator between the fields

2013-08-16 Thread Erick Erickson

I don't know enough about MLT to have an opinion one way or the other. But
it's
perfectly fine to open up a JIRA and attach your patch,
see: http://wiki.apache.org/solr/HowToContribute

Best
Erick


On Thu, Aug 15, 2013 at 12:13 PM, Kranti Parisa kranti.par...@gmail.comwrote:

 I was looking at the code and found that it is hard coded to Occur.SHOULD
 in MoreLikeThisQuery.

 I customized the code to pass a new parameter *mlt.operator*=AND/OR
  based on that it computes the MLT documents. Default operator is set to OR.
 And I also want to have *mlt.sort* option, So I will be trying for that
 as well.

 Do you guys think, we should make this part of the MLT feature?
 Please share your ideas. I can submit this change.


 Thanks  Regards,
 Kranti K Parisa
 http://www.linkedin.com/in/krantiparisa



 On Thu, Aug 15, 2013 at 12:05 AM, Kranti Parisa 
 kranti.par...@gmail.comwrote:

 Hi,

 It seems that when we pass multiple field names with mlt.fl parameter, it
 is ORing them to find the MLT documents.

 Is there a way to specify AND operator? Means if mlt.fl=language,year,
 then we should return back the MLT documents which has language AND year
 field values as same as the main query result document.


 http://localhost:8180/solr/mltCore/mlt?q=id:1wt=jsonmlt=truemlt.fl=language,yearfl=*,scoremlt.mindf=0mlt.mintf=0mlt.match.include=false

 The above query should return those documents whose field values
 (language, year) are exactly matching with the document id:1.

 Is this possible thru any config or param? If not, I think it's worth
 having as a feature because we don't know the values of those fields to
 apply as FQ.


 Thanks  Regards,
 Kranti K Parisa
 http://www.linkedin.com/in/krantiparisa

[jira] [Created] (SOLR-5167) Ability to use AnalyzingInfixSuggester in Solr

2013-08-16 Thread Varun Thacker (JIRA)

Varun Thacker created SOLR-5167:
---

 Summary: Ability to use AnalyzingInfixSuggester in Solr
 Key: SOLR-5167
 URL: https://issues.apache.org/jira/browse/SOLR-5167
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Reporter: Varun Thacker
Priority: Minor
 Fix For: 4.5, 5.0


We should be able to use AnalyzingInfixSuggester in Solr by defining it in 
solrconfig.xml



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5149) Query facet to respect mincount

2013-08-16 Thread Markus Jelsma (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742123#comment-13742123
 ] 

Markus Jelsma commented on SOLR-5149:
-

The use cases mostly limit themselves to saving space when we have a large 
amount of facet queries to return. Also, if our different clients toggle 
mincount with one setting but also have facet queries, we need additional code 
maintain the behaviour. This is not a problem, only inconvenient.

Yes, facet.query.mincount sounds fine.

 Query facet to respect mincount
 ---

 Key: SOLR-5149
 URL: https://issues.apache.org/jira/browse/SOLR-5149
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4
Reporter: Markus Jelsma
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-5149-trunk.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5149) Query facet to respect mincount

2013-08-16 Thread Markus Jelsma (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated SOLR-5149:


Attachment: SOLR-5149-trunk.patch

Patch for trunk now introduces patch.query.mincount. There's no support for 
facet.zeros in this patch.

 Query facet to respect mincount
 ---

 Key: SOLR-5149
 URL: https://issues.apache.org/jira/browse/SOLR-5149
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4
Reporter: Markus Jelsma
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-5149-trunk.patch, SOLR-5149-trunk.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5164) Can not create a collection via collections API (cloud mode)

2013-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742134#comment-13742134
 ] 

ASF subversion and git services commented on SOLR-5164:
---

Commit 1514666 from [~erickoerickson] in branch 'dev/trunk'
[ https://svn.apache.org/r1514666 ]

SOLR-5164, Can not create a collection via collections API (cloud mode). Fixes 
SOLR-5099 too

 Can not create a collection via collections API (cloud mode)
 

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker
 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5099) The core.properties not created during collection creation

2013-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742135#comment-13742135
 ] 

ASF subversion and git services commented on SOLR-5099:
---

Commit 1514666 from [~erickoerickson] in branch 'dev/trunk'
[ https://svn.apache.org/r1514666 ]

SOLR-5164, Can not create a collection via collections API (cloud mode). Fixes 
SOLR-5099 too

 The core.properties not created during collection creation
 --

 Key: SOLR-5099
 URL: https://issues.apache.org/jira/browse/SOLR-5099
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Herb Jiang
Assignee: Erick Erickson
Priority: Critical
 Fix For: 4.5, 5.0

 Attachments: CorePropertiesLocator.java.patch


 When using the new solr.xml structure. The core auto discovery mechanism 
 trying to find core.properties. 
 But I found the core.properties cannot be create when I dynamically create a 
 collection.
 The root issue is the CorePropertiesLocator trying to create properties 
 before the instanceDir is created. 
 And collection creation process will done and looks fine at runtime, but it 
 will cause issues (cores are not auto discovered after server restart).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5167) Ability to use AnalyzingInfixSuggester in Solr

2013-08-16 Thread Varun Thacker (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742139#comment-13742139
 ] 

Varun Thacker commented on SOLR-5167:
-

We could define it like:
{noformat}
searchComponent class=solr.SpellCheckComponent name=suggest
  lst name=spellchecker
str name=namesuggest/str
str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str 
name=lookupImplorg.apache.solr.spelling.suggest.AnalyzingInfixSuggester/str
str name=fieldname/str  !-- the indexed field to derive suggestions 
from --
str name=buildOnCommittrue/str
str name=storeDirsuggester/str
str name=suggestAnalyzerFieldTypetext_general/str
str name=minPrefixChars4/str
  /lst
/searchComponent
{noformat}

 Ability to use AnalyzingInfixSuggester in Solr
 --

 Key: SOLR-5167
 URL: https://issues.apache.org/jira/browse/SOLR-5167
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Reporter: Varun Thacker
Priority: Minor
 Fix For: 4.5, 5.0


 We should be able to use AnalyzingInfixSuggester in Solr by defining it in 
 solrconfig.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k

2013-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742145#comment-13742145
 ] 

ASF subversion and git services commented on LUCENE-4583:
-

Commit 1514669 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1514669 ]

LUCENE-4583: IndexWriter no longer places a limit on length of DV binary fields 
(individual codecs still have their limits, including the default codec)

 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Assignee: Michael McCandless
Priority: Critical
 Fix For: 5.0, 4.5

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5167) Ability to use AnalyzingInfixSuggester in Solr

2013-08-16 Thread Varun Thacker (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated SOLR-5167:


Attachment: SOLR-5167.patch

I have a few doubts over this impl.

1. AnalyzingInfixSuggester.store() and AnalyzingInfixSuggester.load() return 
true instead of false. Not sure if this is the right?

2. Suggester.reload() throws a FileNotFoundException since no file actually 
gets written. Any suggestions on what the right approach for this would be.

 Ability to use AnalyzingInfixSuggester in Solr
 --

 Key: SOLR-5167
 URL: https://issues.apache.org/jira/browse/SOLR-5167
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Reporter: Varun Thacker
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-5167.patch


 We should be able to use AnalyzingInfixSuggester in Solr by defining it in 
 solrconfig.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5178) doc values should expose missing values (or allow configurable defaults)

2013-08-16 Thread Yonik Seeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated LUCENE-5178:
-

Summary: doc values should expose missing values (or allow configurable 
defaults)  (was: doc values should allow configurable defaults)

 doc values should expose missing values (or allow configurable defaults)
 

 Key: LUCENE-5178
 URL: https://issues.apache.org/jira/browse/LUCENE-5178
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Yonik Seeley

 DocValues should somehow allow a configurable default per-field.
 Possible implementations include setting it on the field in the document or 
 registration of an IndexWriter callback.
 If we don't make the default configurable, then another option is to have 
 DocValues fields keep track of whether a value was indexed for that document 
 or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5099) The core.properties not created during collection creation

2013-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742163#comment-13742163
 ] 

ASF subversion and git services commented on SOLR-5099:
---

Commit 1514684 from [~erickoerickson] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1514684 ]

SOLR-5164, Can not create a collection via collections API (cloud mode). Fixes 
SOLR-5099 too

 The core.properties not created during collection creation
 --

 Key: SOLR-5099
 URL: https://issues.apache.org/jira/browse/SOLR-5099
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Herb Jiang
Assignee: Erick Erickson
Priority: Critical
 Fix For: 4.5, 5.0

 Attachments: CorePropertiesLocator.java.patch


 When using the new solr.xml structure. The core auto discovery mechanism 
 trying to find core.properties. 
 But I found the core.properties cannot be create when I dynamically create a 
 collection.
 The root issue is the CorePropertiesLocator trying to create properties 
 before the instanceDir is created. 
 And collection creation process will done and looks fine at runtime, but it 
 will cause issues (cores are not auto discovered after server restart).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5164) Can not create a collection via collections API (cloud mode)

2013-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742162#comment-13742162
 ] 

ASF subversion and git services commented on SOLR-5164:
---

Commit 1514684 from [~erickoerickson] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1514684 ]

SOLR-5164, Can not create a collection via collections API (cloud mode). Fixes 
SOLR-5099 too

 Can not create a collection via collections API (cloud mode)
 

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker
 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-5164) Can not create a collection via collections API (cloud mode)

2013-08-16 Thread Erick Erickson (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-5164.
--

   Resolution: Fixed
Fix Version/s: 5.0
   4.5

 Can not create a collection via collections API (cloud mode)
 

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker
 Fix For: 4.5, 5.0

 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5178) doc values should expose missing values (or allow configurable defaults)

2013-08-16 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742182#comment-13742182
 ] 

Yonik Seeley commented on LUCENE-5178:
--

Yes, I think tracking/exposing missing values is the best option,  esp for 
numerics where you can use the full range and still tell of there was a value 
or not.

 doc values should expose missing values (or allow configurable defaults)
 

 Key: LUCENE-5178
 URL: https://issues.apache.org/jira/browse/LUCENE-5178
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Yonik Seeley

 DocValues should somehow allow a configurable default per-field.
 Possible implementations include setting it on the field in the document or 
 registration of an IndexWriter callback.
 If we don't make the default configurable, then another option is to have 
 DocValues fields keep track of whether a value was indexed for that document 
 or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5178) doc values should expose missing values (or allow configurable defaults)

2013-08-16 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742186#comment-13742186
 ] 

Robert Muir commented on LUCENE-5178:
-

OK. I can remove the solr defaultValue check here too: i have to fix the tests 
to test sort missing first/last / facet missing etc anyway (currently the dv 
tests avoid that).

 doc values should expose missing values (or allow configurable defaults)
 

 Key: LUCENE-5178
 URL: https://issues.apache.org/jira/browse/LUCENE-5178
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Yonik Seeley

 DocValues should somehow allow a configurable default per-field.
 Possible implementations include setting it on the field in the document or 
 registration of an IndexWriter callback.
 If we don't make the default configurable, then another option is to have 
 DocValues fields keep track of whether a value was indexed for that document 
 or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4718) Allow solr.xml to be stored in zookeeper

2013-08-16 Thread Erick Erickson (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-4718:
-

Attachment: SOLR-4718.patch

Alan's patch with some modifications and with the new test cases.

 Allow solr.xml to be stored in zookeeper
 

 Key: SOLR-4718
 URL: https://issues.apache.org/jira/browse/SOLR-4718
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.3, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-4718-alternative.patch, SOLR-4718.patch, 
 SOLR-4718.patch, SOLR-4718.patch, SOLR-4718.patch, SOLR-4718.patch, 
 SOLR-4718.patch


 So the near-final piece of this puzzle is to make solr.xml be storable in 
 Zookeeper. Code-wise in terms of Solr, this doesn't look very difficult, I'm 
 working on it now.
 More interesting is how to get the configuration into ZK in the first place, 
 enhancements to ZkCli? Or boostrap-conf? Other? I'm punting on that for this 
 patch.
 Second level is how to tell Solr to get the file from ZK. Some possibilities:
 1 A system prop, -DzkSolrXmlPath=blah where blah is the path _on zk_ where 
 the file is. Would require -DzkHost or -DzkRun as well.
pros - simple, I can wrap my head around it.
  - easy to script
cons - can't run multiple JVMs pointing to different files. Is this 
 really a problem?
 2 New solr.xml element. Something like:
 solr
   solrcloud
  str name=zkHostzkurl/str
  str name=zkSolrXmlPathwhatever/str
   /solrcloud
 solr
Really, this form would hinge on the presence or absence of zkSolrXmlPath. 
 If present, go up and look for the indicated solr.xml file on ZK. Any 
 properties in the ZK version would overwrite anything in the local copy.
 NOTE: I'm really not very interested in supporting this as an option for 
 old-style solr.xml unless it's _really_ easy. For instance, what if the local 
 solr.xml is new-style and the one in ZK is old-style? Or vice-versa? Since 
 old-style is going away, this doesn't seem like it's worth the effort.
 pros - No new mechanisms
 cons - once again requires that there be a solr.xml file on each client. 
 Admittedly for installations that didn't care much about multiple JVMs, it 
 could be a stock file that didn't change...
 For now, I'm going to just manually push solr.xml to ZK, then read it based 
 on a sysprop. That'll get the structure in place while we debate. Not going 
 to check this in until there's some consensus though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5168) ByteSliceReader assert trips with 32-bit oracle 1.7.0_25 + G1GC

2013-08-16 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742233#comment-13742233
 ] 

Robert Muir commented on LUCENE-5168:
-

Out of curiosity, were those failures also with G1GC?

 ByteSliceReader assert trips with 32-bit oracle 1.7.0_25 + G1GC
 ---

 Key: LUCENE-5168
 URL: https://issues.apache.org/jira/browse/LUCENE-5168
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: java8-windows-4x-3075-console.txt, log.0025, log.0042, 
 log.0078, log.0086, log.0100


 This assertion trips (sometimes from different tests), if you run the 
 highlighting tests on branch_4x with r1512807.
 It reproduces about half the time, always only with 32bit + G1GC (other 
 combinations do not seem to trip it, i didnt try looping or anything really 
 though).
 {noformat}
 rmuir@beast:~/workspace/branch_4x$ svn up -r 1512807
 rmuir@beast:~/workspace/branch_4x$ ant clean
 rmuir@beast:~/workspace/branch_4x$ rm -rf .caches #this is important,
 otherwise master seed does not work!
 rmuir@beast:~/workspace/branch_4x/lucene/highlighter$ ant test
 -Dtests.jvms=2 -Dtests.seed=EBBFA6F4E80A7365 -Dargs=-server
 -XX:+UseG1GC
 {noformat}
 Originally showed up like this:
 {noformat}
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6874/
 Java: 32bit/jdk1.7.0_25 -server -XX:+UseG1GC
 1 tests failed.
 REGRESSION:  
 org.apache.lucene.search.postingshighlight.TestPostingsHighlighter.testUserFailedToIndexOffsets
 Error Message:
 Stack Trace:
 java.lang.AssertionError
 at 
 __randomizedtesting.SeedInfo.seed([EBBFA6F4E80A7365:1FBF811885F2D611]:0)
 at 
 org.apache.lucene.index.ByteSliceReader.readByte(ByteSliceReader.java:73)
 at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
 at 
 org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:453)
 at 
 org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
 at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
 at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
 at 
 org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
 at 
 org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:501)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5156) Provide a way to move the contents of a file to ZooKeeper with ZkCLI

2013-08-16 Thread Erick Erickson (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-5156:
-

Attachment: SOLR-5156.patch

I'll commit this shortly

 Provide a way to move the contents of a file to ZooKeeper with ZkCLI
 

 Key: SOLR-5156
 URL: https://issues.apache.org/jira/browse/SOLR-5156
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-5156.patch, SOLR-5156.patch


 Spinoff from SOLR-4718. We don't have any good way of putting solr.xml up in 
 Zookeeper in the first place. So while we can fake getting the file up there 
 we need a way consistent with ZkCLI

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5179) Refactoring on PostingsWriterBase for delta-encoding

2013-08-16 Thread Han Jiang (JIRA)

Han Jiang created LUCENE-5179:
-

 Summary: Refactoring on PostingsWriterBase for delta-encoding
 Key: LUCENE-5179
 URL: https://issues.apache.org/jira/browse/LUCENE-5179
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Han Jiang
Assignee: Han Jiang
 Fix For: 5.0, 4.5


A further step from LUCENE-5029.

The short story is, previous API change brings two problems:
* it somewhat breaks backward compatibility: although we can still read old 
format,
  we can no longer reproduce it;
* pulsing codec have problem with it.

And long story...

With the change, current PostingsBase API will be like this:

* term dict tells PBF we start a new term (via startTerm());
* PBF adds docs, positions and other postings data;
* term dict tells PBF all the data for current term is completed (via 
finishTerm()),
  then PBF returns the metadata for current term (as long[] and byte[]);
* term dict might buffer all the metadata in an ArrayList. when all the term is 
collected,
  it then decides how those metadata will be located on disk.

So after the API change, PBF no longer have that annoying 'flushTermBlock', and 
instead
term dict maintains the term, metadata list.

However, for each term we'll now write long[] blob before byte[], so the index 
format is not consistent with pre-4.5.
like in Lucne41, the metadata can be written as longA,bytesA,longB, but now we 
have to write as longA,longB,bytesA.

Another problem is, pulsing codec cannot tell wrapped PBF how the metadata is 
delta-encoded, after all
PulsingPostingsWriter is only a PBF.

For example, we have terms=[a, a1, a2, b, b1 b2] and 
itemsInBlock=2, so theoretically
we'll finally have three blocks in BTTR: [a b]  [a1 a2]  [b1 b2], 
with this
approach, the metadata of term b is delta encoded base on metadata of a. 
but when term dict tells
PBF to finishTerm(b), it might silly do the delta encode base on term a2.

So I think maybe we can introduce a method 'encodeTerm(long[], DataOutput out, 
FieldInfo, TermState, boolean absolute)',
so that during metadata flush, we can control how current term is written? And 
the term dict will buffer TermState, which
implicitly holds metadata like we do in PBReader side.

For example, if we want to reproduce old lucene41 format , we can simple set 
longsSize==0, then PBF
writes the old format (longA,bytesA,longB) to DataOutput, and the compatible 
issue is solved.
For pulsing codec, it will also be able to tell lower level how to encode 
metadata.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5168) ByteSliceReader assert trips with 32-bit oracle 1.7.0_25 + G1GC

2013-08-16 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742243#comment-13742243
 ] 

Dawid Weiss commented on LUCENE-5168:
-

Yes. This just has to be complex though because it's not just GC-related. 
Disabling escape analysis also makes the tests pass, so does removing inlining.

I managed to find a reproducible scenario under 1.8 (fastdebug) which is great 
because now I can dump the assembly. It's still terribly large...

Anyway, the blame still seems to point to readvint :) Really, not joking. I 
added a sysout in 
{code}
final int code = freq.readVInt();
{code}
this is consistent when the test passes but when it fails you get a difference:
{code}
// normal run
code::0 true
code::4 true
code::2 true

// error run
code::0 true
code::3 true
code::4 true
{code}



 ByteSliceReader assert trips with 32-bit oracle 1.7.0_25 + G1GC
 ---

 Key: LUCENE-5168
 URL: https://issues.apache.org/jira/browse/LUCENE-5168
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: java8-windows-4x-3075-console.txt, log.0025, log.0042, 
 log.0078, log.0086, log.0100


 This assertion trips (sometimes from different tests), if you run the 
 highlighting tests on branch_4x with r1512807.
 It reproduces about half the time, always only with 32bit + G1GC (other 
 combinations do not seem to trip it, i didnt try looping or anything really 
 though).
 {noformat}
 rmuir@beast:~/workspace/branch_4x$ svn up -r 1512807
 rmuir@beast:~/workspace/branch_4x$ ant clean
 rmuir@beast:~/workspace/branch_4x$ rm -rf .caches #this is important,
 otherwise master seed does not work!
 rmuir@beast:~/workspace/branch_4x/lucene/highlighter$ ant test
 -Dtests.jvms=2 -Dtests.seed=EBBFA6F4E80A7365 -Dargs=-server
 -XX:+UseG1GC
 {noformat}
 Originally showed up like this:
 {noformat}
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6874/
 Java: 32bit/jdk1.7.0_25 -server -XX:+UseG1GC
 1 tests failed.
 REGRESSION:  
 org.apache.lucene.search.postingshighlight.TestPostingsHighlighter.testUserFailedToIndexOffsets
 Error Message:
 Stack Trace:
 java.lang.AssertionError
 at 
 __randomizedtesting.SeedInfo.seed([EBBFA6F4E80A7365:1FBF811885F2D611]:0)
 at 
 org.apache.lucene.index.ByteSliceReader.readByte(ByteSliceReader.java:73)
 at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
 at 
 org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:453)
 at 
 org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
 at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
 at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
 at 
 org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
 at 
 org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:501)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

1 2 >

1 - 100 of 112 matches

Mail list logo