[JENKINS] Lucene-Solr-trunk-Windows ([[ Exception while replacing ENV. Please report this as a bug. ]]

2013-01-18 Thread Policeman Jenkins Server
{{ java.lang.NullPointerException }})
 - Build # 2421 - Failure!
MIME-Version: 1.0
Content-Type: multipart/mixed; 
boundary==_Part_62_913129272.1358497357747
X-Jenkins-Job: Lucene-Solr-trunk-Windows
X-Jenkins-Result: FAILURE
Precedence: bulk

--=_Part_62_913129272.1358497357747
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/2421/
Java: [[ Exception while replacing ENV. Please report this as a bug. ]]
{{ java.lang.NullPointerException }}

No tests ran.

Build Log:
[...truncated 10 lines...]
FATAL: hudson.remoting.RequestAbortedException: java.net.SocketException: 
Connection reset
hudson.remoting.RequestAbortedException: 
hudson.remoting.RequestAbortedException: java.net.SocketException: Connection 
reset
at hudson.remoting.Request.call(Request.java:174)
at hudson.remoting.Channel.call(Channel.java:672)
at hudson.FilePath.act(FilePath.java:841)
at hudson.FilePath.act(FilePath.java:825)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:771)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:713)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1325)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:682)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:587)
at hudson.model.Run.execute(Run.java:1543)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:236)
Caused by: hudson.remoting.RequestAbortedException: java.net.SocketException: 
Connection reset
at hudson.remoting.Request.abort(Request.java:299)
at hudson.remoting.Channel.terminate(Channel.java:732)
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at 
java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2248)
at 
java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2541)
at 
java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2551)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
at hudson.remoting.Command.readFrom(Command.java:92)
at 
hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)


--=_Part_62_913129272.1358497357747--

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Enable javadoc check on Solr too

2013-01-18 Thread Tommaso Teofili
...and surely working in a branch as suggested by Alan would be a good idea
:-)
Thanks Alan,
Tommaso


2013/1/18 Tommaso Teofili tommaso.teof...@gmail.com

 I see Yonik and Jack's points which look reasonable, but, at least for my
 experience, even if Solr is meant to be a server it often happens that
 developers (not necessarily plugins' developers) have to go deep into the
 code in order to understand how actually things work under the hood / fix
 bugs / etc. and I think that would really help.
 Also that should help our users feel more comfortable while browsing the
 Solr code which I think is important.
 Wrapping up I think introducing such check couldn't harm but just improve
 the overall quality of the project so I think it'd be worth the effort.

 My 2 cents,
 Tommaso


 2013/1/18 Jack Krupansky j...@basetechnology.com

 To the degree that people are using Solr merely as a server, that's fine.
 I think the main issue are the touch points of Solr that relate to
 user-developed plugins. The parts of Solr that invoke user plugins and that
 user plugins invoke should have Grade A Prime Javadoc, if for no other
 reason than that Eclipse is a friendly environment for developing and
 testing plugins.

 -- Jack Krupansky

 -Original Message- From: Yonik Seeley
 Sent: Thursday, January 17, 2013 12:42 PM
 To: dev@lucene.apache.org
 Subject: Re: [DISCUSS] Enable javadoc check on Solr too


 Solr is in a different scenario though - the primary use case is to
 run as a server.   The majority of the java code is implementation to
 support that.  I personally don't refer to javadoc (by itself) during
 development - so normal comments work just as well.  Documentation of
 methods should be on an as-needed basis, not mandated everywhere.

 -Yonik
 http://lucidworks.com

 On Thu, Jan 17, 2013 at 11:44 AM, Tommaso Teofili
 tommaso.teof...@gmail.com wrote:

 Hi all,

 What do you think about (re) enabling javadoc check for Solr build too?
 At start it may be a little annoying (since a lot of Solr code misses
 proper
 javadoc thus we may have lots of failing builds) but that should turn in
 being a very useful thing for devs once that's fixed and we keep adding
 javadocs along with checked in code.

 So basically that should just use current Lucene's task for checking
 javadoc
 and make the build fail if there's any missing javadoc.
 We could add that as soon as 4.1 is out.

 What do you think?
 Regards,
 Tommaso


 --**--**-
 To unsubscribe, e-mail: 
 dev-unsubscribe@lucene.apache.**orgdev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

 --**--**-
 To unsubscribe, e-mail: 
 dev-unsubscribe@lucene.apache.**orgdev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





[jira] [Created] (LUCENE-4696) Allow SpanNearQuery to take a BooleanQuery.

2013-01-18 Thread Michel Conrad (JIRA)
Michel Conrad created LUCENE-4696:
-

 Summary: Allow SpanNearQuery to take a BooleanQuery.
 Key: LUCENE-4696
 URL: https://issues.apache.org/jira/browse/LUCENE-4696
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/search
Affects Versions: 4.0
Reporter: Michel Conrad


Currently SpanNearQuery can only take other SpanQuery objects, which include 
spans, span term and span wrapped multi-term queries, but not Boolean queries.

By allowing a Boolean query to added to a SpanNearQuery, we can add f.i. queries
that come from a QueryParser and which can not be easily transformed in the 
corresponding span objects.

The main use case here is to find the intersection between two sets of results 
with the additional restriction that the matched terms from the different 
queries should be near one another.




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2013-01-18 Thread Dmitry Kan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557053#comment-13557053
 ] 

Dmitry Kan commented on SOLR-1604:
--

Hello! Great work!

I have two questions:

1) What would it take to incorporate phrase searches into this extended query 
parser?
\a b\ c~100
that is, a b (phrase search) is found in that order and exactly side by side 
=100 tokens away from c.

2) does this implementation support the Boolean operators, like AND, OR, NOT 
(at least OR and NOT are supported as far as I can see)? Can they be nested?

 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: query parsers, search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Priority: Minor
 Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, 
 ComplexPhraseQueryParser.java, ComplexPhrase_solr_3.4.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, SOLR-1604-alternative.patch, SOLR-1604.patch, 
 SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4043) Add scoring support for query time join

2013-01-18 Thread David vandendriessche (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556033#comment-13556033
 ] 

David vandendriessche edited comment on LUCENE-4043 at 1/18/13 9:47 AM:


Hi, Is this possible to use in solr I tried setting {!scoreMode=Avg}, but it 
doesn't seem to have any effect.

  was (Author: davidvdd):
Hi is there any chance that this might work with multiple cores using the 
fromIndex???
  
 Add scoring support for query time join
 ---

 Key: LUCENE-4043
 URL: https://issues.apache.org/jira/browse/LUCENE-4043
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/join
Reporter: Martijn van Groningen
 Fix For: 4.0-ALPHA

 Attachments: LUCENE-4043.patch, LUCENE-4043.patch, LUCENE-4043.patch, 
 LUCENE-4043.patch


 Have similar scoring for query time joining just like the index time block 
 join (with the score mode).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4307) Solr join scoring

2013-01-18 Thread David vandendriessche (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David vandendriessche updated SOLR-4307:


Summary: Solr join scoring  (was: eDismax cross-core query support (and 
scoring))

 Solr join scoring
 -

 Key: SOLR-4307
 URL: https://issues.apache.org/jira/browse/SOLR-4307
 Project: Solr
  Issue Type: Wish
  Components: multicore, query parsers
 Environment: I'm using Solr 4.0.0
Reporter: David vandendriessche
  Labels: java, solr

 I would like to have cross-core eDismax query support. (for the fromIndex 
 query)
 Example:
 q=   {!join fromIndex=PageCore from=docId to=fileId}pageTxt: little red 
 riding hood
 defType= edismax
 qf=  pageTxt
 When this Query is entered it only queries:pageTxt:little
 Even when I set the defType to edismax.
 I know I could change the query to:
 (pageTxt: little) AND (pageTxt:red) AND (pageTxt:riding) AND (pageTxt:hood)
 But as far as I know this doesn't score documents etc,...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4307) Solr join scoring

2013-01-18 Thread David vandendriessche (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David vandendriessche updated SOLR-4307:


Description: 
I would like to have cross-core eDismax query support. (for the fromIndex query)

Example:

q=   {!join  from=docId to=fileId}pageTxt:test123

defType= edismax
qf=  pageTxt









  was:
I would like to have cross-core eDismax query support. (for the fromIndex query)

Example:

q=   {!join fromIndex=PageCore from=docId to=fileId}pageTxt: little red 
riding hood

defType= edismax
qf=  pageTxt


When this Query is entered it only queries:pageTxt:little

Even when I set the defType to edismax.

I know I could change the query to:

(pageTxt: little) AND (pageTxt:red) AND (pageTxt:riding) AND (pageTxt:hood)

But as far as I know this doesn't score documents etc,...








 Solr join scoring
 -

 Key: SOLR-4307
 URL: https://issues.apache.org/jira/browse/SOLR-4307
 Project: Solr
  Issue Type: Wish
  Components: multicore, query parsers
 Environment: I'm using Solr 4.0.0
Reporter: David vandendriessche
  Labels: java, solr

 I would like to have cross-core eDismax query support. (for the fromIndex 
 query)
 Example:
 q=   {!join  from=docId to=fileId}pageTxt:test123
 defType= edismax
 qf=  pageTxt

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4307) Solr join scoring

2013-01-18 Thread David vandendriessche (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David vandendriessche updated SOLR-4307:


Description: 
I would like to have cross-core eDismax query support. (for the fromIndex query)

Example:

q={!join  from=docId to=fileId}pageTxt:test-123

Add queryTimeJoining to solr.









  was:
I would like to have cross-core eDismax query support. (for the fromIndex query)

Example:

q=   {!join  from=docId to=fileId}pageTxt:test123

defType= edismax
qf=  pageTxt










 Solr join scoring
 -

 Key: SOLR-4307
 URL: https://issues.apache.org/jira/browse/SOLR-4307
 Project: Solr
  Issue Type: Wish
  Components: multicore, query parsers
 Environment: I'm using Solr 4.0.0
Reporter: David vandendriessche
  Labels: java, solr

 I would like to have cross-core eDismax query support. (for the fromIndex 
 query)
 Example:
 q={!join  from=docId to=fileId}pageTxt:test-123
 Add queryTimeJoining to solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4307) Solr join scoring

2013-01-18 Thread David vandendriessche (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David vandendriessche updated SOLR-4307:


Component/s: (was: multicore)

 Solr join scoring
 -

 Key: SOLR-4307
 URL: https://issues.apache.org/jira/browse/SOLR-4307
 Project: Solr
  Issue Type: Wish
  Components: query parsers
 Environment: I'm using Solr 4.0.0
Reporter: David vandendriessche
  Labels: java, solr

 I would like to have cross-core eDismax query support. (for the fromIndex 
 query)
 Example:
 q={!join  from=docId to=fileId}pageTxt:test-123
 Add queryTimeJoining to solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4307) Solr join scoring

2013-01-18 Thread David vandendriessche (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David vandendriessche updated SOLR-4307:


Description: 
Add queryTimeJoining to solr.

Example:

q={!join  from=docId to=fileId}pageTxt:test-123











  was:
I would like to have cross-core eDismax query support. (for the fromIndex query)

Example:

q={!join  from=docId to=fileId}pageTxt:test-123

Add queryTimeJoining to solr.










 Solr join scoring
 -

 Key: SOLR-4307
 URL: https://issues.apache.org/jira/browse/SOLR-4307
 Project: Solr
  Issue Type: Wish
  Components: query parsers
 Environment: I'm using Solr 4.0.0
Reporter: David vandendriessche
  Labels: java, solr

 Add queryTimeJoining to solr.
 Example:
 q={!join  from=docId to=fileId}pageTxt:test-123

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Deleted] (SOLR-4307) Solr join scoring

2013-01-18 Thread David vandendriessche (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David vandendriessche updated SOLR-4307:


Comment: was deleted

(was: {!join fromIndex=PageCore from=docId to=fileId}{!edismax 
qf=pageTxt}little red 

Seems to get me better results. Is this the correct way to query with join and 
use edismax?)

 Solr join scoring
 -

 Key: SOLR-4307
 URL: https://issues.apache.org/jira/browse/SOLR-4307
 Project: Solr
  Issue Type: Wish
  Components: query parsers
 Environment: I'm using Solr 4.0.0
Reporter: David vandendriessche
  Labels: java, solr

 Add queryTimeJoining to solr.
 Example:
 q={!join  from=docId to=fileId}pageTxt:test-123

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4307) Solr join scoring

2013-01-18 Thread David vandendriessche (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David vandendriessche updated SOLR-4307:


Description: 
Add queryTimeJoining to solr.

Example:

q={!join  from=docId to=fileId}pageTxt:test-123


No scoring on the result just a list of documents that have a match.








  was:
Add queryTimeJoining to solr.

Example:

q={!join  from=docId to=fileId}pageTxt:test-123












 Solr join scoring
 -

 Key: SOLR-4307
 URL: https://issues.apache.org/jira/browse/SOLR-4307
 Project: Solr
  Issue Type: Wish
  Components: query parsers
 Environment: I'm using Solr 4.0.0
Reporter: David vandendriessche
  Labels: java, solr

 Add queryTimeJoining to solr.
 Example:
 q={!join  from=docId to=fileId}pageTxt:test-123
 No scoring on the result just a list of documents that have a match.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Deleted] (SOLR-4307) Solr join scoring

2013-01-18 Thread David vandendriessche (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David vandendriessche updated SOLR-4307:


Comment: was deleted

(was: This is what I wan't in solr.)

 Solr join scoring
 -

 Key: SOLR-4307
 URL: https://issues.apache.org/jira/browse/SOLR-4307
 Project: Solr
  Issue Type: Wish
  Components: query parsers
 Environment: I'm using Solr 4.0.0
Reporter: David vandendriessche
  Labels: java, solr

 Add queryTimeJoining to solr.
 Example:
 q={!join  from=docId to=fileId}pageTxt:test-123
 No scoring on the result just a list of documents that have a match.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4043) Add scoring support for query time join

2013-01-18 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557151#comment-13557151
 ] 

Martijn van Groningen commented on LUCENE-4043:
---

Solr uses a different joining implementation. Which doesn't support mapping the 
scores from the `from` side to the `to` side. If you want to use the Lucene 
joining implementation you could wrap this in a Solr QParserPlugin extension. 

 Add scoring support for query time join
 ---

 Key: LUCENE-4043
 URL: https://issues.apache.org/jira/browse/LUCENE-4043
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/join
Reporter: Martijn van Groningen
 Fix For: 4.0-ALPHA

 Attachments: LUCENE-4043.patch, LUCENE-4043.patch, LUCENE-4043.patch, 
 LUCENE-4043.patch


 Have similar scoring for query time joining just like the index time block 
 join (with the score mode).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4315) Remove useless shardId param in DistributedUpdateProcessor#defensiveChecks

2013-01-18 Thread Tommaso Teofili (JIRA)
Tommaso Teofili created SOLR-4315:
-

 Summary: Remove useless shardId param in 
DistributedUpdateProcessor#defensiveChecks
 Key: SOLR-4315
 URL: https://issues.apache.org/jira/browse/SOLR-4315
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0, 4.1
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Trivial
 Fix For: 4.2, 5.0


DistributedUpdateProcessor#doDefensiveChecks takes the shardId parameter as 
while it's not using it, so that should be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4315) Remove useless shardId param in DistributedUpdateProcessor#defensiveChecks

2013-01-18 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557157#comment-13557157
 ] 

Commit Tag Bot commented on SOLR-4315:
--

[trunk commit] Tommaso Teofili
http://svn.apache.org/viewvc?view=revisionrevision=1435097

[SOLR-4315] - removed useless shardId param from doDefensiveChecks method


 Remove useless shardId param in DistributedUpdateProcessor#defensiveChecks
 --

 Key: SOLR-4315
 URL: https://issues.apache.org/jira/browse/SOLR-4315
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0, 4.1
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Trivial
 Fix For: 4.2, 5.0


 DistributedUpdateProcessor#doDefensiveChecks takes the shardId parameter as 
 while it's not using it, so that should be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4570) release policeman tools?

2013-01-18 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-4570:
--

Attachment: LUCENE-4570.patch

The forbidden-api checker is now available on sonatype-snapshots with the 
maven-coordinates:
{noformat}
groupId=de.thetaphi
artifactId=forbiddenapis
version=1.0-SNAPSHOT
{noformat}

Attached is a patch for Lucene trunk, removing the forbidden api checker from 
checkout and use the snapshot version. To enable the download of snapshots, I 
added for now (until it is released) the sonatype-snapshots repo to 
ivy-settings.xml.

There is some cleanup needed in the patch:
- It somehow relies on tools compiled, otherwise some properties are not 
defined, to locate the txt files. This can be solved by placing the not-bundled 
lucene-specific signature files outside tools (where its no longer need to be). 
Just place solr ones in solr and lucene ones in lucene.
- I have to review the API files and also move e.g. commons-io.txt to the 
checker JAR file, so we have more bundled signatures and dont need to maintain 
them inside lucene. This of course does not apply to specific solr/lucene ones 
to prevent specific test patterns.


 release policeman tools?
 

 Key: LUCENE-4570
 URL: https://issues.apache.org/jira/browse/LUCENE-4570
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Robert Muir
Assignee: Uwe Schindler
 Attachments: LUCENE-4570.patch


 Currently there is source code in lucene/tools/src (e.g. Forbidden APIs 
 checker ant task).
 It would be convenient if you could download this thing in your ant build 
 from ivy (especially if maybe it included our definitions .txt files as 
 resources).
 In general checking for locale/charset violations in this way is a pretty 
 general useful thing for a server-side app.
 Can we either release lucene-tools.jar as an artifact, or maybe alternatively 
 move this somewhere else as a standalone project and suck it in ourselves?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #219: POMs out of sync

2013-01-18 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/219/

1 tests failed.
REGRESSION:  org.apache.solr.cloud.SyncSliceTest.testDistribSearch

Error Message:
shard1 should have just been set up to be inconsistent - but it's still 
consistent

Stack Trace:
java.lang.AssertionError: shard1 should have just been set up to be 
inconsistent - but it's still consistent
at 
__randomizedtesting.SeedInfo.seed([B854E2C42793B547:39B26CDC50CCD57B]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertNotNull(Assert.java:526)
at org.apache.solr.cloud.SyncSliceTest.doTest(SyncSliceTest.java:214)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:794)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
   

[jira] [Commented] (LUCENE-4570) release policeman tools?

2013-01-18 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557210#comment-13557210
 ] 

Uwe Schindler commented on LUCENE-4570:
---

By the way: The new checker finds use of a deprecated API, that was missing 
from the hand-made jdk-deprecated.txt: File.toURL(). Its used at three places 
in analyzers - which is a bummer, because it will prevent using those analyzers 
on configs where the lucene files are in a directory with e.g. umlauts or other 
special symbols (see deprecated message).

 release policeman tools?
 

 Key: LUCENE-4570
 URL: https://issues.apache.org/jira/browse/LUCENE-4570
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Robert Muir
Assignee: Uwe Schindler
 Attachments: LUCENE-4570.patch


 Currently there is source code in lucene/tools/src (e.g. Forbidden APIs 
 checker ant task).
 It would be convenient if you could download this thing in your ant build 
 from ivy (especially if maybe it included our definitions .txt files as 
 resources).
 In general checking for locale/charset violations in this way is a pretty 
 general useful thing for a server-side app.
 Can we either release lucene-tools.jar as an artifact, or maybe alternatively 
 move this somewhere else as a standalone project and suck it in ourselves?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader/replica

2013-01-18 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557211#comment-13557211
 ] 

Markus Jelsma commented on SOLR-4260:
-

I've removed domain_b from the index and as i expected the numDocs is now 
inconsistent indeed. By coincidence the what was missing in one replica from 
domain_a was replaced by an extra doc from domain_b and vice versa.

The collection of a couple of million records has one replica that's missing 
one document.

 Inconsistent numDocs between leader/replica
 ---

 Key: SOLR-4260
 URL: https://issues.apache.org/jira/browse/SOLR-4260
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
 Environment: 5.0.0.2013.01.04.15.31.51
Reporter: Markus Jelsma
Priority: Critical
 Fix For: 5.0


 After wiping all cores and reindexing some 3.3 million docs from Nutch using 
 CloudSolrServer we see inconsistencies between the leader and replica for 
 some shards.
 Each core hold about 3.3k documents. For some reason 5 out of 10 shards have 
 a small deviation in then number of documents. The leader and slave deviate 
 for roughly 10-20 documents, not more.
 Results hopping ranks in the result set for identical queries got my 
 attention, there were small IDF differences for exactly the same record 
 causing a record to shift positions in the result set. During those tests no 
 records were indexed. Consecutive catch all queries also return different 
 number of numDocs.
 We're running a 10 node test cluster with 10 shards and a replication factor 
 of two and frequently reindex using a fresh build from trunk. I've not seen 
 this issue for quite some time until a few days ago.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4570) release policeman tools?

2013-01-18 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557210#comment-13557210
 ] 

Uwe Schindler edited comment on LUCENE-4570 at 1/18/13 1:54 PM:


By the way: The new checker finds use of a deprecated API, that was missing 
from the hand-made jdk-deprecated.txt: File.toURL(). Its used at three places 
in analyzers - which is a bummer, because it will prevent using those analyzers 
on configs where the lucene files are in a directory with e.g. umlauts or other 
special symbols (see deprecated message).

Here the message:
{noformat}
-check-forbidden-jdk-apis:
[forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.6
[forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.6
[forbidden-apis] Reading API signatures: C:\Users\Uwe 
Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\executors.txt
[forbidden-apis] Loading classes to check...
[forbidden-apis] Scanning for API signatures and dependencies...
[forbidden-apis] Forbidden method invocation: java.io.File#toURL() [Deprecated 
in Java 1.6]
[forbidden-apis]   in 
org.apache.lucene.analysis.compound.hyphenation.PatternParser 
(PatternParser.java:101)
[forbidden-apis] Forbidden method invocation: java.io.File#toURL() [Deprecated 
in Java 1.6]
[forbidden-apis]   in 
org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter 
(HyphenationCompoundWordTokenFilter.java:151)
[forbidden-apis] Forbidden method invocation: java.io.File#toURL() [Deprecated 
in Java 1.6]
[forbidden-apis]   in 
org.apache.lucene.analysis.compound.hyphenation.HyphenationTree 
(HyphenationTree.java:114)
[forbidden-apis] Scanned 5468 (and 432 related) class file(s) for forbidden API 
invocations (in 2.29s), 3 error(s).
{noformat}

  was (Author: thetaphi):
By the way: The new checker finds use of a deprecated API, that was missing 
from the hand-made jdk-deprecated.txt: File.toURL(). Its used at three places 
in analyzers - which is a bummer, because it will prevent using those analyzers 
on configs where the lucene files are in a directory with e.g. umlauts or other 
special symbols (see deprecated message).
  
 release policeman tools?
 

 Key: LUCENE-4570
 URL: https://issues.apache.org/jira/browse/LUCENE-4570
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Robert Muir
Assignee: Uwe Schindler
 Attachments: LUCENE-4570.patch


 Currently there is source code in lucene/tools/src (e.g. Forbidden APIs 
 checker ant task).
 It would be convenient if you could download this thing in your ant build 
 from ivy (especially if maybe it included our definitions .txt files as 
 resources).
 In general checking for locale/charset violations in this way is a pretty 
 general useful thing for a server-side app.
 Can we either release lucene-tools.jar as an artifact, or maybe alternatively 
 move this somewhere else as a standalone project and suck it in ourselves?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4315) Remove useless shardId param in DistributedUpdateProcessor#defensiveChecks

2013-01-18 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557215#comment-13557215
 ] 

Commit Tag Bot commented on SOLR-4315:
--

[branch_4x commit] Tommaso Teofili
http://svn.apache.org/viewvc?view=revisionrevision=1435137

[SOLR-4315] - merged back to branch_4x


 Remove useless shardId param in DistributedUpdateProcessor#defensiveChecks
 --

 Key: SOLR-4315
 URL: https://issues.apache.org/jira/browse/SOLR-4315
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0, 4.1
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Trivial
 Fix For: 4.2, 5.0


 DistributedUpdateProcessor#doDefensiveChecks takes the shardId parameter as 
 while it's not using it, so that should be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4315) Remove useless shardId param in DistributedUpdateProcessor#defensiveChecks

2013-01-18 Thread Tommaso Teofili (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommaso Teofili resolved SOLR-4315.
---

Resolution: Fixed

 Remove useless shardId param in DistributedUpdateProcessor#defensiveChecks
 --

 Key: SOLR-4315
 URL: https://issues.apache.org/jira/browse/SOLR-4315
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0, 4.1
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Trivial
 Fix For: 4.2, 5.0


 DistributedUpdateProcessor#doDefensiveChecks takes the shardId parameter as 
 while it's not using it, so that should be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4570) release policeman tools?

2013-01-18 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557227#comment-13557227
 ] 

Uwe Schindler commented on LUCENE-4570:
---

I fixed the violations for now...

 release policeman tools?
 

 Key: LUCENE-4570
 URL: https://issues.apache.org/jira/browse/LUCENE-4570
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Robert Muir
Assignee: Uwe Schindler
 Attachments: LUCENE-4570.patch


 Currently there is source code in lucene/tools/src (e.g. Forbidden APIs 
 checker ant task).
 It would be convenient if you could download this thing in your ant build 
 from ivy (especially if maybe it included our definitions .txt files as 
 resources).
 In general checking for locale/charset violations in this way is a pretty 
 general useful thing for a server-side app.
 Can we either release lucene-tools.jar as an artifact, or maybe alternatively 
 move this somewhere else as a standalone project and suck it in ourselves?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4677) Use vInt to encode node addresses inside FST

2013-01-18 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557228#comment-13557228
 ] 

Commit Tag Bot commented on LUCENE-4677:


[branch_4x commit] Robert Muir
http://svn.apache.org/viewvc?view=revisionrevision=1435141

LUCENE-4677, LUCENE-4682, LUCENE-4678, LUCENE-3298: Merged 
/lucene/dev/trunk:r1432459,1432466,1432472,1432474,1432522,1432646,1433026,1433109


 Use vInt to encode node addresses inside FST
 

 Key: LUCENE-4677
 URL: https://issues.apache.org/jira/browse/LUCENE-4677
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.2, 5.0

 Attachments: LUCENE-4677.patch, LUCENE-4677.patch, LUCENE-4677.patch


 Today we use int, but towards enabling  2.1G sized FSTs, I'd like to make 
 this vInt instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4682) Reduce wasted bytes in FST due to array arcs

2013-01-18 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557229#comment-13557229
 ] 

Commit Tag Bot commented on LUCENE-4682:


[branch_4x commit] Robert Muir
http://svn.apache.org/viewvc?view=revisionrevision=1435141

LUCENE-4677, LUCENE-4682, LUCENE-4678, LUCENE-3298: Merged 
/lucene/dev/trunk:r1432459,1432466,1432472,1432474,1432522,1432646,1433026,1433109


 Reduce wasted bytes in FST due to array arcs
 

 Key: LUCENE-4682
 URL: https://issues.apache.org/jira/browse/LUCENE-4682
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Priority: Minor
 Attachments: kuromoji.wasted.bytes.txt, LUCENE-4682.patch


 When a node is close to the root, or it has many outgoing arcs, the FST 
 writes the arcs as an array (each arc gets N bytes), so we can e.g. bin 
 search on lookup.
 The problem is N is set to the max(numBytesPerArc), so if you have an outlier 
 arc e.g. with a big output, you can waste many bytes for all the other arcs 
 that didn't need so many bytes.
 I generated Kuromoji's FST and found it has 271187 wasted bytes vs total size 
 1535612 = ~18% wasted.
 It would be nice to reduce this.
 One thing we could do without packing is: in addNode, if we detect that 
 number of wasted bytes is above some threshold, then don't do the expansion.
 Another thing, if we are packing: we could record stats in the first pass 
 about which nodes wasted the most, and then in the second pass (paack) we 
 could set the threshold based on the top X% nodes that waste ...
 Another idea is maybe to deref large outputs, so that the numBytesPerArc is 
 more uniform ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]

2013-01-18 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557230#comment-13557230
 ] 

Commit Tag Bot commented on LUCENE-4678:


[branch_4x commit] Robert Muir
http://svn.apache.org/viewvc?view=revisionrevision=1435141

LUCENE-4677, LUCENE-4682, LUCENE-4678, LUCENE-3298: Merged 
/lucene/dev/trunk:r1432459,1432466,1432472,1432474,1432522,1432646,1433026,1433109


 FST should use paged byte[] instead of single contiguous byte[]
 ---

 Key: LUCENE-4678
 URL: https://issues.apache.org/jira/browse/LUCENE-4678
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.2, 5.0

 Attachments: LUCENE-4678.patch, LUCENE-4678.patch, LUCENE-4678.patch, 
 LUCENE-4678.patch, LUCENE-4678.patch


 The single byte[] we use today has several limitations, eg it limits us to  
 2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and 
 it causes big RAM spikes during building when a the array has to grow.
 I took basically the same approach as LUCENE-3298, but I want to break out 
 this patch separately from changing all int - long for  2.1 GB support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3298) FST has hard limit max size of 2.1 GB

2013-01-18 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557231#comment-13557231
 ] 

Commit Tag Bot commented on LUCENE-3298:


[branch_4x commit] Robert Muir
http://svn.apache.org/viewvc?view=revisionrevision=1435141

LUCENE-4677, LUCENE-4682, LUCENE-4678, LUCENE-3298: Merged 
/lucene/dev/trunk:r1432459,1432466,1432472,1432474,1432522,1432646,1433026,1433109


 FST has hard limit max size of 2.1 GB
 -

 Key: LUCENE-3298
 URL: https://issues.apache.org/jira/browse/LUCENE-3298
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-3298.patch, LUCENE-3298.patch, LUCENE-3298.patch, 
 LUCENE-3298.patch


 The FST uses a single contiguous byte[] under the hood, which in java is 
 indexed by int so we cannot grow this over Integer.MAX_VALUE.  It also 
 internally encodes references to this array as vInt.
 We could switch this to a paged byte[] and make the far larger.
 But I think this is low priority... I'm not going to work on it any time soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4570) release policeman tools?

2013-01-18 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557232#comment-13557232
 ] 

Commit Tag Bot commented on LUCENE-4570:


[trunk commit] Uwe Schindler
http://svn.apache.org/viewvc?view=revisionrevision=1435146

LUCENE-4570: Fix deprecated API usage (otherwise may lead to bugs if 
Hyphenation filters load files from directories with non-ascii path names)


 release policeman tools?
 

 Key: LUCENE-4570
 URL: https://issues.apache.org/jira/browse/LUCENE-4570
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Robert Muir
Assignee: Uwe Schindler
 Attachments: LUCENE-4570.patch


 Currently there is source code in lucene/tools/src (e.g. Forbidden APIs 
 checker ant task).
 It would be convenient if you could download this thing in your ant build 
 from ivy (especially if maybe it included our definitions .txt files as 
 resources).
 In general checking for locale/charset violations in this way is a pretty 
 general useful thing for a server-side app.
 Can we either release lucene-tools.jar as an artifact, or maybe alternatively 
 move this somewhere else as a standalone project and suck it in ourselves?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4570) release policeman tools?

2013-01-18 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557234#comment-13557234
 ] 

Commit Tag Bot commented on LUCENE-4570:


[branch_4x commit] Uwe Schindler
http://svn.apache.org/viewvc?view=revisionrevision=1435148

Merged revision(s) 1435146 from lucene/dev/trunk:
LUCENE-4570: Fix deprecated API usage (otherwise may lead to bugs if 
Hyphenation filters load files from directories with non-ascii path names)


 release policeman tools?
 

 Key: LUCENE-4570
 URL: https://issues.apache.org/jira/browse/LUCENE-4570
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Robert Muir
Assignee: Uwe Schindler
 Attachments: LUCENE-4570.patch


 Currently there is source code in lucene/tools/src (e.g. Forbidden APIs 
 checker ant task).
 It would be convenient if you could download this thing in your ant build 
 from ivy (especially if maybe it included our definitions .txt files as 
 resources).
 In general checking for locale/charset violations in this way is a pretty 
 general useful thing for a server-side app.
 Can we either release lucene-tools.jar as an artifact, or maybe alternatively 
 move this somewhere else as a standalone project and suck it in ourselves?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4693) FixedBitset might return wrong results if words.length actual words in the bitset

2013-01-18 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557242#comment-13557242
 ] 

Simon Willnauer commented on LUCENE-4693:
-

I will commit this patch if nobody objects

 FixedBitset might return wrong results if words.length  actual words in the 
 bitset
 ---

 Key: LUCENE-4693
 URL: https://issues.apache.org/jira/browse/LUCENE-4693
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.0, 4.1
Reporter: Simon Willnauer
 Fix For: 4.2, 5.0

 Attachments: LUCENE-4693.patch


 Currently we allow to pass in the actual words as a long[] to the FixedBitSet 
 yet if this array is oversized with respect to the actual words it needs to 
 hold the bits the FixedBitSet can return wrong results since we use 
 words.length (bits.lenght) as the bounds when we iterate over the bits ie. if 
 we need to find the next set bit. We should use the actual bound rather than 
 the size of the array. 
 as a site note, I think it would be interesting to explore passing an offset 
 to this too to enable to create bitsets from slices

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4693) FixedBitset might return wrong results if words.length actual words in the bitset

2013-01-18 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557244#comment-13557244
 ] 

Adrien Grand commented on LUCENE-4693:
--

+1

 FixedBitset might return wrong results if words.length  actual words in the 
 bitset
 ---

 Key: LUCENE-4693
 URL: https://issues.apache.org/jira/browse/LUCENE-4693
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.0, 4.1
Reporter: Simon Willnauer
 Fix For: 4.2, 5.0

 Attachments: LUCENE-4693.patch


 Currently we allow to pass in the actual words as a long[] to the FixedBitSet 
 yet if this array is oversized with respect to the actual words it needs to 
 hold the bits the FixedBitSet can return wrong results since we use 
 words.length (bits.lenght) as the bounds when we iterate over the bits ie. if 
 we need to find the next set bit. We should use the actual bound rather than 
 the size of the array. 
 as a site note, I think it would be interesting to explore passing an offset 
 to this too to enable to create bitsets from slices

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4600) Explore facets aggregation during documents collection

2013-01-18 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4600:
---

Attachment: LUCENE-4600.patch

Patch introduces CountingFacetsCollector, very similar to Mike's version, only 
productized.

Made FacetsCollector abstract with a utility create() method which returns 
either CountingFacetsCollector or StandardFacetsCollector (previously, FC), 
given the parameters.

All tests were migrated to use FC.create and all pass (utilizing the new 
collector). Still, I wrote a dedicated test for the new Collector too.

Preliminary results that we have, show nice improvements w/ this Collector. 
Mike, can you paste them here?

There are some nocommits, which I will resolve before committing. But before 
that, I'd like to compare this Collector to ones that use different 
abstractions from the code, e.g. IntDecoder (vs hard-wiring to dgap+vint), 
CategoryListIterator etc.

Also, I also want to compare this Collector to one that in collect() marks a 
bitset, and does all the work in getFacetResults.

 Explore facets aggregation during documents collection
 --

 Key: LUCENE-4600
 URL: https://issues.apache.org/jira/browse/LUCENE-4600
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
 Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch, 
 LUCENE-4600.patch, LUCENE-4600.patch


 Today the facet module simply gathers all hits (as a bitset, optionally with 
 a float[] to hold scores as well, if you will aggregate them) during 
 collection, and then at the end when you call getFacetsResults(), it makes a 
 2nd pass over all those hits doing the actual aggregation.
 We should investigate just aggregating as we collect instead, so we don't 
 have to tie up transient RAM (fairly small for the bit set but possibly big 
 for the float[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4693) FixedBitset might return wrong results if words.length actual words in the bitset

2013-01-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557257#comment-13557257
 ] 

Michael McCandless commented on LUCENE-4693:


+1, but could we change the assert wordLength = bits.length; to a real if 
instead?

 FixedBitset might return wrong results if words.length  actual words in the 
 bitset
 ---

 Key: LUCENE-4693
 URL: https://issues.apache.org/jira/browse/LUCENE-4693
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.0, 4.1
Reporter: Simon Willnauer
 Fix For: 4.2, 5.0

 Attachments: LUCENE-4693.patch


 Currently we allow to pass in the actual words as a long[] to the FixedBitSet 
 yet if this array is oversized with respect to the actual words it needs to 
 hold the bits the FixedBitSet can return wrong results since we use 
 words.length (bits.lenght) as the bounds when we iterate over the bits ie. if 
 we need to find the next set bit. We should use the actual bound rather than 
 the size of the array. 
 as a site note, I think it would be interesting to explore passing an offset 
 to this too to enable to create bitsets from slices

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4600) Explore facets aggregation during documents collection

2013-01-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557263#comment-13557263
 ] 

Michael McCandless commented on LUCENE-4600:


Patch looks great: +1

And this is a healthy speedup, on the Wikipedia 1M / 25 ords per doc test:

{noformat}
TaskQPS base  StdDevQPS comp  StdDev
Pct diff
PKLookup  239.18  (1.5%)  238.87  (1.1%)   
-0.1% (  -2% -2%)
 LowTerm   98.99  (3.1%)  135.95  (1.8%)   
37.3% (  31% -   43%)
HighTerm   20.95  (1.2%)   29.08  (2.4%)   
38.8% (  34% -   42%)
 MedTerm   34.55  (1.5%)   48.31  (2.0%)   
39.8% (  35% -   43%)
{noformat}

 Explore facets aggregation during documents collection
 --

 Key: LUCENE-4600
 URL: https://issues.apache.org/jira/browse/LUCENE-4600
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
 Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch, 
 LUCENE-4600.patch, LUCENE-4600.patch


 Today the facet module simply gathers all hits (as a bitset, optionally with 
 a float[] to hold scores as well, if you will aggregate them) during 
 collection, and then at the end when you call getFacetsResults(), it makes a 
 2nd pass over all those hits doing the actual aggregation.
 We should investigate just aggregating as we collect instead, so we don't 
 have to tie up transient RAM (fairly small for the bit set but possibly big 
 for the float[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4039) MergeIndex on multiple cores impossible with SolrJ

2013-01-18 Thread Olof Jonasson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557266#comment-13557266
 ] 

Olof Jonasson commented on SOLR-4039:
-

This happens in solr4 and we believe this is the problem:

In 
org.apache.solr.client.solrj.request.CoreAdminRequest.MergeIndexes.getParams()

if (srcCores != null) {
  for (String srcCore : srcCores) {
params.set(CoreAdminParams.SRC_CORE, srcCore);
  }
}


param.set overwrites other cores

 MergeIndex on multiple cores impossible with SolrJ
 --

 Key: SOLR-4039
 URL: https://issues.apache.org/jira/browse/SOLR-4039
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6.1
 Environment: Windows
Reporter: Mathieu Gond

 It is not possible to do a mergeIndexes action on multiple cores at the same 
 time with SolrJ.
 Only the last core set in the srcCores parameter is used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4693) FixedBitset might return wrong results if words.length actual words in the bitset

2013-01-18 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557267#comment-13557267
 ] 

Simon Willnauer commented on LUCENE-4693:
-

bq. +1, but could we change the assert wordLength = bits.length; to a real 
if instead?
yeah I will throw a IAE instead! good point

 FixedBitset might return wrong results if words.length  actual words in the 
 bitset
 ---

 Key: LUCENE-4693
 URL: https://issues.apache.org/jira/browse/LUCENE-4693
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.0, 4.1
Reporter: Simon Willnauer
 Fix For: 4.2, 5.0

 Attachments: LUCENE-4693.patch


 Currently we allow to pass in the actual words as a long[] to the FixedBitSet 
 yet if this array is oversized with respect to the actual words it needs to 
 hold the bits the FixedBitSet can return wrong results since we use 
 words.length (bits.lenght) as the bounds when we iterate over the bits ie. if 
 we need to find the next set bit. We should use the actual bound rather than 
 the size of the array. 
 as a site note, I think it would be interesting to explore passing an offset 
 to this too to enable to create bitsets from slices

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4039) MergeIndex on multiple cores impossible with SolrJ

2013-01-18 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4039:
--

Fix Version/s: 5.0
   4.2

 MergeIndex on multiple cores impossible with SolrJ
 --

 Key: SOLR-4039
 URL: https://issues.apache.org/jira/browse/SOLR-4039
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6.1
 Environment: Windows
Reporter: Mathieu Gond
 Fix For: 4.2, 5.0


 It is not possible to do a mergeIndexes action on multiple cores at the same 
 time with SolrJ.
 Only the last core set in the srcCores parameter is used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4693) FixedBitset might return wrong results if words.length actual words in the bitset

2013-01-18 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557275#comment-13557275
 ] 

Commit Tag Bot commented on LUCENE-4693:


[trunk commit] Simon Willnauer
http://svn.apache.org/viewvc?view=revisionrevision=1435191

LUCENE-4693: FixedBitset might return wrong results if words.length  actual 
words in the bitset


 FixedBitset might return wrong results if words.length  actual words in the 
 bitset
 ---

 Key: LUCENE-4693
 URL: https://issues.apache.org/jira/browse/LUCENE-4693
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.0, 4.1
Reporter: Simon Willnauer
 Fix For: 4.2, 5.0

 Attachments: LUCENE-4693.patch


 Currently we allow to pass in the actual words as a long[] to the FixedBitSet 
 yet if this array is oversized with respect to the actual words it needs to 
 hold the bits the FixedBitSet can return wrong results since we use 
 words.length (bits.lenght) as the bounds when we iterate over the bits ie. if 
 we need to find the next set bit. We should use the actual bound rather than 
 the size of the array. 
 as a site note, I think it would be interesting to explore passing an offset 
 to this too to enable to create bitsets from slices

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4693) FixedBitset might return wrong results if words.length actual words in the bitset

2013-01-18 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557282#comment-13557282
 ] 

Commit Tag Bot commented on LUCENE-4693:


[branch_4x commit] Simon Willnauer
http://svn.apache.org/viewvc?view=revisionrevision=1435197

LUCENE-4693: FixedBitset might return wrong results if words.length  actual 
words in the bitset


 FixedBitset might return wrong results if words.length  actual words in the 
 bitset
 ---

 Key: LUCENE-4693
 URL: https://issues.apache.org/jira/browse/LUCENE-4693
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.0, 4.1
Reporter: Simon Willnauer
 Fix For: 4.2, 5.0

 Attachments: LUCENE-4693.patch


 Currently we allow to pass in the actual words as a long[] to the FixedBitSet 
 yet if this array is oversized with respect to the actual words it needs to 
 hold the bits the FixedBitSet can return wrong results since we use 
 words.length (bits.lenght) as the bounds when we iterate over the bits ie. if 
 we need to find the next set bit. We should use the actual bound rather than 
 the size of the array. 
 as a site note, I think it would be interesting to explore passing an offset 
 to this too to enable to create bitsets from slices

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-4039) MergeIndex on multiple cores impossible with SolrJ

2013-01-18 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-4039:
-

Assignee: Mark Miller

 MergeIndex on multiple cores impossible with SolrJ
 --

 Key: SOLR-4039
 URL: https://issues.apache.org/jira/browse/SOLR-4039
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6.1
 Environment: Windows
Reporter: Mathieu Gond
Assignee: Mark Miller
 Fix For: 4.2, 5.0


 It is not possible to do a mergeIndexes action on multiple cores at the same 
 time with SolrJ.
 Only the last core set in the srcCores parameter is used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4693) FixedBitset might return wrong results if words.length actual words in the bitset

2013-01-18 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-4693.
-

Resolution: Fixed

 FixedBitset might return wrong results if words.length  actual words in the 
 bitset
 ---

 Key: LUCENE-4693
 URL: https://issues.apache.org/jira/browse/LUCENE-4693
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.0, 4.1
Reporter: Simon Willnauer
 Fix For: 4.2, 5.0

 Attachments: LUCENE-4693.patch


 Currently we allow to pass in the actual words as a long[] to the FixedBitSet 
 yet if this array is oversized with respect to the actual words it needs to 
 hold the bits the FixedBitSet can return wrong results since we use 
 words.length (bits.lenght) as the bounds when we iterate over the bits ie. if 
 we need to find the next set bit. We should use the actual bound rather than 
 the size of the array. 
 as a site note, I think it would be interesting to explore passing an offset 
 to this too to enable to create bitsets from slices

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4312) SolrCloud upgrade path

2013-01-18 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557348#comment-13557348
 ] 

Yonik Seeley commented on SOLR-4312:


Looking into this a little more, I guess the node naming change was needed for 
SOLR-4088

 SolrCloud upgrade path
 --

 Key: SOLR-4312
 URL: https://issues.apache.org/jira/browse/SOLR-4312
 Project: Solr
  Issue Type: Task
  Components: SolrCloud
Affects Versions: 4.0, 4.1
Reporter: Steve Rowe

 Upgrading from one SolrCloud version to another needs to be figured out and 
 documented.  
 Mark Miller wrote on the 4.1 VOTE email on dev@l.a.o:
 {quote}
 One issue that is probably still a problem is that you can't easily upgrade 
 form a 4.0 to 4.1 SolrCloud setup in some cases - at least to my knowledge. I 
 don't know all the details, but at a minimum, we should probably add an entry 
 to changes about what you should do. It may require blowing away your own 
 clusterstate.json and re doing your numShards settings, or starting over, 
 or…I don't really know. I don't think anyone has tested.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4312) SolrCloud upgrade path

2013-01-18 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557361#comment-13557361
 ] 

Mark Miller commented on SOLR-4312:
---

I think that points out a change we should probably make - we should not re 
guess the address on every startup - we should guess it once and keep using 
that unless someone then overrides it? Or sets a flag to force a re guess? 

 SolrCloud upgrade path
 --

 Key: SOLR-4312
 URL: https://issues.apache.org/jira/browse/SOLR-4312
 Project: Solr
  Issue Type: Task
  Components: SolrCloud
Affects Versions: 4.0, 4.1
Reporter: Steve Rowe

 Upgrading from one SolrCloud version to another needs to be figured out and 
 documented.  
 Mark Miller wrote on the 4.1 VOTE email on dev@l.a.o:
 {quote}
 One issue that is probably still a problem is that you can't easily upgrade 
 form a 4.0 to 4.1 SolrCloud setup in some cases - at least to my knowledge. I 
 don't know all the details, but at a minimum, we should probably add an entry 
 to changes about what you should do. It may require blowing away your own 
 clusterstate.json and re doing your numShards settings, or starting over, 
 or…I don't really know. I don't think anyone has tested.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #742: POMs out of sync

2013-01-18 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/742/

1 tests failed.
REGRESSION:  org.apache.solr.cloud.SyncSliceTest.testDistribSearch

Error Message:
shard1 should have just been set up to be inconsistent - but it's still 
consistent

Stack Trace:
java.lang.AssertionError: shard1 should have just been set up to be 
inconsistent - but it's still consistent
at 
__randomizedtesting.SeedInfo.seed([8D8CF1695801F063:C6A7F712F5E905F]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertNotNull(Assert.java:526)
at org.apache.solr.cloud.SyncSliceTest.doTest(SyncSliceTest.java:214)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:794)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
  

[jira] [Closed] (SOLR-4307) Solr join scoring

2013-01-18 Thread David vandendriessche (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David vandendriessche closed SOLR-4307.
---

Resolution: Invalid

As stated here:
https://issues.apache.org/jira/browse/LUCENE-4043?focusedCommentId=13557151page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13557151

It's possible to write a custom query parser.

 Solr join scoring
 -

 Key: SOLR-4307
 URL: https://issues.apache.org/jira/browse/SOLR-4307
 Project: Solr
  Issue Type: Wish
  Components: query parsers
 Environment: I'm using Solr 4.0.0
Reporter: David vandendriessche
  Labels: java, solr

 Add queryTimeJoining to solr.
 Example:
 q={!join  from=docId to=fileId}pageTxt:test-123
 No scoring on the result just a list of documents that have a match.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4316) Admin UI - SolrCloud - extend core options to collections

2013-01-18 Thread Shawn Heisey (JIRA)
Shawn Heisey created SOLR-4316:
--

 Summary: Admin UI - SolrCloud - extend core options to collections
 Key: SOLR-4316
 URL: https://issues.apache.org/jira/browse/SOLR-4316
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 4.1
Reporter: Shawn Heisey
 Fix For: 4.2, 5.0


There are a number of sections available when you are looking at a core in the 
UI - Ping, Query, Schema, Config, Replication, Analysis, Schema Browser, 
Plugins / Stats, and Dataimport are the ones that I can see.

A list of collections should be available, with as many of those options that 
can apply to a collection,  If options specific to collections/SolrCloud can be 
implemented, those should be there too.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4600) Explore facets aggregation during documents collection

2013-01-18 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4600:
---

Attachment: LUCENE-4600.patch

handle some nocommits. Now there's no translation from OrdinalValue to FRNImpl 
in getFacetResults (the latter is used directly in the queue). I wonder if this 
buys us anything.

 Explore facets aggregation during documents collection
 --

 Key: LUCENE-4600
 URL: https://issues.apache.org/jira/browse/LUCENE-4600
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
 Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch, 
 LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch


 Today the facet module simply gathers all hits (as a bitset, optionally with 
 a float[] to hold scores as well, if you will aggregate them) during 
 collection, and then at the end when you call getFacetsResults(), it makes a 
 2nd pass over all those hits doing the actual aggregation.
 We should investigate just aggregating as we collect instead, so we don't 
 have to tie up transient RAM (fairly small for the bit set but possibly big 
 for the float[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4316) Admin UI - SolrCloud - extend core options to collections

2013-01-18 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557571#comment-13557571
 ] 

Shawn Heisey commented on SOLR-4316:


If you have SolrCloud enabled, IMHO the list should have collapsible sections 
for collections and cores, with collections open and cores collapsed.


 Admin UI - SolrCloud - extend core options to collections
 -

 Key: SOLR-4316
 URL: https://issues.apache.org/jira/browse/SOLR-4316
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 4.1
Reporter: Shawn Heisey
 Fix For: 4.2, 5.0


 There are a number of sections available when you are looking at a core in 
 the UI - Ping, Query, Schema, Config, Replication, Analysis, Schema Browser, 
 Plugins / Stats, and Dataimport are the ones that I can see.
 A list of collections should be available, with as many of those options that 
 can apply to a collection,  If options specific to collections/SolrCloud can 
 be implemented, those should be there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4600) Explore facets aggregation during documents collection

2013-01-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557581#comment-13557581
 ] 

Michael McCandless commented on LUCENE-4600:


It's faster!

{noformat}
TaskQPS base  StdDevQPS comp  StdDev
Pct diff
PKLookup  239.75  (1.2%)  237.59  (1.0%)   
-0.9% (  -3% -1%)
HighTerm   21.21  (1.5%)   29.80  (2.6%)   
40.5% (  35% -   45%)
 MedTerm   34.90  (1.9%)   50.24  (1.9%)   
44.0% (  39% -   48%)
 LowTerm   99.85  (3.7%)  152.40  (1.1%)   
52.6% (  46% -   59%)
{noformat}

 Explore facets aggregation during documents collection
 --

 Key: LUCENE-4600
 URL: https://issues.apache.org/jira/browse/LUCENE-4600
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
 Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch, 
 LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch


 Today the facet module simply gathers all hits (as a bitset, optionally with 
 a float[] to hold scores as well, if you will aggregate them) during 
 collection, and then at the end when you call getFacetsResults(), it makes a 
 2nd pass over all those hits doing the actual aggregation.
 We should investigate just aggregating as we collect instead, so we don't 
 have to tie up transient RAM (fairly small for the bit set but possibly big 
 for the float[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2013-01-18 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557667#comment-13557667
 ] 

Alan Woodward commented on LUCENE-2878:
---

Since the last patch went up I've fixed a bunch of bugs (BrouwerianQuery works 
properly now, as do various nested IntervalQuery subtypes that were throwing 
NPEs), as well as adding Span-type scoring and fleshing out the explain() 
methods.  The only Span functionality that's missing I think is payload 
queries.  If we want to have *all* the span functionality in here before it can 
land on trunk I can work on that next.

It would also be good to do some proper benchmarking.  Do we already have 
something that can compare sets of queries?



 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: SolrTestCaseJ4: Can't avoid collection1 convention

2013-01-18 Thread P Williams
Hi folks,

I think that there is still an issue after the SOLR-3826 patch was applied
for 4.0 [https://issues.apache.org/jira/browse/SOLR-3826] in September
2012.  This line is missing:

Index: solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
===
--- solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
 (revision 1435375)
+++ solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
 (working copy)
@@ -384,9 +384,9 @@
   public static void createCore() {
 assertNotNull(testSolrHome);
 solrConfig = TestHarness.createConfig(testSolrHome, coreName,
getSolrConfigFile());
-h = new TestHarness( dataDir.getAbsolutePath(),
+h = new TestHarness( coreName, new Initializer( coreName,
dataDir.getAbsolutePath(),
 solrConfig,
-getSchemaFile());
+getSchemaFile() ) );
 lrf = h.getRequestFactory
 (standard,0,20,CommonParams.VERSION,2.2);
   }


TestHarness( String dataDirectory,SolrConfig solrConfig, IndexSchema
indexSchema) sets coreName to null and opens the default core: collection1.
 I would expect that coreName is carried all the way through the test.

What's the best course of action for getting this fixed?  Should I re-open
SOLR-3826 or create a new issue?

Thanks,
Tricia

On Tue, Aug 14, 2012 at 12:32 PM, Smiley, David W. dsmi...@mitre.orgwrote:

 I've got some code that extends Solr and I use the Solr test framework for
 my tests.  I upgraded from Solr 4 alpha to Solr 4 beta today, and it
 appears I am forced to put my test solr home directory in solr/collection1
 rather than just plain solr/  (relative to my test classpath).  I looked
 through the code and found that SolrTestCaseJ4.initCore() calls
 createCore() which calls TestHarness.createConfig(solrHome,confFile) which
 adds the collection1 to solr home.  This is a minor issue, but it annoys
 me and I see it as a needless change.  If it isn't fixed, we'll have to at
 least put that in the release notes and definitely the javadoc so that it
 is clear you *have* to use collection1.

 ~ David
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: SolrTestCaseJ4: Can't avoid collection1 convention

2013-01-18 Thread Mark Miller
I'd suggest creating a new issue and referencing the old issue in it.

- Mark

On Jan 18, 2013, at 5:48 PM, P Williams williams.tricia.l...@gmail.com wrote:

 Hi folks,
 
 I think that there is still an issue after the SOLR-3826 patch was applied 
 for 4.0 [https://issues.apache.org/jira/browse/SOLR-3826] in September 2012.  
 This line is missing:
 
 Index: solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
 ===
 --- solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
 (revision 1435375)
 +++ solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
 (working copy)
 @@ -384,9 +384,9 @@
public static void createCore() {
  assertNotNull(testSolrHome);
  solrConfig = TestHarness.createConfig(testSolrHome, coreName, 
 getSolrConfigFile());
 -h = new TestHarness( dataDir.getAbsolutePath(),
 +h = new TestHarness( coreName, new Initializer( coreName, 
 dataDir.getAbsolutePath(),
  solrConfig,
 -getSchemaFile());
 +getSchemaFile() ) );
  lrf = h.getRequestFactory
  (standard,0,20,CommonParams.VERSION,2.2);
}
 
 
 TestHarness( String dataDirectory,SolrConfig solrConfig, IndexSchema 
 indexSchema) sets coreName to null and opens the default core: collection1.  
 I would expect that coreName is carried all the way through the test.
 
 What's the best course of action for getting this fixed?  Should I re-open 
 SOLR-3826 or create a new issue?
 
 Thanks,
 Tricia
 
 On Tue, Aug 14, 2012 at 12:32 PM, Smiley, David W. dsmi...@mitre.org wrote:
 I've got some code that extends Solr and I use the Solr test framework for my 
 tests.  I upgraded from Solr 4 alpha to Solr 4 beta today, and it appears I 
 am forced to put my test solr home directory in solr/collection1 rather than 
 just plain solr/  (relative to my test classpath).  I looked through the code 
 and found that SolrTestCaseJ4.initCore() calls createCore() which calls 
 TestHarness.createConfig(solrHome,confFile) which adds the collection1 to 
 solr home.  This is a minor issue, but it annoys me and I see it as a 
 needless change.  If it isn't fixed, we'll have to at least put that in the 
 release notes and definitely the javadoc so that it is clear you *have* to 
 use collection1.
 
 ~ David
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4317) SolrTestCaseJ4: Can't avoid collection1 convention

2013-01-18 Thread Tricia Jenkins (JIRA)
Tricia Jenkins created SOLR-4317:


 Summary: SolrTestCaseJ4: Can't avoid collection1 convention
 Key: SOLR-4317
 URL: https://issues.apache.org/jira/browse/SOLR-4317
 Project: Solr
  Issue Type: Improvement
  Components: Tests
Affects Versions: 4.0
Reporter: Tricia Jenkins
Priority: Minor
 Fix For: 4.1


I think that there is still an issue after the SOLR-3826 patch was applied for 
4.0 [https://issues.apache.org/jira/browse/SOLR-3826] in September 2012.  This 
line is missing:

Index: solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
===
--- solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
(revision 1435375)
+++ solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
(working copy)
@@ -384,9 +384,9 @@
   public static void createCore() {
 assertNotNull(testSolrHome);
 solrConfig = TestHarness.createConfig(testSolrHome, coreName, 
getSolrConfigFile());
-h = new TestHarness( dataDir.getAbsolutePath(),
+h = new TestHarness( coreName, new Initializer( coreName, 
dataDir.getAbsolutePath(),
 solrConfig,
-getSchemaFile());
+getSchemaFile() ) );
 lrf = h.getRequestFactory
 (standard,0,20,CommonParams.VERSION,2.2);
   }


TestHarness( String dataDirectory,SolrConfig solrConfig, IndexSchema 
indexSchema) sets coreName to null and opens the default core: collection1.  I 
would expect that coreName is carried all the way through the test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4317) SolrTestCaseJ4: Can't avoid collection1 convention

2013-01-18 Thread Tricia Jenkins (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tricia Jenkins updated SOLR-4317:
-

Attachment: SOLR-4317.patch

This is the patch from my description.

 SolrTestCaseJ4: Can't avoid collection1 convention
 

 Key: SOLR-4317
 URL: https://issues.apache.org/jira/browse/SOLR-4317
 Project: Solr
  Issue Type: Improvement
  Components: Tests
Affects Versions: 4.0
Reporter: Tricia Jenkins
Priority: Minor
 Fix For: 4.1

 Attachments: SOLR-4317.patch


 I think that there is still an issue after the SOLR-3826 patch was applied 
 for 4.0 [https://issues.apache.org/jira/browse/SOLR-3826] in September 2012.  
 This line is missing:
 Index: solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
 ===
 --- solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
 (revision 1435375)
 +++ solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
 (working copy)
 @@ -384,9 +384,9 @@
public static void createCore() {
  assertNotNull(testSolrHome);
  solrConfig = TestHarness.createConfig(testSolrHome, coreName, 
 getSolrConfigFile());
 -h = new TestHarness( dataDir.getAbsolutePath(),
 +h = new TestHarness( coreName, new Initializer( coreName, 
 dataDir.getAbsolutePath(),
  solrConfig,
 -getSchemaFile());
 +getSchemaFile() ) );
  lrf = h.getRequestFactory
  (standard,0,20,CommonParams.VERSION,2.2);
}
 TestHarness( String dataDirectory,SolrConfig solrConfig, IndexSchema 
 indexSchema) sets coreName to null and opens the default core: collection1.  
 I would expect that coreName is carried all the way through the test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: SolrTestCaseJ4: Can't avoid collection1 convention

2013-01-18 Thread P Williams
Done.  You can find it here: https://issues.apache.org/jira/browse/SOLR-4317

On Fri, Jan 18, 2013 at 4:01 PM, Mark Miller markrmil...@gmail.com wrote:

 I'd suggest creating a new issue and referencing the old issue in it.

 - Mark

 On Jan 18, 2013, at 5:48 PM, P Williams williams.tricia.l...@gmail.com
 wrote:

  Hi folks,
 
  I think that there is still an issue after the SOLR-3826 patch was
 applied for 4.0 [https://issues.apache.org/jira/browse/SOLR-3826] in
 September 2012.  This line is missing:
 
  Index: solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
  ===
  --- solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
  (revision 1435375)
  +++ solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
  (working copy)
  @@ -384,9 +384,9 @@
 public static void createCore() {
   assertNotNull(testSolrHome);
   solrConfig = TestHarness.createConfig(testSolrHome, coreName,
 getSolrConfigFile());
  -h = new TestHarness( dataDir.getAbsolutePath(),
  +h = new TestHarness( coreName, new Initializer( coreName,
 dataDir.getAbsolutePath(),
   solrConfig,
  -getSchemaFile());
  +getSchemaFile() ) );
   lrf = h.getRequestFactory
   (standard,0,20,CommonParams.VERSION,2.2);
 }
 
 
  TestHarness( String dataDirectory,SolrConfig solrConfig, IndexSchema
 indexSchema) sets coreName to null and opens the default core: collection1.
  I would expect that coreName is carried all the way through the test.
 
  What's the best course of action for getting this fixed?  Should I
 re-open SOLR-3826 or create a new issue?
 
  Thanks,
  Tricia
 
  On Tue, Aug 14, 2012 at 12:32 PM, Smiley, David W. dsmi...@mitre.org
 wrote:
  I've got some code that extends Solr and I use the Solr test framework
 for my tests.  I upgraded from Solr 4 alpha to Solr 4 beta today, and it
 appears I am forced to put my test solr home directory in solr/collection1
 rather than just plain solr/  (relative to my test classpath).  I looked
 through the code and found that SolrTestCaseJ4.initCore() calls
 createCore() which calls TestHarness.createConfig(solrHome,confFile) which
 adds the collection1 to solr home.  This is a minor issue, but it annoys
 me and I see it as a needless change.  If it isn't fixed, we'll have to at
 least put that in the release notes and definitely the javadoc so that it
 is clear you *have* to use collection1.
 
  ~ David
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Updated] (SOLR-4317) SolrTestCaseJ4: Can't avoid collection1 convention

2013-01-18 Thread Tricia Jenkins (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tricia Jenkins updated SOLR-4317:
-

Fix Version/s: (was: 4.1)
   4.2

 SolrTestCaseJ4: Can't avoid collection1 convention
 

 Key: SOLR-4317
 URL: https://issues.apache.org/jira/browse/SOLR-4317
 Project: Solr
  Issue Type: Improvement
  Components: Tests
Affects Versions: 4.0
Reporter: Tricia Jenkins
Priority: Minor
 Fix For: 4.2

 Attachments: SOLR-4317.patch


 I think that there is still an issue after the SOLR-3826 patch was applied 
 for 4.0 [https://issues.apache.org/jira/browse/SOLR-3826] in September 2012.  
 This line is missing:
 Index: solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
 ===
 --- solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
 (revision 1435375)
 +++ solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
 (working copy)
 @@ -384,9 +384,9 @@
public static void createCore() {
  assertNotNull(testSolrHome);
  solrConfig = TestHarness.createConfig(testSolrHome, coreName, 
 getSolrConfigFile());
 -h = new TestHarness( dataDir.getAbsolutePath(),
 +h = new TestHarness( coreName, new Initializer( coreName, 
 dataDir.getAbsolutePath(),
  solrConfig,
 -getSchemaFile());
 +getSchemaFile() ) );
  lrf = h.getRequestFactory
  (standard,0,20,CommonParams.VERSION,2.2);
}
 TestHarness( String dataDirectory,SolrConfig solrConfig, IndexSchema 
 indexSchema) sets coreName to null and opens the default core: collection1.  
 I would expect that coreName is carried all the way through the test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4317) SolrTestCaseJ4: Can't avoid collection1 convention

2013-01-18 Thread Tricia Jenkins (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tricia Jenkins updated SOLR-4317:
-

Fix Version/s: 5.0

 SolrTestCaseJ4: Can't avoid collection1 convention
 

 Key: SOLR-4317
 URL: https://issues.apache.org/jira/browse/SOLR-4317
 Project: Solr
  Issue Type: Improvement
  Components: Tests
Affects Versions: 4.0
Reporter: Tricia Jenkins
Priority: Minor
 Fix For: 4.2, 5.0

 Attachments: SOLR-4317.patch


 I think that there is still an issue after the SOLR-3826 patch was applied 
 for 4.0 [https://issues.apache.org/jira/browse/SOLR-3826] in September 2012.  
 This line is missing:
 Index: solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
 ===
 --- solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
 (revision 1435375)
 +++ solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
 (working copy)
 @@ -384,9 +384,9 @@
public static void createCore() {
  assertNotNull(testSolrHome);
  solrConfig = TestHarness.createConfig(testSolrHome, coreName, 
 getSolrConfigFile());
 -h = new TestHarness( dataDir.getAbsolutePath(),
 +h = new TestHarness( coreName, new Initializer( coreName, 
 dataDir.getAbsolutePath(),
  solrConfig,
 -getSchemaFile());
 +getSchemaFile() ) );
  lrf = h.getRequestFactory
  (standard,0,20,CommonParams.VERSION,2.2);
}
 TestHarness( String dataDirectory,SolrConfig solrConfig, IndexSchema 
 indexSchema) sets coreName to null and opens the default core: collection1.  
 I would expect that coreName is carried all the way through the test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4599) Compressed term vectors

2013-01-18 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-4599:
-

Attachment: LUCENE-4599.patch

New patch with tests, addProx and specialized merging. I think it is ready. 
This patch is similar to the previous ones except that it uses LZ4 compression 
on top of prefix compression (similarly to Lucene40TermVectorsFormat which 
writes the common prefix length with the previous term as a VInt before each 
term) instead of the raw term bytes to improve the compression ratio and relies 
on LUCENE-4643 for most integer encoding instead of raw packed ints. Otherwise:
 - vectors are still compressed into blocks of 16 KB,
 - looking up term vectors requires at most 1 disk seek.

Here are the size reductions of the term vector files depending on the size of 
the input docs:

|| Field options / Document size || 1 KB (a few tens of docs per chunk) || 750 
KB (one doc per chunk) ||
| none | 37% | 32% |
| positions | 32% | 10% |
| offsets | 41% | 31% |
| positions+offsets | 40% | 35% |

Regarding speed, indexing seems to be slightly slower but maybe the diminution 
of the size of the vector files would make merging faster when not everything 
fits in the I/O cache. I also ran a simple benchmark that loads term vectors 
for every doc of the index and iterates over all terms and positions. This new 
format was ~5x slower for small docs (likely because it has to decode the whole 
chunk even to read a single doc) and between 1.5x and 2x faster for large docs 
that are alone in their chunk (again, results would very likely be better on a 
large index which wouldn't fully fit in the O/S cache).

If someone with very large term vector files wanted to test this new format, 
this would be great! I'll try on my side to perform more indexing/highlighting 
benchmarks..

 Compressed term vectors
 ---

 Key: LUCENE-4599
 URL: https://issues.apache.org/jira/browse/LUCENE-4599
 Project: Lucene - Core
  Issue Type: Task
  Components: core/codecs, core/termvectors
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.2

 Attachments: LUCENE-4599.patch, LUCENE-4599.patch, LUCENE-4599.patch


 We should have codec-compressed term vectors similarly to what we have with 
 stored fields.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4599) Compressed term vectors

2013-01-18 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557849#comment-13557849
 ] 

Shawn Heisey commented on LUCENE-4599:
--

bq. If someone with very large term vector files wanted to test this new 
format, this
would be great! I'll try on my side to perform more indexing/highlighting
benchmarks..

My indexes are pretty big, with termvectors taking up a lot of that. The 3.5.0 
version of each of my shards is about 21GB. The same index in 4.1 with 
compressed stored fields is a little lres than 17 GB. I will give this patch a 
try on branch_4x. The full import will take 7-8 hours.

 Compressed term vectors
 ---

 Key: LUCENE-4599
 URL: https://issues.apache.org/jira/browse/LUCENE-4599
 Project: Lucene - Core
  Issue Type: Task
  Components: core/codecs, core/termvectors
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.2

 Attachments: LUCENE-4599.patch, LUCENE-4599.patch, LUCENE-4599.patch


 We should have codec-compressed term vectors similarly to what we have with 
 stored fields.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4599) Compressed term vectors

2013-01-18 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557849#comment-13557849
 ] 

Shawn Heisey edited comment on LUCENE-4599 at 1/19/13 1:41 AM:
---

bq. If someone with very large term vector files wanted to test this new 
format, this would be great! I'll try on my side to perform more 
indexing/highlighting
benchmarks..

My indexes are pretty big, with termvectors taking up a lot of that. The 3.5.0 
version of each of my shards is about 21GB. The same index in 4.1 with 
compressed stored fields is a little lres than 17 GB. I will give this patch a 
try on branch_4x. The full import will take 7-8 hours.

  was (Author: elyograg):
bq. If someone with very large term vector files wanted to test this new 
format, this
would be great! I'll try on my side to perform more indexing/highlighting
benchmarks..

My indexes are pretty big, with termvectors taking up a lot of that. The 3.5.0 
version of each of my shards is about 21GB. The same index in 4.1 with 
compressed stored fields is a little lres than 17 GB. I will give this patch a 
try on branch_4x. The full import will take 7-8 hours.
  
 Compressed term vectors
 ---

 Key: LUCENE-4599
 URL: https://issues.apache.org/jira/browse/LUCENE-4599
 Project: Lucene - Core
  Issue Type: Task
  Components: core/codecs, core/termvectors
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.2

 Attachments: LUCENE-4599.patch, LUCENE-4599.patch, LUCENE-4599.patch


 We should have codec-compressed term vectors similarly to what we have with 
 stored fields.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4599) Compressed term vectors

2013-01-18 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557849#comment-13557849
 ] 

Shawn Heisey edited comment on LUCENE-4599 at 1/19/13 1:42 AM:
---

bq. If someone with very large term vector files wanted to test this new 
format, this would be great! I'll try on my side to perform more 
indexing/highlighting benchmarks..

My indexes are pretty big, with termvectors taking up a lot of that. The 3.5.0 
version of each of my shards is about 21GB. The same index in 4.1 with 
compressed stored fields is a little lres than 17 GB. I will give this patch a 
try on branch_4x. The full import will take 7-8 hours.

  was (Author: elyograg):
bq. If someone with very large term vector files wanted to test this new 
format, this would be great! I'll try on my side to perform more 
indexing/highlighting
benchmarks..

My indexes are pretty big, with termvectors taking up a lot of that. The 3.5.0 
version of each of my shards is about 21GB. The same index in 4.1 with 
compressed stored fields is a little lres than 17 GB. I will give this patch a 
try on branch_4x. The full import will take 7-8 hours.
  
 Compressed term vectors
 ---

 Key: LUCENE-4599
 URL: https://issues.apache.org/jira/browse/LUCENE-4599
 Project: Lucene - Core
  Issue Type: Task
  Components: core/codecs, core/termvectors
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.2

 Attachments: LUCENE-4599.patch, LUCENE-4599.patch, LUCENE-4599.patch


 We should have codec-compressed term vectors similarly to what we have with 
 stored fields.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4599) Compressed term vectors

2013-01-18 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557889#comment-13557889
 ] 

Shawn Heisey commented on LUCENE-4599:
--

I should ask - will this be on by default in Solr with the patch?  I just got 
the patch applied to 4.1 because I already had it, decided to try it before 
branch_4x.  It has occurred to me that as a LUCENE issue, it might not be 
turned on for Solr.


 Compressed term vectors
 ---

 Key: LUCENE-4599
 URL: https://issues.apache.org/jira/browse/LUCENE-4599
 Project: Lucene - Core
  Issue Type: Task
  Components: core/codecs, core/termvectors
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.2

 Attachments: LUCENE-4599.patch, LUCENE-4599.patch, LUCENE-4599.patch


 We should have codec-compressed term vectors similarly to what we have with 
 stored fields.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org