[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2014-06-04 Thread vivek (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017403#comment-14017403
 ] 

vivek commented on LUCENE-2899:
---

I followed this link to integrate https://wiki.apache.org/solr/OpenNLP to 
integrate

Installation

For English language testing: Until LUCENE-2899 is committed:

1.pull the latest trunk or 4.0 branch

2.apply the latest LUCENE-2899 patch
3.do 'ant compile'
cd solr/contrib/opennlp/src/test-files/training
.
.
. 
i followed first two steps but got the following error while executing 3rd point

common.compile-core:
[javac] Compiling 10 source files to 
/home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/build/analysis/opennlp/classes/java

[javac] warning: [path] bad path element 
/home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/lib/jwnl-1.3.3.jar:
 no such file or directory

[javac] 
/home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/FilterPayloadsFilter.java:43:
 error: cannot find symbol

[javac] super(Version.LUCENE_44, input);

[javac]  ^
[javac]   symbol:   variable LUCENE_44
[javac]   location: class Version
[javac] 
/home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:56:
 error: no suitable constructor found for Tokenizer(Reader)
[javac] super(input);
[javac] ^
[javac] constructor Tokenizer.Tokenizer(AttributeFactory) is not 
applicable
[javac]   (actual argument Reader cannot be converted to 
AttributeFactory by method invocation conversion)
[javac] constructor Tokenizer.Tokenizer() is not applicable
[javac]   (actual and formal argument lists differ in length)
[javac] 2 errors
[javac] 1 warning

Im really stuck how to passthough this step. I wasted my entire day to fix this 
but couldn't move a bit. Please someone help me..?


 Add OpenNLP Analysis capabilities as a module
 -

 Key: LUCENE-2899
 URL: https://issues.apache.org/jira/browse/LUCENE-2899
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: LUCENE-2899-RJN.patch, LUCENE-2899.patch, 
 OpenNLPFilter.java, OpenNLPTokenizer.java


 Now that OpenNLP is an ASF project and has a nice license, it would be nice 
 to have a submodule (under analysis) that exposed capabilities for it. Drew 
 Farris, Tom Morton and I have code that does:
 * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
 would have to change slightly to buffer tokens)
 * NamedEntity recognition as a TokenFilter
 We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
 either payloads (PartOfSpeechAttribute?) on a token or at the same position.
 I'd propose it go under:
 modules/analysis/opennlp



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5731) split direct packed ints from in-ram ones

2014-06-04 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5731:


Attachment: LUCENE-5731.patch

just some bugfixes to the mmap stuff. I need to add dedicated tests for those 
tomorrow.

 split direct packed ints from in-ram ones
 -

 Key: LUCENE-5731
 URL: https://issues.apache.org/jira/browse/LUCENE-5731
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5731.patch, LUCENE-5731.patch


 Currently there is an oversharing problem in packedints that imposes too many 
 requirements on improving it:
 * every packed ints must be able to be loaded directly, or in ram, or 
 iterated with.
 * things like filepointers are expected to be adjusted (this is especially 
 stupid) in all cases
 * lots of unnecessary abstractions
 * versioning etc is complex
 None of this flexibility is needed or buys us anything, and it prevents 
 performance improvements (e.g. i just want to add 3 bytes at the end of 
 on-disk streams to reduce the number of bytebuffer calls and thats seriously 
 impossible with the current situation).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6119) TestReplicationHandler attempts to remove open folders

2014-06-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017408#comment-14017408
 ] 

ASF subversion and git services commented on SOLR-6119:
---

Commit 1599942 from [~dawidweiss] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1599942 ]

SOLR-6119: backporting some replication handler fixes from trunk.

 TestReplicationHandler attempts to remove open folders
 --

 Key: SOLR-6119
 URL: https://issues.apache.org/jira/browse/SOLR-6119
 Project: Solr
  Issue Type: Bug
Reporter: Dawid Weiss
Priority: Minor
 Attachments: SOLR-6119.patch, SOLR-6119.patch, SOLR-6119.patch, 
 SOLR-6119.patch, SOLR-6119.patch, SOLR-6119.patch


 TestReplicationHandler has a weird logic around the 'snapDir' variable. It 
 attempts to remove snapshot folders, even though they're not closed yet. My 
 recent patch uncovered the bug but I don't know how to fix it cleanly -- the 
 test itself seems to be very fragile (for example I don't understand the 
 'namedBackup' variable which is always set to true, yet there are 
 conditionals around it).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6119) TestReplicationHandler attempts to remove open folders

2014-06-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017412#comment-14017412
 ] 

ASF subversion and git services commented on SOLR-6119:
---

Commit 1599943 from [~dawidweiss] in branch 'dev/trunk'
[ https://svn.apache.org/r1599943 ]

SOLR-6119: refactored doTestBackup into a separate class.

 TestReplicationHandler attempts to remove open folders
 --

 Key: SOLR-6119
 URL: https://issues.apache.org/jira/browse/SOLR-6119
 Project: Solr
  Issue Type: Bug
Reporter: Dawid Weiss
Priority: Minor
 Attachments: SOLR-6119.patch, SOLR-6119.patch, SOLR-6119.patch, 
 SOLR-6119.patch, SOLR-6119.patch, SOLR-6119.patch


 TestReplicationHandler has a weird logic around the 'snapDir' variable. It 
 attempts to remove snapshot folders, even though they're not closed yet. My 
 recent patch uncovered the bug but I don't know how to fix it cleanly -- the 
 test itself seems to be very fragile (for example I don't understand the 
 'namedBackup' variable which is always set to true, yet there are 
 conditionals around it).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6119) TestReplicationHandler attempts to remove open folders

2014-06-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017415#comment-14017415
 ] 

ASF subversion and git services commented on SOLR-6119:
---

Commit 1599944 from [~dawidweiss] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1599944 ]

SOLR-6119: backport of test split from trunk.

 TestReplicationHandler attempts to remove open folders
 --

 Key: SOLR-6119
 URL: https://issues.apache.org/jira/browse/SOLR-6119
 Project: Solr
  Issue Type: Bug
Reporter: Dawid Weiss
Priority: Minor
 Attachments: SOLR-6119.patch, SOLR-6119.patch, SOLR-6119.patch, 
 SOLR-6119.patch, SOLR-6119.patch, SOLR-6119.patch


 TestReplicationHandler has a weird logic around the 'snapDir' variable. It 
 attempts to remove snapshot folders, even though they're not closed yet. My 
 recent patch uncovered the bug but I don't know how to fix it cleanly -- the 
 test itself seems to be very fragile (for example I don't understand the 
 'namedBackup' variable which is always set to true, yet there are 
 conditionals around it).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5703) Don't allocate/copy bytes all the time in binary DV producers

2014-06-04 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5703:
-

Attachment: LUCENE-5703.patch

Here is an updated patch. Sorted(Set)TermsEnum copies the supplied BytesRef 
when a match is found instead of looking up the ord.

 Don't allocate/copy bytes all the time in binary DV producers
 -

 Key: LUCENE-5703
 URL: https://issues.apache.org/jira/browse/LUCENE-5703
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch


 Our binary doc values producers keep on creating new {{byte[]}} arrays and 
 copying bytes when a value is requested, which likely doesn't help 
 performance. This has been done because of the way fieldcache consumers used 
 the API, but we should try to fix it in 5.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6127) Improve Solr's exampledocs data

2014-06-04 Thread Varun Thacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated SOLR-6127:


Attachment: freebase_film_dump.py

bq. In xml, genre is single values and percentage sign separated. I think this 
would be multivalued field?
Fixed. Thanks!

bq. generated film.xml does not have license header. I thought it will have, no?
Added the license header 

bq. type field has value of /film/film for all docs. Is this expected.
Yes all docs will have type = /film/film as thats the category type of 
freebase where we are fetching the data from.

 Improve Solr's exampledocs data
 ---

 Key: SOLR-6127
 URL: https://issues.apache.org/jira/browse/SOLR-6127
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Reporter: Varun Thacker
Priority: Minor
 Fix For: 5.0

 Attachments: film.csv, film.json, film.xml, freebase_film_dump.py, 
 freebase_film_dump.py, freebase_film_dump.py, freebase_film_dump.py


 Currently 
 - The CSV example has 10 documents.
 - The JSON example has 4 documents.
 - The XML example has 32 documents.
 1. We should have equal number of documents and the same documents in all the 
 example formats
 2. A data set which is slightly more comprehensive.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6127) Improve Solr's exampledocs data

2014-06-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017434#comment-14017434
 ] 

Uwe Schindler commented on SOLR-6127:
-

bq. Added the license header

I think this should be a CC-BY license header, not ASF.

 Improve Solr's exampledocs data
 ---

 Key: SOLR-6127
 URL: https://issues.apache.org/jira/browse/SOLR-6127
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Reporter: Varun Thacker
Priority: Minor
 Fix For: 5.0

 Attachments: film.csv, film.json, film.xml, freebase_film_dump.py, 
 freebase_film_dump.py, freebase_film_dump.py, freebase_film_dump.py


 Currently 
 - The CSV example has 10 documents.
 - The JSON example has 4 documents.
 - The XML example has 32 documents.
 1. We should have equal number of documents and the same documents in all the 
 example formats
 2. A data set which is slightly more comprehensive.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6133) More robust collection-delete

2014-06-04 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017439#comment-14017439
 ] 

Per Steffensen commented on SOLR-6133:
--

In general zk=truth sounds like a great idea :-) But shouldnt zk=truth be 
implicit when either zkRun or zkHost is set?

I am not sure about the terminology, but I believe unloaded does not include 
deleting the data from disk? My main concern with the scenario I show is that 
data is not being deleted from disk. We would really like some way to make 
(fairly) sure that data is deleted when we fire a collection-delete request 
(and info disappears from ZK). We have enormous amounts of data and will run 
out of disk-space if we do not have our data-folders deleted.

I am also a little bit concerned about the on startup part of will be 
unloaded on startup. In the scenario I show above, the shards that where 
deleted from zk but not from disk, will pop up in zk again on restart of Solrs 
(because the folders still contain core.properties I believe), and then we get 
a second chance deleting them because we can re-detect that an unwanted 
collection (partly) exists. So if zk=truth means that data will not be deleted, 
and that the shards will not re-appear in zk after restart of Solrs, it is 
actually a step back wrt my main concern. But back to my concern with on 
startup: Actually we very rarely restart Solrs (because they run fairly stable 
- that is a good thing), so I am concerned with a solution that only cleans up 
or recovers on restart.

I am keen on improving collection-delete to do whatever it can to be all or 
nothing. Will you consider adding to Solr server-side the check for all nodes 
are live and check for all shards/replica are active before delete from 
CollDelete.java. This will be a step in the all or nothing direction, which 
will be even more important for non SolrJ clients, that really cannot do the 
trick themselves on client-side (unless they do the zk-data joggeling on 
client-side in another way).

 More robust collection-delete
 -

 Key: SOLR-6133
 URL: https://issues.apache.org/jira/browse/SOLR-6133
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.7.2, 4.8.1
Reporter: Per Steffensen
 Attachments: CollDelete.java, coll_delete_problem.zip


 If Solrs are not stable (completely up and running etc) a collection-delete 
 request might result in partly deleted collections. You might say that it is 
 fair that you are not able to have a collection deleted if all of its shards 
 are not actively running - even though I would like a mechanism that just 
 deleted them when/if they ever come up again. But even though all shards 
 claim to be actively running you can still end up with partly deleted 
 collections - that is not acceptable IMHO. At least clusterstate should 
 always reflect the state, so that you are able to detect that your 
 collection-delete request was only partly carried out - which parts were 
 successfully deleted and which were not (including information about 
 data-folder-deletion)
 The text above sounds like an epic-sized task, with potentially numerous 
 problems to fix, so in order not to make this ticket open forever I will 
 point out a particular scenario where I see problems. Then this problem is 
 corrected we can close this ticket. Other tickets will have to deal with 
 other collection-delete issues.
 Here is what I did and saw
 * Logged into one of my Linux machines with IP 192.168.78.239
 * Prepared for Solr install
 {code}
 mkdir -p /xXX/solr
 cd /xXX/solr
 {code}
 * downloaded solr-4.7.2.tgz
 * Installed Solr 4.7.2 and prepared for three nodes
 {code}
 tar zxvf solr-4.7.2.tgz
 cd solr-4.7.2/
 cp -r example node1
 cp -r example node2
 cp -r example node3
 {code}
 * Initialized Solr config into Solr
 {code}
 cd node1
 java -DzkRun -Dhost=192.168.78.239 
 -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf 
 -jar start.jar
 CTRL-C to stop solr (node1) again after it started completely
 {code}
 * Started all three Solr nodes
 {code}
 nohup java -Djetty.port=8983 -Dhost=192.168.78.239 -DzkRun -jar start.jar  
 node1_stdouterr.log 
 cd ../node2
 nohup java -Djetty.port=8984 -Dhost=192.168.78.239 -DzkHost=localhost:9983 
 -jar start.jar  node2_stdouterr.log 
 cd ../node3
 nohup java -Djetty.port=8985 -Dhost=192.168.78.239 -DzkHost=localhost:9983 
 -jar start.jar  node3_stdouterr.log 
 {code}
 * Created a collection mycoll
 {code}
 curl 
 'http://192.168.78.239:8983/solr/admin/collections?action=CREATEname=mycollnumShards=6replicationFactor=1maxShardsPerNode=2collection.configName=myconf'
 {code}
 * Collected Cloud Graph image, clusterstate.json and info about data 
 folders (see attached coll_delete_problem.zip | 
 after_create_all_solrs_still_running). 

[jira] [Commented] (LUCENE-5731) split direct packed ints from in-ram ones

2014-06-04 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017446#comment-14017446
 ] 

Adrien Grand commented on LUCENE-5731:
--

+1 I like the new directory API and how direct packed ints use it. One minor 
note: the javadoc of Lucene49Codec refers to the lucene46 package instead of 
lucene49.

 split direct packed ints from in-ram ones
 -

 Key: LUCENE-5731
 URL: https://issues.apache.org/jira/browse/LUCENE-5731
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5731.patch, LUCENE-5731.patch


 Currently there is an oversharing problem in packedints that imposes too many 
 requirements on improving it:
 * every packed ints must be able to be loaded directly, or in ram, or 
 iterated with.
 * things like filepointers are expected to be adjusted (this is especially 
 stupid) in all cases
 * lots of unnecessary abstractions
 * versioning etc is complex
 None of this flexibility is needed or buys us anything, and it prevents 
 performance improvements (e.g. i just want to add 3 bytes at the end of 
 on-disk streams to reduce the number of bytebuffer calls and thats seriously 
 impossible with the current situation).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6127) Improve Solr's exampledocs data

2014-06-04 Thread Varun Thacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated SOLR-6127:


Attachment: freebase_film_dump.py

The XML output adds the Creative Commons Attribution 2.5 header instead of the 
ASF license.

 Improve Solr's exampledocs data
 ---

 Key: SOLR-6127
 URL: https://issues.apache.org/jira/browse/SOLR-6127
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Reporter: Varun Thacker
Priority: Minor
 Fix For: 5.0

 Attachments: film.csv, film.json, film.xml, freebase_film_dump.py, 
 freebase_film_dump.py, freebase_film_dump.py, freebase_film_dump.py, 
 freebase_film_dump.py


 Currently 
 - The CSV example has 10 documents.
 - The JSON example has 4 documents.
 - The XML example has 32 documents.
 1. We should have equal number of documents and the same documents in all the 
 example formats
 2. A data set which is slightly more comprehensive.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Early Access builds for JDK 9 b15, JDK 8u20 b16 are available on java.net

2014-06-04 Thread Rory O'Donnell Oracle, Dublin Ireland

Hi Uwe,Dawid,

Early Access builds for JDK 9 b15 https://jdk9.java.net/download/, JDK 
8u20 b16 https://jdk8.java.net/download.html  are available on java.net.


As we enter the later phases of development for JDK 8u20 , please log 
any show

stoppers as soon as possible.

JDK 7u60 is available for download [0] .

Rgds, Rory

[0] http://www.oracle.com/technetwork/java/javase/downloads/index.html

--
Rgds,Rory O'Donnell
Quality Engineering Manager
Oracle EMEA , Dublin, Ireland



RE: Early Access builds for JDK 9 b15, JDK 8u20 b16 are available on java.net

2014-06-04 Thread Uwe Schindler
Hi Rory,

 

thank you for the info! I installed 7u60 already (yesterday evening). I am 
happy that the MacOSX problem with Socket#accept was solved. I hope this fix 
gets also in the JDK 8 builds: https://bugs.openjdk.java.net/browse/JDK-8024045

This one prevents applications like Lucene, that use many file descriptors, to 
work correctly in web containers like Jetty or Tomcat on MacOSX server – 
causing SIGSEGV.

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Rory O'Donnell Oracle, Dublin Ireland [mailto:rory.odonn...@oracle.com] 
Sent: Wednesday, June 04, 2014 9:36 AM
To: Uwe Schindler; Dawid Weiss
Cc: rory.odonn...@oracle.com; dev@lucene.apache.org; Dalibor Topic; Balchandra 
Vaidya
Subject: Early Access builds for JDK 9 b15, JDK 8u20 b16 are available on 
java.net

 

Hi Uwe,Dawid,

Early Access builds for JDK 9 b15 https://jdk9.java.net/download/ ,  JDK 8u20 
b16 https://jdk8.java.net/download.html   are available on java.net.

As we enter the later phases of development for JDK 8u20 , please log any show
stoppers as soon as possible. 

JDK 7u60 is available for download [0] . 

Rgds, Rory

[0] http://www.oracle.com/technetwork/java/javase/downloads/index.html



-- 
Rgds,Rory O'Donnell
Quality Engineering Manager
Oracle EMEA , Dublin, Ireland


Re: Early Access builds for JDK 9 b15, JDK 8u20 b16 are available on java.net

2014-06-04 Thread Rory O'Donnell Oracle, Dublin Ireland

Hi Uwe,

Let me look into this.

Rgds,Rory
On 04/06/2014 08:42, Uwe Schindler wrote:


Hi Rory,

thank you for the info! I installed 7u60 already (yesterday evening). 
I am happy that the MacOSX problem with Socket#accept was solved. I 
hope this fix gets also in the JDK 8 builds: 
https://bugs.openjdk.java.net/browse/JDK-8024045


This one prevents applications like Lucene, that use many file 
descriptors, to work correctly in web containers like Jetty or Tomcat 
on MacOSX server – causing SIGSEGV.


Uwe

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de http://www.thetaphi.de/

eMail: u...@thetaphi.de

*From:*Rory O'Donnell Oracle, Dublin Ireland 
[mailto:rory.odonn...@oracle.com]

*Sent:* Wednesday, June 04, 2014 9:36 AM
*To:* Uwe Schindler; Dawid Weiss
*Cc:* rory.odonn...@oracle.com; dev@lucene.apache.org; Dalibor Topic; 
Balchandra Vaidya
*Subject:* Early Access builds for JDK 9 b15, JDK 8u20 b16 are 
available on java.net


Hi Uwe,Dawid,

Early Access builds for JDK 9 b15 https://jdk9.java.net/download/, 
JDK 8u20 b16 https://jdk8.java.net/download.html are available on 
java.net.


As we enter the later phases of development for JDK 8u20 , please log 
any show

stoppers as soon as possible.

JDK 7u60 is available for download [0] .

Rgds, Rory

[0] http://www.oracle.com/technetwork/java/javase/downloads/index.html

--
Rgds,Rory O'Donnell
Quality Engineering Manager
Oracle EMEA , Dublin, Ireland


--
Rgds,Rory O'Donnell
Quality Engineering Manager
Oracle EMEA , Dublin, Ireland



[jira] [Created] (LUCENE-5733) Minor PackedInts API cleanups

2014-06-04 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-5733:


 Summary: Minor PackedInts API cleanups
 Key: LUCENE-5733
 URL: https://issues.apache.org/jira/browse/LUCENE-5733
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Trivial
 Fix For: 4.9, 5.0


The PackedInts API has quite some history now and some of its methods are not 
used anymore, eg. PackedInts.Reader.hasArray. I'd like to remove them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5733) Minor PackedInts API cleanups

2014-06-04 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5733:
-

Attachment: LUCENE-5733.patch

Here is a patch:
 - removes Reader.hasArray and Reader.getArray
 - moves getBitsPerValues from Reader (unused there) to Mutable

 Minor PackedInts API cleanups
 -

 Key: LUCENE-5733
 URL: https://issues.apache.org/jira/browse/LUCENE-5733
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Trivial
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5733.patch


 The PackedInts API has quite some history now and some of its methods are not 
 used anymore, eg. PackedInts.Reader.hasArray. I'd like to remove them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6131) Remove deprecated Token class from solr.spelling package

2014-06-04 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-6131:
---

Attachment: NukeToken.patch

I mean deleting Token.java from source tree, as in this patch. And fixing 
remaining compile errors.

 Remove deprecated Token class from solr.spelling package
 

 Key: SOLR-6131
 URL: https://issues.apache.org/jira/browse/SOLR-6131
 Project: Solr
  Issue Type: Improvement
  Components: spellchecker
Affects Versions: 4.8.1
Reporter: Spyros Kapnissis
Priority: Minor
  Labels: spellchecker
 Attachments: NukeToken.patch, SOLR-6131.patch


 The deprecated Token class is used everywhere in the spelling package. I am 
 attaching a patch that refactors/replaces all occurrences with the 
 AttributeSource class. The tests are passing.
 Note: the AttributeSource class also replaces Token as a hash key in many 
 places. Having stricter equals/hashCode requirements than Token, I am a bit 
 concerned that it could produce some duplicate suggestions, especially in the 
 case of ConjunctionSolrSpellChecker where merging of the different spell 
 checking suggestions takes place. If this initial approach is fine, I can 
 create some extra checks/unit tests for this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.8.0_20-ea-b15) - Build # 10349 - Failure!

2014-06-04 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/10349/
Java: 64bit/jdk1.8.0_20-ea-b15 -XX:-UseCompressedOops -XX:+UseSerialGC

1 tests failed.
REGRESSION:  
org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings

Error Message:
startOffset must be non-negative, and endOffset must be = startOffset, 
startOffset=44,endOffset=32

Stack Trace:
java.lang.IllegalArgumentException: startOffset must be non-negative, and 
endOffset must be = startOffset, startOffset=44,endOffset=32
at 
__randomizedtesting.SeedInfo.seed([CF12B5B0721D62C6:A5490AA12B534235]:0)
at 
org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl.setOffset(OffsetAttributeImpl.java:45)
at 
org.apache.lucene.analysis.shingle.ShingleFilter.incrementToken(ShingleFilter.java:345)
at 
org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:68)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:703)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:614)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:513)
at 
org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:946)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
  

Re: Early Access builds for JDK 9 b15, JDK 8u20 b16 are available on java.net

2014-06-04 Thread Rory O'Donnell Oracle, Dublin Ireland

Hi Uwe,

I understand the fix is already in :
8u20/b05: https://bugs.openjdk.java.net/browse/JDK-8036554
9/b04 : https://bugs.openjdk.java.net/browse/JDK-8035897

Can you confirm all is ok ?

Rgds,Rory

On 04/06/2014 08:42, Uwe Schindler wrote:


Hi Rory,

thank you for the info! I installed 7u60 already (yesterday evening). 
I am happy that the MacOSX problem with Socket#accept was solved. I 
hope this fix gets also in the JDK 8 builds: 
https://bugs.openjdk.java.net/browse/JDK-8024045


This one prevents applications like Lucene, that use many file 
descriptors, to work correctly in web containers like Jetty or Tomcat 
on MacOSX server – causing SIGSEGV.


Uwe

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de http://www.thetaphi.de/

eMail: u...@thetaphi.de

*From:*Rory O'Donnell Oracle, Dublin Ireland 
[mailto:rory.odonn...@oracle.com]

*Sent:* Wednesday, June 04, 2014 9:36 AM
*To:* Uwe Schindler; Dawid Weiss
*Cc:* rory.odonn...@oracle.com; dev@lucene.apache.org; Dalibor Topic; 
Balchandra Vaidya
*Subject:* Early Access builds for JDK 9 b15, JDK 8u20 b16 are 
available on java.net


Hi Uwe,Dawid,

Early Access builds for JDK 9 b15 https://jdk9.java.net/download/, 
JDK 8u20 b16 https://jdk8.java.net/download.html are available on 
java.net.


As we enter the later phases of development for JDK 8u20 , please log 
any show

stoppers as soon as possible.

JDK 7u60 is available for download [0] .

Rgds, Rory

[0] http://www.oracle.com/technetwork/java/javase/downloads/index.html

--
Rgds,Rory O'Donnell
Quality Engineering Manager
Oracle EMEA , Dublin, Ireland


--
Rgds,Rory O'Donnell
Quality Engineering Manager
Oracle EMEA , Dublin, Ireland



RE: Early Access builds for JDK 9 b15, JDK 8u20 b16 are available on java.net

2014-06-04 Thread Uwe Schindler
Hi,

 

I checked the backlog for „JNU_NewStringPlatform” (which is part of the crash 
error message). We don’t test 8u20 on MacOSX at the moment (only on Windows and 
Linux), but I have seen no failures in recent 7u60 builds on OSX, but many of 
them with u55, u51 and u45. I would say: this is fixed unless we hit it again.

 

With the documentation on the issue it seems that this might be easier to 
reproduce if we try to run tests on OSX and raise something like number of 
concurrent HTTP transfers between Apache Solr nodes. I might give it a try 
(spawn something like 300 Jetty webservers with Solr and let them execute 
searches against each other). If I find something, I will report back.

 

In any case, thanks for the information, I trust you that it is fixed J.

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Rory O'Donnell Oracle, Dublin Ireland [mailto:rory.odonn...@oracle.com] 
Sent: Wednesday, June 04, 2014 11:59 AM
To: Uwe Schindler; 'Dawid Weiss'
Cc: dev@lucene.apache.org; 'Dalibor Topic'; 'Balchandra Vaidya'
Subject: Re: Early Access builds for JDK 9 b15, JDK 8u20 b16 are available on 
java.net

 

Hi Uwe,

I understand the fix is already in :
8u20/b05: https://bugs.openjdk.java.net/browse/JDK-8036554
9/b04 : https://bugs.openjdk.java.net/browse/JDK-8035897

Can you confirm all is ok ?

Rgds,Rory

On 04/06/2014 08:42, Uwe Schindler wrote:

Hi Rory,

 

thank you for the info! I installed 7u60 already (yesterday evening). I am 
happy that the MacOSX problem with Socket#accept was solved. I hope this fix 
gets also in the JDK 8 builds: https://bugs.openjdk.java.net/browse/JDK-8024045

This one prevents applications like Lucene, that use many file descriptors, to 
work correctly in web containers like Jetty or Tomcat on MacOSX server – 
causing SIGSEGV.

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Rory O'Donnell Oracle, Dublin Ireland [mailto:rory.odonn...@oracle.com] 
Sent: Wednesday, June 04, 2014 9:36 AM
To: Uwe Schindler; Dawid Weiss
Cc: rory.odonn...@oracle.com; dev@lucene.apache.org; Dalibor Topic; Balchandra 
Vaidya
Subject: Early Access builds for JDK 9 b15, JDK 8u20 b16 are available on 
java.net

 

Hi Uwe,Dawid,

Early Access builds for JDK 9 b15 https://jdk9.java.net/download/ ,  JDK 8u20 
b16 https://jdk8.java.net/download.html   are available on java.net.

As we enter the later phases of development for JDK 8u20 , please log any show
stoppers as soon as possible. 

JDK 7u60 is available for download [0] . 

Rgds, Rory

[0] http://www.oracle.com/technetwork/java/javase/downloads/index.html




-- 
Rgds,Rory O'Donnell
Quality Engineering Manager
Oracle EMEA , Dublin, Ireland





-- 
Rgds,Rory O'Donnell
Quality Engineering Manager
Oracle EMEA , Dublin, Ireland


[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

2014-06-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017605#comment-14017605
 ] 

Michael McCandless commented on LUCENE-4396:


Thanks Da!

When you say BNS (without bitset) vs. BS2 that means baseline=BS2
and my_version=BNS (without bitset)?  I just want to make sure I have
the direction right!

With the added bitset, couldn't you not use a linked list anymore?
Ie, just use prev/nextSetBit.  I wonder if the bitset (instead of the
linked list) could also help BooleanScorer?  Maybe test this change
separately (e.g. just modify BS we have today on trunk) to see if it
helps or hurts... if it does help, it seems like BNS could be
used (or BS could be a Scorer not a BulkScorer) even when there are no
MUST clauses?  Ie, the bitset lets us easily keep the order.  Then we
can merge BS/BNS into one?

Could you attach all new tasks as a single file in general?  Note that
when you set up a luceneutil test, you can add a task filter using
addTaskPattern, so you run just a subset of the tasks for that one
test.

Strange that the scores are still different between BS/BS2 and BNS/BS2
when using double.

If there's only 1 required clause sent to BS/BNS can't we use its scorer
instead?

Have you explored having BS interact directly with all the MUST
clauses, rather than using ConjunctionScorer?

Because we have wildly divergent results (sometimes one is much
faster, other times it's much slower) we will somehow need to add
logic to pick the right scorer for each query.  But we can defer this
until we're doneish iterating the changes to each scorer... it can
come later on.


 BooleanScorer should sometimes be used for MUST clauses
 ---

 Key: LUCENE-4396
 URL: https://issues.apache.org/jira/browse/LUCENE-4396
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, 
 LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
 LUCENE-4396.patch, luceneutil-score-equal.patch, luceneutil-score-equal.patch


 Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
 If there is one or more MUST clauses we always use BooleanScorer2.
 But I suspect that unless the MUST clauses have very low hit count compared 
 to the other clauses, that BooleanScorer would perform better than 
 BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
 handle MUST so it shouldn't be hard to bring back this capability ... I think 
 the challenging part might be the heuristics on when to use which (likely we 
 would have to use firstDocID as proxy for total hit count).
 Likely we should also have BooleanScorer sometimes use .advance() on the subs 
 in this case, eg if suddenly the MUST clause skips 100 docs then you want 
 to .advance() all the SHOULD clauses.
 I won't have near term time to work on this so feel free to take it if you 
 are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4763) Performance issue when using group.facet=true

2014-06-04 Thread Hua Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017623#comment-14017623
 ] 

Hua Jiang commented on SOLR-4763:
-

Hello, Varun. Thanks for your feedback. 

I rebuild lucene_solr on my laptop, and every tests just pass. I made this 
patch base on revision 1553089. If you are using a different revision, you may 
have to do some modification yourself. I will explain the patch a little more, 
and hope it helps.

In the unpatched code, the groupedFacetHits is a list of GroupedFacetHit 
objects, which stores unique combinations of values of the group field and the 
facet field in the previous segments. When a new segment is opened, this list 
is traversed first to recalculate the segmentGroupedFacetsIndex, because that 
value may differ from segment to segment. That's what the loop you mentioned in 
the setNextReader() is doing.

During the the recalculation, the lookupTerm() method is invoked on 
facetFieldTermsIndex and groupFieldTermsIndex. This method uses binary search 
to lookup values among all the values that appears in the group/facet field in 
the current segment.

Let's assume that we have D documents distributed in S segments. And the 
docments are distributed evenly, so that we have G and F unique values in each 
segment for the group and facet field, and that the length of the 
groupedFacetHits list after the nth segment is processed is n*L. Then the 
complexity of the recalculation is (logG + logF) * (L + 2L + ... + (S-1)L) ~ 
O((LogG + LogF)*L*S^2). It's proportion to S squared. As S grows, performance 
drops rapidly.

In the patched version, I changed groupedFacetHits from a list to a set. So the 
recalculation can be avoided, because when you get a GroupFacetHit, you just 
add it the to set without worrying about that some other GroupFacetHit with the 
same group and facet field values has been added before, because it is a set. 
The add() method on a set will return false, when the same values is already 
added.


 Performance issue when using group.facet=true
 -

 Key: SOLR-4763
 URL: https://issues.apache.org/jira/browse/SOLR-4763
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: Alexander Koval
 Attachments: SOLR-4763.patch, SOLR-4763.patch


 I do not know whether this is bug or not. But calculating facets with 
 {{group.facet=true}} is too slow.
 I have query that:
 {code}
 matches: 730597,
 ngroups: 24024,
 {code}
 1. All queries with {{group.facet=true}}:
 {code}
 QTime: 5171
 facet: {
 time: 4716
 {code}
 2. Without {{group.facet}}:
 * First query:
 {code}
 QTime: 3284
 facet: {
 time: 3104
 {code}
 * Next queries:
 {code}
 QTime: 230,
 facet: {
 time: 76
 {code}
 So I think with {{group.facet=true}} Solr doesn't use cache to calculate 
 facets.
 Is it possible to improve performance of facets when {{group.facet=true}}?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5731) split direct packed ints from in-ram ones

2014-06-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017639#comment-14017639
 ] 

Michael McCandless commented on LUCENE-5731:


+1, this looks really nice.

 split direct packed ints from in-ram ones
 -

 Key: LUCENE-5731
 URL: https://issues.apache.org/jira/browse/LUCENE-5731
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5731.patch, LUCENE-5731.patch


 Currently there is an oversharing problem in packedints that imposes too many 
 requirements on improving it:
 * every packed ints must be able to be loaded directly, or in ram, or 
 iterated with.
 * things like filepointers are expected to be adjusted (this is especially 
 stupid) in all cases
 * lots of unnecessary abstractions
 * versioning etc is complex
 None of this flexibility is needed or buys us anything, and it prevents 
 performance improvements (e.g. i just want to add 3 bytes at the end of 
 on-disk streams to reduce the number of bytebuffer calls and thats seriously 
 impossible with the current situation).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6131) Remove deprecated Token class from solr.spelling package

2014-06-04 Thread Spyros Kapnissis (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017663#comment-14017663
 ] 

Spyros Kapnissis commented on SOLR-6131:


Not sure it's that easy. There are a lot of places that it is still being used, 
even though it's obsolete since version 2.9. Any refactoring has to happen 
incrementally imo. This patch is specifically for the solr.spelling package.

 Remove deprecated Token class from solr.spelling package
 

 Key: SOLR-6131
 URL: https://issues.apache.org/jira/browse/SOLR-6131
 Project: Solr
  Issue Type: Improvement
  Components: spellchecker
Affects Versions: 4.8.1
Reporter: Spyros Kapnissis
Priority: Minor
  Labels: spellchecker
 Attachments: NukeToken.patch, SOLR-6131.patch


 The deprecated Token class is used everywhere in the spelling package. I am 
 attaching a patch that refactors/replaces all occurrences with the 
 AttributeSource class. The tests are passing.
 Note: the AttributeSource class also replaces Token as a hash key in many 
 places. Having stricter equals/hashCode requirements than Token, I am a bit 
 concerned that it could produce some duplicate suggestions, especially in the 
 case of ConjunctionSolrSpellChecker where merging of the different spell 
 checking suggestions takes place. If this initial approach is fine, I can 
 create some extra checks/unit tests for this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5703) Don't allocate/copy bytes all the time in binary DV producers

2014-06-04 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017668#comment-14017668
 ] 

Robert Muir commented on LUCENE-5703:
-

+1 to commit, thank you for taking care of this!

 Don't allocate/copy bytes all the time in binary DV producers
 -

 Key: LUCENE-5703
 URL: https://issues.apache.org/jira/browse/LUCENE-5703
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch


 Our binary doc values producers keep on creating new {{byte[]}} arrays and 
 copying bytes when a value is requested, which likely doesn't help 
 performance. This has been done because of the way fieldcache consumers used 
 the API, but we should try to fix it in 5.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5393) remove codec byte[] cloning in BinaryDocValues api

2014-06-04 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5393.
-

Resolution: Duplicate

see LUCENE-5703

 remove codec byte[] cloning in BinaryDocValues api
 --

 Key: LUCENE-5393
 URL: https://issues.apache.org/jira/browse/LUCENE-5393
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir

 I can attack this (at least in trunk/5.0, we can discuss if/when it should 
 happen for 4.x).
 See the mailing list for more discussion. this was done intentionally, to 
 prevent lots of reuse bugs.
 The issue is very simple, lots of old fieldcache-type logic has it because 
 things used to be immutable Strings or because they rely on things being in a 
 large array:
 {code}
 byte[] b1 = get(doc1);
 byte[] b2 = get(doc2);
 // some code that expects b1 to be unchanged.
 {code}
 Currently each get() internally is cloning the bytes, for safety. but this is 
 really bad for code like faceting (which is going to decompress integers and 
 never needs to save bytes), and its even stupid for things like 
 fieldcomparator (where in general its doing comparisons, and only rarely 
 needs to save a copy of the bytes for later).
 I can address it with lots of tests (i added a lot in general anyway since 
 the time of adding this TODO, but more would make me feel safer).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

2014-06-04 Thread Da Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017708#comment-14017708
 ] 

Da Huang commented on LUCENE-4396:
--

Thanks for your suggestions, Mike!

{quote}
When you say BNS (without bitset) vs. BS2 that means baseline=BS2
and my_version=BNS (without bitset)?
{quote}
Yes, this is just what I mean.

{quote}
With the added bitset, couldn't you not use a linked list anymore?
Ie, just use prev/nextSetBit. I wonder if the bitset (instead of the
linked list) could also help BooleanScorer? Maybe test this change
separately (e.g. just modify BS we have today on trunk) to see if it
helps or hurts... if it does help, it seems like BNS could be
used (or BS could be a Scorer not a BulkScorer) even when there are no
MUST clauses? Ie, the bitset lets us easily keep the order. Then we
can merge BS/BNS into one?
{quote}
Oh, that's a good idea! I will try that. However, linked list can be helpful 
when required docs is extremly sparse.

{quote}
Could you attach all new tasks as a single file in general? Note that
when you set up a luceneutil test, you can add a task filter using
addTaskPattern, so you run just a subset of the tasks for that one
test.
{quote}
Do you mean merging And.tasks and AndOr.tasks ? If so, there's no need to
do that, because And.tasks contains all tasks in AndOr.tasks, although tasks'
names are changed.
All the way, thanks for the advice on using addTaskPattern. I haven't noticed 
that.

{quote}
Strange that the scores are still different between BS/BS2 and BNS/BS2
when using double.
{quote}
I don't think it strange. Because the difference is due to the score 
calculating order.
Supposed that a doc hits +a b c, 
SCORE_BS = (float)((float)(double)score_a + (float)score_b) + (float)score_c, 
while 
SCORE_BS2 = (float)(double)score_a + ((float)score_b + (float)score_c). 
Here, (float) means that we can only get the score by .score() whose return 
type is float.
The modification on this patch can only make score_a has a temp double value.

{quote}
If there's only 1 required clause sent to BS/BNS can't we use its scorer
instead?
Have you explored having BS interact directly with all the MUST
clauses, rather than using ConjunctionScorer?
{quote}
Hmm. I don't think that would be helpful. The reason is just the same as above.

{quote}
Because we have wildly divergent results (sometimes one is much
faster, other times it's much slower) we will somehow need to add
logic to pick the right scorer for each query. But we can defer this
until we're doneish iterating the changes to each scorer... it can
come later on.
{quote}
Yes, I agree.

 BooleanScorer should sometimes be used for MUST clauses
 ---

 Key: LUCENE-4396
 URL: https://issues.apache.org/jira/browse/LUCENE-4396
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, 
 LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
 LUCENE-4396.patch, luceneutil-score-equal.patch, luceneutil-score-equal.patch


 Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
 If there is one or more MUST clauses we always use BooleanScorer2.
 But I suspect that unless the MUST clauses have very low hit count compared 
 to the other clauses, that BooleanScorer would perform better than 
 BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
 handle MUST so it shouldn't be hard to bring back this capability ... I think 
 the challenging part might be the heuristics on when to use which (likely we 
 would have to use firstDocID as proxy for total hit count).
 Likely we should also have BooleanScorer sometimes use .advance() on the subs 
 in this case, eg if suddenly the MUST clause skips 100 docs then you want 
 to .advance() all the SHOULD clauses.
 I won't have near term time to work on this so feel free to take it if you 
 are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5703) Don't allocate/copy bytes all the time in binary DV producers

2014-06-04 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017737#comment-14017737
 ] 

Robert Muir commented on LUCENE-5703:
-

I kicked this around all i could with nightly tests, running tests over and 
over, etc. I'm seeing this reproducible fail:
{noformat}
ant test  -Dtestcase=TestDistributedMissingSort 
-Dtests.method=testDistribSearch -Dtests.seed=6B475C36C0EF9CD5 
-Dtests.nightly=true -Dtests.slow=true -Dtests.locale=ar 
-Dtests.timezone=Africa/Windhoek -Dtests.file.encoding=UTF-8
{noformat}

 Don't allocate/copy bytes all the time in binary DV producers
 -

 Key: LUCENE-5703
 URL: https://issues.apache.org/jira/browse/LUCENE-5703
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch


 Our binary doc values producers keep on creating new {{byte[]}} arrays and 
 copying bytes when a value is requested, which likely doesn't help 
 performance. This has been done because of the way fieldcache consumers used 
 the API, but we should try to fix it in 5.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5650) Enforce read-only access to any path outside the temporary folder via security manager

2014-06-04 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017787#comment-14017787
 ] 

Steve Rowe commented on LUCENE-5650:


bq. But feel free to just use your patch if you want and I'll clean it up when 
I resolve that issue.

Thanks, I'll do that.


 Enforce read-only access to any path outside the temporary folder via 
 security manager
 --

 Key: LUCENE-5650
 URL: https://issues.apache.org/jira/browse/LUCENE-5650
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/test
Reporter: Ryan Ernst
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, 
 LUCENE-5650.patch, dih.patch


 The recent refactoring to all the create temp file/dir functions (which is 
 great!) has a minor regression from what existed before.  With the old 
 {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist.  
 So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that 
 dir within the per jvm working dir.  However, {{getBaseTempDirForClass()}} 
 now does asserts that check the dir exists, is a dir, and is writeable.
 Lucene uses {{.}} as {{java.io.tmpdir}}.  Then in the test security 
 manager, the per jvm cwd has read/write/execute permissions.  However, this 
 allows tests to write to their cwd, which I'm trying to protect against (by 
 setting cwd to read/execute in my test security manager).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5650) Enforce read-only access to any path outside the temporary folder via security manager

2014-06-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017788#comment-14017788
 ] 

ASF subversion and git services commented on LUCENE-5650:
-

Commit 1600310 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1600310 ]

LUCENE-5650: Reset solr.hdfs.home correctly to allow TestRecoveryHdfs tests to 
pass

 Enforce read-only access to any path outside the temporary folder via 
 security manager
 --

 Key: LUCENE-5650
 URL: https://issues.apache.org/jira/browse/LUCENE-5650
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/test
Reporter: Ryan Ernst
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, 
 LUCENE-5650.patch, dih.patch


 The recent refactoring to all the create temp file/dir functions (which is 
 great!) has a minor regression from what existed before.  With the old 
 {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist.  
 So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that 
 dir within the per jvm working dir.  However, {{getBaseTempDirForClass()}} 
 now does asserts that check the dir exists, is a dir, and is writeable.
 Lucene uses {{.}} as {{java.io.tmpdir}}.  Then in the test security 
 manager, the per jvm cwd has read/write/execute permissions.  However, this 
 allows tests to write to their cwd, which I'm trying to protect against (by 
 setting cwd to read/execute in my test security manager).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5734) HTMLStripCharFilter end offset should be left of closing tags

2014-06-04 Thread David Smiley (JIRA)
David Smiley created LUCENE-5734:


 Summary: HTMLStripCharFilter end offset should be left of closing 
tags
 Key: LUCENE-5734
 URL: https://issues.apache.org/jira/browse/LUCENE-5734
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: David Smiley
Priority: Minor


Consider this simple input:
{noformat}
emhello/em
{noformat}
to be analyzed by HTMLStripCharFilter and WhitespaceTokenizer.
You get back one token for hello.  Good.  The start offset of this token is 
at the position of 'h' -- good.  But the end offset is surprisingly plus one to 
the adjacent /em.  I argue that it should be plus one to the last character 
of the token (following 'o').

FYI it behaves as I expect if after hello is an nbsp; -- the end offset 
immediately follows the 'o'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5734) HTMLStripCharFilter end offset should be left of closing tags

2014-06-04 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5734:
-

Description: 
Consider this simple input:
{noformat}
emhello/em
{noformat}
to be analyzed by HTMLStripCharFilter and WhitespaceTokenizer.
You get back one token for hello.  Good.  The start offset of this token is 
at the position of 'h' -- good.  But the end offset is surprisingly plus one to 
the adjacent /em.  I argue that it should be plus one to the last character 
of the token (following 'o').

FYI it behaves as I expect if after hello is an XML entity such as in this 
example: {noformat}hellonbsp;{noformat} The end offset immediately follows the 
'o'.

  was:
Consider this simple input:
{noformat}
emhello/em
{noformat}
to be analyzed by HTMLStripCharFilter and WhitespaceTokenizer.
You get back one token for hello.  Good.  The start offset of this token is 
at the position of 'h' -- good.  But the end offset is surprisingly plus one to 
the adjacent /em.  I argue that it should be plus one to the last character 
of the token (following 'o').

FYI it behaves as I expect if after hello is an nbsp; -- the end offset 
immediately follows the 'o'.


 HTMLStripCharFilter end offset should be left of closing tags
 -

 Key: LUCENE-5734
 URL: https://issues.apache.org/jira/browse/LUCENE-5734
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: David Smiley
Priority: Minor

 Consider this simple input:
 {noformat}
 emhello/em
 {noformat}
 to be analyzed by HTMLStripCharFilter and WhitespaceTokenizer.
 You get back one token for hello.  Good.  The start offset of this token is 
 at the position of 'h' -- good.  But the end offset is surprisingly plus one 
 to the adjacent /em.  I argue that it should be plus one to the last 
 character of the token (following 'o').
 FYI it behaves as I expect if after hello is an XML entity such as in this 
 example: {noformat}hellonbsp;{noformat} The end offset immediately follows 
 the 'o'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5734) HTMLStripCharFilter end offset should be left of closing tags

2014-06-04 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017822#comment-14017822
 ] 

Alan Woodward commented on LUCENE-5734:
---

Steve Rowe and I discussed this a while back - there are good use cases for 
offsets to be both before and after the trailing tag.  I have a separate 
CharFilter somewhere that reports offsets the way you want here, will try and 
dig it out and attach it as a patch.

 HTMLStripCharFilter end offset should be left of closing tags
 -

 Key: LUCENE-5734
 URL: https://issues.apache.org/jira/browse/LUCENE-5734
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: David Smiley
Priority: Minor

 Consider this simple input:
 {noformat}
 emhello/em
 {noformat}
 to be analyzed by HTMLStripCharFilter and WhitespaceTokenizer.
 You get back one token for hello.  Good.  The start offset of this token is 
 at the position of 'h' -- good.  But the end offset is surprisingly plus one 
 to the adjacent /em.  I argue that it should be plus one to the last 
 character of the token (following 'o').
 FYI it behaves as I expect if after hello is an XML entity such as in this 
 example: {noformat}hellonbsp;{noformat} The end offset immediately follows 
 the 'o'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5734) HTMLStripCharFilter end offset should be left of closing tags

2014-06-04 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017826#comment-14017826
 ] 

David Smiley commented on LUCENE-5734:
--

FYI this triggered my interest because I'm trying to highlight XML.  
Technically I'm not using Lucene/Solr's highlighter as I'm doing something 
custom.  I'm going to insert special demarcation markup into the source text at 
the offsets that I find.  My current work-around is to detect that the source 
text has a closing element at the end offset, and then adjust for it if found.  
Not too hard for me.

 HTMLStripCharFilter end offset should be left of closing tags
 -

 Key: LUCENE-5734
 URL: https://issues.apache.org/jira/browse/LUCENE-5734
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: David Smiley
Priority: Minor

 Consider this simple input:
 {noformat}
 emhello/em
 {noformat}
 to be analyzed by HTMLStripCharFilter and WhitespaceTokenizer.
 You get back one token for hello.  Good.  The start offset of this token is 
 at the position of 'h' -- good.  But the end offset is surprisingly plus one 
 to the adjacent /em.  I argue that it should be plus one to the last 
 character of the token (following 'o').
 FYI it behaves as I expect if after hello is an XML entity such as in this 
 example: {noformat}hellonbsp;{noformat} The end offset immediately follows 
 the 'o'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5734) HTMLStripCharFilter end offset should be left of closing tags

2014-06-04 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017831#comment-14017831
 ] 

David Smiley commented on LUCENE-5734:
--

[~romseygeek] ok then it should be configurable, and _consistent_ too.  *if* 
the user wants a closing element offset to be included with the token (as it 
currently does) then then an adjacent opening element should mark the start of 
the token too.  IMO it shouldn't work this way by default, though.

 HTMLStripCharFilter end offset should be left of closing tags
 -

 Key: LUCENE-5734
 URL: https://issues.apache.org/jira/browse/LUCENE-5734
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: David Smiley
Priority: Minor

 Consider this simple input:
 {noformat}
 emhello/em
 {noformat}
 to be analyzed by HTMLStripCharFilter and WhitespaceTokenizer.
 You get back one token for hello.  Good.  The start offset of this token is 
 at the position of 'h' -- good.  But the end offset is surprisingly plus one 
 to the adjacent /em.  I argue that it should be plus one to the last 
 character of the token (following 'o').
 FYI it behaves as I expect if after hello is an XML entity such as in this 
 example: {noformat}hellonbsp;{noformat} The end offset immediately follows 
 the 'o'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6136) ConcurrentUpdateSolrServer includes a Spin Lock

2014-06-04 Thread Brandon Chapman (JIRA)
Brandon Chapman created SOLR-6136:
-

 Summary: ConcurrentUpdateSolrServer includes a Spin Lock
 Key: SOLR-6136
 URL: https://issues.apache.org/jira/browse/SOLR-6136
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.8.1, 4.8, 4.7.2, 4.7.1, 4.7, 4.6.1, 4.6
Reporter: Brandon Chapman
Priority: Critical


ConcurrentUpdateSolrServer.blockUntilFinished() includes a Spin Lock. This 
causes an extremely high amount of CPU to be used on the Cloud Leader during 
indexing.

Here is a summary of our system testing. 

Importing data on Solr4.5.0: 
Throughput gets as high as 240 documents per second.

[tomcat@solr-stg01 logs]$ uptime 
09:53:50 up 310 days, 23:52, 1 user, load average: 3.33, 3.72, 5.43

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
9547 tomcat 21 0 6850m 1.2g 16m S 86.2 5.0 1:48.81 java

Importing data on Solr4.7.0 with no replicas: 
Throughput peaks at 350 documents per second.

[tomcat@solr-stg01 logs]$ uptime 
10:03:44 up 311 days, 2 min, 1 user, load average: 4.57, 2.55, 4.18

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
9728 tomcat 23 0 6859m 2.2g 28m S 62.3 9.0 2:20.20 java

Importing data on Solr4.7.0 with replicas: 
Throughput peaks at 30 documents per second because the Solr machine is out of 
CPU.

[tomcat@solr-stg01 logs]$ uptime 
09:40:04 up 310 days, 23:38, 1 user, load average: 30.54, 12.39, 4.79

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
9190 tomcat 17 0 7005m 397m 15m S 198.5 1.6 7:14.87 java



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6136) ConcurrentUpdateSolrServer includes a Spin Lock

2014-06-04 Thread Brandon Chapman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017838#comment-14017838
 ] 

Brandon Chapman commented on SOLR-6136:
---

Applying the patch from the linked ticket to Solr 4.5 will cause the same issue 
to be present in Solr 4.5.

 ConcurrentUpdateSolrServer includes a Spin Lock
 ---

 Key: SOLR-6136
 URL: https://issues.apache.org/jira/browse/SOLR-6136
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6, 4.6.1, 4.7, 4.7.1, 4.7.2, 4.8, 4.8.1
Reporter: Brandon Chapman
Priority: Critical

 ConcurrentUpdateSolrServer.blockUntilFinished() includes a Spin Lock. This 
 causes an extremely high amount of CPU to be used on the Cloud Leader during 
 indexing.
 Here is a summary of our system testing. 
 Importing data on Solr4.5.0: 
 Throughput gets as high as 240 documents per second.
 [tomcat@solr-stg01 logs]$ uptime 
 09:53:50 up 310 days, 23:52, 1 user, load average: 3.33, 3.72, 5.43
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9547 tomcat 21 0 6850m 1.2g 16m S 86.2 5.0 1:48.81 java
 Importing data on Solr4.7.0 with no replicas: 
 Throughput peaks at 350 documents per second.
 [tomcat@solr-stg01 logs]$ uptime 
 10:03:44 up 311 days, 2 min, 1 user, load average: 4.57, 2.55, 4.18
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9728 tomcat 23 0 6859m 2.2g 28m S 62.3 9.0 2:20.20 java
 Importing data on Solr4.7.0 with replicas: 
 Throughput peaks at 30 documents per second because the Solr machine is out 
 of CPU.
 [tomcat@solr-stg01 logs]$ uptime 
 09:40:04 up 310 days, 23:38, 1 user, load average: 30.54, 12.39, 4.79
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9190 tomcat 17 0 7005m 397m 15m S 198.5 1.6 7:14.87 java



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6136) ConcurrentUpdateSolrServer includes a Spin Lock

2014-06-04 Thread Brandon Chapman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Chapman updated SOLR-6136:
--

Attachment: wait___notify_all.patch

Attached patch for Solr 4.7.1 drastically improves performance. The patch is a 
workaround of the spin lock by using a simple wait / notify mechanism. It is 
not a suggestion on how to fix ConcurrentUpdateSolrServer for an official 
release.

 ConcurrentUpdateSolrServer includes a Spin Lock
 ---

 Key: SOLR-6136
 URL: https://issues.apache.org/jira/browse/SOLR-6136
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6, 4.6.1, 4.7, 4.7.1, 4.7.2, 4.8, 4.8.1
Reporter: Brandon Chapman
Priority: Critical
 Attachments: wait___notify_all.patch


 ConcurrentUpdateSolrServer.blockUntilFinished() includes a Spin Lock. This 
 causes an extremely high amount of CPU to be used on the Cloud Leader during 
 indexing.
 Here is a summary of our system testing. 
 Importing data on Solr4.5.0: 
 Throughput gets as high as 240 documents per second.
 [tomcat@solr-stg01 logs]$ uptime 
 09:53:50 up 310 days, 23:52, 1 user, load average: 3.33, 3.72, 5.43
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9547 tomcat 21 0 6850m 1.2g 16m S 86.2 5.0 1:48.81 java
 Importing data on Solr4.7.0 with no replicas: 
 Throughput peaks at 350 documents per second.
 [tomcat@solr-stg01 logs]$ uptime 
 10:03:44 up 311 days, 2 min, 1 user, load average: 4.57, 2.55, 4.18
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9728 tomcat 23 0 6859m 2.2g 28m S 62.3 9.0 2:20.20 java
 Importing data on Solr4.7.0 with replicas: 
 Throughput peaks at 30 documents per second because the Solr machine is out 
 of CPU.
 [tomcat@solr-stg01 logs]$ uptime 
 09:40:04 up 310 days, 23:38, 1 user, load average: 30.54, 12.39, 4.79
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9190 tomcat 17 0 7005m 397m 15m S 198.5 1.6 7:14.87 java



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5648) Index/search multi-valued time durations

2014-06-04 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017875#comment-14017875
 ] 

Ryan McKinley commented on LUCENE-5648:
---

This stuff is looking great.  java Calendar/Date is mess...  would be nice to 
use joda-time, but adding that as a dependency is not a great idea.

The name 'NRShape' and 'NRCell' are a little funny -- maybe 
NumericRangeShape/NumericRangeCell would be better?

I vote +1 to add it as experimental and get more eyes on it

 Index/search multi-valued time durations
 

 Key: LUCENE-5648
 URL: https://issues.apache.org/jira/browse/LUCENE-5648
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: LUCENE-5648.patch, LUCENE-5648.patch, LUCENE-5648.patch, 
 LUCENE-5648.patch


 If you need to index a date/time duration, then the way to do that is to have 
 a pair of date fields; one for the start and one for the end -- pretty 
 straight-forward. But if you need to index a variable number of durations per 
 document, then the options aren't pretty, ranging from denormalization, to 
 joins, to using Lucene spatial with 2D as described 
 [here|http://wiki.apache.org/solr/SpatialForTimeDurations].  Ideally it would 
 be easier to index durations, and work in a more optimal way.
 This issue implements the aforementioned feature using Lucene-spatial with a 
 new single-dimensional SpatialPrefixTree implementation. Unlike the other two 
 SPT implementations, it's not based on floating point numbers. It will have a 
 Date based customization that indexes levels at meaningful quantities like 
 seconds, minutes, hours, etc.  The point of that alignment is to make it 
 faster to query across meaningful ranges (i.e. [2000 TO 2014]) and to enable 
 a follow-on issue to facet on the data in a really fast way.
 I'll expect to have a working patch up this week.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6103) Add DateRangeField

2014-06-04 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017886#comment-14017886
 ] 

Ryan McKinley commented on SOLR-6103:
-

+1

 Add DateRangeField
 --

 Key: SOLR-6103
 URL: https://issues.apache.org/jira/browse/SOLR-6103
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-6103.patch


 LUCENE-5648 introduced a date range index  search capability in the spatial 
 module. This issue is for a corresponding Solr FieldType to be named 
 DateRangeField. LUCENE-5648 includes a parseCalendar(String) method that 
 parses a superset of Solr's strict date format.  It also parses partial dates 
 (e.g.: 2014-10  has month specificity), and the trailing 'Z' is optional, and 
 a leading +/- may be present (minus indicates BC era), and * means 
 all-time.  The proposed field type would use it to parse a string and also 
 both ends of a range query, but furthermore it will also allow an arbitrary 
 range query of the form {{calspec TO calspec}} such as:
 {noformat}2000 TO 2014-05-21T10{noformat}
 Which parses as the year 2000 thru 2014 May 21st 10am (GMT). 
 I suggest this syntax because it is aligned with Lucene's range query syntax. 
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6136) ConcurrentUpdateSolrServer includes a Spin Lock

2014-06-04 Thread Timothy Potter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017896#comment-14017896
 ] 

Timothy Potter commented on SOLR-6136:
--

Thanks for the patch Brandon! I'll start working on this issue tomorrow unless 
someone can dig into it sooner.

 ConcurrentUpdateSolrServer includes a Spin Lock
 ---

 Key: SOLR-6136
 URL: https://issues.apache.org/jira/browse/SOLR-6136
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6, 4.6.1, 4.7, 4.7.1, 4.7.2, 4.8, 4.8.1
Reporter: Brandon Chapman
Priority: Critical
 Attachments: wait___notify_all.patch


 ConcurrentUpdateSolrServer.blockUntilFinished() includes a Spin Lock. This 
 causes an extremely high amount of CPU to be used on the Cloud Leader during 
 indexing.
 Here is a summary of our system testing. 
 Importing data on Solr4.5.0: 
 Throughput gets as high as 240 documents per second.
 [tomcat@solr-stg01 logs]$ uptime 
 09:53:50 up 310 days, 23:52, 1 user, load average: 3.33, 3.72, 5.43
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9547 tomcat 21 0 6850m 1.2g 16m S 86.2 5.0 1:48.81 java
 Importing data on Solr4.7.0 with no replicas: 
 Throughput peaks at 350 documents per second.
 [tomcat@solr-stg01 logs]$ uptime 
 10:03:44 up 311 days, 2 min, 1 user, load average: 4.57, 2.55, 4.18
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9728 tomcat 23 0 6859m 2.2g 28m S 62.3 9.0 2:20.20 java
 Importing data on Solr4.7.0 with replicas: 
 Throughput peaks at 30 documents per second because the Solr machine is out 
 of CPU.
 [tomcat@solr-stg01 logs]$ uptime 
 09:40:04 up 310 days, 23:38, 1 user, load average: 30.54, 12.39, 4.79
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9190 tomcat 17 0 7005m 397m 15m S 198.5 1.6 7:14.87 java



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5734) HTMLStripCharFilter end offset should be left of closing tags

2014-06-04 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017912#comment-14017912
 ] 

Steve Rowe commented on LUCENE-5734:


bq. Steve Rowe and I discussed this a while back

On twitter: https://twitter.com/romseygeek/status/433553268577681408

 HTMLStripCharFilter end offset should be left of closing tags
 -

 Key: LUCENE-5734
 URL: https://issues.apache.org/jira/browse/LUCENE-5734
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: David Smiley
Priority: Minor

 Consider this simple input:
 {noformat}
 emhello/em
 {noformat}
 to be analyzed by HTMLStripCharFilter and WhitespaceTokenizer.
 You get back one token for hello.  Good.  The start offset of this token is 
 at the position of 'h' -- good.  But the end offset is surprisingly plus one 
 to the adjacent /em.  I argue that it should be plus one to the last 
 character of the token (following 'o').
 FYI it behaves as I expect if after hello is an XML entity such as in this 
 example: {noformat}hellonbsp;{noformat} The end offset immediately follows 
 the 'o'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5731) split direct packed ints from in-ram ones

2014-06-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017968#comment-14017968
 ] 

ASF subversion and git services commented on LUCENE-5731:
-

Commit 1600412 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1600412 ]

LUCENE-5731: split out direct packed ints from in-ram ones

 split direct packed ints from in-ram ones
 -

 Key: LUCENE-5731
 URL: https://issues.apache.org/jira/browse/LUCENE-5731
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5731.patch, LUCENE-5731.patch


 Currently there is an oversharing problem in packedints that imposes too many 
 requirements on improving it:
 * every packed ints must be able to be loaded directly, or in ram, or 
 iterated with.
 * things like filepointers are expected to be adjusted (this is especially 
 stupid) in all cases
 * lots of unnecessary abstractions
 * versioning etc is complex
 None of this flexibility is needed or buys us anything, and it prevents 
 performance improvements (e.g. i just want to add 3 bytes at the end of 
 on-disk streams to reduce the number of bytebuffer calls and thats seriously 
 impossible with the current situation).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5715) Upgrade direct dependencies known to be older than transitive dependencies

2014-06-04 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017998#comment-14017998
 ] 

Steve Rowe commented on LUCENE-5715:


Committing shortly.

 Upgrade direct dependencies known to be older than transitive dependencies
 --

 Key: LUCENE-5715
 URL: https://issues.apache.org/jira/browse/LUCENE-5715
 Project: Lucene - Core
  Issue Type: Task
  Components: general/build
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Attachments: LUCENE-5715.patch


 LUCENE-5442 added functionality to the {{check-lib-versions}} ant task to 
 fail the build if a direct dependency's version conflicts with that of a 
 transitive dependency.
 {{ivy-ignore-conflicts.properties}} contains a list of 19 transitive 
 dependencies with versions that are newer than direct dependencies' versions: 
 https://issues.apache.org/jira/browse/LUCENE-5442?focusedCommentId=14012220page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14012220
 We should try to keep that list small.  It's likely that upgrading most of 
 those dependencies will require little effort.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5703) Don't allocate/copy bytes all the time in binary DV producers

2014-06-04 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018004#comment-14018004
 ] 

Robert Muir commented on LUCENE-5703:
-

I have a fix, I will update the patch in a bit (also with a test).

 Don't allocate/copy bytes all the time in binary DV producers
 -

 Key: LUCENE-5703
 URL: https://issues.apache.org/jira/browse/LUCENE-5703
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch


 Our binary doc values producers keep on creating new {{byte[]}} arrays and 
 copying bytes when a value is requested, which likely doesn't help 
 performance. This has been done because of the way fieldcache consumers used 
 the API, but we should try to fix it in 5.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5734) HTMLStripCharFilter end offset should be left of closing tags

2014-06-04 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018003#comment-14018003
 ] 

David Smiley commented on LUCENE-5734:
--

The essential part of that conversation you had on Twitter, [~steve_rowe], is 
this:
{quote}
ic - i guess the only awkwardness would be embedded inline tags that produce 
single tokens: somebthing/b - something
{quote}

In that case, where the token includes an opening tag, I would expect the end 
offset to be where it is placed now, after the close tag.  But otherwise (the 
case I presented) I wouldn't expect this.

 HTMLStripCharFilter end offset should be left of closing tags
 -

 Key: LUCENE-5734
 URL: https://issues.apache.org/jira/browse/LUCENE-5734
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: David Smiley
Priority: Minor

 Consider this simple input:
 {noformat}
 emhello/em
 {noformat}
 to be analyzed by HTMLStripCharFilter and WhitespaceTokenizer.
 You get back one token for hello.  Good.  The start offset of this token is 
 at the position of 'h' -- good.  But the end offset is surprisingly plus one 
 to the adjacent /em.  I argue that it should be plus one to the last 
 character of the token (following 'o').
 FYI it behaves as I expect if after hello is an XML entity such as in this 
 example: {noformat}hellonbsp;{noformat} The end offset immediately follows 
 the 'o'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5728) use slice() api in packedints decode

2014-06-04 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5728.
-

Resolution: Duplicate

see LUCENE-5731

 use slice() api in packedints decode
 

 Key: LUCENE-5728
 URL: https://issues.apache.org/jira/browse/LUCENE-5728
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5728.patch


 Today, for example 8-bpv decoder looks like this:
 {code}
 in.seek(startPointer + index);
 return in.readByte()  0xFF;
 {code}
 If instead we take a slice of 'in', we can remove an addition. Its not much, 
 but helps a little. additionally we already (in PackedInts.java) compute the 
 number of bytes, so we could make this an actual slice of the range, which 
 would return an error on abuse instead of garbage data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5729) explore random-access methods to IndexInput

2014-06-04 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5729.
-

Resolution: Duplicate

See LUCENE-5731

 explore random-access methods to IndexInput
 ---

 Key: LUCENE-5729
 URL: https://issues.apache.org/jira/browse/LUCENE-5729
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir

 Traditionally lucene access is mostly reading lists of postings and geared at 
 that, but for random-access stuff like docvalues, it just creates overhead.
 So today we are hacking around it, by doing this random access with 
 seek+readXXX, but this is inefficient (additional checks by the jdk that we 
 dont need).
 As a hack, I added the following to IndexInput, changed direct packed ints 
 decode to use them, and implemented in MMapDir:
 {code}
 byte readByte(long pos) -- ByteBuffer.get(pos)
 short readShort(long pos) -- ByteBuffer.getShort(pos)
 int readInt(long pos) -- ByteBuffer.getInt(pos)
 long readLong(long pos) -- ByteBuffer.getLong(pos)
 {code}
 This gives ~30% performance improvement for docvalues (numerics, sorting 
 strings, etc)
 We should do a few things first before working this (LUCENE-5728: use slice 
 api in decode, pad packed ints so we only have one i/o call ever, etc etc) 
 but I think we need to figure out such an API.
 It could either be on indexinput like my hack (this is similar to ByteBuffer 
 API with both relative and absolute methods), or we could have a separate 
 API. But i guess arguably IOContext exists to supply hints too, so I dont 
 know which is the way to go.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5715) Upgrade direct dependencies known to be older than transitive dependencies

2014-06-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018005#comment-14018005
 ] 

Uwe Schindler commented on LUCENE-5715:
---

+1, I see no problem.

Which module tried to import ASM 5.0_BETA?

 Upgrade direct dependencies known to be older than transitive dependencies
 --

 Key: LUCENE-5715
 URL: https://issues.apache.org/jira/browse/LUCENE-5715
 Project: Lucene - Core
  Issue Type: Task
  Components: general/build
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Attachments: LUCENE-5715.patch


 LUCENE-5442 added functionality to the {{check-lib-versions}} ant task to 
 fail the build if a direct dependency's version conflicts with that of a 
 transitive dependency.
 {{ivy-ignore-conflicts.properties}} contains a list of 19 transitive 
 dependencies with versions that are newer than direct dependencies' versions: 
 https://issues.apache.org/jira/browse/LUCENE-5442?focusedCommentId=14012220page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14012220
 We should try to keep that list small.  It's likely that upgrading most of 
 those dependencies will require little effort.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5731) split direct packed ints from in-ram ones

2014-06-04 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5731.
-

   Resolution: Fixed
Fix Version/s: 5.0
   4.9

 split direct packed ints from in-ram ones
 -

 Key: LUCENE-5731
 URL: https://issues.apache.org/jira/browse/LUCENE-5731
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5731.patch, LUCENE-5731.patch


 Currently there is an oversharing problem in packedints that imposes too many 
 requirements on improving it:
 * every packed ints must be able to be loaded directly, or in ram, or 
 iterated with.
 * things like filepointers are expected to be adjusted (this is especially 
 stupid) in all cases
 * lots of unnecessary abstractions
 * versioning etc is complex
 None of this flexibility is needed or buys us anything, and it prevents 
 performance improvements (e.g. i just want to add 3 bytes at the end of 
 on-disk streams to reduce the number of bytebuffer calls and thats seriously 
 impossible with the current situation).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5731) split direct packed ints from in-ram ones

2014-06-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018007#comment-14018007
 ] 

ASF subversion and git services commented on LUCENE-5731:
-

Commit 1600423 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1600423 ]

LUCENE-5731: split out direct packed ints from in-ram ones

 split direct packed ints from in-ram ones
 -

 Key: LUCENE-5731
 URL: https://issues.apache.org/jira/browse/LUCENE-5731
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5731.patch, LUCENE-5731.patch


 Currently there is an oversharing problem in packedints that imposes too many 
 requirements on improving it:
 * every packed ints must be able to be loaded directly, or in ram, or 
 iterated with.
 * things like filepointers are expected to be adjusted (this is especially 
 stupid) in all cases
 * lots of unnecessary abstractions
 * versioning etc is complex
 None of this flexibility is needed or buys us anything, and it prevents 
 performance improvements (e.g. i just want to add 3 bytes at the end of 
 on-disk streams to reduce the number of bytebuffer calls and thats seriously 
 impossible with the current situation).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5734) HTMLStripCharFilter end offset should be left of closing tags

2014-06-04 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018010#comment-14018010
 ] 

Steve Rowe commented on LUCENE-5734:


Right, but you can't have it both ways, though - you have to make a choice.

 HTMLStripCharFilter end offset should be left of closing tags
 -

 Key: LUCENE-5734
 URL: https://issues.apache.org/jira/browse/LUCENE-5734
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: David Smiley
Priority: Minor

 Consider this simple input:
 {noformat}
 emhello/em
 {noformat}
 to be analyzed by HTMLStripCharFilter and WhitespaceTokenizer.
 You get back one token for hello.  Good.  The start offset of this token is 
 at the position of 'h' -- good.  But the end offset is surprisingly plus one 
 to the adjacent /em.  I argue that it should be plus one to the last 
 character of the token (following 'o').
 FYI it behaves as I expect if after hello is an XML entity such as in this 
 example: {noformat}hellonbsp;{noformat} The end offset immediately follows 
 the 'o'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5715) Upgrade direct dependencies known to be older than transitive dependencies

2014-06-04 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018013#comment-14018013
 ] 

Steve Rowe commented on LUCENE-5715:


bq. Which module tried to import ASM 5.0_BETA?

{noformat}
[libversions] VERSION CONFLICT: transitive dependency in module(s) 
solr-test-framework, core-test-framework:
[libversions] /com.carrotsearch.randomizedtesting/junit4-ant=2.1.3
[libversions] +-- /org.ow2.asm/asm=5.0_BETA  Conflict (direct=4.1)
{noformat}

 Upgrade direct dependencies known to be older than transitive dependencies
 --

 Key: LUCENE-5715
 URL: https://issues.apache.org/jira/browse/LUCENE-5715
 Project: Lucene - Core
  Issue Type: Task
  Components: general/build
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Attachments: LUCENE-5715.patch


 LUCENE-5442 added functionality to the {{check-lib-versions}} ant task to 
 fail the build if a direct dependency's version conflicts with that of a 
 transitive dependency.
 {{ivy-ignore-conflicts.properties}} contains a list of 19 transitive 
 dependencies with versions that are newer than direct dependencies' versions: 
 https://issues.apache.org/jira/browse/LUCENE-5442?focusedCommentId=14012220page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14012220
 We should try to keep that list small.  It's likely that upgrading most of 
 those dependencies will require little effort.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5731) split direct packed ints from in-ram ones

2014-06-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018025#comment-14018025
 ] 

Uwe Schindler commented on LUCENE-5731:
---

Thanks Robert. I was very busy today, so I had no time to look into it. But 
from my first check it looks like our idea from the talk yesterday :-)

{code:java}
@Override
public RandomAccessInput randomAccessSlice(long offset, long length) throws 
IOException {
  // note: technically we could even avoid the clone...
  return slice(null, offset, length);
}
{code}

We can avoid the clone not in all cases, because we must duplicate the 
ByteBuffer, if the offset is different. But for the simple case, if you request 
the full IndexInput as slice (means offset==null, length==this.length), we 
could return this.

 split direct packed ints from in-ram ones
 -

 Key: LUCENE-5731
 URL: https://issues.apache.org/jira/browse/LUCENE-5731
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5731.patch, LUCENE-5731.patch


 Currently there is an oversharing problem in packedints that imposes too many 
 requirements on improving it:
 * every packed ints must be able to be loaded directly, or in ram, or 
 iterated with.
 * things like filepointers are expected to be adjusted (this is especially 
 stupid) in all cases
 * lots of unnecessary abstractions
 * versioning etc is complex
 None of this flexibility is needed or buys us anything, and it prevents 
 performance improvements (e.g. i just want to add 3 bytes at the end of 
 on-disk streams to reduce the number of bytebuffer calls and thats seriously 
 impossible with the current situation).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5731) split direct packed ints from in-ram ones

2014-06-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018025#comment-14018025
 ] 

Uwe Schindler edited comment on LUCENE-5731 at 6/4/14 7:07 PM:
---

Thanks Robert. I was very busy today, so I had no time to look into it. But 
from my first check it looks like our idea from the talk yesterday :-)

{code:java}
@Override
public RandomAccessInput randomAccessSlice(long offset, long length) throws 
IOException {
  // note: technically we could even avoid the clone...
  return slice(null, offset, length);
}
{code}

We can avoid the clone not in all cases, because we must duplicate the 
ByteBuffer, if the offset is different. But for the simple case, if you request 
the full IndexInput as slice (means offset==0L, length==this.length), we could 
return this.


was (Author: thetaphi):
Thanks Robert. I was very busy today, so I had no time to look into it. But 
from my first check it looks like our idea from the talk yesterday :-)

{code:java}
@Override
public RandomAccessInput randomAccessSlice(long offset, long length) throws 
IOException {
  // note: technically we could even avoid the clone...
  return slice(null, offset, length);
}
{code}

We can avoid the clone not in all cases, because we must duplicate the 
ByteBuffer, if the offset is different. But for the simple case, if you request 
the full IndexInput as slice (means offset==null, length==this.length), we 
could return this.

 split direct packed ints from in-ram ones
 -

 Key: LUCENE-5731
 URL: https://issues.apache.org/jira/browse/LUCENE-5731
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5731.patch, LUCENE-5731.patch


 Currently there is an oversharing problem in packedints that imposes too many 
 requirements on improving it:
 * every packed ints must be able to be loaded directly, or in ram, or 
 iterated with.
 * things like filepointers are expected to be adjusted (this is especially 
 stupid) in all cases
 * lots of unnecessary abstractions
 * versioning etc is complex
 None of this flexibility is needed or buys us anything, and it prevents 
 performance improvements (e.g. i just want to add 3 bytes at the end of 
 on-disk streams to reduce the number of bytebuffer calls and thats seriously 
 impossible with the current situation).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5715) Upgrade direct dependencies known to be older than transitive dependencies

2014-06-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018036#comment-14018036
 ] 

ASF subversion and git services commented on LUCENE-5715:
-

Commit 1600444 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1600444 ]

LUCENE-5715: Upgrade direct dependencies known to be older than transitive 
dependencies

 Upgrade direct dependencies known to be older than transitive dependencies
 --

 Key: LUCENE-5715
 URL: https://issues.apache.org/jira/browse/LUCENE-5715
 Project: Lucene - Core
  Issue Type: Task
  Components: general/build
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Attachments: LUCENE-5715.patch


 LUCENE-5442 added functionality to the {{check-lib-versions}} ant task to 
 fail the build if a direct dependency's version conflicts with that of a 
 transitive dependency.
 {{ivy-ignore-conflicts.properties}} contains a list of 19 transitive 
 dependencies with versions that are newer than direct dependencies' versions: 
 https://issues.apache.org/jira/browse/LUCENE-5442?focusedCommentId=14012220page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14012220
 We should try to keep that list small.  It's likely that upgrading most of 
 those dependencies will require little effort.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5731) split direct packed ints from in-ram ones

2014-06-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018025#comment-14018025
 ] 

Uwe Schindler edited comment on LUCENE-5731 at 6/4/14 7:16 PM:
---

Thanks Robert. I was very busy today, so I had no time to look into it. But 
from my first check it looks like our idea from the talk yesterday :-) I was 
afraid to propose to implement this using an interface, thanks for doing it 
that way. Otherwise we would have crazyness in ByteBufferIndexInput. The 
interface hidden behind the randomAccessSlice() method just returning slice() 
is wonderful.

{code:java}
@Override
public RandomAccessInput randomAccessSlice(long offset, long length) throws 
IOException {
  // note: technically we could even avoid the clone...
  return slice(null, offset, length);
}
{code}

We can avoid the clone not in all cases, because we must duplicate the 
ByteBuffer, if the offset is different. But for the simple case, if you request 
the full IndexInput as slice (means offset==0L, length==this.length), we could 
return this.


was (Author: thetaphi):
Thanks Robert. I was very busy today, so I had no time to look into it. But 
from my first check it looks like our idea from the talk yesterday :-)

{code:java}
@Override
public RandomAccessInput randomAccessSlice(long offset, long length) throws 
IOException {
  // note: technically we could even avoid the clone...
  return slice(null, offset, length);
}
{code}

We can avoid the clone not in all cases, because we must duplicate the 
ByteBuffer, if the offset is different. But for the simple case, if you request 
the full IndexInput as slice (means offset==0L, length==this.length), we could 
return this.

 split direct packed ints from in-ram ones
 -

 Key: LUCENE-5731
 URL: https://issues.apache.org/jira/browse/LUCENE-5731
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5731.patch, LUCENE-5731.patch


 Currently there is an oversharing problem in packedints that imposes too many 
 requirements on improving it:
 * every packed ints must be able to be loaded directly, or in ram, or 
 iterated with.
 * things like filepointers are expected to be adjusted (this is especially 
 stupid) in all cases
 * lots of unnecessary abstractions
 * versioning etc is complex
 None of this flexibility is needed or buys us anything, and it prevents 
 performance improvements (e.g. i just want to add 3 bytes at the end of 
 on-disk streams to reduce the number of bytebuffer calls and thats seriously 
 impossible with the current situation).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6134) MapReduce GoLive code improvements

2014-06-04 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018057#comment-14018057
 ] 

David Smiley commented on SOLR-6134:


One small changed needed to satisfy ant precommit is to use a non-default 
ThreadFactory, such as by doing this:
{code:java}
final ExecutorService executor = 
Executors.newFixedThreadPool(options.goLiveThreads,
new DefaultSolrThreadFactory(goLive));
{code}

 MapReduce GoLive code improvements
 --

 Key: SOLR-6134
 URL: https://issues.apache.org/jira/browse/SOLR-6134
 Project: Solr
  Issue Type: Improvement
  Components: contrib - MapReduce
Reporter: David Smiley
Priority: Minor
 Attachments: SOLR-6134_GoLive.patch


 I looked at the GoLive.java source quite a bit and found myself editing the 
 source to make it clearer.  It wasn't hard to understand before but I felt it 
 could be better.  Furthermore, when not in SolrCloud mode, the commit 
 messages are now submitted asynchronously using the same thread pool used for 
 merging.
 This refactoring does away with the inner class Result, the 
 CompletionService, and any keeping track of Future's/Result's in collections 
 and looping over them.  Fundamentally the code never cared about the result; 
 it just wanted to know if it all worked or not.  This refactoring uses Java's 
 Phaser concurrency utility which may seem advanced (especially with the 
 cool name :-) but I find it quite understandable how to use, and is very 
 flexible. I added an inner class implementing Runnable to avoid some 
 duplication across the merge and commit phases.
 The tests pass but I confess to not having used it for real.  I certainly 
 don't feel comfortable committing this until someone does try it; especially 
 try and break it ;-).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5731) split direct packed ints from in-ram ones

2014-06-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018025#comment-14018025
 ] 

Uwe Schindler edited comment on LUCENE-5731 at 6/4/14 7:31 PM:
---

Thanks Robert. I was very busy today, so I had no time to look into it. But 
from my first check it looks like our idea from the talk yesterday :-) I was 
afraid to propose to implement this using an interface, thanks for doing it 
that way. Otherwise we would have crazyness in ByteBufferIndexInput. The 
interface hidden behind the randomAccessSlice() method just returning slice() 
is wonderful.

{code:java}
@Override
public RandomAccessInput randomAccessSlice(long offset, long length) throws 
IOException {
  // note: technically we could even avoid the clone...
  return slice(null, offset, length);
}
{code}

We can avoid the clone not in all cases, because we must duplicate the 
ByteBuffer, if the offset is different. But for the simple case, if you request 
the full IndexInput as slice (means offset==0L, length==this.length), we could 
return this.

EDIT: we cannot do this at the moment, because in the multi-mmap case, we 
change the bytebuffers's position. So we always have to clone (otherwise the 
random access slice would have side effects on file position of master slice).


was (Author: thetaphi):
Thanks Robert. I was very busy today, so I had no time to look into it. But 
from my first check it looks like our idea from the talk yesterday :-) I was 
afraid to propose to implement this using an interface, thanks for doing it 
that way. Otherwise we would have crazyness in ByteBufferIndexInput. The 
interface hidden behind the randomAccessSlice() method just returning slice() 
is wonderful.

{code:java}
@Override
public RandomAccessInput randomAccessSlice(long offset, long length) throws 
IOException {
  // note: technically we could even avoid the clone...
  return slice(null, offset, length);
}
{code}

We can avoid the clone not in all cases, because we must duplicate the 
ByteBuffer, if the offset is different. But for the simple case, if you request 
the full IndexInput as slice (means offset==0L, length==this.length), we could 
return this.

 split direct packed ints from in-ram ones
 -

 Key: LUCENE-5731
 URL: https://issues.apache.org/jira/browse/LUCENE-5731
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5731.patch, LUCENE-5731.patch


 Currently there is an oversharing problem in packedints that imposes too many 
 requirements on improving it:
 * every packed ints must be able to be loaded directly, or in ram, or 
 iterated with.
 * things like filepointers are expected to be adjusted (this is especially 
 stupid) in all cases
 * lots of unnecessary abstractions
 * versioning etc is complex
 None of this flexibility is needed or buys us anything, and it prevents 
 performance improvements (e.g. i just want to add 3 bytes at the end of 
 on-disk streams to reduce the number of bytebuffer calls and thats seriously 
 impossible with the current situation).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5715) Upgrade direct dependencies known to be older than transitive dependencies

2014-06-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018067#comment-14018067
 ] 

ASF subversion and git services commented on LUCENE-5715:
-

Commit 1600473 from [~steve_rowe] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1600473 ]

LUCENE-5715: Upgrade direct dependencies known to be older than transitive 
dependencies (merged trunk r1600444)

 Upgrade direct dependencies known to be older than transitive dependencies
 --

 Key: LUCENE-5715
 URL: https://issues.apache.org/jira/browse/LUCENE-5715
 Project: Lucene - Core
  Issue Type: Task
  Components: general/build
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Attachments: LUCENE-5715.patch


 LUCENE-5442 added functionality to the {{check-lib-versions}} ant task to 
 fail the build if a direct dependency's version conflicts with that of a 
 transitive dependency.
 {{ivy-ignore-conflicts.properties}} contains a list of 19 transitive 
 dependencies with versions that are newer than direct dependencies' versions: 
 https://issues.apache.org/jira/browse/LUCENE-5442?focusedCommentId=14012220page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14012220
 We should try to keep that list small.  It's likely that upgrading most of 
 those dependencies will require little effort.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5715) Upgrade direct dependencies known to be older than transitive dependencies

2014-06-04 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe resolved LUCENE-5715.


   Resolution: Fixed
Fix Version/s: 4.9

Committed to trunk and branch_4x.

 Upgrade direct dependencies known to be older than transitive dependencies
 --

 Key: LUCENE-5715
 URL: https://issues.apache.org/jira/browse/LUCENE-5715
 Project: Lucene - Core
  Issue Type: Task
  Components: general/build
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Fix For: 4.9

 Attachments: LUCENE-5715.patch


 LUCENE-5442 added functionality to the {{check-lib-versions}} ant task to 
 fail the build if a direct dependency's version conflicts with that of a 
 transitive dependency.
 {{ivy-ignore-conflicts.properties}} contains a list of 19 transitive 
 dependencies with versions that are newer than direct dependencies' versions: 
 https://issues.apache.org/jira/browse/LUCENE-5442?focusedCommentId=14012220page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14012220
 We should try to keep that list small.  It's likely that upgrading most of 
 those dependencies will require little effort.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene/Solr 5?

2014-06-04 Thread Lajos

Hi all,

Just coming back to this question of mine from the fall ... Given that 
the pace of things has accelerated quite a bit (and quite nicely) 
lately, does anyone have concrete plans for a 5.0 release yet? Would we 
be talking summer's end or (hopefully) earlier?


Cheers,

L



On 02/10/2013 16:56, Shawn Heisey wrote:

On 10/2/2013 4:34 AM, la...@protulae.com wrote:

Your pending 4.5 release reminds me I wanted to ask - what is
the expected timeframe for 5.0? Are we talking end of year? Q1 2014? Later?

I’m not asking for any commitment or firm date - I would just appreciate
an indication of what y’all are thinking of right now.


Here's the tail end of a message that I just sent to solr-user:

A 4.6.0 release will probably happen before the end of the year, but I
can't guarantee that.

The release schedule for 5.0 is *completely* undecided.  It might be a
few months from now, it might be a year from now.  Some of the things
that have been tentatively planned for that release are nowhere near
finished.

Thanks,
Shawn


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene/Solr 5?

2014-06-04 Thread Jack Krupansky
The 4.x branch seems to be doing well enough, both from a stability 
perspective and momentum with new features.


Yeah, a year ago I would have expected a 5.0 around now, but... sometimes 
reality happens.


I'll offer a prediction: 5.0 will happen when the Lucene guys at 
Elasticsearch come up with some great new ideas for how to leapfrog Solr! 
(And then we watch how the Heliosearch guys respond to that!)


-- Jack Krupansky

-Original Message- 
From: Lajos

Sent: Wednesday, June 4, 2014 3:44 PM
To: dev@lucene.apache.org
Subject: Re: Lucene/Solr 5?

Hi all,

Just coming back to this question of mine from the fall ... Given that
the pace of things has accelerated quite a bit (and quite nicely)
lately, does anyone have concrete plans for a 5.0 release yet? Would we
be talking summer's end or (hopefully) earlier?

Cheers,

L



On 02/10/2013 16:56, Shawn Heisey wrote:

On 10/2/2013 4:34 AM, la...@protulae.com wrote:

Your pending 4.5 release reminds me I wanted to ask - what is
the expected timeframe for 5.0? Are we talking end of year? Q1 2014? 
Later?


I’m not asking for any commitment or firm date - I would just appreciate
an indication of what y’all are thinking of right now.


Here's the tail end of a message that I just sent to solr-user:

A 4.6.0 release will probably happen before the end of the year, but I
can't guarantee that.

The release schedule for 5.0 is *completely* undecided.  It might be a
few months from now, it might be a year from now.  Some of the things
that have been tentatively planned for that release are nowhere near
finished.

Thanks,
Shawn


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5627) Positional joins

2014-06-04 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018114#comment-14018114
 ] 

Paul Elschot commented on LUCENE-5627:
--

The javadocs here contain some references on what was used to make this.
Meanwhile I had another look around and found two somewhat similar 
implementations:

Luxdb:  https://github.com/msokolov/lux
This uses a TaggedTokenStream for the XML tags, see 
http://www.slideshare.net/lucenerevolution/querying-rich-text-with-xquery

Fangorn: https://code.google.com/p/fangorn/
This indexes each tag by adding a payload with four position numbers (left, 
right, depth, parent).
Its target is large treebanks of linguistically parsed text.

A first impression:
Both are based on Lucene and add a tree of XML tags like the label tree here.
They have a query language implementation which is not available here.
They do not have labeled fragments in the sense of having 0..n tokens in more 
than one field that can form a single leaf in the tag tree.

 Positional joins
 

 Key: LUCENE-5627
 URL: https://issues.apache.org/jira/browse/LUCENE-5627
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Paul Elschot
Priority: Minor

 Prototype of analysis and search for labeled fragments



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #1153: POMs out of sync

2014-06-04 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/1153/

3 tests failed.
FAILED:  org.apache.solr.cloud.MultiThreadedOCPTest.testDistribSearch

Error Message:
Task 3002 did not complete, final state: running

Stack Trace:
java.lang.AssertionError: Task 3002 did not complete, final state: running
at 
__randomizedtesting.SeedInfo.seed([542F77FBEDC2170E:D5C9F9E39A9D7732]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.solr.cloud.MultiThreadedOCPTest.testDeduplicationOfSubmittedTasks(MultiThreadedOCPTest.java:158)
at 
org.apache.solr.cloud.MultiThreadedOCPTest.doTest(MultiThreadedOCPTest.java:67)


FAILED:  
org.apache.solr.cloud.MultiThreadedOCPTest.org.apache.solr.cloud.MultiThreadedOCPTest

Error Message:
1 thread leaked from SUITE scope at org.apache.solr.cloud.MultiThreadedOCPTest: 
   1) Thread[id=7297, 
name=TEST-MultiThreadedOCPTest.testDistribSearch-seed#[542F77FBEDC2170E]-EventThread,
 state=RUNNABLE, group=TGRP-MultiThreadedOCPTest]
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:318)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.PrintStream.write(PrintStream.java:482)
at 
org.apache.maven.surefire.booter.ForkingRunListener.writeTestOutput(ForkingRunListener.java:178)
at 
org.apache.maven.surefire.report.ConsoleOutputCapture$ForwardingPrintStream.write(ConsoleOutputCapture.java:64)
at 
org.apache.maven.surefire.report.ConsoleOutputCapture$ForwardingPrintStream.write(ConsoleOutputCapture.java:73)
at java.io.FilterOutputStream.write(FilterOutputStream.java:77)
at 
org.apache.lucene.util.TestRuleLimitSysouts$DelegateStream.write(TestRuleLimitSysouts.java:134)
at java.io.FilterOutputStream.write(FilterOutputStream.java:125)
at 
org.apache.lucene.util.TestRuleLimitSysouts$DelegateStream.write(TestRuleLimitSysouts.java:128)
at java.io.PrintStream.write(PrintStream.java:480)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59)
at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324)
at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
at 
org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
at org.apache.log4j.Category.callAppenders(Category.java:206)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:304)
at 
org.apache.solr.cloud.DistributedQueue$LatchChildWatcher.process(DistributedQueue.java:263)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.solr.cloud.MultiThreadedOCPTest: 
   1) Thread[id=7297, 
name=TEST-MultiThreadedOCPTest.testDistribSearch-seed#[542F77FBEDC2170E]-EventThread,
 state=RUNNABLE, group=TGRP-MultiThreadedOCPTest]
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:318)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.PrintStream.write(PrintStream.java:482)
at 
org.apache.maven.surefire.booter.ForkingRunListener.writeTestOutput(ForkingRunListener.java:178)
at 
org.apache.maven.surefire.report.ConsoleOutputCapture$ForwardingPrintStream.write(ConsoleOutputCapture.java:64)
at 
org.apache.maven.surefire.report.ConsoleOutputCapture$ForwardingPrintStream.write(ConsoleOutputCapture.java:73)
at java.io.FilterOutputStream.write(FilterOutputStream.java:77)
at 
org.apache.lucene.util.TestRuleLimitSysouts$DelegateStream.write(TestRuleLimitSysouts.java:134)
at java.io.FilterOutputStream.write(FilterOutputStream.java:125)
at 
org.apache.lucene.util.TestRuleLimitSysouts$DelegateStream.write(TestRuleLimitSysouts.java:128)
at java.io.PrintStream.write(PrintStream.java:480)
at 

[jira] [Commented] (SOLR-4408) Server hanging on startup

2014-06-04 Thread simpleliving (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018168#comment-14018168
 ] 

simpleliving commented on SOLR-4408:



I am facing the same exact issue as reported by the reporter of this ticket and 
I am using Solr 4.7

 Server hanging on startup
 -

 Key: SOLR-4408
 URL: https://issues.apache.org/jira/browse/SOLR-4408
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.1
 Environment: OpenJDK 64-Bit Server VM (23.2-b09 mixed mode)
 Tomcat 7.0
 Eclipse Juno + WTP
Reporter: Francois-Xavier Bonnet
Assignee: Erick Erickson
 Attachments: patch-4408.txt


 While starting, the server hangs indefinitely. Everything works fine when I 
 first start the server with no index created yet but if I fill the index then 
 stop and start the server, it hangs. Could it be a lock that is never 
 released?
 Here is what I get in a full thread dump:
 2013-02-06 16:28:52
 Full thread dump OpenJDK 64-Bit Server VM (23.2-b09 mixed mode):
 searcherExecutor-4-thread-1 prio=10 tid=0x7fbdfc16a800 nid=0x42c6 in 
 Object.wait() [0x7fbe0ab1]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xc34c1c48 (a java.lang.Object)
   at java.lang.Object.wait(Object.java:503)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1492)
   - locked 0xc34c1c48 (a java.lang.Object)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1312)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1247)
   at 
 org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:94)
   at 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:213)
   at 
 org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:112)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:203)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:180)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
   at 
 org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:64)
   at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1594)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:722)
 coreLoadExecutor-3-thread-1 prio=10 tid=0x7fbe04194000 nid=0x42c5 in 
 Object.wait() [0x7fbe0ac11000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xc34c1c48 (a java.lang.Object)
   at java.lang.Object.wait(Object.java:503)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1492)
   - locked 0xc34c1c48 (a java.lang.Object)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1312)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1247)
   at 
 org.apache.solr.handler.ReplicationHandler.getIndexVersion(ReplicationHandler.java:495)
   at 
 org.apache.solr.handler.ReplicationHandler.getStatistics(ReplicationHandler.java:518)
   at 
 org.apache.solr.core.JmxMonitoredMap$SolrDynamicMBean.getMBeanInfo(JmxMonitoredMap.java:232)
   at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333)
   at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319)
   at 
 com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:512)
   at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:140)
   at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:51)
   at 
 org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:636)
   at org.apache.solr.core.SolrCore.init(SolrCore.java:809)
   at org.apache.solr.core.SolrCore.init(SolrCore.java:607)
   at 
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1003)
   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033)
   at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
   at 

[jira] [Comment Edited] (SOLR-4408) Server hanging on startup

2014-06-04 Thread simpleliving (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018168#comment-14018168
 ] 

simpleliving edited comment on SOLR-4408 at 6/4/14 9:00 PM:


I am facing the same exact issue as reported by the reporter of this ticket and 
I am using Solr 4.7 . If there is not index the server starts , if there is an 
index present it hangs and does not start and I am using the spellcheckers and 
I can confirm that using spellcheckers cause this issue.


was (Author: simpleliving):

I am facing the same exact issue as reported by the reporter of this ticket and 
I am using Solr 4.7

 Server hanging on startup
 -

 Key: SOLR-4408
 URL: https://issues.apache.org/jira/browse/SOLR-4408
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.1
 Environment: OpenJDK 64-Bit Server VM (23.2-b09 mixed mode)
 Tomcat 7.0
 Eclipse Juno + WTP
Reporter: Francois-Xavier Bonnet
Assignee: Erick Erickson
 Attachments: patch-4408.txt


 While starting, the server hangs indefinitely. Everything works fine when I 
 first start the server with no index created yet but if I fill the index then 
 stop and start the server, it hangs. Could it be a lock that is never 
 released?
 Here is what I get in a full thread dump:
 2013-02-06 16:28:52
 Full thread dump OpenJDK 64-Bit Server VM (23.2-b09 mixed mode):
 searcherExecutor-4-thread-1 prio=10 tid=0x7fbdfc16a800 nid=0x42c6 in 
 Object.wait() [0x7fbe0ab1]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xc34c1c48 (a java.lang.Object)
   at java.lang.Object.wait(Object.java:503)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1492)
   - locked 0xc34c1c48 (a java.lang.Object)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1312)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1247)
   at 
 org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:94)
   at 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:213)
   at 
 org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:112)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:203)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:180)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
   at 
 org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:64)
   at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1594)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:722)
 coreLoadExecutor-3-thread-1 prio=10 tid=0x7fbe04194000 nid=0x42c5 in 
 Object.wait() [0x7fbe0ac11000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xc34c1c48 (a java.lang.Object)
   at java.lang.Object.wait(Object.java:503)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1492)
   - locked 0xc34c1c48 (a java.lang.Object)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1312)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1247)
   at 
 org.apache.solr.handler.ReplicationHandler.getIndexVersion(ReplicationHandler.java:495)
   at 
 org.apache.solr.handler.ReplicationHandler.getStatistics(ReplicationHandler.java:518)
   at 
 org.apache.solr.core.JmxMonitoredMap$SolrDynamicMBean.getMBeanInfo(JmxMonitoredMap.java:232)
   at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333)
   at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319)
   at 
 com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:512)
   at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:140)
   at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:51)
   at 
 org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:636)
   at 

[JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 23286 - Failure!

2014-06-04 Thread builder
Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/23286/

2 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.lucene.search.TestControlledRealTimeReopenThread

Error Message:
1 thread leaked from SUITE scope at 
org.apache.lucene.search.TestControlledRealTimeReopenThread: 1) 
Thread[id=119, name=Thread-53, state=TIMED_WAITING, 
group=TGRP-TestControlledRealTimeReopenThread] at 
sun.misc.Unsafe.park(Native Method) at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) 
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
 at 
org.apache.lucene.search.ControlledRealTimeReopenThread.run(ControlledRealTimeReopenThread.java:223)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.lucene.search.TestControlledRealTimeReopenThread: 
   1) Thread[id=119, name=Thread-53, state=TIMED_WAITING, 
group=TGRP-TestControlledRealTimeReopenThread]
at sun.misc.Unsafe.park(Native Method)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at 
org.apache.lucene.search.ControlledRealTimeReopenThread.run(ControlledRealTimeReopenThread.java:223)
at __randomizedtesting.SeedInfo.seed([2651BE7982C65DFA]:0)


REGRESSION:  
org.apache.lucene.search.TestControlledRealTimeReopenThread.testCRTReopen

Error Message:
waited too long for generation 25376

Stack Trace:
java.lang.AssertionError: waited too long for generation 25376
at 
__randomizedtesting.SeedInfo.seed([2651BE7982C65DFA:7471234D5601E583]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.lucene.search.TestControlledRealTimeReopenThread.testCRTReopen(TestControlledRealTimeReopenThread.java:519)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 

[jira] [Updated] (SOLR-6123) The 'clusterstatus' API filtered by collection times out if a long running operation is in progress

2014-06-04 Thread Anshum Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated SOLR-6123:
---

Attachment: SOLR-6123.patch

Updated patch.

 The 'clusterstatus' API filtered by collection times out if a long running 
 operation is in progress
 ---

 Key: SOLR-6123
 URL: https://issues.apache.org/jira/browse/SOLR-6123
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.9
Reporter: Shalin Shekhar Mangar
Assignee: Anshum Gupta
 Fix For: 4.9

 Attachments: SOLR-6123.patch, SOLR-6123.patch


 If a long running shard split is in progress, say for collection=X, then 
 clusterstatus API with collection=X will time out.
 The OverseerCollectionProcessor should never block an operation such as 
 clusterstatus even if there are tasks for the same collection in progress.
 This bug was introduced by SOLR-5681.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6123) The 'clusterstatus' API filtered by collection times out if a long running operation is in progress

2014-06-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018309#comment-14018309
 ] 

ASF subversion and git services commented on SOLR-6123:
---

Commit 1600535 from [~anshumg] in branch 'dev/trunk'
[ https://svn.apache.org/r1600535 ]

SOLR-6123: Make CLUSTERSTATE Api unblocked and non-blocking always

 The 'clusterstatus' API filtered by collection times out if a long running 
 operation is in progress
 ---

 Key: SOLR-6123
 URL: https://issues.apache.org/jira/browse/SOLR-6123
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.9
Reporter: Shalin Shekhar Mangar
Assignee: Anshum Gupta
 Fix For: 4.9

 Attachments: SOLR-6123.patch, SOLR-6123.patch


 If a long running shard split is in progress, say for collection=X, then 
 clusterstatus API with collection=X will time out.
 The OverseerCollectionProcessor should never block an operation such as 
 clusterstatus even if there are tasks for the same collection in progress.
 This bug was introduced by SOLR-5681.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Managed Schema and SolrCloud

2014-06-04 Thread Steve Rowe
Hi Greg,

Your understanding is correct, and I agree that this limits managed schema 
functionality.

Under SolrCloud, all Solr nodes participating in a collection bound to a 
configset with a managed schema keep a watch on the corresponding schema ZK 
node.  In my testing (on my laptop), when the managed schema is written to ZK, 
the other nodes are notified very quickly (single-digit milliseconds) and 
immediately download and start parsing the schema.  Incoming requests are bound 
to a snapshot of the live schema at the time they arrive, so there is a window 
of time between initial posting to ZK and swapping out the schema after 
parsing.  Different loads on, and/or different network latentcy between ZK and 
each participating node can result in varying latencies before all nodes are in 
sync.

For Schema API users, delaying a couple of seconds after adding fields before 
using them should workaround this problem.  While not ideal, I think schema 
field additions are rare enough in the Solr collection lifecycle that this is 
not a huge problem.

For schemaless users, the picture is worse, as you noted.  Immediate 
distribution of documents triggering schema field addition could easily prove 
problematic.  Maybe we need a schema update blocking mode, where after the ZK 
schema node watch is triggered, all new request processing is halted until the 
schema is finished downloading/parsing/swapping out?  Can you make an issue, 
Greg?  (Such a mode should help Schema API users too.)

Thanks,
Steve

On Jun 3, 2014, at 8:06 PM, Gregory Chanan gcha...@cloudera.com wrote:

 I'm trying to determine if the Managed Schema functionality works with 
 SolrCloud, and AFAICT the integration seems pretty limited.
 
 The issue I'm running into is variants of the issue that schema changes are 
 not pushed to all shards/replicas synchronously.  So, for example, I can make 
 the following two requests:
 1) add a field to the collection on server1 using the Schema API
 2) add a document with the new field, the document is routed to a core on 
 server2
 
 Then, there appears to be a race between when the document is processed by 
 the core on server2 and when the core on server2, via the 
 ZkIndexSchemaReader, gets the new schema.  If the document is processed 
 first, I get a 400 error because the field doesn't exist.  This is easily 
 reproducible by adding a sleep to the ZkIndexSchemaReader's processing.
 
 I hit a similar issue with Schemaless: the distributed request handler sends 
 out the document updates, but there is no guarantee that the other 
 shards/replicas see the schema changes made by the update.chain.
 
 Is my understanding correct?  Is this expected?
 
 Thanks,
 Greg


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6123) The 'clusterstatus' API filtered by collection times out if a long running operation is in progress

2014-06-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018324#comment-14018324
 ] 

ASF subversion and git services commented on SOLR-6123:
---

Commit 1600538 from [~anshumg] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1600538 ]

SOLR-6123: Make CLUSTERSTATE Api unblocked and non-blocking always (Merge from 
trunk r1600535)

 The 'clusterstatus' API filtered by collection times out if a long running 
 operation is in progress
 ---

 Key: SOLR-6123
 URL: https://issues.apache.org/jira/browse/SOLR-6123
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.9
Reporter: Shalin Shekhar Mangar
Assignee: Anshum Gupta
 Fix For: 4.9

 Attachments: SOLR-6123.patch, SOLR-6123.patch


 If a long running shard split is in progress, say for collection=X, then 
 clusterstatus API with collection=X will time out.
 The OverseerCollectionProcessor should never block an operation such as 
 clusterstatus even if there are tasks for the same collection in progress.
 This bug was introduced by SOLR-5681.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-6123) The 'clusterstatus' API filtered by collection times out if a long running operation is in progress

2014-06-04 Thread Anshum Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta resolved SOLR-6123.


Resolution: Fixed

 The 'clusterstatus' API filtered by collection times out if a long running 
 operation is in progress
 ---

 Key: SOLR-6123
 URL: https://issues.apache.org/jira/browse/SOLR-6123
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.9
Reporter: Shalin Shekhar Mangar
Assignee: Anshum Gupta
 Fix For: 4.9

 Attachments: SOLR-6123.patch, SOLR-6123.patch


 If a long running shard split is in progress, say for collection=X, then 
 clusterstatus API with collection=X will time out.
 The OverseerCollectionProcessor should never block an operation such as 
 clusterstatus even if there are tasks for the same collection in progress.
 This bug was introduced by SOLR-5681.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6130) solr-cell dependencies weren't fully upgraded with the Tika 1.4-1.5 upgrade

2014-06-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018358#comment-14018358
 ] 

ASF subversion and git services commented on SOLR-6130:
---

Commit 1600544 from [~steve_rowe] in branch 'dev/branches/lucene_solr_4_8'
[ https://svn.apache.org/r1600544 ]

SOLR-6130: Added com.uwyn:jhighlight dependency to, and removed asm:asm 
dependency from the extraction contrib - dependencies weren't fully upgraded 
with the Tika 1.4-1.5 upgrade (SOLR-5763) (merged trunk r1599663)

 solr-cell dependencies weren't fully upgraded with the Tika 1.4-1.5 upgrade
 

 Key: SOLR-6130
 URL: https://issues.apache.org/jira/browse/SOLR-6130
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8
Reporter: Steve Rowe
Assignee: Steve Rowe
 Fix For: 4.9, 5.0, 4.8.2

 Attachments: SOLR-6130.patch


 There are problems with the solr-cell dependency configuration:
 # Despite the fact that the asm:asm dependency was removed in LUCENE-4263, 
 and re-addition effectively vetoed by Uwe/Robert in SOLR-4209, asm:asm:3.1 
 was re-added with no apparent discussion by SOLR-1301 in Solr 4.7.
 # The Tika 1.5 upgrade (SOLR-5763) failed to properly upgrade the asm:asm:3.1 
 dependency to org.ow2.asm:asm-debug-all:4.1 (see TIKA-1053).
 # New Tika dependency com.uwyn:jhighlight:1.0 was not added.
 [~thetaphi], do you have any opinions on the asm issues?  In particular, 
 would it make sense to have an additional asm dependency (asm-debug-all in 
 addition to asm)?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5763) Upgrade to Tika 1.5

2014-06-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018359#comment-14018359
 ] 

ASF subversion and git services commented on SOLR-5763:
---

Commit 1600544 from [~steve_rowe] in branch 'dev/branches/lucene_solr_4_8'
[ https://svn.apache.org/r1600544 ]

SOLR-6130: Added com.uwyn:jhighlight dependency to, and removed asm:asm 
dependency from the extraction contrib - dependencies weren't fully upgraded 
with the Tika 1.4-1.5 upgrade (SOLR-5763) (merged trunk r1599663)

 Upgrade to Tika 1.5
 ---

 Key: SOLR-5763
 URL: https://issues.apache.org/jira/browse/SOLR-5763
 Project: Solr
  Issue Type: Task
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-5763.patch, SOLR-5763.patch, SOLR-5763.patch


 Just released: http://www.apache.org/dist/tika/CHANGES-1.5.txt



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-6130) solr-cell dependencies weren't fully upgraded with the Tika 1.4-1.5 upgrade

2014-06-04 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe resolved SOLR-6130.
--

Resolution: Fixed

Committed to trunk, branch_4x, and the lucene_solr_4_8 branch (in case there is 
a 4.8.2 release)

 solr-cell dependencies weren't fully upgraded with the Tika 1.4-1.5 upgrade
 

 Key: SOLR-6130
 URL: https://issues.apache.org/jira/browse/SOLR-6130
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8
Reporter: Steve Rowe
Assignee: Steve Rowe
 Fix For: 4.9, 5.0, 4.8.2

 Attachments: SOLR-6130.patch


 There are problems with the solr-cell dependency configuration:
 # Despite the fact that the asm:asm dependency was removed in LUCENE-4263, 
 and re-addition effectively vetoed by Uwe/Robert in SOLR-4209, asm:asm:3.1 
 was re-added with no apparent discussion by SOLR-1301 in Solr 4.7.
 # The Tika 1.5 upgrade (SOLR-5763) failed to properly upgrade the asm:asm:3.1 
 dependency to org.ow2.asm:asm-debug-all:4.1 (see TIKA-1053).
 # New Tika dependency com.uwyn:jhighlight:1.0 was not added.
 [~thetaphi], do you have any opinions on the asm issues?  In particular, 
 would it make sense to have an additional asm dependency (asm-debug-all in 
 addition to asm)?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5703) Don't allocate/copy bytes all the time in binary DV producers

2014-06-04 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5703:


Attachment: LUCENE-5703.patch

Updated to trunk. Added TestFieldCacheSortRandom. Fixed bug in the original 
patch with FC (it cannot share here), so instead getTermsIndex returns a light 
iterator over the real thing just like docTermsOrds. Because of this, I had 
to also fix any bad test assumptions around 'same'. Also cleaned up the 
termsenum in the default codec a bit: the methods like doSeek/doNext are stupid 
and I removed them.

Ill beast and review some more, but this is all looking good.

 Don't allocate/copy bytes all the time in binary DV producers
 -

 Key: LUCENE-5703
 URL: https://issues.apache.org/jira/browse/LUCENE-5703
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch, 
 LUCENE-5703.patch


 Our binary doc values producers keep on creating new {{byte[]}} arrays and 
 copying bytes when a value is requested, which likely doesn't help 
 performance. This has been done because of the way fieldcache consumers used 
 the API, but we should try to fix it in 5.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5648) Index/search multi-valued time durations

2014-06-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018405#comment-14018405
 ] 

ASF subversion and git services commented on LUCENE-5648:
-

Commit 1600555 from [~dsmiley] in branch 'dev/trunk'
[ https://svn.apache.org/r1600555 ]

LUCENE-5648: DateRangePrefixTree and NumberRangePrefixTreeStrategy

 Index/search multi-valued time durations
 

 Key: LUCENE-5648
 URL: https://issues.apache.org/jira/browse/LUCENE-5648
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: LUCENE-5648.patch, LUCENE-5648.patch, LUCENE-5648.patch, 
 LUCENE-5648.patch


 If you need to index a date/time duration, then the way to do that is to have 
 a pair of date fields; one for the start and one for the end -- pretty 
 straight-forward. But if you need to index a variable number of durations per 
 document, then the options aren't pretty, ranging from denormalization, to 
 joins, to using Lucene spatial with 2D as described 
 [here|http://wiki.apache.org/solr/SpatialForTimeDurations].  Ideally it would 
 be easier to index durations, and work in a more optimal way.
 This issue implements the aforementioned feature using Lucene-spatial with a 
 new single-dimensional SpatialPrefixTree implementation. Unlike the other two 
 SPT implementations, it's not based on floating point numbers. It will have a 
 Date based customization that indexes levels at meaningful quantities like 
 seconds, minutes, hours, etc.  The point of that alignment is to make it 
 faster to query across meaningful ranges (i.e. [2000 TO 2014]) and to enable 
 a follow-on issue to facet on the data in a really fast way.
 I'll expect to have a working patch up this week.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5703) Don't allocate/copy bytes all the time in binary DV producers

2014-06-04 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5703:


Attachment: LUCENE-5703.patch

Updated patch: folds in an unrelated test bug fix from beasting slow+nightly... 

 Don't allocate/copy bytes all the time in binary DV producers
 -

 Key: LUCENE-5703
 URL: https://issues.apache.org/jira/browse/LUCENE-5703
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch, 
 LUCENE-5703.patch, LUCENE-5703.patch


 Our binary doc values producers keep on creating new {{byte[]}} arrays and 
 copying bytes when a value is requested, which likely doesn't help 
 performance. This has been done because of the way fieldcache consumers used 
 the API, but we should try to fix it in 5.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5648) Index/search multi-valued time durations

2014-06-04 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-5648.
--

   Resolution: Fixed
Fix Version/s: 5.0

The NR abbreviation is purely used on internal classes and it's referenced a 
lot so I don't worry about it's succinct name.

I committed against 5x.  LUCENE-5608 (spatial api refactoring) is a dependency 
which is still 5x; maybe I should back-port that to 4x now or soon.  Or wait a 
bit to see if further changes may arrive when I try to facet.

 Index/search multi-valued time durations
 

 Key: LUCENE-5648
 URL: https://issues.apache.org/jira/browse/LUCENE-5648
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 5.0

 Attachments: LUCENE-5648.patch, LUCENE-5648.patch, LUCENE-5648.patch, 
 LUCENE-5648.patch


 If you need to index a date/time duration, then the way to do that is to have 
 a pair of date fields; one for the start and one for the end -- pretty 
 straight-forward. But if you need to index a variable number of durations per 
 document, then the options aren't pretty, ranging from denormalization, to 
 joins, to using Lucene spatial with 2D as described 
 [here|http://wiki.apache.org/solr/SpatialForTimeDurations].  Ideally it would 
 be easier to index durations, and work in a more optimal way.
 This issue implements the aforementioned feature using Lucene-spatial with a 
 new single-dimensional SpatialPrefixTree implementation. Unlike the other two 
 SPT implementations, it's not based on floating point numbers. It will have a 
 Date based customization that indexes levels at meaningful quantities like 
 seconds, minutes, hours, etc.  The point of that alignment is to make it 
 faster to query across meaningful ranges (i.e. [2000 TO 2014]) and to enable 
 a follow-on issue to facet on the data in a really fast way.
 I'll expect to have a working patch up this week.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

2014-06-04 Thread Da Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018417#comment-14018417
 ] 

Da Huang commented on LUCENE-4396:
--

About scores diff. on BS/BS2 (the same as BNS/BS2)

Now, there's scores diff. on BS/BS2, when excuting query like +a b c d 

I have been told that the reason is indicate by 
the TODO on ReqOptSumScorer.score() which says that
{code}
// TODO: sum into a double and cast to float if we ever send required clauses 
to BS1
{code}

However, I don't think so, as the score bias is due to
different score calculating orders.

Supposed that a doc hits the query +a b c d, the score calculated by BS is 
{code}
BS.score(doc) = ((a.score() + b.score()) + c.score()) + d.score()
{code}
while the score calculated by BS2 is 
{code}
BS2.score(doc) = a.score() + (float)(b.score() + c.score() + d.score())
{code}

Notice that, in BS2, we can only get the float value of (b.score() + c.score() 
+ d.score())
by reqScorer.score().

Furthermore, I have noticed that actually we can control the BS's 
score calulating order, so that 
{code}
BS.score(doc) = a.score() + ((b.score() + c.score()) + d.score())
{code}
However, for BS2, we do not know the calculating order of 
(b.score() + c.score() + d.score()), as the order is determined by 
scorer's position in a heap. I still think this matters little.

I will rearrange the calculating order of BS.score() at next patch, 
to see whether it works.


 BooleanScorer should sometimes be used for MUST clauses
 ---

 Key: LUCENE-4396
 URL: https://issues.apache.org/jira/browse/LUCENE-4396
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, 
 LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
 LUCENE-4396.patch, luceneutil-score-equal.patch, luceneutil-score-equal.patch


 Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
 If there is one or more MUST clauses we always use BooleanScorer2.
 But I suspect that unless the MUST clauses have very low hit count compared 
 to the other clauses, that BooleanScorer would perform better than 
 BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
 handle MUST so it shouldn't be hard to bring back this capability ... I think 
 the challenging part might be the heuristics on when to use which (likely we 
 would have to use firstDocID as proxy for total hit count).
 Likely we should also have BooleanScorer sometimes use .advance() on the subs 
 in this case, eg if suddenly the MUST clause skips 100 docs then you want 
 to .advance() all the SHOULD clauses.
 I won't have near term time to work on this so feel free to take it if you 
 are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6103) Add DateRangeField

2014-06-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018416#comment-14018416
 ] 

ASF subversion and git services commented on SOLR-6103:
---

Commit 1600556 from [~dsmiley] in branch 'dev/trunk'
[ https://svn.apache.org/r1600556 ]

SOLR-6103: Add QParser arg to AbstractSpatialFieldType.parseSpatialArgs(). Make 
getQueryFromSpatialArgs protected no private.

 Add DateRangeField
 --

 Key: SOLR-6103
 URL: https://issues.apache.org/jira/browse/SOLR-6103
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-6103.patch


 LUCENE-5648 introduced a date range index  search capability in the spatial 
 module. This issue is for a corresponding Solr FieldType to be named 
 DateRangeField. LUCENE-5648 includes a parseCalendar(String) method that 
 parses a superset of Solr's strict date format.  It also parses partial dates 
 (e.g.: 2014-10  has month specificity), and the trailing 'Z' is optional, and 
 a leading +/- may be present (minus indicates BC era), and * means 
 all-time.  The proposed field type would use it to parse a string and also 
 both ends of a range query, but furthermore it will also allow an arbitrary 
 range query of the form {{calspec TO calspec}} such as:
 {noformat}2000 TO 2014-05-21T10{noformat}
 Which parses as the year 2000 thru 2014 May 21st 10am (GMT). 
 I suggest this syntax because it is aligned with Lucene's range query syntax. 
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6103) Add DateRangeField

2014-06-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018418#comment-14018418
 ] 

ASF subversion and git services commented on SOLR-6103:
---

Commit 1600557 from [~dsmiley] in branch 'dev/trunk'
[ https://svn.apache.org/r1600557 ]

SOLR-6103: DateRangeField

 Add DateRangeField
 --

 Key: SOLR-6103
 URL: https://issues.apache.org/jira/browse/SOLR-6103
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-6103.patch


 LUCENE-5648 introduced a date range index  search capability in the spatial 
 module. This issue is for a corresponding Solr FieldType to be named 
 DateRangeField. LUCENE-5648 includes a parseCalendar(String) method that 
 parses a superset of Solr's strict date format.  It also parses partial dates 
 (e.g.: 2014-10  has month specificity), and the trailing 'Z' is optional, and 
 a leading +/- may be present (minus indicates BC era), and * means 
 all-time.  The proposed field type would use it to parse a string and also 
 both ends of a range query, but furthermore it will also allow an arbitrary 
 range query of the form {{calspec TO calspec}} such as:
 {noformat}2000 TO 2014-05-21T10{noformat}
 Which parses as the year 2000 thru 2014 May 21st 10am (GMT). 
 I suggest this syntax because it is aligned with Lucene's range query syntax. 
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-6103) Add DateRangeField

2014-06-04 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-6103.


   Resolution: Fixed
Fix Version/s: 5.0

Committed to 5x for now; intend to move to 4x soon-ish.

Please try it out folks!

Faceting to come...

 Add DateRangeField
 --

 Key: SOLR-6103
 URL: https://issues.apache.org/jira/browse/SOLR-6103
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 5.0

 Attachments: SOLR-6103.patch


 LUCENE-5648 introduced a date range index  search capability in the spatial 
 module. This issue is for a corresponding Solr FieldType to be named 
 DateRangeField. LUCENE-5648 includes a parseCalendar(String) method that 
 parses a superset of Solr's strict date format.  It also parses partial dates 
 (e.g.: 2014-10  has month specificity), and the trailing 'Z' is optional, and 
 a leading +/- may be present (minus indicates BC era), and * means 
 all-time.  The proposed field type would use it to parse a string and also 
 both ends of a range query, but furthermore it will also allow an arbitrary 
 range query of the form {{calspec TO calspec}} such as:
 {noformat}2000 TO 2014-05-21T10{noformat}
 Which parses as the year 2000 thru 2014 May 21st 10am (GMT). 
 I suggest this syntax because it is aligned with Lucene's range query syntax. 
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6137) Managed Schema / Schemaless and SolrCloud concurrency issues

2014-06-04 Thread Gregory Chanan (JIRA)
Gregory Chanan created SOLR-6137:


 Summary: Managed Schema / Schemaless and SolrCloud concurrency 
issues
 Key: SOLR-6137
 URL: https://issues.apache.org/jira/browse/SOLR-6137
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis, SolrCloud
Reporter: Gregory Chanan


This is a follow up to a message on the mailing list, linked here: 
http://mail-archives.apache.org/mod_mbox/lucene-dev/201406.mbox/%3CCAKfebOOcMeVEb010SsdcH8nta%3DyonMK5R7dSFOsbJ_tnre0O7w%40mail.gmail.com%3E

The Managed Schema integration with SolrCloud seems pretty limited.

The issue I'm running into is variants of the issue that schema changes are not 
pushed to all shards/replicas synchronously.  So, for example, I can make the 
following two requests:
1) add a field to the collection on server1 using the Schema API
2) add a document with the new field, the document is routed to a core on 
server2

Then, there appears to be a race between when the document is processed by the 
core on server2 and when the core on server2, via the ZkIndexSchemaReader, gets 
the new schema.  If the document is processed first, I get a 400 error because 
the field doesn't exist.  This is easily reproducible by adding a sleep to the 
ZkIndexSchemaReader's processing.

I hit a similar issue with Schemaless: the distributed request handler sends 
out the document updates, but there is no guarantee that the other 
shards/replicas see the schema changes made by the update.chain.

Another issue I noticed today: making multiple schema API calls concurrently 
can block; that is, one may get through and the other may infinite loop.

So, for reference, the issues include:
1) Schema API changes return success before all cores are updated; subsequent 
calls attempting to use new schema may fail
2) Schemaless changes may fail on replicas/other shards for the same reason
3) Concurrent Schema API changes may block

From Steve Rowe on the mailing list:
{quote}
For Schema API users, delaying a couple of seconds after adding fields before 
using them should workaround this problem.  While not ideal, I think schema 
field additions are rare enough in the Solr collection lifecycle that this is 
not a huge problem.

For schemaless users, the picture is worse, as you noted.  Immediate 
distribution of documents triggering schema field addition could easily prove 
problematic.  Maybe we need a schema update blocking mode, where after the ZK 
schema node watch is triggered, all new request processing is halted until the 
schema is finished downloading/parsing/swapping out? (Such a mode should help 
Schema API users too.)
{quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5648) Index/search multi-valued time durations

2014-06-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018422#comment-14018422
 ] 

ASF subversion and git services commented on LUCENE-5648:
-

Commit 1600560 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1600560 ]

LUCENE-5648: unbreak ant test

 Index/search multi-valued time durations
 

 Key: LUCENE-5648
 URL: https://issues.apache.org/jira/browse/LUCENE-5648
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 5.0

 Attachments: LUCENE-5648.patch, LUCENE-5648.patch, LUCENE-5648.patch, 
 LUCENE-5648.patch


 If you need to index a date/time duration, then the way to do that is to have 
 a pair of date fields; one for the start and one for the end -- pretty 
 straight-forward. But if you need to index a variable number of durations per 
 document, then the options aren't pretty, ranging from denormalization, to 
 joins, to using Lucene spatial with 2D as described 
 [here|http://wiki.apache.org/solr/SpatialForTimeDurations].  Ideally it would 
 be easier to index durations, and work in a more optimal way.
 This issue implements the aforementioned feature using Lucene-spatial with a 
 new single-dimensional SpatialPrefixTree implementation. Unlike the other two 
 SPT implementations, it's not based on floating point numbers. It will have a 
 Date based customization that indexes levels at meaningful quantities like 
 seconds, minutes, hours, etc.  The point of that alignment is to make it 
 faster to query across meaningful ranges (i.e. [2000 TO 2014]) and to enable 
 a follow-on issue to facet on the data in a really fast way.
 I'll expect to have a working patch up this week.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6137) Managed Schema / Schemaless and SolrCloud concurrency issues

2014-06-04 Thread Gregory Chanan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018425#comment-14018425
 ] 

Gregory Chanan commented on SOLR-6137:
--

The Schema API blocking mode is an interesting idea, I'd want to think more 
about that.

In some sense, the schemaless issue seems easier to solve than the Schema API 
issue.  This is because if we run all (or more) of the update chain, instead of 
just skipping to the distributed update handler on the forwarded nodes, we 
could have all the cores apply the schema changes, so we are guaranteed of 
having the correct schema on each core.  We'd need to be smarter about trying 
to update the schema in ZK (as I noted above, concurrent schema changes may 
fail currently).  But that doesn't seem impossible.

The Schema API issue does seem more difficult.  A blocking mode could work in 
theory, though I guess one complication is you need to wait for all the cores 
that use the config, not just all the cores of the collection.  Although, 
perhaps we should just throw in some checks that only one collection is using a 
certain managed schema config at a time; it may make the logic easier and it 
seems very unlikely the user actually wants to use the same schema for multiple 
collections (I did that myself the first time before realizing why it didn't 
make any sense).

As Steve noted above, a blocking mode could be used by the schemaless 
functionality as well, instead of what I wrote above.

 Managed Schema / Schemaless and SolrCloud concurrency issues
 

 Key: SOLR-6137
 URL: https://issues.apache.org/jira/browse/SOLR-6137
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis, SolrCloud
Reporter: Gregory Chanan

 This is a follow up to a message on the mailing list, linked here: 
 http://mail-archives.apache.org/mod_mbox/lucene-dev/201406.mbox/%3CCAKfebOOcMeVEb010SsdcH8nta%3DyonMK5R7dSFOsbJ_tnre0O7w%40mail.gmail.com%3E
 The Managed Schema integration with SolrCloud seems pretty limited.
 The issue I'm running into is variants of the issue that schema changes are 
 not pushed to all shards/replicas synchronously.  So, for example, I can make 
 the following two requests:
 1) add a field to the collection on server1 using the Schema API
 2) add a document with the new field, the document is routed to a core on 
 server2
 Then, there appears to be a race between when the document is processed by 
 the core on server2 and when the core on server2, via the 
 ZkIndexSchemaReader, gets the new schema.  If the document is processed 
 first, I get a 400 error because the field doesn't exist.  This is easily 
 reproducible by adding a sleep to the ZkIndexSchemaReader's processing.
 I hit a similar issue with Schemaless: the distributed request handler sends 
 out the document updates, but there is no guarantee that the other 
 shards/replicas see the schema changes made by the update.chain.
 Another issue I noticed today: making multiple schema API calls concurrently 
 can block; that is, one may get through and the other may infinite loop.
 So, for reference, the issues include:
 1) Schema API changes return success before all cores are updated; subsequent 
 calls attempting to use new schema may fail
 2) Schemaless changes may fail on replicas/other shards for the same reason
 3) Concurrent Schema API changes may block
 From Steve Rowe on the mailing list:
 {quote}
 For Schema API users, delaying a couple of seconds after adding fields before 
 using them should workaround this problem.  While not ideal, I think schema 
 field additions are rare enough in the Solr collection lifecycle that this is 
 not a huge problem.
 For schemaless users, the picture is worse, as you noted.  Immediate 
 distribution of documents triggering schema field addition could easily prove 
 problematic.  Maybe we need a schema update blocking mode, where after the ZK 
 schema node watch is triggered, all new request processing is halted until 
 the schema is finished downloading/parsing/swapping out? (Such a mode should 
 help Schema API users too.)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Managed Schema and SolrCloud

2014-06-04 Thread Gregory Chanan
Thanks for the reply, Steve.

I filed SOLR-6137.

Greg


On Wed, Jun 4, 2014 at 4:08 PM, Steve Rowe sar...@gmail.com wrote:

 Hi Greg,

 Your understanding is correct, and I agree that this limits managed schema
 functionality.

 Under SolrCloud, all Solr nodes participating in a collection bound to a
 configset with a managed schema keep a watch on the corresponding schema ZK
 node.  In my testing (on my laptop), when the managed schema is written to
 ZK, the other nodes are notified very quickly (single-digit milliseconds)
 and immediately download and start parsing the schema.  Incoming requests
 are bound to a snapshot of the live schema at the time they arrive, so
 there is a window of time between initial posting to ZK and swapping out
 the schema after parsing.  Different loads on, and/or different network
 latentcy between ZK and each participating node can result in varying
 latencies before all nodes are in sync.

 For Schema API users, delaying a couple of seconds after adding fields
 before using them should workaround this problem.  While not ideal, I think
 schema field additions are rare enough in the Solr collection lifecycle
 that this is not a huge problem.

 For schemaless users, the picture is worse, as you noted.  Immediate
 distribution of documents triggering schema field addition could easily
 prove problematic.  Maybe we need a schema update blocking mode, where
 after the ZK schema node watch is triggered, all new request processing is
 halted until the schema is finished downloading/parsing/swapping out?  Can
 you make an issue, Greg?  (Such a mode should help Schema API users too.)

 Thanks,
 Steve

 On Jun 3, 2014, at 8:06 PM, Gregory Chanan gcha...@cloudera.com wrote:

  I'm trying to determine if the Managed Schema functionality works with
 SolrCloud, and AFAICT the integration seems pretty limited.
 
  The issue I'm running into is variants of the issue that schema changes
 are not pushed to all shards/replicas synchronously.  So, for example, I
 can make the following two requests:
  1) add a field to the collection on server1 using the Schema API
  2) add a document with the new field, the document is routed to a core
 on server2
 
  Then, there appears to be a race between when the document is processed
 by the core on server2 and when the core on server2, via the
 ZkIndexSchemaReader, gets the new schema.  If the document is processed
 first, I get a 400 error because the field doesn't exist.  This is easily
 reproducible by adding a sleep to the ZkIndexSchemaReader's processing.
 
  I hit a similar issue with Schemaless: the distributed request handler
 sends out the document updates, but there is no guarantee that the other
 shards/replicas see the schema changes made by the update.chain.
 
  Is my understanding correct?  Is this expected?
 
  Thanks,
  Greg


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0_20-ea-b15) - Build # 10472 - Failure!

2014-06-04 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/10472/
Java: 32bit/jdk1.8.0_20-ea-b15 -server -XX:+UseG1GC

1 tests failed.
FAILED:  
org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest.initializationError

Error Message:
Suite class org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest 
should be a concrete class (not abstract).

Stack Trace:
java.lang.RuntimeException: Suite class 
org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest should be a 
concrete class (not abstract).
at 
com.carrotsearch.randomizedtesting.Validation$ClassValidation.isConcreteClass(Validation.java:90)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.validateTarget(RandomizedRunner.java:1681)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.init(RandomizedRunner.java:379)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at 
org.junit.internal.builders.AnnotatedBuilder.buildRunner(AnnotatedBuilder.java:31)
at 
org.junit.internal.builders.AnnotatedBuilder.runnerForClass(AnnotatedBuilder.java:24)
at 
org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57)
at 
org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:29)
at 
org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57)
at 
org.junit.internal.requests.ClassRequest.getRunner(ClassRequest.java:24)
at 
com.carrotsearch.ant.tasks.junit4.slave.SlaveMain.execute(SlaveMain.java:176)
at 
com.carrotsearch.ant.tasks.junit4.slave.SlaveMain.main(SlaveMain.java:276)
at 
com.carrotsearch.ant.tasks.junit4.slave.SlaveMainSafe.main(SlaveMainSafe.java:12)




Build Log:
[...truncated 9585 lines...]
   [junit4] Suite: 
org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest
   [junit4] ERROR   0.04s J1 | 
BaseNonFuzzySpatialOpStrategyTest.initializationError 
   [junit4] Throwable #1: java.lang.RuntimeException: Suite class 
org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest should be a 
concrete class (not abstract).
   [junit4]at 
com.carrotsearch.randomizedtesting.Validation$ClassValidation.isConcreteClass(Validation.java:90)
   [junit4]at 
java.lang.reflect.Constructor.newInstance(Constructor.java:408)
   [junit4] Completed on J1 in 0.04s, 1 test, 1 error  FAILURES!

[...truncated 16 lines...]
BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:467: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:447: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:45: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:37: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:543: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:2017:
 The following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/module-build.xml:60: 
The following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1296:
 The following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:920: 
There were test failures: 17 suites, 126 tests, 1 error, 12 ignored (2 
assumptions)

Total time: 31 minutes 49 seconds
Build step 'Invoke Ant' marked build as failure
Description set: Java: 32bit/jdk1.8.0_20-ea-b15 -server -XX:+UseG1GC
Archiving artifacts
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6138) Solr core load limit

2014-06-04 Thread HuangTongwen (JIRA)
HuangTongwen created SOLR-6138:
--

 Summary: Solr core load limit
 Key: SOLR-6138
 URL: https://issues.apache.org/jira/browse/SOLR-6138
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 4.7.1, 4.6
 Environment: ubuntu 12.04
memery 20G
Reporter: HuangTongwen


We want to enrich our search ability by solr.We do an exercise for test that 
how many cores in one machine solr cores can support.
We find we can create more than 2000 cores without datas in one machine.But 
when we create cores with data ,we just can create about 1000 cores,after more 
t han 1000 cores,we meet many errors like following I will apend it .If you 
have meets the same or similar problem,please tell me.

I would be grateful if you could help me.

Hear are some errors:

09:43:29WARNSolrResourceLoader  Can't find (or read) directory 
to add to classloader: /non/existent/dir/yields/warning (resolved as: 
/non/existent/dir/yields/warning).
09:43:29WARNSolrResourceLoader  Can't find (or read) directory 
to add to classloader: /non/existent/dir/yields/warning (resolved as: 
/non/existent/dir/yields/warning).
09:43:29WARNSolrResourceLoader  Can't find (or read) directory 
to add to classloader: /non/existent/dir/yields/warning (resolved as: 
/non/existent/dir/yields/warning).
09:43:29WARNSolrResourceLoader  Can't find (or read) directory 
to add to classloader: /non/existent/dir/yields/warning (resolved as: 
/non/existent/dir/yields/warning).
09:43:29WARNSolrResourceLoader  Can't find (or read) directory 
to add to classloader: /non/existent/dir/yields/warning (resolved as: 
/non/existent/dir/yields/warning).
09:43:29WARNSolrResourceLoader  Can't find (or read) directory 
to add to classloader: /non/existent/dir/yields/warning (resolved as: 
/non/existent/dir/yields/warning).
09:43:29WARNSolrResourceLoader  Can't find (or read) directory 
to add to classloader: /non/existent/dir/yields/warning (resolved as: 
/non/existent/dir/yields/warning).
09:43:29WARNSolrResourceLoader  Can't find (or read) directory 
to add to classloader: /non/existent/dir/yields/warning (resolved as: 
/non/existent/dir/yields/warning).
09:43:29WARNSolrResourceLoader  Can't find (or read) directory 
to add to classloader: /non/existent/dir/yields/warning (resolved as: 
/non/existent/dir/yields/warning).
09:43:29WARNSolrResourceLoader  Can't find (or read) directory 
to add to classloader: /non/existent/dir/yields/warning (resolved as: 
/non/existent/dir/yields/warning).
09:43:29WARNSolrResourceLoader  Can't find (or read) directory 
to add to classloader: /non/existent/dir/yields/warning (resolved as: 
/non/existent/dir/yields/warning).
09:43:29WARNSolrResourceLoader  Can't find (or read) directory 
to add to classloader: /non/existent/dir/yields/warning (resolved as: 
/non/existent/dir/yields/warning).
09:43:29WARNSolrResourceLoader  Can't find (or read) directory 
to add to classloader: /non/existent/dir/yields/warning (resolved as: 
/non/existent/dir/yields/warning).
09:46:15ERROR   ShardLeaderElectionContext  There was a problem 
trying to register as the 
leader:org.apache.zookeeper.KeeperException$NodeExistsException: 
KeeperErrorCode = NodeExists for /collections/ctest.test.3521/leaders/shard1
09:46:15WARNElectionContext cancelElection did not find 
election node to remove
09:46:16WARNRecoveryStrategyStopping recovery for 
zkNodeName=core_node1core=ctest.test.3521
09:46:17ERROR   RecoveryStrategyError while trying to recover. 
core=ctest.test.3521:org.apache.solr.common.SolrException: No registered leader 
was found,​ collection:ctest.test.3521 slice:shard1
09:46:17ERROR   RecoveryStrategyRecovery failed - trying 
again... (0) core=ctest.test.3521
09:46:17ERROR   RecoveryStrategyRecovery failed - interrupted. 
core=ctest.test.3521
09:46:17ERROR   RecoveryStrategyRecovery failed - I give up. 
core=ctest.test.3521
09:46:18WARNRecoveryStrategyStopping recovery for 
zkNodeName=core_node1core=ctest.test.3521
10:01:58ERROR   SolrCoreorg.apache.solr.common.SolrException: 
Error handling 'status' action
10:01:58ERROR   SolrDispatchFilter  
null:org.apache.solr.common.SolrException: Error handling 'status' action
10:15:59ERROR   ZkControllerError getting leader from zk
10:15:59ERROR   ZkControllerError registering 
SolrCore:org.apache.solr.common.SolrException: Error getting leader from zk for 
shard shard1
10:16:18ERROR   SolrCoreorg.apache.solr.common.SolrException: 
Error handling 'status' action
10:16:18  

Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0_20-ea-b15) - Build # 10472 - Failure!

2014-06-04 Thread david.w.smi...@gmail.com
Thanks for fixing, Rob.

~ David

On Wed, Jun 4, 2014 at 10:49 PM, Policeman Jenkins Server 
jenk...@thetaphi.de wrote:

 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/10472/
 Java: 32bit/jdk1.8.0_20-ea-b15 -server -XX:+UseG1GC

 1 tests failed.
 FAILED:
  
 org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest.initializationError

 Error Message:
 Suite class
 org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest should
 be a concrete class (not abstract).

 Stack Trace:
 java.lang.RuntimeException: Suite class
 org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest should
 be a concrete class (not abstract).
 at
 com.carrotsearch.randomizedtesting.Validation$ClassValidation.isConcreteClass(Validation.java:90)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner.validateTarget(RandomizedRunner.java:1681)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner.init(RandomizedRunner.java:379)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
 at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
 at
 org.junit.internal.builders.AnnotatedBuilder.buildRunner(AnnotatedBuilder.java:31)
 at
 org.junit.internal.builders.AnnotatedBuilder.runnerForClass(AnnotatedBuilder.java:24)
 at
 org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57)
 at
 org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:29)
 at
 org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57)
 at
 org.junit.internal.requests.ClassRequest.getRunner(ClassRequest.java:24)
 at
 com.carrotsearch.ant.tasks.junit4.slave.SlaveMain.execute(SlaveMain.java:176)
 at
 com.carrotsearch.ant.tasks.junit4.slave.SlaveMain.main(SlaveMain.java:276)
 at
 com.carrotsearch.ant.tasks.junit4.slave.SlaveMainSafe.main(SlaveMainSafe.java:12)




 Build Log:
 [...truncated 9585 lines...]
[junit4] Suite:
 org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest
[junit4] ERROR   0.04s J1 |
 BaseNonFuzzySpatialOpStrategyTest.initializationError 
[junit4] Throwable #1: java.lang.RuntimeException: Suite class
 org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest should
 be a concrete class (not abstract).
[junit4]at
 com.carrotsearch.randomizedtesting.Validation$ClassValidation.isConcreteClass(Validation.java:90)
[junit4]at
 java.lang.reflect.Constructor.newInstance(Constructor.java:408)
[junit4] Completed on J1 in 0.04s, 1 test, 1 error  FAILURES!

 [...truncated 16 lines...]
 BUILD FAILED
 /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:467: The
 following error occurred while executing this line:
 /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:447: The
 following error occurred while executing this line:
 /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:45: The
 following error occurred while executing this line:
 /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:37:
 The following error occurred while executing this line:
 /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:543:
 The following error occurred while executing this line:
 /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:2017:
 The following error occurred while executing this line:
 /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/module-build.xml:60:
 The following error occurred while executing this line:
 /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1296:
 The following error occurred while executing this line:
 /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:920:
 There were test failures: 17 suites, 126 tests, 1 error, 12 ignored (2
 assumptions)

 Total time: 31 minutes 49 seconds
 Build step 'Invoke Ant' marked build as failure
 Description set: Java: 32bit/jdk1.8.0_20-ea-b15 -server -XX:+UseG1GC
 Archiving artifacts
 Recording test results
 Email was triggered for: Failure - Any
 Sending email for trigger: Failure - Any




[jira] [Created] (LUCENE-5735) Faceting for DateRangePrefixTree

2014-06-04 Thread David Smiley (JIRA)
David Smiley created LUCENE-5735:


 Summary: Faceting for DateRangePrefixTree
 Key: LUCENE-5735
 URL: https://issues.apache.org/jira/browse/LUCENE-5735
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley


The newly added DateRangePrefixTree (DRPT) encodes terms in a fashion amenable 
to faceting by meaningful time buckets. The motivation for this feature is to 
efficiently populate a calendar bar chart or 
[heat-map|http://bl.ocks.org/mbostock/4063318]. It's not hard if you have date 
instances like many do but it's challenging for date ranges.

Internally this is going to iterate over the terms using seek/next with 
TermsEnum as appropriate.  It should be quite efficient; it won't need any 
special caches. I should be able to re-use SPT traversal code in 
AbstractVisitingPrefixTreeFilter.  If this goes especially well; the underlying 
implementation will be re-usable for geospatial heat-map faceting.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6137) Managed Schema / Schemaless and SolrCloud concurrency issues

2014-06-04 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018455#comment-14018455
 ] 

Yonik Seeley commented on SOLR-6137:


{quote}In some sense, the schemaless issue seems easier to solve than the 
Schema API issue. This is because if we run all (or more) of the update chain, 
instead of just skipping to the distributed update handler on the forwarded 
nodes, we could have all the cores apply the schema changes, so we are 
guaranteed of having the correct schema on each core.
{quote}

Right.  Schemaless *should* be a non-issue.  The type-guessing logic should be 
run on replicas as well.  If the replica hasn't seen the change yet, then it 
will guess the same type, and try to add it to the schema.  It should fail due 
to optimistic locking and the fact the leader already added it, re-read the 
schema, and then successfully find the field.  It's the same case as a single 
node with multiple threads both encountering the new field at around the same 
time.

Although the schema API needs a blocking mode, no blocking mode should be added 
to schemaless... that's what the optimistic concurrency is for.

 Managed Schema / Schemaless and SolrCloud concurrency issues
 

 Key: SOLR-6137
 URL: https://issues.apache.org/jira/browse/SOLR-6137
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis, SolrCloud
Reporter: Gregory Chanan

 This is a follow up to a message on the mailing list, linked here: 
 http://mail-archives.apache.org/mod_mbox/lucene-dev/201406.mbox/%3CCAKfebOOcMeVEb010SsdcH8nta%3DyonMK5R7dSFOsbJ_tnre0O7w%40mail.gmail.com%3E
 The Managed Schema integration with SolrCloud seems pretty limited.
 The issue I'm running into is variants of the issue that schema changes are 
 not pushed to all shards/replicas synchronously.  So, for example, I can make 
 the following two requests:
 1) add a field to the collection on server1 using the Schema API
 2) add a document with the new field, the document is routed to a core on 
 server2
 Then, there appears to be a race between when the document is processed by 
 the core on server2 and when the core on server2, via the 
 ZkIndexSchemaReader, gets the new schema.  If the document is processed 
 first, I get a 400 error because the field doesn't exist.  This is easily 
 reproducible by adding a sleep to the ZkIndexSchemaReader's processing.
 I hit a similar issue with Schemaless: the distributed request handler sends 
 out the document updates, but there is no guarantee that the other 
 shards/replicas see the schema changes made by the update.chain.
 Another issue I noticed today: making multiple schema API calls concurrently 
 can block; that is, one may get through and the other may infinite loop.
 So, for reference, the issues include:
 1) Schema API changes return success before all cores are updated; subsequent 
 calls attempting to use new schema may fail
 2) Schemaless changes may fail on replicas/other shards for the same reason
 3) Concurrent Schema API changes may block
 From Steve Rowe on the mailing list:
 {quote}
 For Schema API users, delaying a couple of seconds after adding fields before 
 using them should workaround this problem.  While not ideal, I think schema 
 field additions are rare enough in the Solr collection lifecycle that this is 
 not a huge problem.
 For schemaless users, the picture is worse, as you noted.  Immediate 
 distribution of documents triggering schema field addition could easily prove 
 problematic.  Maybe we need a schema update blocking mode, where after the ZK 
 schema node watch is triggered, all new request processing is halted until 
 the schema is finished downloading/parsing/swapping out? (Such a mode should 
 help Schema API users too.)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 23310 - Failure!

2014-06-04 Thread builder
Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/23310/

2 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.lucene.search.TestControlledRealTimeReopenThread

Error Message:
1 thread leaked from SUITE scope at 
org.apache.lucene.search.TestControlledRealTimeReopenThread: 1) 
Thread[id=143, name=Thread-71, state=TIMED_WAITING, 
group=TGRP-TestControlledRealTimeReopenThread] at 
sun.misc.Unsafe.park(Native Method) at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) 
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
 at 
org.apache.lucene.search.ControlledRealTimeReopenThread.run(ControlledRealTimeReopenThread.java:223)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.lucene.search.TestControlledRealTimeReopenThread: 
   1) Thread[id=143, name=Thread-71, state=TIMED_WAITING, 
group=TGRP-TestControlledRealTimeReopenThread]
at sun.misc.Unsafe.park(Native Method)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at 
org.apache.lucene.search.ControlledRealTimeReopenThread.run(ControlledRealTimeReopenThread.java:223)
at __randomizedtesting.SeedInfo.seed([178AC51FA789A281]:0)


REGRESSION:  
org.apache.lucene.search.TestControlledRealTimeReopenThread.testCRTReopen

Error Message:
waited too long for generation 20665

Stack Trace:
java.lang.AssertionError: waited too long for generation 20665
at 
__randomizedtesting.SeedInfo.seed([178AC51FA789A281:45AA582B734E1AF8]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.lucene.search.TestControlledRealTimeReopenThread.testCRTReopen(TestControlledRealTimeReopenThread.java:519)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 

[jira] [Commented] (LUCENE-5703) Don't allocate/copy bytes all the time in binary DV producers

2014-06-04 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018461#comment-14018461
 ] 

Robert Muir commented on LUCENE-5703:
-

Upon final review: I am unhappy about a few things with the latest patch, 
mostly doing with safety:
* DocValues.EMPTY_XXX is now unsafe, it uses a static mutable thing (BytesRef). 
We should make these methods instead of constants. This won't ever be 
performance critical so its ok to me.
* Memory and so on should do an array copy instead of returning singleton 
stuff. If there is a bug in someone's code, it could corrupt the data and get 
merged into index corruption. 

I'm ok with someone's bug in their code corrupting their threadlocal 
code-private byte[], but not the index. We have to draw the line there.

 Don't allocate/copy bytes all the time in binary DV producers
 -

 Key: LUCENE-5703
 URL: https://issues.apache.org/jira/browse/LUCENE-5703
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch, 
 LUCENE-5703.patch, LUCENE-5703.patch


 Our binary doc values producers keep on creating new {{byte[]}} arrays and 
 copying bytes when a value is requested, which likely doesn't help 
 performance. This has been done because of the way fieldcache consumers used 
 the API, but we should try to fix it in 5.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5703) Don't allocate/copy bytes all the time in binary DV producers

2014-06-04 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018463#comment-14018463
 ] 

Robert Muir commented on LUCENE-5703:
-

Ill start tackling the EMPTY issue. it shouldn't be controversial at all, but 
the safety here is mandatory because this constant is used by SegmentMerger.

As far as the all-in-ram ones exposing the ability to corrupt the same stuff 
thats merged, we can think of a number of compromises / solutions, but 
something must be done:
* System.arraycopy
* big fat warnings on these that they are unsafe (they are not part of the 
official index format, so maybe thats ok).
* they could keep ahold of their file descriptors and override merge() to 
stream the data from the file.


 Don't allocate/copy bytes all the time in binary DV producers
 -

 Key: LUCENE-5703
 URL: https://issues.apache.org/jira/browse/LUCENE-5703
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch, 
 LUCENE-5703.patch, LUCENE-5703.patch


 Our binary doc values producers keep on creating new {{byte[]}} arrays and 
 copying bytes when a value is requested, which likely doesn't help 
 performance. This has been done because of the way fieldcache consumers used 
 the API, but we should try to fix it in 5.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-6138) Solr core load limit

2014-06-04 Thread Shawn Heisey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey closed SOLR-6138.
--

Resolution: Invalid

 Solr core load limit
 

 Key: SOLR-6138
 URL: https://issues.apache.org/jira/browse/SOLR-6138
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 4.6, 4.7.1
 Environment: ubuntu 12.04
 memery 20G
Reporter: HuangTongwen
  Labels: test
   Original Estimate: 840h
  Remaining Estimate: 840h

 We want to enrich our search ability by solr.We do an exercise for test that 
 how many cores in one machine solr cores can support.
 We find we can create more than 2000 cores without datas in one machine.But 
 when we create cores with data ,we just can create about 1000 cores,after 
 more t han 1000 cores,we meet many errors like following I will apend it .If 
 you have meets the same or similar problem,please tell me.
 I would be grateful if you could help me.
 Hear are some errors:
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:46:15  ERROR   ShardLeaderElectionContext  There was a problem 
 trying to register as the 
 leader:org.apache.zookeeper.KeeperException$NodeExistsException: 
 KeeperErrorCode = NodeExists for /collections/ctest.test.3521/leaders/shard1
 09:46:15  WARNElectionContext cancelElection did not find 
 election node to remove
 09:46:16  WARNRecoveryStrategyStopping recovery for 
 zkNodeName=core_node1core=ctest.test.3521
 09:46:17  ERROR   RecoveryStrategyError while trying to recover. 
 core=ctest.test.3521:org.apache.solr.common.SolrException: No registered 
 leader was found,​ collection:ctest.test.3521 slice:shard1
 09:46:17  ERROR   RecoveryStrategyRecovery failed - trying 
 again... (0) core=ctest.test.3521
 09:46:17  ERROR   RecoveryStrategyRecovery failed - interrupted. 
 core=ctest.test.3521
 09:46:17  ERROR   RecoveryStrategyRecovery failed - I give up. 
 core=ctest.test.3521
 09:46:18  WARNRecoveryStrategyStopping recovery for 
 zkNodeName=core_node1core=ctest.test.3521
 10:01:58  ERROR   SolrCoreorg.apache.solr.common.SolrException: 
 Error handling 'status' action
 10:01:58  ERROR   SolrDispatchFilter  
 null:org.apache.solr.common.SolrException: Error handling 'status' action
 10:15:59  ERROR   ZkControllerError getting leader from zk
 10:15:59  ERROR   

[jira] [Commented] (SOLR-6138) Solr core load limit

2014-06-04 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018470#comment-14018470
 ] 

Shawn Heisey commented on SOLR-6138:


Solr itself does not have any hard limits on the number of cores that you can 
create, but you are relying on other software than Solr itself.

In this case, I believe that you are running into a limitation in zookeeper, a 
dependency for SolrCloud.  Zookeeper has a default max database size of 1MB.  
Each new collection puts data into zookeeper, and eventually you're going to 
run into this database size limit.

Search the following page for jute.maxbuffer to find out how to increase the 
maximum database size in zookeeper:

http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html

I'm going to close this issue, because this is most likely not a problem in 
Solr.  If it does turn out that there is a bug in Solr that is causing this 
problem, we can re-open the issue.

Please direct any followup to the Solr mailing list:

http://lucene.apache.org/solr/discussion.html


 Solr core load limit
 

 Key: SOLR-6138
 URL: https://issues.apache.org/jira/browse/SOLR-6138
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 4.6, 4.7.1
 Environment: ubuntu 12.04
 memery 20G
Reporter: HuangTongwen
  Labels: test
   Original Estimate: 840h
  Remaining Estimate: 840h

 We want to enrich our search ability by solr.We do an exercise for test that 
 how many cores in one machine solr cores can support.
 We find we can create more than 2000 cores without datas in one machine.But 
 when we create cores with data ,we just can create about 1000 cores,after 
 more t han 1000 cores,we meet many errors like following I will apend it .If 
 you have meets the same or similar problem,please tell me.
 I would be grateful if you could help me.
 Hear are some errors:
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:43:29  WARNSolrResourceLoader  Can't find (or read) directory 
 to add to classloader: /non/existent/dir/yields/warning (resolved as: 
 /non/existent/dir/yields/warning).
 09:46:15  ERROR   ShardLeaderElectionContext  There was a problem 
 trying to register as the 
 leader:org.apache.zookeeper.KeeperException$NodeExistsException: 
 KeeperErrorCode = NodeExists for /collections/ctest.test.3521/leaders/shard1
 09:46:15  WARNElectionContext cancelElection did not find 
 election node to remove
 09:46:16  WARNRecoveryStrategyStopping recovery for 
 zkNodeName=core_node1core=ctest.test.3521
 09:46:17  ERROR   RecoveryStrategy   

[jira] [Updated] (LUCENE-5703) Don't allocate/copy bytes all the time in binary DV producers

2014-06-04 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5703:


Attachment: LUCENE-5703.patch

Updated patch fixing most EMPTY stuff.

TermOrdValComparator.MISSING_BYTESREF and other unsafe stuff like that still 
needs to be fixed.

And I did nothing with the in-RAM dv providers.

 Don't allocate/copy bytes all the time in binary DV producers
 -

 Key: LUCENE-5703
 URL: https://issues.apache.org/jira/browse/LUCENE-5703
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch, 
 LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch


 Our binary doc values producers keep on creating new {{byte[]}} arrays and 
 copying bytes when a value is requested, which likely doesn't help 
 performance. This has been done because of the way fieldcache consumers used 
 the API, but we should try to fix it in 5.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org