[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module
[ https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017403#comment-14017403 ] vivek commented on LUCENE-2899: --- I followed this link to integrate https://wiki.apache.org/solr/OpenNLP to integrate Installation For English language testing: Until LUCENE-2899 is committed: 1.pull the latest trunk or 4.0 branch 2.apply the latest LUCENE-2899 patch 3.do 'ant compile' cd solr/contrib/opennlp/src/test-files/training . . . i followed first two steps but got the following error while executing 3rd point common.compile-core: [javac] Compiling 10 source files to /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/build/analysis/opennlp/classes/java [javac] warning: [path] bad path element /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/lib/jwnl-1.3.3.jar: no such file or directory [javac] /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/FilterPayloadsFilter.java:43: error: cannot find symbol [javac] super(Version.LUCENE_44, input); [javac] ^ [javac] symbol: variable LUCENE_44 [javac] location: class Version [javac] /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:56: error: no suitable constructor found for Tokenizer(Reader) [javac] super(input); [javac] ^ [javac] constructor Tokenizer.Tokenizer(AttributeFactory) is not applicable [javac] (actual argument Reader cannot be converted to AttributeFactory by method invocation conversion) [javac] constructor Tokenizer.Tokenizer() is not applicable [javac] (actual and formal argument lists differ in length) [javac] 2 errors [javac] 1 warning Im really stuck how to passthough this step. I wasted my entire day to fix this but couldn't move a bit. Please someone help me..? Add OpenNLP Analysis capabilities as a module - Key: LUCENE-2899 URL: https://issues.apache.org/jira/browse/LUCENE-2899 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-2899-RJN.patch, LUCENE-2899.patch, OpenNLPFilter.java, OpenNLPTokenizer.java Now that OpenNLP is an ASF project and has a nice license, it would be nice to have a submodule (under analysis) that exposed capabilities for it. Drew Farris, Tom Morton and I have code that does: * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it would have to change slightly to buffer tokens) * NamedEntity recognition as a TokenFilter We are also planning a Tokenizer/TokenFilter that can put parts of speech as either payloads (PartOfSpeechAttribute?) on a token or at the same position. I'd propose it go under: modules/analysis/opennlp -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5731) split direct packed ints from in-ram ones
[ https://issues.apache.org/jira/browse/LUCENE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5731: Attachment: LUCENE-5731.patch just some bugfixes to the mmap stuff. I need to add dedicated tests for those tomorrow. split direct packed ints from in-ram ones - Key: LUCENE-5731 URL: https://issues.apache.org/jira/browse/LUCENE-5731 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir Attachments: LUCENE-5731.patch, LUCENE-5731.patch Currently there is an oversharing problem in packedints that imposes too many requirements on improving it: * every packed ints must be able to be loaded directly, or in ram, or iterated with. * things like filepointers are expected to be adjusted (this is especially stupid) in all cases * lots of unnecessary abstractions * versioning etc is complex None of this flexibility is needed or buys us anything, and it prevents performance improvements (e.g. i just want to add 3 bytes at the end of on-disk streams to reduce the number of bytebuffer calls and thats seriously impossible with the current situation). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6119) TestReplicationHandler attempts to remove open folders
[ https://issues.apache.org/jira/browse/SOLR-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017408#comment-14017408 ] ASF subversion and git services commented on SOLR-6119: --- Commit 1599942 from [~dawidweiss] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1599942 ] SOLR-6119: backporting some replication handler fixes from trunk. TestReplicationHandler attempts to remove open folders -- Key: SOLR-6119 URL: https://issues.apache.org/jira/browse/SOLR-6119 Project: Solr Issue Type: Bug Reporter: Dawid Weiss Priority: Minor Attachments: SOLR-6119.patch, SOLR-6119.patch, SOLR-6119.patch, SOLR-6119.patch, SOLR-6119.patch, SOLR-6119.patch TestReplicationHandler has a weird logic around the 'snapDir' variable. It attempts to remove snapshot folders, even though they're not closed yet. My recent patch uncovered the bug but I don't know how to fix it cleanly -- the test itself seems to be very fragile (for example I don't understand the 'namedBackup' variable which is always set to true, yet there are conditionals around it). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6119) TestReplicationHandler attempts to remove open folders
[ https://issues.apache.org/jira/browse/SOLR-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017412#comment-14017412 ] ASF subversion and git services commented on SOLR-6119: --- Commit 1599943 from [~dawidweiss] in branch 'dev/trunk' [ https://svn.apache.org/r1599943 ] SOLR-6119: refactored doTestBackup into a separate class. TestReplicationHandler attempts to remove open folders -- Key: SOLR-6119 URL: https://issues.apache.org/jira/browse/SOLR-6119 Project: Solr Issue Type: Bug Reporter: Dawid Weiss Priority: Minor Attachments: SOLR-6119.patch, SOLR-6119.patch, SOLR-6119.patch, SOLR-6119.patch, SOLR-6119.patch, SOLR-6119.patch TestReplicationHandler has a weird logic around the 'snapDir' variable. It attempts to remove snapshot folders, even though they're not closed yet. My recent patch uncovered the bug but I don't know how to fix it cleanly -- the test itself seems to be very fragile (for example I don't understand the 'namedBackup' variable which is always set to true, yet there are conditionals around it). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6119) TestReplicationHandler attempts to remove open folders
[ https://issues.apache.org/jira/browse/SOLR-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017415#comment-14017415 ] ASF subversion and git services commented on SOLR-6119: --- Commit 1599944 from [~dawidweiss] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1599944 ] SOLR-6119: backport of test split from trunk. TestReplicationHandler attempts to remove open folders -- Key: SOLR-6119 URL: https://issues.apache.org/jira/browse/SOLR-6119 Project: Solr Issue Type: Bug Reporter: Dawid Weiss Priority: Minor Attachments: SOLR-6119.patch, SOLR-6119.patch, SOLR-6119.patch, SOLR-6119.patch, SOLR-6119.patch, SOLR-6119.patch TestReplicationHandler has a weird logic around the 'snapDir' variable. It attempts to remove snapshot folders, even though they're not closed yet. My recent patch uncovered the bug but I don't know how to fix it cleanly -- the test itself seems to be very fragile (for example I don't understand the 'namedBackup' variable which is always set to true, yet there are conditionals around it). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5703) Don't allocate/copy bytes all the time in binary DV producers
[ https://issues.apache.org/jira/browse/LUCENE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-5703: - Attachment: LUCENE-5703.patch Here is an updated patch. Sorted(Set)TermsEnum copies the supplied BytesRef when a match is found instead of looking up the ord. Don't allocate/copy bytes all the time in binary DV producers - Key: LUCENE-5703 URL: https://issues.apache.org/jira/browse/LUCENE-5703 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Fix For: 4.9, 5.0 Attachments: LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch Our binary doc values producers keep on creating new {{byte[]}} arrays and copying bytes when a value is requested, which likely doesn't help performance. This has been done because of the way fieldcache consumers used the API, but we should try to fix it in 5.0. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6127) Improve Solr's exampledocs data
[ https://issues.apache.org/jira/browse/SOLR-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated SOLR-6127: Attachment: freebase_film_dump.py bq. In xml, genre is single values and percentage sign separated. I think this would be multivalued field? Fixed. Thanks! bq. generated film.xml does not have license header. I thought it will have, no? Added the license header bq. type field has value of /film/film for all docs. Is this expected. Yes all docs will have type = /film/film as thats the category type of freebase where we are fetching the data from. Improve Solr's exampledocs data --- Key: SOLR-6127 URL: https://issues.apache.org/jira/browse/SOLR-6127 Project: Solr Issue Type: Improvement Components: documentation Reporter: Varun Thacker Priority: Minor Fix For: 5.0 Attachments: film.csv, film.json, film.xml, freebase_film_dump.py, freebase_film_dump.py, freebase_film_dump.py, freebase_film_dump.py Currently - The CSV example has 10 documents. - The JSON example has 4 documents. - The XML example has 32 documents. 1. We should have equal number of documents and the same documents in all the example formats 2. A data set which is slightly more comprehensive. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6127) Improve Solr's exampledocs data
[ https://issues.apache.org/jira/browse/SOLR-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017434#comment-14017434 ] Uwe Schindler commented on SOLR-6127: - bq. Added the license header I think this should be a CC-BY license header, not ASF. Improve Solr's exampledocs data --- Key: SOLR-6127 URL: https://issues.apache.org/jira/browse/SOLR-6127 Project: Solr Issue Type: Improvement Components: documentation Reporter: Varun Thacker Priority: Minor Fix For: 5.0 Attachments: film.csv, film.json, film.xml, freebase_film_dump.py, freebase_film_dump.py, freebase_film_dump.py, freebase_film_dump.py Currently - The CSV example has 10 documents. - The JSON example has 4 documents. - The XML example has 32 documents. 1. We should have equal number of documents and the same documents in all the example formats 2. A data set which is slightly more comprehensive. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6133) More robust collection-delete
[ https://issues.apache.org/jira/browse/SOLR-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017439#comment-14017439 ] Per Steffensen commented on SOLR-6133: -- In general zk=truth sounds like a great idea :-) But shouldnt zk=truth be implicit when either zkRun or zkHost is set? I am not sure about the terminology, but I believe unloaded does not include deleting the data from disk? My main concern with the scenario I show is that data is not being deleted from disk. We would really like some way to make (fairly) sure that data is deleted when we fire a collection-delete request (and info disappears from ZK). We have enormous amounts of data and will run out of disk-space if we do not have our data-folders deleted. I am also a little bit concerned about the on startup part of will be unloaded on startup. In the scenario I show above, the shards that where deleted from zk but not from disk, will pop up in zk again on restart of Solrs (because the folders still contain core.properties I believe), and then we get a second chance deleting them because we can re-detect that an unwanted collection (partly) exists. So if zk=truth means that data will not be deleted, and that the shards will not re-appear in zk after restart of Solrs, it is actually a step back wrt my main concern. But back to my concern with on startup: Actually we very rarely restart Solrs (because they run fairly stable - that is a good thing), so I am concerned with a solution that only cleans up or recovers on restart. I am keen on improving collection-delete to do whatever it can to be all or nothing. Will you consider adding to Solr server-side the check for all nodes are live and check for all shards/replica are active before delete from CollDelete.java. This will be a step in the all or nothing direction, which will be even more important for non SolrJ clients, that really cannot do the trick themselves on client-side (unless they do the zk-data joggeling on client-side in another way). More robust collection-delete - Key: SOLR-6133 URL: https://issues.apache.org/jira/browse/SOLR-6133 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7.2, 4.8.1 Reporter: Per Steffensen Attachments: CollDelete.java, coll_delete_problem.zip If Solrs are not stable (completely up and running etc) a collection-delete request might result in partly deleted collections. You might say that it is fair that you are not able to have a collection deleted if all of its shards are not actively running - even though I would like a mechanism that just deleted them when/if they ever come up again. But even though all shards claim to be actively running you can still end up with partly deleted collections - that is not acceptable IMHO. At least clusterstate should always reflect the state, so that you are able to detect that your collection-delete request was only partly carried out - which parts were successfully deleted and which were not (including information about data-folder-deletion) The text above sounds like an epic-sized task, with potentially numerous problems to fix, so in order not to make this ticket open forever I will point out a particular scenario where I see problems. Then this problem is corrected we can close this ticket. Other tickets will have to deal with other collection-delete issues. Here is what I did and saw * Logged into one of my Linux machines with IP 192.168.78.239 * Prepared for Solr install {code} mkdir -p /xXX/solr cd /xXX/solr {code} * downloaded solr-4.7.2.tgz * Installed Solr 4.7.2 and prepared for three nodes {code} tar zxvf solr-4.7.2.tgz cd solr-4.7.2/ cp -r example node1 cp -r example node2 cp -r example node3 {code} * Initialized Solr config into Solr {code} cd node1 java -DzkRun -Dhost=192.168.78.239 -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar start.jar CTRL-C to stop solr (node1) again after it started completely {code} * Started all three Solr nodes {code} nohup java -Djetty.port=8983 -Dhost=192.168.78.239 -DzkRun -jar start.jar node1_stdouterr.log cd ../node2 nohup java -Djetty.port=8984 -Dhost=192.168.78.239 -DzkHost=localhost:9983 -jar start.jar node2_stdouterr.log cd ../node3 nohup java -Djetty.port=8985 -Dhost=192.168.78.239 -DzkHost=localhost:9983 -jar start.jar node3_stdouterr.log {code} * Created a collection mycoll {code} curl 'http://192.168.78.239:8983/solr/admin/collections?action=CREATEname=mycollnumShards=6replicationFactor=1maxShardsPerNode=2collection.configName=myconf' {code} * Collected Cloud Graph image, clusterstate.json and info about data folders (see attached coll_delete_problem.zip | after_create_all_solrs_still_running).
[jira] [Commented] (LUCENE-5731) split direct packed ints from in-ram ones
[ https://issues.apache.org/jira/browse/LUCENE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017446#comment-14017446 ] Adrien Grand commented on LUCENE-5731: -- +1 I like the new directory API and how direct packed ints use it. One minor note: the javadoc of Lucene49Codec refers to the lucene46 package instead of lucene49. split direct packed ints from in-ram ones - Key: LUCENE-5731 URL: https://issues.apache.org/jira/browse/LUCENE-5731 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir Attachments: LUCENE-5731.patch, LUCENE-5731.patch Currently there is an oversharing problem in packedints that imposes too many requirements on improving it: * every packed ints must be able to be loaded directly, or in ram, or iterated with. * things like filepointers are expected to be adjusted (this is especially stupid) in all cases * lots of unnecessary abstractions * versioning etc is complex None of this flexibility is needed or buys us anything, and it prevents performance improvements (e.g. i just want to add 3 bytes at the end of on-disk streams to reduce the number of bytebuffer calls and thats seriously impossible with the current situation). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6127) Improve Solr's exampledocs data
[ https://issues.apache.org/jira/browse/SOLR-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated SOLR-6127: Attachment: freebase_film_dump.py The XML output adds the Creative Commons Attribution 2.5 header instead of the ASF license. Improve Solr's exampledocs data --- Key: SOLR-6127 URL: https://issues.apache.org/jira/browse/SOLR-6127 Project: Solr Issue Type: Improvement Components: documentation Reporter: Varun Thacker Priority: Minor Fix For: 5.0 Attachments: film.csv, film.json, film.xml, freebase_film_dump.py, freebase_film_dump.py, freebase_film_dump.py, freebase_film_dump.py, freebase_film_dump.py Currently - The CSV example has 10 documents. - The JSON example has 4 documents. - The XML example has 32 documents. 1. We should have equal number of documents and the same documents in all the example formats 2. A data set which is slightly more comprehensive. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Early Access builds for JDK 9 b15, JDK 8u20 b16 are available on java.net
Hi Uwe,Dawid, Early Access builds for JDK 9 b15 https://jdk9.java.net/download/, JDK 8u20 b16 https://jdk8.java.net/download.html are available on java.net. As we enter the later phases of development for JDK 8u20 , please log any show stoppers as soon as possible. JDK 7u60 is available for download [0] . Rgds, Rory [0] http://www.oracle.com/technetwork/java/javase/downloads/index.html -- Rgds,Rory O'Donnell Quality Engineering Manager Oracle EMEA , Dublin, Ireland
RE: Early Access builds for JDK 9 b15, JDK 8u20 b16 are available on java.net
Hi Rory, thank you for the info! I installed 7u60 already (yesterday evening). I am happy that the MacOSX problem with Socket#accept was solved. I hope this fix gets also in the JDK 8 builds: https://bugs.openjdk.java.net/browse/JDK-8024045 This one prevents applications like Lucene, that use many file descriptors, to work correctly in web containers like Jetty or Tomcat on MacOSX server – causing SIGSEGV. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de/ http://www.thetaphi.de eMail: u...@thetaphi.de From: Rory O'Donnell Oracle, Dublin Ireland [mailto:rory.odonn...@oracle.com] Sent: Wednesday, June 04, 2014 9:36 AM To: Uwe Schindler; Dawid Weiss Cc: rory.odonn...@oracle.com; dev@lucene.apache.org; Dalibor Topic; Balchandra Vaidya Subject: Early Access builds for JDK 9 b15, JDK 8u20 b16 are available on java.net Hi Uwe,Dawid, Early Access builds for JDK 9 b15 https://jdk9.java.net/download/ , JDK 8u20 b16 https://jdk8.java.net/download.html are available on java.net. As we enter the later phases of development for JDK 8u20 , please log any show stoppers as soon as possible. JDK 7u60 is available for download [0] . Rgds, Rory [0] http://www.oracle.com/technetwork/java/javase/downloads/index.html -- Rgds,Rory O'Donnell Quality Engineering Manager Oracle EMEA , Dublin, Ireland
Re: Early Access builds for JDK 9 b15, JDK 8u20 b16 are available on java.net
Hi Uwe, Let me look into this. Rgds,Rory On 04/06/2014 08:42, Uwe Schindler wrote: Hi Rory, thank you for the info! I installed 7u60 already (yesterday evening). I am happy that the MacOSX problem with Socket#accept was solved. I hope this fix gets also in the JDK 8 builds: https://bugs.openjdk.java.net/browse/JDK-8024045 This one prevents applications like Lucene, that use many file descriptors, to work correctly in web containers like Jetty or Tomcat on MacOSX server – causing SIGSEGV. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de http://www.thetaphi.de/ eMail: u...@thetaphi.de *From:*Rory O'Donnell Oracle, Dublin Ireland [mailto:rory.odonn...@oracle.com] *Sent:* Wednesday, June 04, 2014 9:36 AM *To:* Uwe Schindler; Dawid Weiss *Cc:* rory.odonn...@oracle.com; dev@lucene.apache.org; Dalibor Topic; Balchandra Vaidya *Subject:* Early Access builds for JDK 9 b15, JDK 8u20 b16 are available on java.net Hi Uwe,Dawid, Early Access builds for JDK 9 b15 https://jdk9.java.net/download/, JDK 8u20 b16 https://jdk8.java.net/download.html are available on java.net. As we enter the later phases of development for JDK 8u20 , please log any show stoppers as soon as possible. JDK 7u60 is available for download [0] . Rgds, Rory [0] http://www.oracle.com/technetwork/java/javase/downloads/index.html -- Rgds,Rory O'Donnell Quality Engineering Manager Oracle EMEA , Dublin, Ireland -- Rgds,Rory O'Donnell Quality Engineering Manager Oracle EMEA , Dublin, Ireland
[jira] [Created] (LUCENE-5733) Minor PackedInts API cleanups
Adrien Grand created LUCENE-5733: Summary: Minor PackedInts API cleanups Key: LUCENE-5733 URL: https://issues.apache.org/jira/browse/LUCENE-5733 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Trivial Fix For: 4.9, 5.0 The PackedInts API has quite some history now and some of its methods are not used anymore, eg. PackedInts.Reader.hasArray. I'd like to remove them. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5733) Minor PackedInts API cleanups
[ https://issues.apache.org/jira/browse/LUCENE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-5733: - Attachment: LUCENE-5733.patch Here is a patch: - removes Reader.hasArray and Reader.getArray - moves getBitsPerValues from Reader (unused there) to Mutable Minor PackedInts API cleanups - Key: LUCENE-5733 URL: https://issues.apache.org/jira/browse/LUCENE-5733 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Trivial Fix For: 4.9, 5.0 Attachments: LUCENE-5733.patch The PackedInts API has quite some history now and some of its methods are not used anymore, eg. PackedInts.Reader.hasArray. I'd like to remove them. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6131) Remove deprecated Token class from solr.spelling package
[ https://issues.apache.org/jira/browse/SOLR-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated SOLR-6131: --- Attachment: NukeToken.patch I mean deleting Token.java from source tree, as in this patch. And fixing remaining compile errors. Remove deprecated Token class from solr.spelling package Key: SOLR-6131 URL: https://issues.apache.org/jira/browse/SOLR-6131 Project: Solr Issue Type: Improvement Components: spellchecker Affects Versions: 4.8.1 Reporter: Spyros Kapnissis Priority: Minor Labels: spellchecker Attachments: NukeToken.patch, SOLR-6131.patch The deprecated Token class is used everywhere in the spelling package. I am attaching a patch that refactors/replaces all occurrences with the AttributeSource class. The tests are passing. Note: the AttributeSource class also replaces Token as a hash key in many places. Having stricter equals/hashCode requirements than Token, I am a bit concerned that it could produce some duplicate suggestions, especially in the case of ConjunctionSolrSpellChecker where merging of the different spell checking suggestions takes place. If this initial approach is fine, I can create some extra checks/unit tests for this. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.8.0_20-ea-b15) - Build # 10349 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/10349/ Java: 64bit/jdk1.8.0_20-ea-b15 -XX:-UseCompressedOops -XX:+UseSerialGC 1 tests failed. REGRESSION: org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings Error Message: startOffset must be non-negative, and endOffset must be = startOffset, startOffset=44,endOffset=32 Stack Trace: java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be = startOffset, startOffset=44,endOffset=32 at __randomizedtesting.SeedInfo.seed([CF12B5B0721D62C6:A5490AA12B534235]:0) at org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl.setOffset(OffsetAttributeImpl.java:45) at org.apache.lucene.analysis.shingle.ShingleFilter.incrementToken(ShingleFilter.java:345) at org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:68) at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:703) at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:614) at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:513) at org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:946) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
Re: Early Access builds for JDK 9 b15, JDK 8u20 b16 are available on java.net
Hi Uwe, I understand the fix is already in : 8u20/b05: https://bugs.openjdk.java.net/browse/JDK-8036554 9/b04 : https://bugs.openjdk.java.net/browse/JDK-8035897 Can you confirm all is ok ? Rgds,Rory On 04/06/2014 08:42, Uwe Schindler wrote: Hi Rory, thank you for the info! I installed 7u60 already (yesterday evening). I am happy that the MacOSX problem with Socket#accept was solved. I hope this fix gets also in the JDK 8 builds: https://bugs.openjdk.java.net/browse/JDK-8024045 This one prevents applications like Lucene, that use many file descriptors, to work correctly in web containers like Jetty or Tomcat on MacOSX server – causing SIGSEGV. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de http://www.thetaphi.de/ eMail: u...@thetaphi.de *From:*Rory O'Donnell Oracle, Dublin Ireland [mailto:rory.odonn...@oracle.com] *Sent:* Wednesday, June 04, 2014 9:36 AM *To:* Uwe Schindler; Dawid Weiss *Cc:* rory.odonn...@oracle.com; dev@lucene.apache.org; Dalibor Topic; Balchandra Vaidya *Subject:* Early Access builds for JDK 9 b15, JDK 8u20 b16 are available on java.net Hi Uwe,Dawid, Early Access builds for JDK 9 b15 https://jdk9.java.net/download/, JDK 8u20 b16 https://jdk8.java.net/download.html are available on java.net. As we enter the later phases of development for JDK 8u20 , please log any show stoppers as soon as possible. JDK 7u60 is available for download [0] . Rgds, Rory [0] http://www.oracle.com/technetwork/java/javase/downloads/index.html -- Rgds,Rory O'Donnell Quality Engineering Manager Oracle EMEA , Dublin, Ireland -- Rgds,Rory O'Donnell Quality Engineering Manager Oracle EMEA , Dublin, Ireland
RE: Early Access builds for JDK 9 b15, JDK 8u20 b16 are available on java.net
Hi, I checked the backlog for „JNU_NewStringPlatform” (which is part of the crash error message). We don’t test 8u20 on MacOSX at the moment (only on Windows and Linux), but I have seen no failures in recent 7u60 builds on OSX, but many of them with u55, u51 and u45. I would say: this is fixed unless we hit it again. With the documentation on the issue it seems that this might be easier to reproduce if we try to run tests on OSX and raise something like number of concurrent HTTP transfers between Apache Solr nodes. I might give it a try (spawn something like 300 Jetty webservers with Solr and let them execute searches against each other). If I find something, I will report back. In any case, thanks for the information, I trust you that it is fixed J. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de/ http://www.thetaphi.de eMail: u...@thetaphi.de From: Rory O'Donnell Oracle, Dublin Ireland [mailto:rory.odonn...@oracle.com] Sent: Wednesday, June 04, 2014 11:59 AM To: Uwe Schindler; 'Dawid Weiss' Cc: dev@lucene.apache.org; 'Dalibor Topic'; 'Balchandra Vaidya' Subject: Re: Early Access builds for JDK 9 b15, JDK 8u20 b16 are available on java.net Hi Uwe, I understand the fix is already in : 8u20/b05: https://bugs.openjdk.java.net/browse/JDK-8036554 9/b04 : https://bugs.openjdk.java.net/browse/JDK-8035897 Can you confirm all is ok ? Rgds,Rory On 04/06/2014 08:42, Uwe Schindler wrote: Hi Rory, thank you for the info! I installed 7u60 already (yesterday evening). I am happy that the MacOSX problem with Socket#accept was solved. I hope this fix gets also in the JDK 8 builds: https://bugs.openjdk.java.net/browse/JDK-8024045 This one prevents applications like Lucene, that use many file descriptors, to work correctly in web containers like Jetty or Tomcat on MacOSX server – causing SIGSEGV. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de/ http://www.thetaphi.de eMail: u...@thetaphi.de From: Rory O'Donnell Oracle, Dublin Ireland [mailto:rory.odonn...@oracle.com] Sent: Wednesday, June 04, 2014 9:36 AM To: Uwe Schindler; Dawid Weiss Cc: rory.odonn...@oracle.com; dev@lucene.apache.org; Dalibor Topic; Balchandra Vaidya Subject: Early Access builds for JDK 9 b15, JDK 8u20 b16 are available on java.net Hi Uwe,Dawid, Early Access builds for JDK 9 b15 https://jdk9.java.net/download/ , JDK 8u20 b16 https://jdk8.java.net/download.html are available on java.net. As we enter the later phases of development for JDK 8u20 , please log any show stoppers as soon as possible. JDK 7u60 is available for download [0] . Rgds, Rory [0] http://www.oracle.com/technetwork/java/javase/downloads/index.html -- Rgds,Rory O'Donnell Quality Engineering Manager Oracle EMEA , Dublin, Ireland -- Rgds,Rory O'Donnell Quality Engineering Manager Oracle EMEA , Dublin, Ireland
[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses
[ https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017605#comment-14017605 ] Michael McCandless commented on LUCENE-4396: Thanks Da! When you say BNS (without bitset) vs. BS2 that means baseline=BS2 and my_version=BNS (without bitset)? I just want to make sure I have the direction right! With the added bitset, couldn't you not use a linked list anymore? Ie, just use prev/nextSetBit. I wonder if the bitset (instead of the linked list) could also help BooleanScorer? Maybe test this change separately (e.g. just modify BS we have today on trunk) to see if it helps or hurts... if it does help, it seems like BNS could be used (or BS could be a Scorer not a BulkScorer) even when there are no MUST clauses? Ie, the bitset lets us easily keep the order. Then we can merge BS/BNS into one? Could you attach all new tasks as a single file in general? Note that when you set up a luceneutil test, you can add a task filter using addTaskPattern, so you run just a subset of the tasks for that one test. Strange that the scores are still different between BS/BS2 and BNS/BS2 when using double. If there's only 1 required clause sent to BS/BNS can't we use its scorer instead? Have you explored having BS interact directly with all the MUST clauses, rather than using ConjunctionScorer? Because we have wildly divergent results (sometimes one is much faster, other times it's much slower) we will somehow need to add logic to pick the right scorer for each query. But we can defer this until we're doneish iterating the changes to each scorer... it can come later on. BooleanScorer should sometimes be used for MUST clauses --- Key: LUCENE-4396 URL: https://issues.apache.org/jira/browse/LUCENE-4396 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, luceneutil-score-equal.patch, luceneutil-score-equal.patch Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT. If there is one or more MUST clauses we always use BooleanScorer2. But I suspect that unless the MUST clauses have very low hit count compared to the other clauses, that BooleanScorer would perform better than BooleanScorer2. BooleanScorer still has some vestiges from when it used to handle MUST so it shouldn't be hard to bring back this capability ... I think the challenging part might be the heuristics on when to use which (likely we would have to use firstDocID as proxy for total hit count). Likely we should also have BooleanScorer sometimes use .advance() on the subs in this case, eg if suddenly the MUST clause skips 100 docs then you want to .advance() all the SHOULD clauses. I won't have near term time to work on this so feel free to take it if you are inspired! -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4763) Performance issue when using group.facet=true
[ https://issues.apache.org/jira/browse/SOLR-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017623#comment-14017623 ] Hua Jiang commented on SOLR-4763: - Hello, Varun. Thanks for your feedback. I rebuild lucene_solr on my laptop, and every tests just pass. I made this patch base on revision 1553089. If you are using a different revision, you may have to do some modification yourself. I will explain the patch a little more, and hope it helps. In the unpatched code, the groupedFacetHits is a list of GroupedFacetHit objects, which stores unique combinations of values of the group field and the facet field in the previous segments. When a new segment is opened, this list is traversed first to recalculate the segmentGroupedFacetsIndex, because that value may differ from segment to segment. That's what the loop you mentioned in the setNextReader() is doing. During the the recalculation, the lookupTerm() method is invoked on facetFieldTermsIndex and groupFieldTermsIndex. This method uses binary search to lookup values among all the values that appears in the group/facet field in the current segment. Let's assume that we have D documents distributed in S segments. And the docments are distributed evenly, so that we have G and F unique values in each segment for the group and facet field, and that the length of the groupedFacetHits list after the nth segment is processed is n*L. Then the complexity of the recalculation is (logG + logF) * (L + 2L + ... + (S-1)L) ~ O((LogG + LogF)*L*S^2). It's proportion to S squared. As S grows, performance drops rapidly. In the patched version, I changed groupedFacetHits from a list to a set. So the recalculation can be avoided, because when you get a GroupFacetHit, you just add it the to set without worrying about that some other GroupFacetHit with the same group and facet field values has been added before, because it is a set. The add() method on a set will return false, when the same values is already added. Performance issue when using group.facet=true - Key: SOLR-4763 URL: https://issues.apache.org/jira/browse/SOLR-4763 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: Alexander Koval Attachments: SOLR-4763.patch, SOLR-4763.patch I do not know whether this is bug or not. But calculating facets with {{group.facet=true}} is too slow. I have query that: {code} matches: 730597, ngroups: 24024, {code} 1. All queries with {{group.facet=true}}: {code} QTime: 5171 facet: { time: 4716 {code} 2. Without {{group.facet}}: * First query: {code} QTime: 3284 facet: { time: 3104 {code} * Next queries: {code} QTime: 230, facet: { time: 76 {code} So I think with {{group.facet=true}} Solr doesn't use cache to calculate facets. Is it possible to improve performance of facets when {{group.facet=true}}? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5731) split direct packed ints from in-ram ones
[ https://issues.apache.org/jira/browse/LUCENE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017639#comment-14017639 ] Michael McCandless commented on LUCENE-5731: +1, this looks really nice. split direct packed ints from in-ram ones - Key: LUCENE-5731 URL: https://issues.apache.org/jira/browse/LUCENE-5731 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir Attachments: LUCENE-5731.patch, LUCENE-5731.patch Currently there is an oversharing problem in packedints that imposes too many requirements on improving it: * every packed ints must be able to be loaded directly, or in ram, or iterated with. * things like filepointers are expected to be adjusted (this is especially stupid) in all cases * lots of unnecessary abstractions * versioning etc is complex None of this flexibility is needed or buys us anything, and it prevents performance improvements (e.g. i just want to add 3 bytes at the end of on-disk streams to reduce the number of bytebuffer calls and thats seriously impossible with the current situation). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6131) Remove deprecated Token class from solr.spelling package
[ https://issues.apache.org/jira/browse/SOLR-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017663#comment-14017663 ] Spyros Kapnissis commented on SOLR-6131: Not sure it's that easy. There are a lot of places that it is still being used, even though it's obsolete since version 2.9. Any refactoring has to happen incrementally imo. This patch is specifically for the solr.spelling package. Remove deprecated Token class from solr.spelling package Key: SOLR-6131 URL: https://issues.apache.org/jira/browse/SOLR-6131 Project: Solr Issue Type: Improvement Components: spellchecker Affects Versions: 4.8.1 Reporter: Spyros Kapnissis Priority: Minor Labels: spellchecker Attachments: NukeToken.patch, SOLR-6131.patch The deprecated Token class is used everywhere in the spelling package. I am attaching a patch that refactors/replaces all occurrences with the AttributeSource class. The tests are passing. Note: the AttributeSource class also replaces Token as a hash key in many places. Having stricter equals/hashCode requirements than Token, I am a bit concerned that it could produce some duplicate suggestions, especially in the case of ConjunctionSolrSpellChecker where merging of the different spell checking suggestions takes place. If this initial approach is fine, I can create some extra checks/unit tests for this. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5703) Don't allocate/copy bytes all the time in binary DV producers
[ https://issues.apache.org/jira/browse/LUCENE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017668#comment-14017668 ] Robert Muir commented on LUCENE-5703: - +1 to commit, thank you for taking care of this! Don't allocate/copy bytes all the time in binary DV producers - Key: LUCENE-5703 URL: https://issues.apache.org/jira/browse/LUCENE-5703 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Fix For: 4.9, 5.0 Attachments: LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch Our binary doc values producers keep on creating new {{byte[]}} arrays and copying bytes when a value is requested, which likely doesn't help performance. This has been done because of the way fieldcache consumers used the API, but we should try to fix it in 5.0. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5393) remove codec byte[] cloning in BinaryDocValues api
[ https://issues.apache.org/jira/browse/LUCENE-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5393. - Resolution: Duplicate see LUCENE-5703 remove codec byte[] cloning in BinaryDocValues api -- Key: LUCENE-5393 URL: https://issues.apache.org/jira/browse/LUCENE-5393 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir I can attack this (at least in trunk/5.0, we can discuss if/when it should happen for 4.x). See the mailing list for more discussion. this was done intentionally, to prevent lots of reuse bugs. The issue is very simple, lots of old fieldcache-type logic has it because things used to be immutable Strings or because they rely on things being in a large array: {code} byte[] b1 = get(doc1); byte[] b2 = get(doc2); // some code that expects b1 to be unchanged. {code} Currently each get() internally is cloning the bytes, for safety. but this is really bad for code like faceting (which is going to decompress integers and never needs to save bytes), and its even stupid for things like fieldcomparator (where in general its doing comparisons, and only rarely needs to save a copy of the bytes for later). I can address it with lots of tests (i added a lot in general anyway since the time of adding this TODO, but more would make me feel safer). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses
[ https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017708#comment-14017708 ] Da Huang commented on LUCENE-4396: -- Thanks for your suggestions, Mike! {quote} When you say BNS (without bitset) vs. BS2 that means baseline=BS2 and my_version=BNS (without bitset)? {quote} Yes, this is just what I mean. {quote} With the added bitset, couldn't you not use a linked list anymore? Ie, just use prev/nextSetBit. I wonder if the bitset (instead of the linked list) could also help BooleanScorer? Maybe test this change separately (e.g. just modify BS we have today on trunk) to see if it helps or hurts... if it does help, it seems like BNS could be used (or BS could be a Scorer not a BulkScorer) even when there are no MUST clauses? Ie, the bitset lets us easily keep the order. Then we can merge BS/BNS into one? {quote} Oh, that's a good idea! I will try that. However, linked list can be helpful when required docs is extremly sparse. {quote} Could you attach all new tasks as a single file in general? Note that when you set up a luceneutil test, you can add a task filter using addTaskPattern, so you run just a subset of the tasks for that one test. {quote} Do you mean merging And.tasks and AndOr.tasks ? If so, there's no need to do that, because And.tasks contains all tasks in AndOr.tasks, although tasks' names are changed. All the way, thanks for the advice on using addTaskPattern. I haven't noticed that. {quote} Strange that the scores are still different between BS/BS2 and BNS/BS2 when using double. {quote} I don't think it strange. Because the difference is due to the score calculating order. Supposed that a doc hits +a b c, SCORE_BS = (float)((float)(double)score_a + (float)score_b) + (float)score_c, while SCORE_BS2 = (float)(double)score_a + ((float)score_b + (float)score_c). Here, (float) means that we can only get the score by .score() whose return type is float. The modification on this patch can only make score_a has a temp double value. {quote} If there's only 1 required clause sent to BS/BNS can't we use its scorer instead? Have you explored having BS interact directly with all the MUST clauses, rather than using ConjunctionScorer? {quote} Hmm. I don't think that would be helpful. The reason is just the same as above. {quote} Because we have wildly divergent results (sometimes one is much faster, other times it's much slower) we will somehow need to add logic to pick the right scorer for each query. But we can defer this until we're doneish iterating the changes to each scorer... it can come later on. {quote} Yes, I agree. BooleanScorer should sometimes be used for MUST clauses --- Key: LUCENE-4396 URL: https://issues.apache.org/jira/browse/LUCENE-4396 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, luceneutil-score-equal.patch, luceneutil-score-equal.patch Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT. If there is one or more MUST clauses we always use BooleanScorer2. But I suspect that unless the MUST clauses have very low hit count compared to the other clauses, that BooleanScorer would perform better than BooleanScorer2. BooleanScorer still has some vestiges from when it used to handle MUST so it shouldn't be hard to bring back this capability ... I think the challenging part might be the heuristics on when to use which (likely we would have to use firstDocID as proxy for total hit count). Likely we should also have BooleanScorer sometimes use .advance() on the subs in this case, eg if suddenly the MUST clause skips 100 docs then you want to .advance() all the SHOULD clauses. I won't have near term time to work on this so feel free to take it if you are inspired! -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5703) Don't allocate/copy bytes all the time in binary DV producers
[ https://issues.apache.org/jira/browse/LUCENE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017737#comment-14017737 ] Robert Muir commented on LUCENE-5703: - I kicked this around all i could with nightly tests, running tests over and over, etc. I'm seeing this reproducible fail: {noformat} ant test -Dtestcase=TestDistributedMissingSort -Dtests.method=testDistribSearch -Dtests.seed=6B475C36C0EF9CD5 -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=ar -Dtests.timezone=Africa/Windhoek -Dtests.file.encoding=UTF-8 {noformat} Don't allocate/copy bytes all the time in binary DV producers - Key: LUCENE-5703 URL: https://issues.apache.org/jira/browse/LUCENE-5703 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Fix For: 4.9, 5.0 Attachments: LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch Our binary doc values producers keep on creating new {{byte[]}} arrays and copying bytes when a value is requested, which likely doesn't help performance. This has been done because of the way fieldcache consumers used the API, but we should try to fix it in 5.0. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5650) Enforce read-only access to any path outside the temporary folder via security manager
[ https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017787#comment-14017787 ] Steve Rowe commented on LUCENE-5650: bq. But feel free to just use your patch if you want and I'll clean it up when I resolve that issue. Thanks, I'll do that. Enforce read-only access to any path outside the temporary folder via security manager -- Key: LUCENE-5650 URL: https://issues.apache.org/jira/browse/LUCENE-5650 Project: Lucene - Core Issue Type: Improvement Components: general/test Reporter: Ryan Ernst Assignee: Dawid Weiss Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, dih.patch The recent refactoring to all the create temp file/dir functions (which is great!) has a minor regression from what existed before. With the old {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist. So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that dir within the per jvm working dir. However, {{getBaseTempDirForClass()}} now does asserts that check the dir exists, is a dir, and is writeable. Lucene uses {{.}} as {{java.io.tmpdir}}. Then in the test security manager, the per jvm cwd has read/write/execute permissions. However, this allows tests to write to their cwd, which I'm trying to protect against (by setting cwd to read/execute in my test security manager). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5650) Enforce read-only access to any path outside the temporary folder via security manager
[ https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017788#comment-14017788 ] ASF subversion and git services commented on LUCENE-5650: - Commit 1600310 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1600310 ] LUCENE-5650: Reset solr.hdfs.home correctly to allow TestRecoveryHdfs tests to pass Enforce read-only access to any path outside the temporary folder via security manager -- Key: LUCENE-5650 URL: https://issues.apache.org/jira/browse/LUCENE-5650 Project: Lucene - Core Issue Type: Improvement Components: general/test Reporter: Ryan Ernst Assignee: Dawid Weiss Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, dih.patch The recent refactoring to all the create temp file/dir functions (which is great!) has a minor regression from what existed before. With the old {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist. So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that dir within the per jvm working dir. However, {{getBaseTempDirForClass()}} now does asserts that check the dir exists, is a dir, and is writeable. Lucene uses {{.}} as {{java.io.tmpdir}}. Then in the test security manager, the per jvm cwd has read/write/execute permissions. However, this allows tests to write to their cwd, which I'm trying to protect against (by setting cwd to read/execute in my test security manager). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5734) HTMLStripCharFilter end offset should be left of closing tags
David Smiley created LUCENE-5734: Summary: HTMLStripCharFilter end offset should be left of closing tags Key: LUCENE-5734 URL: https://issues.apache.org/jira/browse/LUCENE-5734 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Reporter: David Smiley Priority: Minor Consider this simple input: {noformat} emhello/em {noformat} to be analyzed by HTMLStripCharFilter and WhitespaceTokenizer. You get back one token for hello. Good. The start offset of this token is at the position of 'h' -- good. But the end offset is surprisingly plus one to the adjacent /em. I argue that it should be plus one to the last character of the token (following 'o'). FYI it behaves as I expect if after hello is an nbsp; -- the end offset immediately follows the 'o'. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5734) HTMLStripCharFilter end offset should be left of closing tags
[ https://issues.apache.org/jira/browse/LUCENE-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5734: - Description: Consider this simple input: {noformat} emhello/em {noformat} to be analyzed by HTMLStripCharFilter and WhitespaceTokenizer. You get back one token for hello. Good. The start offset of this token is at the position of 'h' -- good. But the end offset is surprisingly plus one to the adjacent /em. I argue that it should be plus one to the last character of the token (following 'o'). FYI it behaves as I expect if after hello is an XML entity such as in this example: {noformat}hellonbsp;{noformat} The end offset immediately follows the 'o'. was: Consider this simple input: {noformat} emhello/em {noformat} to be analyzed by HTMLStripCharFilter and WhitespaceTokenizer. You get back one token for hello. Good. The start offset of this token is at the position of 'h' -- good. But the end offset is surprisingly plus one to the adjacent /em. I argue that it should be plus one to the last character of the token (following 'o'). FYI it behaves as I expect if after hello is an nbsp; -- the end offset immediately follows the 'o'. HTMLStripCharFilter end offset should be left of closing tags - Key: LUCENE-5734 URL: https://issues.apache.org/jira/browse/LUCENE-5734 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Reporter: David Smiley Priority: Minor Consider this simple input: {noformat} emhello/em {noformat} to be analyzed by HTMLStripCharFilter and WhitespaceTokenizer. You get back one token for hello. Good. The start offset of this token is at the position of 'h' -- good. But the end offset is surprisingly plus one to the adjacent /em. I argue that it should be plus one to the last character of the token (following 'o'). FYI it behaves as I expect if after hello is an XML entity such as in this example: {noformat}hellonbsp;{noformat} The end offset immediately follows the 'o'. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5734) HTMLStripCharFilter end offset should be left of closing tags
[ https://issues.apache.org/jira/browse/LUCENE-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017822#comment-14017822 ] Alan Woodward commented on LUCENE-5734: --- Steve Rowe and I discussed this a while back - there are good use cases for offsets to be both before and after the trailing tag. I have a separate CharFilter somewhere that reports offsets the way you want here, will try and dig it out and attach it as a patch. HTMLStripCharFilter end offset should be left of closing tags - Key: LUCENE-5734 URL: https://issues.apache.org/jira/browse/LUCENE-5734 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Reporter: David Smiley Priority: Minor Consider this simple input: {noformat} emhello/em {noformat} to be analyzed by HTMLStripCharFilter and WhitespaceTokenizer. You get back one token for hello. Good. The start offset of this token is at the position of 'h' -- good. But the end offset is surprisingly plus one to the adjacent /em. I argue that it should be plus one to the last character of the token (following 'o'). FYI it behaves as I expect if after hello is an XML entity such as in this example: {noformat}hellonbsp;{noformat} The end offset immediately follows the 'o'. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5734) HTMLStripCharFilter end offset should be left of closing tags
[ https://issues.apache.org/jira/browse/LUCENE-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017826#comment-14017826 ] David Smiley commented on LUCENE-5734: -- FYI this triggered my interest because I'm trying to highlight XML. Technically I'm not using Lucene/Solr's highlighter as I'm doing something custom. I'm going to insert special demarcation markup into the source text at the offsets that I find. My current work-around is to detect that the source text has a closing element at the end offset, and then adjust for it if found. Not too hard for me. HTMLStripCharFilter end offset should be left of closing tags - Key: LUCENE-5734 URL: https://issues.apache.org/jira/browse/LUCENE-5734 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Reporter: David Smiley Priority: Minor Consider this simple input: {noformat} emhello/em {noformat} to be analyzed by HTMLStripCharFilter and WhitespaceTokenizer. You get back one token for hello. Good. The start offset of this token is at the position of 'h' -- good. But the end offset is surprisingly plus one to the adjacent /em. I argue that it should be plus one to the last character of the token (following 'o'). FYI it behaves as I expect if after hello is an XML entity such as in this example: {noformat}hellonbsp;{noformat} The end offset immediately follows the 'o'. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5734) HTMLStripCharFilter end offset should be left of closing tags
[ https://issues.apache.org/jira/browse/LUCENE-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017831#comment-14017831 ] David Smiley commented on LUCENE-5734: -- [~romseygeek] ok then it should be configurable, and _consistent_ too. *if* the user wants a closing element offset to be included with the token (as it currently does) then then an adjacent opening element should mark the start of the token too. IMO it shouldn't work this way by default, though. HTMLStripCharFilter end offset should be left of closing tags - Key: LUCENE-5734 URL: https://issues.apache.org/jira/browse/LUCENE-5734 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Reporter: David Smiley Priority: Minor Consider this simple input: {noformat} emhello/em {noformat} to be analyzed by HTMLStripCharFilter and WhitespaceTokenizer. You get back one token for hello. Good. The start offset of this token is at the position of 'h' -- good. But the end offset is surprisingly plus one to the adjacent /em. I argue that it should be plus one to the last character of the token (following 'o'). FYI it behaves as I expect if after hello is an XML entity such as in this example: {noformat}hellonbsp;{noformat} The end offset immediately follows the 'o'. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6136) ConcurrentUpdateSolrServer includes a Spin Lock
Brandon Chapman created SOLR-6136: - Summary: ConcurrentUpdateSolrServer includes a Spin Lock Key: SOLR-6136 URL: https://issues.apache.org/jira/browse/SOLR-6136 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.8.1, 4.8, 4.7.2, 4.7.1, 4.7, 4.6.1, 4.6 Reporter: Brandon Chapman Priority: Critical ConcurrentUpdateSolrServer.blockUntilFinished() includes a Spin Lock. This causes an extremely high amount of CPU to be used on the Cloud Leader during indexing. Here is a summary of our system testing. Importing data on Solr4.5.0: Throughput gets as high as 240 documents per second. [tomcat@solr-stg01 logs]$ uptime 09:53:50 up 310 days, 23:52, 1 user, load average: 3.33, 3.72, 5.43 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9547 tomcat 21 0 6850m 1.2g 16m S 86.2 5.0 1:48.81 java Importing data on Solr4.7.0 with no replicas: Throughput peaks at 350 documents per second. [tomcat@solr-stg01 logs]$ uptime 10:03:44 up 311 days, 2 min, 1 user, load average: 4.57, 2.55, 4.18 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9728 tomcat 23 0 6859m 2.2g 28m S 62.3 9.0 2:20.20 java Importing data on Solr4.7.0 with replicas: Throughput peaks at 30 documents per second because the Solr machine is out of CPU. [tomcat@solr-stg01 logs]$ uptime 09:40:04 up 310 days, 23:38, 1 user, load average: 30.54, 12.39, 4.79 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9190 tomcat 17 0 7005m 397m 15m S 198.5 1.6 7:14.87 java -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6136) ConcurrentUpdateSolrServer includes a Spin Lock
[ https://issues.apache.org/jira/browse/SOLR-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017838#comment-14017838 ] Brandon Chapman commented on SOLR-6136: --- Applying the patch from the linked ticket to Solr 4.5 will cause the same issue to be present in Solr 4.5. ConcurrentUpdateSolrServer includes a Spin Lock --- Key: SOLR-6136 URL: https://issues.apache.org/jira/browse/SOLR-6136 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.6, 4.6.1, 4.7, 4.7.1, 4.7.2, 4.8, 4.8.1 Reporter: Brandon Chapman Priority: Critical ConcurrentUpdateSolrServer.blockUntilFinished() includes a Spin Lock. This causes an extremely high amount of CPU to be used on the Cloud Leader during indexing. Here is a summary of our system testing. Importing data on Solr4.5.0: Throughput gets as high as 240 documents per second. [tomcat@solr-stg01 logs]$ uptime 09:53:50 up 310 days, 23:52, 1 user, load average: 3.33, 3.72, 5.43 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9547 tomcat 21 0 6850m 1.2g 16m S 86.2 5.0 1:48.81 java Importing data on Solr4.7.0 with no replicas: Throughput peaks at 350 documents per second. [tomcat@solr-stg01 logs]$ uptime 10:03:44 up 311 days, 2 min, 1 user, load average: 4.57, 2.55, 4.18 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9728 tomcat 23 0 6859m 2.2g 28m S 62.3 9.0 2:20.20 java Importing data on Solr4.7.0 with replicas: Throughput peaks at 30 documents per second because the Solr machine is out of CPU. [tomcat@solr-stg01 logs]$ uptime 09:40:04 up 310 days, 23:38, 1 user, load average: 30.54, 12.39, 4.79 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9190 tomcat 17 0 7005m 397m 15m S 198.5 1.6 7:14.87 java -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6136) ConcurrentUpdateSolrServer includes a Spin Lock
[ https://issues.apache.org/jira/browse/SOLR-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Chapman updated SOLR-6136: -- Attachment: wait___notify_all.patch Attached patch for Solr 4.7.1 drastically improves performance. The patch is a workaround of the spin lock by using a simple wait / notify mechanism. It is not a suggestion on how to fix ConcurrentUpdateSolrServer for an official release. ConcurrentUpdateSolrServer includes a Spin Lock --- Key: SOLR-6136 URL: https://issues.apache.org/jira/browse/SOLR-6136 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.6, 4.6.1, 4.7, 4.7.1, 4.7.2, 4.8, 4.8.1 Reporter: Brandon Chapman Priority: Critical Attachments: wait___notify_all.patch ConcurrentUpdateSolrServer.blockUntilFinished() includes a Spin Lock. This causes an extremely high amount of CPU to be used on the Cloud Leader during indexing. Here is a summary of our system testing. Importing data on Solr4.5.0: Throughput gets as high as 240 documents per second. [tomcat@solr-stg01 logs]$ uptime 09:53:50 up 310 days, 23:52, 1 user, load average: 3.33, 3.72, 5.43 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9547 tomcat 21 0 6850m 1.2g 16m S 86.2 5.0 1:48.81 java Importing data on Solr4.7.0 with no replicas: Throughput peaks at 350 documents per second. [tomcat@solr-stg01 logs]$ uptime 10:03:44 up 311 days, 2 min, 1 user, load average: 4.57, 2.55, 4.18 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9728 tomcat 23 0 6859m 2.2g 28m S 62.3 9.0 2:20.20 java Importing data on Solr4.7.0 with replicas: Throughput peaks at 30 documents per second because the Solr machine is out of CPU. [tomcat@solr-stg01 logs]$ uptime 09:40:04 up 310 days, 23:38, 1 user, load average: 30.54, 12.39, 4.79 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9190 tomcat 17 0 7005m 397m 15m S 198.5 1.6 7:14.87 java -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5648) Index/search multi-valued time durations
[ https://issues.apache.org/jira/browse/LUCENE-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017875#comment-14017875 ] Ryan McKinley commented on LUCENE-5648: --- This stuff is looking great. java Calendar/Date is mess... would be nice to use joda-time, but adding that as a dependency is not a great idea. The name 'NRShape' and 'NRCell' are a little funny -- maybe NumericRangeShape/NumericRangeCell would be better? I vote +1 to add it as experimental and get more eyes on it Index/search multi-valued time durations Key: LUCENE-5648 URL: https://issues.apache.org/jira/browse/LUCENE-5648 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Assignee: David Smiley Attachments: LUCENE-5648.patch, LUCENE-5648.patch, LUCENE-5648.patch, LUCENE-5648.patch If you need to index a date/time duration, then the way to do that is to have a pair of date fields; one for the start and one for the end -- pretty straight-forward. But if you need to index a variable number of durations per document, then the options aren't pretty, ranging from denormalization, to joins, to using Lucene spatial with 2D as described [here|http://wiki.apache.org/solr/SpatialForTimeDurations]. Ideally it would be easier to index durations, and work in a more optimal way. This issue implements the aforementioned feature using Lucene-spatial with a new single-dimensional SpatialPrefixTree implementation. Unlike the other two SPT implementations, it's not based on floating point numbers. It will have a Date based customization that indexes levels at meaningful quantities like seconds, minutes, hours, etc. The point of that alignment is to make it faster to query across meaningful ranges (i.e. [2000 TO 2014]) and to enable a follow-on issue to facet on the data in a really fast way. I'll expect to have a working patch up this week. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6103) Add DateRangeField
[ https://issues.apache.org/jira/browse/SOLR-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017886#comment-14017886 ] Ryan McKinley commented on SOLR-6103: - +1 Add DateRangeField -- Key: SOLR-6103 URL: https://issues.apache.org/jira/browse/SOLR-6103 Project: Solr Issue Type: New Feature Components: spatial Reporter: David Smiley Assignee: David Smiley Attachments: SOLR-6103.patch LUCENE-5648 introduced a date range index search capability in the spatial module. This issue is for a corresponding Solr FieldType to be named DateRangeField. LUCENE-5648 includes a parseCalendar(String) method that parses a superset of Solr's strict date format. It also parses partial dates (e.g.: 2014-10 has month specificity), and the trailing 'Z' is optional, and a leading +/- may be present (minus indicates BC era), and * means all-time. The proposed field type would use it to parse a string and also both ends of a range query, but furthermore it will also allow an arbitrary range query of the form {{calspec TO calspec}} such as: {noformat}2000 TO 2014-05-21T10{noformat} Which parses as the year 2000 thru 2014 May 21st 10am (GMT). I suggest this syntax because it is aligned with Lucene's range query syntax. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6136) ConcurrentUpdateSolrServer includes a Spin Lock
[ https://issues.apache.org/jira/browse/SOLR-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017896#comment-14017896 ] Timothy Potter commented on SOLR-6136: -- Thanks for the patch Brandon! I'll start working on this issue tomorrow unless someone can dig into it sooner. ConcurrentUpdateSolrServer includes a Spin Lock --- Key: SOLR-6136 URL: https://issues.apache.org/jira/browse/SOLR-6136 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.6, 4.6.1, 4.7, 4.7.1, 4.7.2, 4.8, 4.8.1 Reporter: Brandon Chapman Priority: Critical Attachments: wait___notify_all.patch ConcurrentUpdateSolrServer.blockUntilFinished() includes a Spin Lock. This causes an extremely high amount of CPU to be used on the Cloud Leader during indexing. Here is a summary of our system testing. Importing data on Solr4.5.0: Throughput gets as high as 240 documents per second. [tomcat@solr-stg01 logs]$ uptime 09:53:50 up 310 days, 23:52, 1 user, load average: 3.33, 3.72, 5.43 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9547 tomcat 21 0 6850m 1.2g 16m S 86.2 5.0 1:48.81 java Importing data on Solr4.7.0 with no replicas: Throughput peaks at 350 documents per second. [tomcat@solr-stg01 logs]$ uptime 10:03:44 up 311 days, 2 min, 1 user, load average: 4.57, 2.55, 4.18 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9728 tomcat 23 0 6859m 2.2g 28m S 62.3 9.0 2:20.20 java Importing data on Solr4.7.0 with replicas: Throughput peaks at 30 documents per second because the Solr machine is out of CPU. [tomcat@solr-stg01 logs]$ uptime 09:40:04 up 310 days, 23:38, 1 user, load average: 30.54, 12.39, 4.79 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9190 tomcat 17 0 7005m 397m 15m S 198.5 1.6 7:14.87 java -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5734) HTMLStripCharFilter end offset should be left of closing tags
[ https://issues.apache.org/jira/browse/LUCENE-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017912#comment-14017912 ] Steve Rowe commented on LUCENE-5734: bq. Steve Rowe and I discussed this a while back On twitter: https://twitter.com/romseygeek/status/433553268577681408 HTMLStripCharFilter end offset should be left of closing tags - Key: LUCENE-5734 URL: https://issues.apache.org/jira/browse/LUCENE-5734 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Reporter: David Smiley Priority: Minor Consider this simple input: {noformat} emhello/em {noformat} to be analyzed by HTMLStripCharFilter and WhitespaceTokenizer. You get back one token for hello. Good. The start offset of this token is at the position of 'h' -- good. But the end offset is surprisingly plus one to the adjacent /em. I argue that it should be plus one to the last character of the token (following 'o'). FYI it behaves as I expect if after hello is an XML entity such as in this example: {noformat}hellonbsp;{noformat} The end offset immediately follows the 'o'. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5731) split direct packed ints from in-ram ones
[ https://issues.apache.org/jira/browse/LUCENE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017968#comment-14017968 ] ASF subversion and git services commented on LUCENE-5731: - Commit 1600412 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1600412 ] LUCENE-5731: split out direct packed ints from in-ram ones split direct packed ints from in-ram ones - Key: LUCENE-5731 URL: https://issues.apache.org/jira/browse/LUCENE-5731 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir Attachments: LUCENE-5731.patch, LUCENE-5731.patch Currently there is an oversharing problem in packedints that imposes too many requirements on improving it: * every packed ints must be able to be loaded directly, or in ram, or iterated with. * things like filepointers are expected to be adjusted (this is especially stupid) in all cases * lots of unnecessary abstractions * versioning etc is complex None of this flexibility is needed or buys us anything, and it prevents performance improvements (e.g. i just want to add 3 bytes at the end of on-disk streams to reduce the number of bytebuffer calls and thats seriously impossible with the current situation). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5715) Upgrade direct dependencies known to be older than transitive dependencies
[ https://issues.apache.org/jira/browse/LUCENE-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017998#comment-14017998 ] Steve Rowe commented on LUCENE-5715: Committing shortly. Upgrade direct dependencies known to be older than transitive dependencies -- Key: LUCENE-5715 URL: https://issues.apache.org/jira/browse/LUCENE-5715 Project: Lucene - Core Issue Type: Task Components: general/build Reporter: Steve Rowe Assignee: Steve Rowe Priority: Minor Attachments: LUCENE-5715.patch LUCENE-5442 added functionality to the {{check-lib-versions}} ant task to fail the build if a direct dependency's version conflicts with that of a transitive dependency. {{ivy-ignore-conflicts.properties}} contains a list of 19 transitive dependencies with versions that are newer than direct dependencies' versions: https://issues.apache.org/jira/browse/LUCENE-5442?focusedCommentId=14012220page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14012220 We should try to keep that list small. It's likely that upgrading most of those dependencies will require little effort. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5703) Don't allocate/copy bytes all the time in binary DV producers
[ https://issues.apache.org/jira/browse/LUCENE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018004#comment-14018004 ] Robert Muir commented on LUCENE-5703: - I have a fix, I will update the patch in a bit (also with a test). Don't allocate/copy bytes all the time in binary DV producers - Key: LUCENE-5703 URL: https://issues.apache.org/jira/browse/LUCENE-5703 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Fix For: 4.9, 5.0 Attachments: LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch Our binary doc values producers keep on creating new {{byte[]}} arrays and copying bytes when a value is requested, which likely doesn't help performance. This has been done because of the way fieldcache consumers used the API, but we should try to fix it in 5.0. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5734) HTMLStripCharFilter end offset should be left of closing tags
[ https://issues.apache.org/jira/browse/LUCENE-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018003#comment-14018003 ] David Smiley commented on LUCENE-5734: -- The essential part of that conversation you had on Twitter, [~steve_rowe], is this: {quote} ic - i guess the only awkwardness would be embedded inline tags that produce single tokens: somebthing/b - something {quote} In that case, where the token includes an opening tag, I would expect the end offset to be where it is placed now, after the close tag. But otherwise (the case I presented) I wouldn't expect this. HTMLStripCharFilter end offset should be left of closing tags - Key: LUCENE-5734 URL: https://issues.apache.org/jira/browse/LUCENE-5734 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Reporter: David Smiley Priority: Minor Consider this simple input: {noformat} emhello/em {noformat} to be analyzed by HTMLStripCharFilter and WhitespaceTokenizer. You get back one token for hello. Good. The start offset of this token is at the position of 'h' -- good. But the end offset is surprisingly plus one to the adjacent /em. I argue that it should be plus one to the last character of the token (following 'o'). FYI it behaves as I expect if after hello is an XML entity such as in this example: {noformat}hellonbsp;{noformat} The end offset immediately follows the 'o'. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5728) use slice() api in packedints decode
[ https://issues.apache.org/jira/browse/LUCENE-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5728. - Resolution: Duplicate see LUCENE-5731 use slice() api in packedints decode Key: LUCENE-5728 URL: https://issues.apache.org/jira/browse/LUCENE-5728 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5728.patch Today, for example 8-bpv decoder looks like this: {code} in.seek(startPointer + index); return in.readByte() 0xFF; {code} If instead we take a slice of 'in', we can remove an addition. Its not much, but helps a little. additionally we already (in PackedInts.java) compute the number of bytes, so we could make this an actual slice of the range, which would return an error on abuse instead of garbage data. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5729) explore random-access methods to IndexInput
[ https://issues.apache.org/jira/browse/LUCENE-5729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5729. - Resolution: Duplicate See LUCENE-5731 explore random-access methods to IndexInput --- Key: LUCENE-5729 URL: https://issues.apache.org/jira/browse/LUCENE-5729 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Traditionally lucene access is mostly reading lists of postings and geared at that, but for random-access stuff like docvalues, it just creates overhead. So today we are hacking around it, by doing this random access with seek+readXXX, but this is inefficient (additional checks by the jdk that we dont need). As a hack, I added the following to IndexInput, changed direct packed ints decode to use them, and implemented in MMapDir: {code} byte readByte(long pos) -- ByteBuffer.get(pos) short readShort(long pos) -- ByteBuffer.getShort(pos) int readInt(long pos) -- ByteBuffer.getInt(pos) long readLong(long pos) -- ByteBuffer.getLong(pos) {code} This gives ~30% performance improvement for docvalues (numerics, sorting strings, etc) We should do a few things first before working this (LUCENE-5728: use slice api in decode, pad packed ints so we only have one i/o call ever, etc etc) but I think we need to figure out such an API. It could either be on indexinput like my hack (this is similar to ByteBuffer API with both relative and absolute methods), or we could have a separate API. But i guess arguably IOContext exists to supply hints too, so I dont know which is the way to go. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5715) Upgrade direct dependencies known to be older than transitive dependencies
[ https://issues.apache.org/jira/browse/LUCENE-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018005#comment-14018005 ] Uwe Schindler commented on LUCENE-5715: --- +1, I see no problem. Which module tried to import ASM 5.0_BETA? Upgrade direct dependencies known to be older than transitive dependencies -- Key: LUCENE-5715 URL: https://issues.apache.org/jira/browse/LUCENE-5715 Project: Lucene - Core Issue Type: Task Components: general/build Reporter: Steve Rowe Assignee: Steve Rowe Priority: Minor Attachments: LUCENE-5715.patch LUCENE-5442 added functionality to the {{check-lib-versions}} ant task to fail the build if a direct dependency's version conflicts with that of a transitive dependency. {{ivy-ignore-conflicts.properties}} contains a list of 19 transitive dependencies with versions that are newer than direct dependencies' versions: https://issues.apache.org/jira/browse/LUCENE-5442?focusedCommentId=14012220page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14012220 We should try to keep that list small. It's likely that upgrading most of those dependencies will require little effort. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5731) split direct packed ints from in-ram ones
[ https://issues.apache.org/jira/browse/LUCENE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5731. - Resolution: Fixed Fix Version/s: 5.0 4.9 split direct packed ints from in-ram ones - Key: LUCENE-5731 URL: https://issues.apache.org/jira/browse/LUCENE-5731 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-5731.patch, LUCENE-5731.patch Currently there is an oversharing problem in packedints that imposes too many requirements on improving it: * every packed ints must be able to be loaded directly, or in ram, or iterated with. * things like filepointers are expected to be adjusted (this is especially stupid) in all cases * lots of unnecessary abstractions * versioning etc is complex None of this flexibility is needed or buys us anything, and it prevents performance improvements (e.g. i just want to add 3 bytes at the end of on-disk streams to reduce the number of bytebuffer calls and thats seriously impossible with the current situation). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5731) split direct packed ints from in-ram ones
[ https://issues.apache.org/jira/browse/LUCENE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018007#comment-14018007 ] ASF subversion and git services commented on LUCENE-5731: - Commit 1600423 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1600423 ] LUCENE-5731: split out direct packed ints from in-ram ones split direct packed ints from in-ram ones - Key: LUCENE-5731 URL: https://issues.apache.org/jira/browse/LUCENE-5731 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-5731.patch, LUCENE-5731.patch Currently there is an oversharing problem in packedints that imposes too many requirements on improving it: * every packed ints must be able to be loaded directly, or in ram, or iterated with. * things like filepointers are expected to be adjusted (this is especially stupid) in all cases * lots of unnecessary abstractions * versioning etc is complex None of this flexibility is needed or buys us anything, and it prevents performance improvements (e.g. i just want to add 3 bytes at the end of on-disk streams to reduce the number of bytebuffer calls and thats seriously impossible with the current situation). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5734) HTMLStripCharFilter end offset should be left of closing tags
[ https://issues.apache.org/jira/browse/LUCENE-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018010#comment-14018010 ] Steve Rowe commented on LUCENE-5734: Right, but you can't have it both ways, though - you have to make a choice. HTMLStripCharFilter end offset should be left of closing tags - Key: LUCENE-5734 URL: https://issues.apache.org/jira/browse/LUCENE-5734 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Reporter: David Smiley Priority: Minor Consider this simple input: {noformat} emhello/em {noformat} to be analyzed by HTMLStripCharFilter and WhitespaceTokenizer. You get back one token for hello. Good. The start offset of this token is at the position of 'h' -- good. But the end offset is surprisingly plus one to the adjacent /em. I argue that it should be plus one to the last character of the token (following 'o'). FYI it behaves as I expect if after hello is an XML entity such as in this example: {noformat}hellonbsp;{noformat} The end offset immediately follows the 'o'. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5715) Upgrade direct dependencies known to be older than transitive dependencies
[ https://issues.apache.org/jira/browse/LUCENE-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018013#comment-14018013 ] Steve Rowe commented on LUCENE-5715: bq. Which module tried to import ASM 5.0_BETA? {noformat} [libversions] VERSION CONFLICT: transitive dependency in module(s) solr-test-framework, core-test-framework: [libversions] /com.carrotsearch.randomizedtesting/junit4-ant=2.1.3 [libversions] +-- /org.ow2.asm/asm=5.0_BETA Conflict (direct=4.1) {noformat} Upgrade direct dependencies known to be older than transitive dependencies -- Key: LUCENE-5715 URL: https://issues.apache.org/jira/browse/LUCENE-5715 Project: Lucene - Core Issue Type: Task Components: general/build Reporter: Steve Rowe Assignee: Steve Rowe Priority: Minor Attachments: LUCENE-5715.patch LUCENE-5442 added functionality to the {{check-lib-versions}} ant task to fail the build if a direct dependency's version conflicts with that of a transitive dependency. {{ivy-ignore-conflicts.properties}} contains a list of 19 transitive dependencies with versions that are newer than direct dependencies' versions: https://issues.apache.org/jira/browse/LUCENE-5442?focusedCommentId=14012220page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14012220 We should try to keep that list small. It's likely that upgrading most of those dependencies will require little effort. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5731) split direct packed ints from in-ram ones
[ https://issues.apache.org/jira/browse/LUCENE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018025#comment-14018025 ] Uwe Schindler commented on LUCENE-5731: --- Thanks Robert. I was very busy today, so I had no time to look into it. But from my first check it looks like our idea from the talk yesterday :-) {code:java} @Override public RandomAccessInput randomAccessSlice(long offset, long length) throws IOException { // note: technically we could even avoid the clone... return slice(null, offset, length); } {code} We can avoid the clone not in all cases, because we must duplicate the ByteBuffer, if the offset is different. But for the simple case, if you request the full IndexInput as slice (means offset==null, length==this.length), we could return this. split direct packed ints from in-ram ones - Key: LUCENE-5731 URL: https://issues.apache.org/jira/browse/LUCENE-5731 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-5731.patch, LUCENE-5731.patch Currently there is an oversharing problem in packedints that imposes too many requirements on improving it: * every packed ints must be able to be loaded directly, or in ram, or iterated with. * things like filepointers are expected to be adjusted (this is especially stupid) in all cases * lots of unnecessary abstractions * versioning etc is complex None of this flexibility is needed or buys us anything, and it prevents performance improvements (e.g. i just want to add 3 bytes at the end of on-disk streams to reduce the number of bytebuffer calls and thats seriously impossible with the current situation). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5731) split direct packed ints from in-ram ones
[ https://issues.apache.org/jira/browse/LUCENE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018025#comment-14018025 ] Uwe Schindler edited comment on LUCENE-5731 at 6/4/14 7:07 PM: --- Thanks Robert. I was very busy today, so I had no time to look into it. But from my first check it looks like our idea from the talk yesterday :-) {code:java} @Override public RandomAccessInput randomAccessSlice(long offset, long length) throws IOException { // note: technically we could even avoid the clone... return slice(null, offset, length); } {code} We can avoid the clone not in all cases, because we must duplicate the ByteBuffer, if the offset is different. But for the simple case, if you request the full IndexInput as slice (means offset==0L, length==this.length), we could return this. was (Author: thetaphi): Thanks Robert. I was very busy today, so I had no time to look into it. But from my first check it looks like our idea from the talk yesterday :-) {code:java} @Override public RandomAccessInput randomAccessSlice(long offset, long length) throws IOException { // note: technically we could even avoid the clone... return slice(null, offset, length); } {code} We can avoid the clone not in all cases, because we must duplicate the ByteBuffer, if the offset is different. But for the simple case, if you request the full IndexInput as slice (means offset==null, length==this.length), we could return this. split direct packed ints from in-ram ones - Key: LUCENE-5731 URL: https://issues.apache.org/jira/browse/LUCENE-5731 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-5731.patch, LUCENE-5731.patch Currently there is an oversharing problem in packedints that imposes too many requirements on improving it: * every packed ints must be able to be loaded directly, or in ram, or iterated with. * things like filepointers are expected to be adjusted (this is especially stupid) in all cases * lots of unnecessary abstractions * versioning etc is complex None of this flexibility is needed or buys us anything, and it prevents performance improvements (e.g. i just want to add 3 bytes at the end of on-disk streams to reduce the number of bytebuffer calls and thats seriously impossible with the current situation). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5715) Upgrade direct dependencies known to be older than transitive dependencies
[ https://issues.apache.org/jira/browse/LUCENE-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018036#comment-14018036 ] ASF subversion and git services commented on LUCENE-5715: - Commit 1600444 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1600444 ] LUCENE-5715: Upgrade direct dependencies known to be older than transitive dependencies Upgrade direct dependencies known to be older than transitive dependencies -- Key: LUCENE-5715 URL: https://issues.apache.org/jira/browse/LUCENE-5715 Project: Lucene - Core Issue Type: Task Components: general/build Reporter: Steve Rowe Assignee: Steve Rowe Priority: Minor Attachments: LUCENE-5715.patch LUCENE-5442 added functionality to the {{check-lib-versions}} ant task to fail the build if a direct dependency's version conflicts with that of a transitive dependency. {{ivy-ignore-conflicts.properties}} contains a list of 19 transitive dependencies with versions that are newer than direct dependencies' versions: https://issues.apache.org/jira/browse/LUCENE-5442?focusedCommentId=14012220page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14012220 We should try to keep that list small. It's likely that upgrading most of those dependencies will require little effort. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5731) split direct packed ints from in-ram ones
[ https://issues.apache.org/jira/browse/LUCENE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018025#comment-14018025 ] Uwe Schindler edited comment on LUCENE-5731 at 6/4/14 7:16 PM: --- Thanks Robert. I was very busy today, so I had no time to look into it. But from my first check it looks like our idea from the talk yesterday :-) I was afraid to propose to implement this using an interface, thanks for doing it that way. Otherwise we would have crazyness in ByteBufferIndexInput. The interface hidden behind the randomAccessSlice() method just returning slice() is wonderful. {code:java} @Override public RandomAccessInput randomAccessSlice(long offset, long length) throws IOException { // note: technically we could even avoid the clone... return slice(null, offset, length); } {code} We can avoid the clone not in all cases, because we must duplicate the ByteBuffer, if the offset is different. But for the simple case, if you request the full IndexInput as slice (means offset==0L, length==this.length), we could return this. was (Author: thetaphi): Thanks Robert. I was very busy today, so I had no time to look into it. But from my first check it looks like our idea from the talk yesterday :-) {code:java} @Override public RandomAccessInput randomAccessSlice(long offset, long length) throws IOException { // note: technically we could even avoid the clone... return slice(null, offset, length); } {code} We can avoid the clone not in all cases, because we must duplicate the ByteBuffer, if the offset is different. But for the simple case, if you request the full IndexInput as slice (means offset==0L, length==this.length), we could return this. split direct packed ints from in-ram ones - Key: LUCENE-5731 URL: https://issues.apache.org/jira/browse/LUCENE-5731 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-5731.patch, LUCENE-5731.patch Currently there is an oversharing problem in packedints that imposes too many requirements on improving it: * every packed ints must be able to be loaded directly, or in ram, or iterated with. * things like filepointers are expected to be adjusted (this is especially stupid) in all cases * lots of unnecessary abstractions * versioning etc is complex None of this flexibility is needed or buys us anything, and it prevents performance improvements (e.g. i just want to add 3 bytes at the end of on-disk streams to reduce the number of bytebuffer calls and thats seriously impossible with the current situation). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6134) MapReduce GoLive code improvements
[ https://issues.apache.org/jira/browse/SOLR-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018057#comment-14018057 ] David Smiley commented on SOLR-6134: One small changed needed to satisfy ant precommit is to use a non-default ThreadFactory, such as by doing this: {code:java} final ExecutorService executor = Executors.newFixedThreadPool(options.goLiveThreads, new DefaultSolrThreadFactory(goLive)); {code} MapReduce GoLive code improvements -- Key: SOLR-6134 URL: https://issues.apache.org/jira/browse/SOLR-6134 Project: Solr Issue Type: Improvement Components: contrib - MapReduce Reporter: David Smiley Priority: Minor Attachments: SOLR-6134_GoLive.patch I looked at the GoLive.java source quite a bit and found myself editing the source to make it clearer. It wasn't hard to understand before but I felt it could be better. Furthermore, when not in SolrCloud mode, the commit messages are now submitted asynchronously using the same thread pool used for merging. This refactoring does away with the inner class Result, the CompletionService, and any keeping track of Future's/Result's in collections and looping over them. Fundamentally the code never cared about the result; it just wanted to know if it all worked or not. This refactoring uses Java's Phaser concurrency utility which may seem advanced (especially with the cool name :-) but I find it quite understandable how to use, and is very flexible. I added an inner class implementing Runnable to avoid some duplication across the merge and commit phases. The tests pass but I confess to not having used it for real. I certainly don't feel comfortable committing this until someone does try it; especially try and break it ;-). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5731) split direct packed ints from in-ram ones
[ https://issues.apache.org/jira/browse/LUCENE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018025#comment-14018025 ] Uwe Schindler edited comment on LUCENE-5731 at 6/4/14 7:31 PM: --- Thanks Robert. I was very busy today, so I had no time to look into it. But from my first check it looks like our idea from the talk yesterday :-) I was afraid to propose to implement this using an interface, thanks for doing it that way. Otherwise we would have crazyness in ByteBufferIndexInput. The interface hidden behind the randomAccessSlice() method just returning slice() is wonderful. {code:java} @Override public RandomAccessInput randomAccessSlice(long offset, long length) throws IOException { // note: technically we could even avoid the clone... return slice(null, offset, length); } {code} We can avoid the clone not in all cases, because we must duplicate the ByteBuffer, if the offset is different. But for the simple case, if you request the full IndexInput as slice (means offset==0L, length==this.length), we could return this. EDIT: we cannot do this at the moment, because in the multi-mmap case, we change the bytebuffers's position. So we always have to clone (otherwise the random access slice would have side effects on file position of master slice). was (Author: thetaphi): Thanks Robert. I was very busy today, so I had no time to look into it. But from my first check it looks like our idea from the talk yesterday :-) I was afraid to propose to implement this using an interface, thanks for doing it that way. Otherwise we would have crazyness in ByteBufferIndexInput. The interface hidden behind the randomAccessSlice() method just returning slice() is wonderful. {code:java} @Override public RandomAccessInput randomAccessSlice(long offset, long length) throws IOException { // note: technically we could even avoid the clone... return slice(null, offset, length); } {code} We can avoid the clone not in all cases, because we must duplicate the ByteBuffer, if the offset is different. But for the simple case, if you request the full IndexInput as slice (means offset==0L, length==this.length), we could return this. split direct packed ints from in-ram ones - Key: LUCENE-5731 URL: https://issues.apache.org/jira/browse/LUCENE-5731 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-5731.patch, LUCENE-5731.patch Currently there is an oversharing problem in packedints that imposes too many requirements on improving it: * every packed ints must be able to be loaded directly, or in ram, or iterated with. * things like filepointers are expected to be adjusted (this is especially stupid) in all cases * lots of unnecessary abstractions * versioning etc is complex None of this flexibility is needed or buys us anything, and it prevents performance improvements (e.g. i just want to add 3 bytes at the end of on-disk streams to reduce the number of bytebuffer calls and thats seriously impossible with the current situation). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5715) Upgrade direct dependencies known to be older than transitive dependencies
[ https://issues.apache.org/jira/browse/LUCENE-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018067#comment-14018067 ] ASF subversion and git services commented on LUCENE-5715: - Commit 1600473 from [~steve_rowe] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1600473 ] LUCENE-5715: Upgrade direct dependencies known to be older than transitive dependencies (merged trunk r1600444) Upgrade direct dependencies known to be older than transitive dependencies -- Key: LUCENE-5715 URL: https://issues.apache.org/jira/browse/LUCENE-5715 Project: Lucene - Core Issue Type: Task Components: general/build Reporter: Steve Rowe Assignee: Steve Rowe Priority: Minor Attachments: LUCENE-5715.patch LUCENE-5442 added functionality to the {{check-lib-versions}} ant task to fail the build if a direct dependency's version conflicts with that of a transitive dependency. {{ivy-ignore-conflicts.properties}} contains a list of 19 transitive dependencies with versions that are newer than direct dependencies' versions: https://issues.apache.org/jira/browse/LUCENE-5442?focusedCommentId=14012220page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14012220 We should try to keep that list small. It's likely that upgrading most of those dependencies will require little effort. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5715) Upgrade direct dependencies known to be older than transitive dependencies
[ https://issues.apache.org/jira/browse/LUCENE-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe resolved LUCENE-5715. Resolution: Fixed Fix Version/s: 4.9 Committed to trunk and branch_4x. Upgrade direct dependencies known to be older than transitive dependencies -- Key: LUCENE-5715 URL: https://issues.apache.org/jira/browse/LUCENE-5715 Project: Lucene - Core Issue Type: Task Components: general/build Reporter: Steve Rowe Assignee: Steve Rowe Priority: Minor Fix For: 4.9 Attachments: LUCENE-5715.patch LUCENE-5442 added functionality to the {{check-lib-versions}} ant task to fail the build if a direct dependency's version conflicts with that of a transitive dependency. {{ivy-ignore-conflicts.properties}} contains a list of 19 transitive dependencies with versions that are newer than direct dependencies' versions: https://issues.apache.org/jira/browse/LUCENE-5442?focusedCommentId=14012220page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14012220 We should try to keep that list small. It's likely that upgrading most of those dependencies will require little effort. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene/Solr 5?
Hi all, Just coming back to this question of mine from the fall ... Given that the pace of things has accelerated quite a bit (and quite nicely) lately, does anyone have concrete plans for a 5.0 release yet? Would we be talking summer's end or (hopefully) earlier? Cheers, L On 02/10/2013 16:56, Shawn Heisey wrote: On 10/2/2013 4:34 AM, la...@protulae.com wrote: Your pending 4.5 release reminds me I wanted to ask - what is the expected timeframe for 5.0? Are we talking end of year? Q1 2014? Later? I’m not asking for any commitment or firm date - I would just appreciate an indication of what y’all are thinking of right now. Here's the tail end of a message that I just sent to solr-user: A 4.6.0 release will probably happen before the end of the year, but I can't guarantee that. The release schedule for 5.0 is *completely* undecided. It might be a few months from now, it might be a year from now. Some of the things that have been tentatively planned for that release are nowhere near finished. Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene/Solr 5?
The 4.x branch seems to be doing well enough, both from a stability perspective and momentum with new features. Yeah, a year ago I would have expected a 5.0 around now, but... sometimes reality happens. I'll offer a prediction: 5.0 will happen when the Lucene guys at Elasticsearch come up with some great new ideas for how to leapfrog Solr! (And then we watch how the Heliosearch guys respond to that!) -- Jack Krupansky -Original Message- From: Lajos Sent: Wednesday, June 4, 2014 3:44 PM To: dev@lucene.apache.org Subject: Re: Lucene/Solr 5? Hi all, Just coming back to this question of mine from the fall ... Given that the pace of things has accelerated quite a bit (and quite nicely) lately, does anyone have concrete plans for a 5.0 release yet? Would we be talking summer's end or (hopefully) earlier? Cheers, L On 02/10/2013 16:56, Shawn Heisey wrote: On 10/2/2013 4:34 AM, la...@protulae.com wrote: Your pending 4.5 release reminds me I wanted to ask - what is the expected timeframe for 5.0? Are we talking end of year? Q1 2014? Later? I’m not asking for any commitment or firm date - I would just appreciate an indication of what y’all are thinking of right now. Here's the tail end of a message that I just sent to solr-user: A 4.6.0 release will probably happen before the end of the year, but I can't guarantee that. The release schedule for 5.0 is *completely* undecided. It might be a few months from now, it might be a year from now. Some of the things that have been tentatively planned for that release are nowhere near finished. Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5627) Positional joins
[ https://issues.apache.org/jira/browse/LUCENE-5627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018114#comment-14018114 ] Paul Elschot commented on LUCENE-5627: -- The javadocs here contain some references on what was used to make this. Meanwhile I had another look around and found two somewhat similar implementations: Luxdb: https://github.com/msokolov/lux This uses a TaggedTokenStream for the XML tags, see http://www.slideshare.net/lucenerevolution/querying-rich-text-with-xquery Fangorn: https://code.google.com/p/fangorn/ This indexes each tag by adding a payload with four position numbers (left, right, depth, parent). Its target is large treebanks of linguistically parsed text. A first impression: Both are based on Lucene and add a tree of XML tags like the label tree here. They have a query language implementation which is not available here. They do not have labeled fragments in the sense of having 0..n tokens in more than one field that can form a single leaf in the tag tree. Positional joins Key: LUCENE-5627 URL: https://issues.apache.org/jira/browse/LUCENE-5627 Project: Lucene - Core Issue Type: New Feature Reporter: Paul Elschot Priority: Minor Prototype of analysis and search for labeled fragments -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #1153: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/1153/ 3 tests failed. FAILED: org.apache.solr.cloud.MultiThreadedOCPTest.testDistribSearch Error Message: Task 3002 did not complete, final state: running Stack Trace: java.lang.AssertionError: Task 3002 did not complete, final state: running at __randomizedtesting.SeedInfo.seed([542F77FBEDC2170E:D5C9F9E39A9D7732]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.solr.cloud.MultiThreadedOCPTest.testDeduplicationOfSubmittedTasks(MultiThreadedOCPTest.java:158) at org.apache.solr.cloud.MultiThreadedOCPTest.doTest(MultiThreadedOCPTest.java:67) FAILED: org.apache.solr.cloud.MultiThreadedOCPTest.org.apache.solr.cloud.MultiThreadedOCPTest Error Message: 1 thread leaked from SUITE scope at org.apache.solr.cloud.MultiThreadedOCPTest: 1) Thread[id=7297, name=TEST-MultiThreadedOCPTest.testDistribSearch-seed#[542F77FBEDC2170E]-EventThread, state=RUNNABLE, group=TGRP-MultiThreadedOCPTest] at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:318) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at java.io.PrintStream.write(PrintStream.java:482) at org.apache.maven.surefire.booter.ForkingRunListener.writeTestOutput(ForkingRunListener.java:178) at org.apache.maven.surefire.report.ConsoleOutputCapture$ForwardingPrintStream.write(ConsoleOutputCapture.java:64) at org.apache.maven.surefire.report.ConsoleOutputCapture$ForwardingPrintStream.write(ConsoleOutputCapture.java:73) at java.io.FilterOutputStream.write(FilterOutputStream.java:77) at org.apache.lucene.util.TestRuleLimitSysouts$DelegateStream.write(TestRuleLimitSysouts.java:134) at java.io.FilterOutputStream.write(FilterOutputStream.java:125) at org.apache.lucene.util.TestRuleLimitSysouts$DelegateStream.write(TestRuleLimitSysouts.java:128) at java.io.PrintStream.write(PrintStream.java:480) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59) at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324) at org.apache.log4j.WriterAppender.append(WriterAppender.java:162) at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) at org.apache.log4j.Category.callAppenders(Category.java:206) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:304) at org.apache.solr.cloud.DistributedQueue$LatchChildWatcher.process(DistributedQueue.java:263) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.cloud.MultiThreadedOCPTest: 1) Thread[id=7297, name=TEST-MultiThreadedOCPTest.testDistribSearch-seed#[542F77FBEDC2170E]-EventThread, state=RUNNABLE, group=TGRP-MultiThreadedOCPTest] at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:318) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at java.io.PrintStream.write(PrintStream.java:482) at org.apache.maven.surefire.booter.ForkingRunListener.writeTestOutput(ForkingRunListener.java:178) at org.apache.maven.surefire.report.ConsoleOutputCapture$ForwardingPrintStream.write(ConsoleOutputCapture.java:64) at org.apache.maven.surefire.report.ConsoleOutputCapture$ForwardingPrintStream.write(ConsoleOutputCapture.java:73) at java.io.FilterOutputStream.write(FilterOutputStream.java:77) at org.apache.lucene.util.TestRuleLimitSysouts$DelegateStream.write(TestRuleLimitSysouts.java:134) at java.io.FilterOutputStream.write(FilterOutputStream.java:125) at org.apache.lucene.util.TestRuleLimitSysouts$DelegateStream.write(TestRuleLimitSysouts.java:128) at java.io.PrintStream.write(PrintStream.java:480) at
[jira] [Commented] (SOLR-4408) Server hanging on startup
[ https://issues.apache.org/jira/browse/SOLR-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018168#comment-14018168 ] simpleliving commented on SOLR-4408: I am facing the same exact issue as reported by the reporter of this ticket and I am using Solr 4.7 Server hanging on startup - Key: SOLR-4408 URL: https://issues.apache.org/jira/browse/SOLR-4408 Project: Solr Issue Type: Bug Affects Versions: 4.1 Environment: OpenJDK 64-Bit Server VM (23.2-b09 mixed mode) Tomcat 7.0 Eclipse Juno + WTP Reporter: Francois-Xavier Bonnet Assignee: Erick Erickson Attachments: patch-4408.txt While starting, the server hangs indefinitely. Everything works fine when I first start the server with no index created yet but if I fill the index then stop and start the server, it hangs. Could it be a lock that is never released? Here is what I get in a full thread dump: 2013-02-06 16:28:52 Full thread dump OpenJDK 64-Bit Server VM (23.2-b09 mixed mode): searcherExecutor-4-thread-1 prio=10 tid=0x7fbdfc16a800 nid=0x42c6 in Object.wait() [0x7fbe0ab1] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xc34c1c48 (a java.lang.Object) at java.lang.Object.wait(Object.java:503) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1492) - locked 0xc34c1c48 (a java.lang.Object) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1312) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1247) at org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:94) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:213) at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:112) at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:203) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:180) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:64) at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1594) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) coreLoadExecutor-3-thread-1 prio=10 tid=0x7fbe04194000 nid=0x42c5 in Object.wait() [0x7fbe0ac11000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xc34c1c48 (a java.lang.Object) at java.lang.Object.wait(Object.java:503) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1492) - locked 0xc34c1c48 (a java.lang.Object) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1312) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1247) at org.apache.solr.handler.ReplicationHandler.getIndexVersion(ReplicationHandler.java:495) at org.apache.solr.handler.ReplicationHandler.getStatistics(ReplicationHandler.java:518) at org.apache.solr.core.JmxMonitoredMap$SolrDynamicMBean.getMBeanInfo(JmxMonitoredMap.java:232) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:512) at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:140) at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:51) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:636) at org.apache.solr.core.SolrCore.init(SolrCore.java:809) at org.apache.solr.core.SolrCore.init(SolrCore.java:607) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1003) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at
[jira] [Comment Edited] (SOLR-4408) Server hanging on startup
[ https://issues.apache.org/jira/browse/SOLR-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018168#comment-14018168 ] simpleliving edited comment on SOLR-4408 at 6/4/14 9:00 PM: I am facing the same exact issue as reported by the reporter of this ticket and I am using Solr 4.7 . If there is not index the server starts , if there is an index present it hangs and does not start and I am using the spellcheckers and I can confirm that using spellcheckers cause this issue. was (Author: simpleliving): I am facing the same exact issue as reported by the reporter of this ticket and I am using Solr 4.7 Server hanging on startup - Key: SOLR-4408 URL: https://issues.apache.org/jira/browse/SOLR-4408 Project: Solr Issue Type: Bug Affects Versions: 4.1 Environment: OpenJDK 64-Bit Server VM (23.2-b09 mixed mode) Tomcat 7.0 Eclipse Juno + WTP Reporter: Francois-Xavier Bonnet Assignee: Erick Erickson Attachments: patch-4408.txt While starting, the server hangs indefinitely. Everything works fine when I first start the server with no index created yet but if I fill the index then stop and start the server, it hangs. Could it be a lock that is never released? Here is what I get in a full thread dump: 2013-02-06 16:28:52 Full thread dump OpenJDK 64-Bit Server VM (23.2-b09 mixed mode): searcherExecutor-4-thread-1 prio=10 tid=0x7fbdfc16a800 nid=0x42c6 in Object.wait() [0x7fbe0ab1] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xc34c1c48 (a java.lang.Object) at java.lang.Object.wait(Object.java:503) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1492) - locked 0xc34c1c48 (a java.lang.Object) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1312) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1247) at org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:94) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:213) at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:112) at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:203) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:180) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:64) at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1594) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) coreLoadExecutor-3-thread-1 prio=10 tid=0x7fbe04194000 nid=0x42c5 in Object.wait() [0x7fbe0ac11000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xc34c1c48 (a java.lang.Object) at java.lang.Object.wait(Object.java:503) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1492) - locked 0xc34c1c48 (a java.lang.Object) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1312) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1247) at org.apache.solr.handler.ReplicationHandler.getIndexVersion(ReplicationHandler.java:495) at org.apache.solr.handler.ReplicationHandler.getStatistics(ReplicationHandler.java:518) at org.apache.solr.core.JmxMonitoredMap$SolrDynamicMBean.getMBeanInfo(JmxMonitoredMap.java:232) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:512) at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:140) at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:51) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:636) at
[JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 23286 - Failure!
Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/23286/ 2 tests failed. FAILED: junit.framework.TestSuite.org.apache.lucene.search.TestControlledRealTimeReopenThread Error Message: 1 thread leaked from SUITE scope at org.apache.lucene.search.TestControlledRealTimeReopenThread: 1) Thread[id=119, name=Thread-53, state=TIMED_WAITING, group=TGRP-TestControlledRealTimeReopenThread] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at org.apache.lucene.search.ControlledRealTimeReopenThread.run(ControlledRealTimeReopenThread.java:223) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.lucene.search.TestControlledRealTimeReopenThread: 1) Thread[id=119, name=Thread-53, state=TIMED_WAITING, group=TGRP-TestControlledRealTimeReopenThread] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at org.apache.lucene.search.ControlledRealTimeReopenThread.run(ControlledRealTimeReopenThread.java:223) at __randomizedtesting.SeedInfo.seed([2651BE7982C65DFA]:0) REGRESSION: org.apache.lucene.search.TestControlledRealTimeReopenThread.testCRTReopen Error Message: waited too long for generation 25376 Stack Trace: java.lang.AssertionError: waited too long for generation 25376 at __randomizedtesting.SeedInfo.seed([2651BE7982C65DFA:7471234D5601E583]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.lucene.search.TestControlledRealTimeReopenThread.testCRTReopen(TestControlledRealTimeReopenThread.java:519) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at
[jira] [Updated] (SOLR-6123) The 'clusterstatus' API filtered by collection times out if a long running operation is in progress
[ https://issues.apache.org/jira/browse/SOLR-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-6123: --- Attachment: SOLR-6123.patch Updated patch. The 'clusterstatus' API filtered by collection times out if a long running operation is in progress --- Key: SOLR-6123 URL: https://issues.apache.org/jira/browse/SOLR-6123 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.9 Reporter: Shalin Shekhar Mangar Assignee: Anshum Gupta Fix For: 4.9 Attachments: SOLR-6123.patch, SOLR-6123.patch If a long running shard split is in progress, say for collection=X, then clusterstatus API with collection=X will time out. The OverseerCollectionProcessor should never block an operation such as clusterstatus even if there are tasks for the same collection in progress. This bug was introduced by SOLR-5681. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6123) The 'clusterstatus' API filtered by collection times out if a long running operation is in progress
[ https://issues.apache.org/jira/browse/SOLR-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018309#comment-14018309 ] ASF subversion and git services commented on SOLR-6123: --- Commit 1600535 from [~anshumg] in branch 'dev/trunk' [ https://svn.apache.org/r1600535 ] SOLR-6123: Make CLUSTERSTATE Api unblocked and non-blocking always The 'clusterstatus' API filtered by collection times out if a long running operation is in progress --- Key: SOLR-6123 URL: https://issues.apache.org/jira/browse/SOLR-6123 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.9 Reporter: Shalin Shekhar Mangar Assignee: Anshum Gupta Fix For: 4.9 Attachments: SOLR-6123.patch, SOLR-6123.patch If a long running shard split is in progress, say for collection=X, then clusterstatus API with collection=X will time out. The OverseerCollectionProcessor should never block an operation such as clusterstatus even if there are tasks for the same collection in progress. This bug was introduced by SOLR-5681. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Managed Schema and SolrCloud
Hi Greg, Your understanding is correct, and I agree that this limits managed schema functionality. Under SolrCloud, all Solr nodes participating in a collection bound to a configset with a managed schema keep a watch on the corresponding schema ZK node. In my testing (on my laptop), when the managed schema is written to ZK, the other nodes are notified very quickly (single-digit milliseconds) and immediately download and start parsing the schema. Incoming requests are bound to a snapshot of the live schema at the time they arrive, so there is a window of time between initial posting to ZK and swapping out the schema after parsing. Different loads on, and/or different network latentcy between ZK and each participating node can result in varying latencies before all nodes are in sync. For Schema API users, delaying a couple of seconds after adding fields before using them should workaround this problem. While not ideal, I think schema field additions are rare enough in the Solr collection lifecycle that this is not a huge problem. For schemaless users, the picture is worse, as you noted. Immediate distribution of documents triggering schema field addition could easily prove problematic. Maybe we need a schema update blocking mode, where after the ZK schema node watch is triggered, all new request processing is halted until the schema is finished downloading/parsing/swapping out? Can you make an issue, Greg? (Such a mode should help Schema API users too.) Thanks, Steve On Jun 3, 2014, at 8:06 PM, Gregory Chanan gcha...@cloudera.com wrote: I'm trying to determine if the Managed Schema functionality works with SolrCloud, and AFAICT the integration seems pretty limited. The issue I'm running into is variants of the issue that schema changes are not pushed to all shards/replicas synchronously. So, for example, I can make the following two requests: 1) add a field to the collection on server1 using the Schema API 2) add a document with the new field, the document is routed to a core on server2 Then, there appears to be a race between when the document is processed by the core on server2 and when the core on server2, via the ZkIndexSchemaReader, gets the new schema. If the document is processed first, I get a 400 error because the field doesn't exist. This is easily reproducible by adding a sleep to the ZkIndexSchemaReader's processing. I hit a similar issue with Schemaless: the distributed request handler sends out the document updates, but there is no guarantee that the other shards/replicas see the schema changes made by the update.chain. Is my understanding correct? Is this expected? Thanks, Greg - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6123) The 'clusterstatus' API filtered by collection times out if a long running operation is in progress
[ https://issues.apache.org/jira/browse/SOLR-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018324#comment-14018324 ] ASF subversion and git services commented on SOLR-6123: --- Commit 1600538 from [~anshumg] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1600538 ] SOLR-6123: Make CLUSTERSTATE Api unblocked and non-blocking always (Merge from trunk r1600535) The 'clusterstatus' API filtered by collection times out if a long running operation is in progress --- Key: SOLR-6123 URL: https://issues.apache.org/jira/browse/SOLR-6123 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.9 Reporter: Shalin Shekhar Mangar Assignee: Anshum Gupta Fix For: 4.9 Attachments: SOLR-6123.patch, SOLR-6123.patch If a long running shard split is in progress, say for collection=X, then clusterstatus API with collection=X will time out. The OverseerCollectionProcessor should never block an operation such as clusterstatus even if there are tasks for the same collection in progress. This bug was introduced by SOLR-5681. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-6123) The 'clusterstatus' API filtered by collection times out if a long running operation is in progress
[ https://issues.apache.org/jira/browse/SOLR-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta resolved SOLR-6123. Resolution: Fixed The 'clusterstatus' API filtered by collection times out if a long running operation is in progress --- Key: SOLR-6123 URL: https://issues.apache.org/jira/browse/SOLR-6123 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.9 Reporter: Shalin Shekhar Mangar Assignee: Anshum Gupta Fix For: 4.9 Attachments: SOLR-6123.patch, SOLR-6123.patch If a long running shard split is in progress, say for collection=X, then clusterstatus API with collection=X will time out. The OverseerCollectionProcessor should never block an operation such as clusterstatus even if there are tasks for the same collection in progress. This bug was introduced by SOLR-5681. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6130) solr-cell dependencies weren't fully upgraded with the Tika 1.4-1.5 upgrade
[ https://issues.apache.org/jira/browse/SOLR-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018358#comment-14018358 ] ASF subversion and git services commented on SOLR-6130: --- Commit 1600544 from [~steve_rowe] in branch 'dev/branches/lucene_solr_4_8' [ https://svn.apache.org/r1600544 ] SOLR-6130: Added com.uwyn:jhighlight dependency to, and removed asm:asm dependency from the extraction contrib - dependencies weren't fully upgraded with the Tika 1.4-1.5 upgrade (SOLR-5763) (merged trunk r1599663) solr-cell dependencies weren't fully upgraded with the Tika 1.4-1.5 upgrade Key: SOLR-6130 URL: https://issues.apache.org/jira/browse/SOLR-6130 Project: Solr Issue Type: Bug Affects Versions: 4.8 Reporter: Steve Rowe Assignee: Steve Rowe Fix For: 4.9, 5.0, 4.8.2 Attachments: SOLR-6130.patch There are problems with the solr-cell dependency configuration: # Despite the fact that the asm:asm dependency was removed in LUCENE-4263, and re-addition effectively vetoed by Uwe/Robert in SOLR-4209, asm:asm:3.1 was re-added with no apparent discussion by SOLR-1301 in Solr 4.7. # The Tika 1.5 upgrade (SOLR-5763) failed to properly upgrade the asm:asm:3.1 dependency to org.ow2.asm:asm-debug-all:4.1 (see TIKA-1053). # New Tika dependency com.uwyn:jhighlight:1.0 was not added. [~thetaphi], do you have any opinions on the asm issues? In particular, would it make sense to have an additional asm dependency (asm-debug-all in addition to asm)? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5763) Upgrade to Tika 1.5
[ https://issues.apache.org/jira/browse/SOLR-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018359#comment-14018359 ] ASF subversion and git services commented on SOLR-5763: --- Commit 1600544 from [~steve_rowe] in branch 'dev/branches/lucene_solr_4_8' [ https://svn.apache.org/r1600544 ] SOLR-6130: Added com.uwyn:jhighlight dependency to, and removed asm:asm dependency from the extraction contrib - dependencies weren't fully upgraded with the Tika 1.4-1.5 upgrade (SOLR-5763) (merged trunk r1599663) Upgrade to Tika 1.5 --- Key: SOLR-5763 URL: https://issues.apache.org/jira/browse/SOLR-5763 Project: Solr Issue Type: Task Components: contrib - Solr Cell (Tika extraction) Reporter: Steve Rowe Assignee: Steve Rowe Priority: Minor Fix For: 4.8 Attachments: SOLR-5763.patch, SOLR-5763.patch, SOLR-5763.patch Just released: http://www.apache.org/dist/tika/CHANGES-1.5.txt -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-6130) solr-cell dependencies weren't fully upgraded with the Tika 1.4-1.5 upgrade
[ https://issues.apache.org/jira/browse/SOLR-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe resolved SOLR-6130. -- Resolution: Fixed Committed to trunk, branch_4x, and the lucene_solr_4_8 branch (in case there is a 4.8.2 release) solr-cell dependencies weren't fully upgraded with the Tika 1.4-1.5 upgrade Key: SOLR-6130 URL: https://issues.apache.org/jira/browse/SOLR-6130 Project: Solr Issue Type: Bug Affects Versions: 4.8 Reporter: Steve Rowe Assignee: Steve Rowe Fix For: 4.9, 5.0, 4.8.2 Attachments: SOLR-6130.patch There are problems with the solr-cell dependency configuration: # Despite the fact that the asm:asm dependency was removed in LUCENE-4263, and re-addition effectively vetoed by Uwe/Robert in SOLR-4209, asm:asm:3.1 was re-added with no apparent discussion by SOLR-1301 in Solr 4.7. # The Tika 1.5 upgrade (SOLR-5763) failed to properly upgrade the asm:asm:3.1 dependency to org.ow2.asm:asm-debug-all:4.1 (see TIKA-1053). # New Tika dependency com.uwyn:jhighlight:1.0 was not added. [~thetaphi], do you have any opinions on the asm issues? In particular, would it make sense to have an additional asm dependency (asm-debug-all in addition to asm)? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5703) Don't allocate/copy bytes all the time in binary DV producers
[ https://issues.apache.org/jira/browse/LUCENE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5703: Attachment: LUCENE-5703.patch Updated to trunk. Added TestFieldCacheSortRandom. Fixed bug in the original patch with FC (it cannot share here), so instead getTermsIndex returns a light iterator over the real thing just like docTermsOrds. Because of this, I had to also fix any bad test assumptions around 'same'. Also cleaned up the termsenum in the default codec a bit: the methods like doSeek/doNext are stupid and I removed them. Ill beast and review some more, but this is all looking good. Don't allocate/copy bytes all the time in binary DV producers - Key: LUCENE-5703 URL: https://issues.apache.org/jira/browse/LUCENE-5703 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Fix For: 4.9, 5.0 Attachments: LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch Our binary doc values producers keep on creating new {{byte[]}} arrays and copying bytes when a value is requested, which likely doesn't help performance. This has been done because of the way fieldcache consumers used the API, but we should try to fix it in 5.0. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5648) Index/search multi-valued time durations
[ https://issues.apache.org/jira/browse/LUCENE-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018405#comment-14018405 ] ASF subversion and git services commented on LUCENE-5648: - Commit 1600555 from [~dsmiley] in branch 'dev/trunk' [ https://svn.apache.org/r1600555 ] LUCENE-5648: DateRangePrefixTree and NumberRangePrefixTreeStrategy Index/search multi-valued time durations Key: LUCENE-5648 URL: https://issues.apache.org/jira/browse/LUCENE-5648 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Assignee: David Smiley Attachments: LUCENE-5648.patch, LUCENE-5648.patch, LUCENE-5648.patch, LUCENE-5648.patch If you need to index a date/time duration, then the way to do that is to have a pair of date fields; one for the start and one for the end -- pretty straight-forward. But if you need to index a variable number of durations per document, then the options aren't pretty, ranging from denormalization, to joins, to using Lucene spatial with 2D as described [here|http://wiki.apache.org/solr/SpatialForTimeDurations]. Ideally it would be easier to index durations, and work in a more optimal way. This issue implements the aforementioned feature using Lucene-spatial with a new single-dimensional SpatialPrefixTree implementation. Unlike the other two SPT implementations, it's not based on floating point numbers. It will have a Date based customization that indexes levels at meaningful quantities like seconds, minutes, hours, etc. The point of that alignment is to make it faster to query across meaningful ranges (i.e. [2000 TO 2014]) and to enable a follow-on issue to facet on the data in a really fast way. I'll expect to have a working patch up this week. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5703) Don't allocate/copy bytes all the time in binary DV producers
[ https://issues.apache.org/jira/browse/LUCENE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5703: Attachment: LUCENE-5703.patch Updated patch: folds in an unrelated test bug fix from beasting slow+nightly... Don't allocate/copy bytes all the time in binary DV producers - Key: LUCENE-5703 URL: https://issues.apache.org/jira/browse/LUCENE-5703 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Fix For: 4.9, 5.0 Attachments: LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch Our binary doc values producers keep on creating new {{byte[]}} arrays and copying bytes when a value is requested, which likely doesn't help performance. This has been done because of the way fieldcache consumers used the API, but we should try to fix it in 5.0. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5648) Index/search multi-valued time durations
[ https://issues.apache.org/jira/browse/LUCENE-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley resolved LUCENE-5648. -- Resolution: Fixed Fix Version/s: 5.0 The NR abbreviation is purely used on internal classes and it's referenced a lot so I don't worry about it's succinct name. I committed against 5x. LUCENE-5608 (spatial api refactoring) is a dependency which is still 5x; maybe I should back-port that to 4x now or soon. Or wait a bit to see if further changes may arrive when I try to facet. Index/search multi-valued time durations Key: LUCENE-5648 URL: https://issues.apache.org/jira/browse/LUCENE-5648 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Assignee: David Smiley Fix For: 5.0 Attachments: LUCENE-5648.patch, LUCENE-5648.patch, LUCENE-5648.patch, LUCENE-5648.patch If you need to index a date/time duration, then the way to do that is to have a pair of date fields; one for the start and one for the end -- pretty straight-forward. But if you need to index a variable number of durations per document, then the options aren't pretty, ranging from denormalization, to joins, to using Lucene spatial with 2D as described [here|http://wiki.apache.org/solr/SpatialForTimeDurations]. Ideally it would be easier to index durations, and work in a more optimal way. This issue implements the aforementioned feature using Lucene-spatial with a new single-dimensional SpatialPrefixTree implementation. Unlike the other two SPT implementations, it's not based on floating point numbers. It will have a Date based customization that indexes levels at meaningful quantities like seconds, minutes, hours, etc. The point of that alignment is to make it faster to query across meaningful ranges (i.e. [2000 TO 2014]) and to enable a follow-on issue to facet on the data in a really fast way. I'll expect to have a working patch up this week. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses
[ https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018417#comment-14018417 ] Da Huang commented on LUCENE-4396: -- About scores diff. on BS/BS2 (the same as BNS/BS2) Now, there's scores diff. on BS/BS2, when excuting query like +a b c d I have been told that the reason is indicate by the TODO on ReqOptSumScorer.score() which says that {code} // TODO: sum into a double and cast to float if we ever send required clauses to BS1 {code} However, I don't think so, as the score bias is due to different score calculating orders. Supposed that a doc hits the query +a b c d, the score calculated by BS is {code} BS.score(doc) = ((a.score() + b.score()) + c.score()) + d.score() {code} while the score calculated by BS2 is {code} BS2.score(doc) = a.score() + (float)(b.score() + c.score() + d.score()) {code} Notice that, in BS2, we can only get the float value of (b.score() + c.score() + d.score()) by reqScorer.score(). Furthermore, I have noticed that actually we can control the BS's score calulating order, so that {code} BS.score(doc) = a.score() + ((b.score() + c.score()) + d.score()) {code} However, for BS2, we do not know the calculating order of (b.score() + c.score() + d.score()), as the order is determined by scorer's position in a heap. I still think this matters little. I will rearrange the calculating order of BS.score() at next patch, to see whether it works. BooleanScorer should sometimes be used for MUST clauses --- Key: LUCENE-4396 URL: https://issues.apache.org/jira/browse/LUCENE-4396 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, luceneutil-score-equal.patch, luceneutil-score-equal.patch Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT. If there is one or more MUST clauses we always use BooleanScorer2. But I suspect that unless the MUST clauses have very low hit count compared to the other clauses, that BooleanScorer would perform better than BooleanScorer2. BooleanScorer still has some vestiges from when it used to handle MUST so it shouldn't be hard to bring back this capability ... I think the challenging part might be the heuristics on when to use which (likely we would have to use firstDocID as proxy for total hit count). Likely we should also have BooleanScorer sometimes use .advance() on the subs in this case, eg if suddenly the MUST clause skips 100 docs then you want to .advance() all the SHOULD clauses. I won't have near term time to work on this so feel free to take it if you are inspired! -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6103) Add DateRangeField
[ https://issues.apache.org/jira/browse/SOLR-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018416#comment-14018416 ] ASF subversion and git services commented on SOLR-6103: --- Commit 1600556 from [~dsmiley] in branch 'dev/trunk' [ https://svn.apache.org/r1600556 ] SOLR-6103: Add QParser arg to AbstractSpatialFieldType.parseSpatialArgs(). Make getQueryFromSpatialArgs protected no private. Add DateRangeField -- Key: SOLR-6103 URL: https://issues.apache.org/jira/browse/SOLR-6103 Project: Solr Issue Type: New Feature Components: spatial Reporter: David Smiley Assignee: David Smiley Attachments: SOLR-6103.patch LUCENE-5648 introduced a date range index search capability in the spatial module. This issue is for a corresponding Solr FieldType to be named DateRangeField. LUCENE-5648 includes a parseCalendar(String) method that parses a superset of Solr's strict date format. It also parses partial dates (e.g.: 2014-10 has month specificity), and the trailing 'Z' is optional, and a leading +/- may be present (minus indicates BC era), and * means all-time. The proposed field type would use it to parse a string and also both ends of a range query, but furthermore it will also allow an arbitrary range query of the form {{calspec TO calspec}} such as: {noformat}2000 TO 2014-05-21T10{noformat} Which parses as the year 2000 thru 2014 May 21st 10am (GMT). I suggest this syntax because it is aligned with Lucene's range query syntax. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6103) Add DateRangeField
[ https://issues.apache.org/jira/browse/SOLR-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018418#comment-14018418 ] ASF subversion and git services commented on SOLR-6103: --- Commit 1600557 from [~dsmiley] in branch 'dev/trunk' [ https://svn.apache.org/r1600557 ] SOLR-6103: DateRangeField Add DateRangeField -- Key: SOLR-6103 URL: https://issues.apache.org/jira/browse/SOLR-6103 Project: Solr Issue Type: New Feature Components: spatial Reporter: David Smiley Assignee: David Smiley Attachments: SOLR-6103.patch LUCENE-5648 introduced a date range index search capability in the spatial module. This issue is for a corresponding Solr FieldType to be named DateRangeField. LUCENE-5648 includes a parseCalendar(String) method that parses a superset of Solr's strict date format. It also parses partial dates (e.g.: 2014-10 has month specificity), and the trailing 'Z' is optional, and a leading +/- may be present (minus indicates BC era), and * means all-time. The proposed field type would use it to parse a string and also both ends of a range query, but furthermore it will also allow an arbitrary range query of the form {{calspec TO calspec}} such as: {noformat}2000 TO 2014-05-21T10{noformat} Which parses as the year 2000 thru 2014 May 21st 10am (GMT). I suggest this syntax because it is aligned with Lucene's range query syntax. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-6103) Add DateRangeField
[ https://issues.apache.org/jira/browse/SOLR-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley resolved SOLR-6103. Resolution: Fixed Fix Version/s: 5.0 Committed to 5x for now; intend to move to 4x soon-ish. Please try it out folks! Faceting to come... Add DateRangeField -- Key: SOLR-6103 URL: https://issues.apache.org/jira/browse/SOLR-6103 Project: Solr Issue Type: New Feature Components: spatial Reporter: David Smiley Assignee: David Smiley Fix For: 5.0 Attachments: SOLR-6103.patch LUCENE-5648 introduced a date range index search capability in the spatial module. This issue is for a corresponding Solr FieldType to be named DateRangeField. LUCENE-5648 includes a parseCalendar(String) method that parses a superset of Solr's strict date format. It also parses partial dates (e.g.: 2014-10 has month specificity), and the trailing 'Z' is optional, and a leading +/- may be present (minus indicates BC era), and * means all-time. The proposed field type would use it to parse a string and also both ends of a range query, but furthermore it will also allow an arbitrary range query of the form {{calspec TO calspec}} such as: {noformat}2000 TO 2014-05-21T10{noformat} Which parses as the year 2000 thru 2014 May 21st 10am (GMT). I suggest this syntax because it is aligned with Lucene's range query syntax. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6137) Managed Schema / Schemaless and SolrCloud concurrency issues
Gregory Chanan created SOLR-6137: Summary: Managed Schema / Schemaless and SolrCloud concurrency issues Key: SOLR-6137 URL: https://issues.apache.org/jira/browse/SOLR-6137 Project: Solr Issue Type: Bug Components: Schema and Analysis, SolrCloud Reporter: Gregory Chanan This is a follow up to a message on the mailing list, linked here: http://mail-archives.apache.org/mod_mbox/lucene-dev/201406.mbox/%3CCAKfebOOcMeVEb010SsdcH8nta%3DyonMK5R7dSFOsbJ_tnre0O7w%40mail.gmail.com%3E The Managed Schema integration with SolrCloud seems pretty limited. The issue I'm running into is variants of the issue that schema changes are not pushed to all shards/replicas synchronously. So, for example, I can make the following two requests: 1) add a field to the collection on server1 using the Schema API 2) add a document with the new field, the document is routed to a core on server2 Then, there appears to be a race between when the document is processed by the core on server2 and when the core on server2, via the ZkIndexSchemaReader, gets the new schema. If the document is processed first, I get a 400 error because the field doesn't exist. This is easily reproducible by adding a sleep to the ZkIndexSchemaReader's processing. I hit a similar issue with Schemaless: the distributed request handler sends out the document updates, but there is no guarantee that the other shards/replicas see the schema changes made by the update.chain. Another issue I noticed today: making multiple schema API calls concurrently can block; that is, one may get through and the other may infinite loop. So, for reference, the issues include: 1) Schema API changes return success before all cores are updated; subsequent calls attempting to use new schema may fail 2) Schemaless changes may fail on replicas/other shards for the same reason 3) Concurrent Schema API changes may block From Steve Rowe on the mailing list: {quote} For Schema API users, delaying a couple of seconds after adding fields before using them should workaround this problem. While not ideal, I think schema field additions are rare enough in the Solr collection lifecycle that this is not a huge problem. For schemaless users, the picture is worse, as you noted. Immediate distribution of documents triggering schema field addition could easily prove problematic. Maybe we need a schema update blocking mode, where after the ZK schema node watch is triggered, all new request processing is halted until the schema is finished downloading/parsing/swapping out? (Such a mode should help Schema API users too.) {quote} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5648) Index/search multi-valued time durations
[ https://issues.apache.org/jira/browse/LUCENE-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018422#comment-14018422 ] ASF subversion and git services commented on LUCENE-5648: - Commit 1600560 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1600560 ] LUCENE-5648: unbreak ant test Index/search multi-valued time durations Key: LUCENE-5648 URL: https://issues.apache.org/jira/browse/LUCENE-5648 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Assignee: David Smiley Fix For: 5.0 Attachments: LUCENE-5648.patch, LUCENE-5648.patch, LUCENE-5648.patch, LUCENE-5648.patch If you need to index a date/time duration, then the way to do that is to have a pair of date fields; one for the start and one for the end -- pretty straight-forward. But if you need to index a variable number of durations per document, then the options aren't pretty, ranging from denormalization, to joins, to using Lucene spatial with 2D as described [here|http://wiki.apache.org/solr/SpatialForTimeDurations]. Ideally it would be easier to index durations, and work in a more optimal way. This issue implements the aforementioned feature using Lucene-spatial with a new single-dimensional SpatialPrefixTree implementation. Unlike the other two SPT implementations, it's not based on floating point numbers. It will have a Date based customization that indexes levels at meaningful quantities like seconds, minutes, hours, etc. The point of that alignment is to make it faster to query across meaningful ranges (i.e. [2000 TO 2014]) and to enable a follow-on issue to facet on the data in a really fast way. I'll expect to have a working patch up this week. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6137) Managed Schema / Schemaless and SolrCloud concurrency issues
[ https://issues.apache.org/jira/browse/SOLR-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018425#comment-14018425 ] Gregory Chanan commented on SOLR-6137: -- The Schema API blocking mode is an interesting idea, I'd want to think more about that. In some sense, the schemaless issue seems easier to solve than the Schema API issue. This is because if we run all (or more) of the update chain, instead of just skipping to the distributed update handler on the forwarded nodes, we could have all the cores apply the schema changes, so we are guaranteed of having the correct schema on each core. We'd need to be smarter about trying to update the schema in ZK (as I noted above, concurrent schema changes may fail currently). But that doesn't seem impossible. The Schema API issue does seem more difficult. A blocking mode could work in theory, though I guess one complication is you need to wait for all the cores that use the config, not just all the cores of the collection. Although, perhaps we should just throw in some checks that only one collection is using a certain managed schema config at a time; it may make the logic easier and it seems very unlikely the user actually wants to use the same schema for multiple collections (I did that myself the first time before realizing why it didn't make any sense). As Steve noted above, a blocking mode could be used by the schemaless functionality as well, instead of what I wrote above. Managed Schema / Schemaless and SolrCloud concurrency issues Key: SOLR-6137 URL: https://issues.apache.org/jira/browse/SOLR-6137 Project: Solr Issue Type: Bug Components: Schema and Analysis, SolrCloud Reporter: Gregory Chanan This is a follow up to a message on the mailing list, linked here: http://mail-archives.apache.org/mod_mbox/lucene-dev/201406.mbox/%3CCAKfebOOcMeVEb010SsdcH8nta%3DyonMK5R7dSFOsbJ_tnre0O7w%40mail.gmail.com%3E The Managed Schema integration with SolrCloud seems pretty limited. The issue I'm running into is variants of the issue that schema changes are not pushed to all shards/replicas synchronously. So, for example, I can make the following two requests: 1) add a field to the collection on server1 using the Schema API 2) add a document with the new field, the document is routed to a core on server2 Then, there appears to be a race between when the document is processed by the core on server2 and when the core on server2, via the ZkIndexSchemaReader, gets the new schema. If the document is processed first, I get a 400 error because the field doesn't exist. This is easily reproducible by adding a sleep to the ZkIndexSchemaReader's processing. I hit a similar issue with Schemaless: the distributed request handler sends out the document updates, but there is no guarantee that the other shards/replicas see the schema changes made by the update.chain. Another issue I noticed today: making multiple schema API calls concurrently can block; that is, one may get through and the other may infinite loop. So, for reference, the issues include: 1) Schema API changes return success before all cores are updated; subsequent calls attempting to use new schema may fail 2) Schemaless changes may fail on replicas/other shards for the same reason 3) Concurrent Schema API changes may block From Steve Rowe on the mailing list: {quote} For Schema API users, delaying a couple of seconds after adding fields before using them should workaround this problem. While not ideal, I think schema field additions are rare enough in the Solr collection lifecycle that this is not a huge problem. For schemaless users, the picture is worse, as you noted. Immediate distribution of documents triggering schema field addition could easily prove problematic. Maybe we need a schema update blocking mode, where after the ZK schema node watch is triggered, all new request processing is halted until the schema is finished downloading/parsing/swapping out? (Such a mode should help Schema API users too.) {quote} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Managed Schema and SolrCloud
Thanks for the reply, Steve. I filed SOLR-6137. Greg On Wed, Jun 4, 2014 at 4:08 PM, Steve Rowe sar...@gmail.com wrote: Hi Greg, Your understanding is correct, and I agree that this limits managed schema functionality. Under SolrCloud, all Solr nodes participating in a collection bound to a configset with a managed schema keep a watch on the corresponding schema ZK node. In my testing (on my laptop), when the managed schema is written to ZK, the other nodes are notified very quickly (single-digit milliseconds) and immediately download and start parsing the schema. Incoming requests are bound to a snapshot of the live schema at the time they arrive, so there is a window of time between initial posting to ZK and swapping out the schema after parsing. Different loads on, and/or different network latentcy between ZK and each participating node can result in varying latencies before all nodes are in sync. For Schema API users, delaying a couple of seconds after adding fields before using them should workaround this problem. While not ideal, I think schema field additions are rare enough in the Solr collection lifecycle that this is not a huge problem. For schemaless users, the picture is worse, as you noted. Immediate distribution of documents triggering schema field addition could easily prove problematic. Maybe we need a schema update blocking mode, where after the ZK schema node watch is triggered, all new request processing is halted until the schema is finished downloading/parsing/swapping out? Can you make an issue, Greg? (Such a mode should help Schema API users too.) Thanks, Steve On Jun 3, 2014, at 8:06 PM, Gregory Chanan gcha...@cloudera.com wrote: I'm trying to determine if the Managed Schema functionality works with SolrCloud, and AFAICT the integration seems pretty limited. The issue I'm running into is variants of the issue that schema changes are not pushed to all shards/replicas synchronously. So, for example, I can make the following two requests: 1) add a field to the collection on server1 using the Schema API 2) add a document with the new field, the document is routed to a core on server2 Then, there appears to be a race between when the document is processed by the core on server2 and when the core on server2, via the ZkIndexSchemaReader, gets the new schema. If the document is processed first, I get a 400 error because the field doesn't exist. This is easily reproducible by adding a sleep to the ZkIndexSchemaReader's processing. I hit a similar issue with Schemaless: the distributed request handler sends out the document updates, but there is no guarantee that the other shards/replicas see the schema changes made by the update.chain. Is my understanding correct? Is this expected? Thanks, Greg - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0_20-ea-b15) - Build # 10472 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/10472/ Java: 32bit/jdk1.8.0_20-ea-b15 -server -XX:+UseG1GC 1 tests failed. FAILED: org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest.initializationError Error Message: Suite class org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest should be a concrete class (not abstract). Stack Trace: java.lang.RuntimeException: Suite class org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest should be a concrete class (not abstract). at com.carrotsearch.randomizedtesting.Validation$ClassValidation.isConcreteClass(Validation.java:90) at com.carrotsearch.randomizedtesting.RandomizedRunner.validateTarget(RandomizedRunner.java:1681) at com.carrotsearch.randomizedtesting.RandomizedRunner.init(RandomizedRunner.java:379) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at org.junit.internal.builders.AnnotatedBuilder.buildRunner(AnnotatedBuilder.java:31) at org.junit.internal.builders.AnnotatedBuilder.runnerForClass(AnnotatedBuilder.java:24) at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57) at org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:29) at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57) at org.junit.internal.requests.ClassRequest.getRunner(ClassRequest.java:24) at com.carrotsearch.ant.tasks.junit4.slave.SlaveMain.execute(SlaveMain.java:176) at com.carrotsearch.ant.tasks.junit4.slave.SlaveMain.main(SlaveMain.java:276) at com.carrotsearch.ant.tasks.junit4.slave.SlaveMainSafe.main(SlaveMainSafe.java:12) Build Log: [...truncated 9585 lines...] [junit4] Suite: org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest [junit4] ERROR 0.04s J1 | BaseNonFuzzySpatialOpStrategyTest.initializationError [junit4] Throwable #1: java.lang.RuntimeException: Suite class org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest should be a concrete class (not abstract). [junit4]at com.carrotsearch.randomizedtesting.Validation$ClassValidation.isConcreteClass(Validation.java:90) [junit4]at java.lang.reflect.Constructor.newInstance(Constructor.java:408) [junit4] Completed on J1 in 0.04s, 1 test, 1 error FAILURES! [...truncated 16 lines...] BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:467: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:447: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:45: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:37: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:543: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:2017: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/module-build.xml:60: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1296: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:920: There were test failures: 17 suites, 126 tests, 1 error, 12 ignored (2 assumptions) Total time: 31 minutes 49 seconds Build step 'Invoke Ant' marked build as failure Description set: Java: 32bit/jdk1.8.0_20-ea-b15 -server -XX:+UseG1GC Archiving artifacts Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6138) Solr core load limit
HuangTongwen created SOLR-6138: -- Summary: Solr core load limit Key: SOLR-6138 URL: https://issues.apache.org/jira/browse/SOLR-6138 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 4.7.1, 4.6 Environment: ubuntu 12.04 memery 20G Reporter: HuangTongwen We want to enrich our search ability by solr.We do an exercise for test that how many cores in one machine solr cores can support. We find we can create more than 2000 cores without datas in one machine.But when we create cores with data ,we just can create about 1000 cores,after more t han 1000 cores,we meet many errors like following I will apend it .If you have meets the same or similar problem,please tell me. I would be grateful if you could help me. Hear are some errors: 09:43:29WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:46:15ERROR ShardLeaderElectionContext There was a problem trying to register as the leader:org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /collections/ctest.test.3521/leaders/shard1 09:46:15WARNElectionContext cancelElection did not find election node to remove 09:46:16WARNRecoveryStrategyStopping recovery for zkNodeName=core_node1core=ctest.test.3521 09:46:17ERROR RecoveryStrategyError while trying to recover. core=ctest.test.3521:org.apache.solr.common.SolrException: No registered leader was found, collection:ctest.test.3521 slice:shard1 09:46:17ERROR RecoveryStrategyRecovery failed - trying again... (0) core=ctest.test.3521 09:46:17ERROR RecoveryStrategyRecovery failed - interrupted. core=ctest.test.3521 09:46:17ERROR RecoveryStrategyRecovery failed - I give up. core=ctest.test.3521 09:46:18WARNRecoveryStrategyStopping recovery for zkNodeName=core_node1core=ctest.test.3521 10:01:58ERROR SolrCoreorg.apache.solr.common.SolrException: Error handling 'status' action 10:01:58ERROR SolrDispatchFilter null:org.apache.solr.common.SolrException: Error handling 'status' action 10:15:59ERROR ZkControllerError getting leader from zk 10:15:59ERROR ZkControllerError registering SolrCore:org.apache.solr.common.SolrException: Error getting leader from zk for shard shard1 10:16:18ERROR SolrCoreorg.apache.solr.common.SolrException: Error handling 'status' action 10:16:18
Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0_20-ea-b15) - Build # 10472 - Failure!
Thanks for fixing, Rob. ~ David On Wed, Jun 4, 2014 at 10:49 PM, Policeman Jenkins Server jenk...@thetaphi.de wrote: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/10472/ Java: 32bit/jdk1.8.0_20-ea-b15 -server -XX:+UseG1GC 1 tests failed. FAILED: org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest.initializationError Error Message: Suite class org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest should be a concrete class (not abstract). Stack Trace: java.lang.RuntimeException: Suite class org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest should be a concrete class (not abstract). at com.carrotsearch.randomizedtesting.Validation$ClassValidation.isConcreteClass(Validation.java:90) at com.carrotsearch.randomizedtesting.RandomizedRunner.validateTarget(RandomizedRunner.java:1681) at com.carrotsearch.randomizedtesting.RandomizedRunner.init(RandomizedRunner.java:379) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at org.junit.internal.builders.AnnotatedBuilder.buildRunner(AnnotatedBuilder.java:31) at org.junit.internal.builders.AnnotatedBuilder.runnerForClass(AnnotatedBuilder.java:24) at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57) at org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:29) at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57) at org.junit.internal.requests.ClassRequest.getRunner(ClassRequest.java:24) at com.carrotsearch.ant.tasks.junit4.slave.SlaveMain.execute(SlaveMain.java:176) at com.carrotsearch.ant.tasks.junit4.slave.SlaveMain.main(SlaveMain.java:276) at com.carrotsearch.ant.tasks.junit4.slave.SlaveMainSafe.main(SlaveMainSafe.java:12) Build Log: [...truncated 9585 lines...] [junit4] Suite: org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest [junit4] ERROR 0.04s J1 | BaseNonFuzzySpatialOpStrategyTest.initializationError [junit4] Throwable #1: java.lang.RuntimeException: Suite class org.apache.lucene.spatial.prefix.BaseNonFuzzySpatialOpStrategyTest should be a concrete class (not abstract). [junit4]at com.carrotsearch.randomizedtesting.Validation$ClassValidation.isConcreteClass(Validation.java:90) [junit4]at java.lang.reflect.Constructor.newInstance(Constructor.java:408) [junit4] Completed on J1 in 0.04s, 1 test, 1 error FAILURES! [...truncated 16 lines...] BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:467: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:447: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:45: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:37: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:543: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:2017: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/module-build.xml:60: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1296: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:920: There were test failures: 17 suites, 126 tests, 1 error, 12 ignored (2 assumptions) Total time: 31 minutes 49 seconds Build step 'Invoke Ant' marked build as failure Description set: Java: 32bit/jdk1.8.0_20-ea-b15 -server -XX:+UseG1GC Archiving artifacts Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any
[jira] [Created] (LUCENE-5735) Faceting for DateRangePrefixTree
David Smiley created LUCENE-5735: Summary: Faceting for DateRangePrefixTree Key: LUCENE-5735 URL: https://issues.apache.org/jira/browse/LUCENE-5735 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Assignee: David Smiley The newly added DateRangePrefixTree (DRPT) encodes terms in a fashion amenable to faceting by meaningful time buckets. The motivation for this feature is to efficiently populate a calendar bar chart or [heat-map|http://bl.ocks.org/mbostock/4063318]. It's not hard if you have date instances like many do but it's challenging for date ranges. Internally this is going to iterate over the terms using seek/next with TermsEnum as appropriate. It should be quite efficient; it won't need any special caches. I should be able to re-use SPT traversal code in AbstractVisitingPrefixTreeFilter. If this goes especially well; the underlying implementation will be re-usable for geospatial heat-map faceting. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6137) Managed Schema / Schemaless and SolrCloud concurrency issues
[ https://issues.apache.org/jira/browse/SOLR-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018455#comment-14018455 ] Yonik Seeley commented on SOLR-6137: {quote}In some sense, the schemaless issue seems easier to solve than the Schema API issue. This is because if we run all (or more) of the update chain, instead of just skipping to the distributed update handler on the forwarded nodes, we could have all the cores apply the schema changes, so we are guaranteed of having the correct schema on each core. {quote} Right. Schemaless *should* be a non-issue. The type-guessing logic should be run on replicas as well. If the replica hasn't seen the change yet, then it will guess the same type, and try to add it to the schema. It should fail due to optimistic locking and the fact the leader already added it, re-read the schema, and then successfully find the field. It's the same case as a single node with multiple threads both encountering the new field at around the same time. Although the schema API needs a blocking mode, no blocking mode should be added to schemaless... that's what the optimistic concurrency is for. Managed Schema / Schemaless and SolrCloud concurrency issues Key: SOLR-6137 URL: https://issues.apache.org/jira/browse/SOLR-6137 Project: Solr Issue Type: Bug Components: Schema and Analysis, SolrCloud Reporter: Gregory Chanan This is a follow up to a message on the mailing list, linked here: http://mail-archives.apache.org/mod_mbox/lucene-dev/201406.mbox/%3CCAKfebOOcMeVEb010SsdcH8nta%3DyonMK5R7dSFOsbJ_tnre0O7w%40mail.gmail.com%3E The Managed Schema integration with SolrCloud seems pretty limited. The issue I'm running into is variants of the issue that schema changes are not pushed to all shards/replicas synchronously. So, for example, I can make the following two requests: 1) add a field to the collection on server1 using the Schema API 2) add a document with the new field, the document is routed to a core on server2 Then, there appears to be a race between when the document is processed by the core on server2 and when the core on server2, via the ZkIndexSchemaReader, gets the new schema. If the document is processed first, I get a 400 error because the field doesn't exist. This is easily reproducible by adding a sleep to the ZkIndexSchemaReader's processing. I hit a similar issue with Schemaless: the distributed request handler sends out the document updates, but there is no guarantee that the other shards/replicas see the schema changes made by the update.chain. Another issue I noticed today: making multiple schema API calls concurrently can block; that is, one may get through and the other may infinite loop. So, for reference, the issues include: 1) Schema API changes return success before all cores are updated; subsequent calls attempting to use new schema may fail 2) Schemaless changes may fail on replicas/other shards for the same reason 3) Concurrent Schema API changes may block From Steve Rowe on the mailing list: {quote} For Schema API users, delaying a couple of seconds after adding fields before using them should workaround this problem. While not ideal, I think schema field additions are rare enough in the Solr collection lifecycle that this is not a huge problem. For schemaless users, the picture is worse, as you noted. Immediate distribution of documents triggering schema field addition could easily prove problematic. Maybe we need a schema update blocking mode, where after the ZK schema node watch is triggered, all new request processing is halted until the schema is finished downloading/parsing/swapping out? (Such a mode should help Schema API users too.) {quote} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 23310 - Failure!
Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/23310/ 2 tests failed. FAILED: junit.framework.TestSuite.org.apache.lucene.search.TestControlledRealTimeReopenThread Error Message: 1 thread leaked from SUITE scope at org.apache.lucene.search.TestControlledRealTimeReopenThread: 1) Thread[id=143, name=Thread-71, state=TIMED_WAITING, group=TGRP-TestControlledRealTimeReopenThread] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at org.apache.lucene.search.ControlledRealTimeReopenThread.run(ControlledRealTimeReopenThread.java:223) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.lucene.search.TestControlledRealTimeReopenThread: 1) Thread[id=143, name=Thread-71, state=TIMED_WAITING, group=TGRP-TestControlledRealTimeReopenThread] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at org.apache.lucene.search.ControlledRealTimeReopenThread.run(ControlledRealTimeReopenThread.java:223) at __randomizedtesting.SeedInfo.seed([178AC51FA789A281]:0) REGRESSION: org.apache.lucene.search.TestControlledRealTimeReopenThread.testCRTReopen Error Message: waited too long for generation 20665 Stack Trace: java.lang.AssertionError: waited too long for generation 20665 at __randomizedtesting.SeedInfo.seed([178AC51FA789A281:45AA582B734E1AF8]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.lucene.search.TestControlledRealTimeReopenThread.testCRTReopen(TestControlledRealTimeReopenThread.java:519) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at
[jira] [Commented] (LUCENE-5703) Don't allocate/copy bytes all the time in binary DV producers
[ https://issues.apache.org/jira/browse/LUCENE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018461#comment-14018461 ] Robert Muir commented on LUCENE-5703: - Upon final review: I am unhappy about a few things with the latest patch, mostly doing with safety: * DocValues.EMPTY_XXX is now unsafe, it uses a static mutable thing (BytesRef). We should make these methods instead of constants. This won't ever be performance critical so its ok to me. * Memory and so on should do an array copy instead of returning singleton stuff. If there is a bug in someone's code, it could corrupt the data and get merged into index corruption. I'm ok with someone's bug in their code corrupting their threadlocal code-private byte[], but not the index. We have to draw the line there. Don't allocate/copy bytes all the time in binary DV producers - Key: LUCENE-5703 URL: https://issues.apache.org/jira/browse/LUCENE-5703 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Fix For: 4.9, 5.0 Attachments: LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch Our binary doc values producers keep on creating new {{byte[]}} arrays and copying bytes when a value is requested, which likely doesn't help performance. This has been done because of the way fieldcache consumers used the API, but we should try to fix it in 5.0. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5703) Don't allocate/copy bytes all the time in binary DV producers
[ https://issues.apache.org/jira/browse/LUCENE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018463#comment-14018463 ] Robert Muir commented on LUCENE-5703: - Ill start tackling the EMPTY issue. it shouldn't be controversial at all, but the safety here is mandatory because this constant is used by SegmentMerger. As far as the all-in-ram ones exposing the ability to corrupt the same stuff thats merged, we can think of a number of compromises / solutions, but something must be done: * System.arraycopy * big fat warnings on these that they are unsafe (they are not part of the official index format, so maybe thats ok). * they could keep ahold of their file descriptors and override merge() to stream the data from the file. Don't allocate/copy bytes all the time in binary DV producers - Key: LUCENE-5703 URL: https://issues.apache.org/jira/browse/LUCENE-5703 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Fix For: 4.9, 5.0 Attachments: LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch Our binary doc values producers keep on creating new {{byte[]}} arrays and copying bytes when a value is requested, which likely doesn't help performance. This has been done because of the way fieldcache consumers used the API, but we should try to fix it in 5.0. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-6138) Solr core load limit
[ https://issues.apache.org/jira/browse/SOLR-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Heisey closed SOLR-6138. -- Resolution: Invalid Solr core load limit Key: SOLR-6138 URL: https://issues.apache.org/jira/browse/SOLR-6138 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 4.6, 4.7.1 Environment: ubuntu 12.04 memery 20G Reporter: HuangTongwen Labels: test Original Estimate: 840h Remaining Estimate: 840h We want to enrich our search ability by solr.We do an exercise for test that how many cores in one machine solr cores can support. We find we can create more than 2000 cores without datas in one machine.But when we create cores with data ,we just can create about 1000 cores,after more t han 1000 cores,we meet many errors like following I will apend it .If you have meets the same or similar problem,please tell me. I would be grateful if you could help me. Hear are some errors: 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:46:15 ERROR ShardLeaderElectionContext There was a problem trying to register as the leader:org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /collections/ctest.test.3521/leaders/shard1 09:46:15 WARNElectionContext cancelElection did not find election node to remove 09:46:16 WARNRecoveryStrategyStopping recovery for zkNodeName=core_node1core=ctest.test.3521 09:46:17 ERROR RecoveryStrategyError while trying to recover. core=ctest.test.3521:org.apache.solr.common.SolrException: No registered leader was found, collection:ctest.test.3521 slice:shard1 09:46:17 ERROR RecoveryStrategyRecovery failed - trying again... (0) core=ctest.test.3521 09:46:17 ERROR RecoveryStrategyRecovery failed - interrupted. core=ctest.test.3521 09:46:17 ERROR RecoveryStrategyRecovery failed - I give up. core=ctest.test.3521 09:46:18 WARNRecoveryStrategyStopping recovery for zkNodeName=core_node1core=ctest.test.3521 10:01:58 ERROR SolrCoreorg.apache.solr.common.SolrException: Error handling 'status' action 10:01:58 ERROR SolrDispatchFilter null:org.apache.solr.common.SolrException: Error handling 'status' action 10:15:59 ERROR ZkControllerError getting leader from zk 10:15:59 ERROR
[jira] [Commented] (SOLR-6138) Solr core load limit
[ https://issues.apache.org/jira/browse/SOLR-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018470#comment-14018470 ] Shawn Heisey commented on SOLR-6138: Solr itself does not have any hard limits on the number of cores that you can create, but you are relying on other software than Solr itself. In this case, I believe that you are running into a limitation in zookeeper, a dependency for SolrCloud. Zookeeper has a default max database size of 1MB. Each new collection puts data into zookeeper, and eventually you're going to run into this database size limit. Search the following page for jute.maxbuffer to find out how to increase the maximum database size in zookeeper: http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html I'm going to close this issue, because this is most likely not a problem in Solr. If it does turn out that there is a bug in Solr that is causing this problem, we can re-open the issue. Please direct any followup to the Solr mailing list: http://lucene.apache.org/solr/discussion.html Solr core load limit Key: SOLR-6138 URL: https://issues.apache.org/jira/browse/SOLR-6138 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 4.6, 4.7.1 Environment: ubuntu 12.04 memery 20G Reporter: HuangTongwen Labels: test Original Estimate: 840h Remaining Estimate: 840h We want to enrich our search ability by solr.We do an exercise for test that how many cores in one machine solr cores can support. We find we can create more than 2000 cores without datas in one machine.But when we create cores with data ,we just can create about 1000 cores,after more t han 1000 cores,we meet many errors like following I will apend it .If you have meets the same or similar problem,please tell me. I would be grateful if you could help me. Hear are some errors: 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:43:29 WARNSolrResourceLoader Can't find (or read) directory to add to classloader: /non/existent/dir/yields/warning (resolved as: /non/existent/dir/yields/warning). 09:46:15 ERROR ShardLeaderElectionContext There was a problem trying to register as the leader:org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /collections/ctest.test.3521/leaders/shard1 09:46:15 WARNElectionContext cancelElection did not find election node to remove 09:46:16 WARNRecoveryStrategyStopping recovery for zkNodeName=core_node1core=ctest.test.3521 09:46:17 ERROR RecoveryStrategy
[jira] [Updated] (LUCENE-5703) Don't allocate/copy bytes all the time in binary DV producers
[ https://issues.apache.org/jira/browse/LUCENE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5703: Attachment: LUCENE-5703.patch Updated patch fixing most EMPTY stuff. TermOrdValComparator.MISSING_BYTESREF and other unsafe stuff like that still needs to be fixed. And I did nothing with the in-RAM dv providers. Don't allocate/copy bytes all the time in binary DV producers - Key: LUCENE-5703 URL: https://issues.apache.org/jira/browse/LUCENE-5703 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Fix For: 4.9, 5.0 Attachments: LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch, LUCENE-5703.patch Our binary doc values producers keep on creating new {{byte[]}} arrays and copying bytes when a value is requested, which likely doesn't help performance. This has been done because of the way fieldcache consumers used the API, but we should try to fix it in 5.0. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org