Adding index epoch tracking to Lucene
Hi guys, Lucene today tracks the index generation, which is incremented whenever changes are committed to the index. In LUCENE-4532 I needed to add epoch tracking, which is incremented whenever the index is being re-created. An index is considered to be re-created (for the use case in LUCENE-4532) either when you open IndexWriter with OpenMode.CREATE or when you call IndexWriter.deleteAll(). In LUCENE-4532 I did that through index commit data. I was wondering if others find this information useful, and whether we should add it to Lucene alongside the generation and version tracking. It's just another int/long, and is not supposed to complicate the code or add any runtime complexity. Thoughts? Shai
Re: [JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.6.0_37) - Build # 1459 - Still Failing!
I committed a fix On 6 Nov 2012, at 07:28, Policeman Jenkins Server wrote: Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows/1459/ Java: 32bit/jdk1.6.0_37 -client -XX:+UseConcMarkSweepGC 2 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.schema.TestBinaryField Error Message: 2 threads leaked from SUITE scope at org.apache.solr.schema.TestBinaryField: 1) Thread[id=24, name=metrics-meter-tick-thread-2, state=TIMED_WAITING, group=TGRP-TestBinaryField] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025) at java.util.concurrent.DelayQueue.take(DelayQueue.java:164) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:662)2) Thread[id=23, name=metrics-meter-tick-thread-1, state=TIMED_WAITING, group=TGRP-TestBinaryField] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025) at java.util.concurrent.DelayQueue.take(DelayQueue.java:164) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:662) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 2 threads leaked from SUITE scope at org.apache.solr.schema.TestBinaryField: 1) Thread[id=24, name=metrics-meter-tick-thread-2, state=TIMED_WAITING, group=TGRP-TestBinaryField] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025) at java.util.concurrent.DelayQueue.take(DelayQueue.java:164) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:662) 2) Thread[id=23, name=metrics-meter-tick-thread-1, state=TIMED_WAITING, group=TGRP-TestBinaryField] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025) at java.util.concurrent.DelayQueue.take(DelayQueue.java:164) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:662) at __randomizedtesting.SeedInfo.seed([28100922E87E2C5D]:0) FAILED: junit.framework.TestSuite.org.apache.solr.schema.TestBinaryField Error Message: There are still zombie threads that couldn't be terminated:1) Thread[id=24, name=metrics-meter-tick-thread-2, state=TIMED_WAITING, group=TGRP-TestBinaryField] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025) at java.util.concurrent.DelayQueue.take(DelayQueue.java:164) at
[jira] [Commented] (LUCENE-4540) Allow packed ints norms
[ https://issues.apache.org/jira/browse/LUCENE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491341#comment-13491341 ] Simon Willnauer commented on LUCENE-4540: - +1 - should we also document that we don't have similarities that can make use of it at this point? Allow packed ints norms --- Key: LUCENE-4540 URL: https://issues.apache.org/jira/browse/LUCENE-4540 Project: Lucene - Core Issue Type: Task Components: core/index Reporter: Robert Muir Attachments: LUCENE-4540.patch I was curious what the performance would be, because it might be useful option to use packedints for norms if you have lots of fields and still want good scoring: Today the smallest norm per-field-per-doc you can use is a single byte, and if you have _f_ fields with norms enabled and _n_ docs, it uses _f_ * _n_ bytes of space in RAM. Especially if you aren't using index-time boosting (or even if you are, but not with ridiculous values), this could be wasting a ton of RAM. But then I noticed there was no clean way to allow you to do this in your Similarity: its a trivial patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4537) Move RateLimiter up to Directory and make it IOContext aware
[ https://issues.apache.org/jira/browse/LUCENE-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491345#comment-13491345 ] Simon Willnauer commented on LUCENE-4537: - bq. I think this is some premature optimization. I am not sure if that is premature. But I do agree it would be great if we could just wrap the IndexOutput to do this kind of stuff entirely outside of Directory. Maybe we can have a flush callback on BufferedIndexOutput we can hook into the flush call. This would also enable us to do some flush statistics which is independent of this issue. This could be an impl detail of BufferedIndexOutput but it would enable us to 1. do the opto we do today 2. divorce the rate limiting entirely from Dir. Move RateLimiter up to Directory and make it IOContext aware Key: LUCENE-4537 URL: https://issues.apache.org/jira/browse/LUCENE-4537 Project: Lucene - Core Issue Type: Improvement Components: core/store Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.1, 5.0 Attachments: LUCENE-4537.patch, LUCENE-4537.patch, LUCENE-4537.patch Currently the RateLimiter only applies to FSDirectory which is fine in general but always requires casts and other dir. impls (custom ones could benefit from this too.) We are also only able to rate limit merge operations which limits the functionality here a lot. Since we have the context information what the IndexOutput is used for we can use that for rate limiting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4037) Continuous Ping query caused exception: java.util.concurrent.RejectedExecutionException
Markus Jelsma created SOLR-4037: --- Summary: Continuous Ping query caused exception: java.util.concurrent.RejectedExecutionException Key: SOLR-4037 URL: https://issues.apache.org/jira/browse/SOLR-4037 Project: Solr Issue Type: Bug Affects Versions: 4.0 Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 12:37:38 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. Reporter: Markus Jelsma Fix For: 4.1, 5.0 See: http://lucene.472066.n3.nabble.com/Continuous-Ping-query-caused-exception-java-util-concurrent-RejectedExecutionException-td4017470.html Using this week's trunk we sometime see nodes entering a some funky state where it continuously reports exceptions. Replication and query handling is still possible but there is an increase in CPU time: {code} 2012-11-01 09:24:28,337 INFO [solr.core.SolrCore] - [http-8080-exec-4] - : [openindex_f] webapp=/solr path=/admin/ping params={} status=500 QTime=21 2012-11-01 09:24:28,337 ERROR [solr.core.SolrCore] - [http-8080-exec-4] - : org.apache.solr.common.SolrException: Ping query caused exception: java.util.concurrent.RejectedExecutionException at org.apache.solr.handler.PingRequestHandler.handlePing(PingRequestHandler.java:259) at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:207) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: java.util.concurrent.RejectedExecutionException at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1674) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1330) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1265) at org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:88) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:214) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) at org.apache.solr.handler.PingRequestHandler.handlePing(PingRequestHandler.java:250) ... 19 more Caused by: java.util.concurrent.RejectedExecutionException at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:92) at java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:603) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1605) ... 27 more {code} This won't stop until i restart the servlet container but began in the first place after restarting the servlet container. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see:
[jira] [Created] (SOLR-4038) SolrCloud indexing blocks if node is recovering
Markus Jelsma created SOLR-4038: --- Summary: SolrCloud indexing blocks if node is recovering Key: SOLR-4038 URL: https://issues.apache.org/jira/browse/SOLR-4038 Project: Solr Issue Type: Bug Affects Versions: 4.0 Reporter: Markus Jelsma Fix For: 4.1, 5.0 See: http://lucene.472066.n3.nabble.com/SolrCloud-indexing-blocks-if-node-is-recovering-td4017827.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4038) SolrCloud indexing blocks if node is recovering
[ https://issues.apache.org/jira/browse/SOLR-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated SOLR-4038: Description: See: http://lucene.472066.n3.nabble.com/SolrCloud-indexing-blocks-if-node-is-recovering-td4017827.html While indexing (without CloudSolrServer at that time) one node dies with an OOME perhaps because of the linked issue SOLR-4032. The OOME stack traces are varied but here are some ZK-related logs between the OOME stack traces: {code} 2012-11-02 14:14:37,126 INFO [solr.update.UpdateLog] - [RecoveryThread] - : Dropping buffered updates FSUpdateLog{state=BUFFERING, tlog=null} 2012-11-02 14:14:37,127 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - trying again... (2) core=shard_e 2012-11-02 14:14:37,127 INFO [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Wait 8.0 seconds before trying to recover again (3) 2012-11-02 14:14:45,328 INFO [solr.cloud.ZkController] - [RecoveryThread] - : numShards not found on descriptor - reading it from system property 2012-11-02 14:14:45,363 INFO [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Starting Replication Recovery. core=shard_e 2012-11-02 14:14:45,363 INFO [solrj.impl.HttpClientUtil] - [RecoveryThread] - : Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false 2012-11-02 14:14:45,775 INFO [common.cloud.ZkStateReader] - [main-EventThread] - : A cluster state change has occurred - updating... (10) 2012-11-02 14:14:50,987 INFO [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Begin buffering updates. core=shard_e 2012-11-02 14:14:50,987 INFO [solr.update.UpdateLog] - [RecoveryThread] - : Starting to buffer updates. FSUpdateLog{state=ACTIVE, tlog=null} 2012-11-02 14:14:50,987 INFO [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Attempting to replicate from http://rot05.solrserver:8080/solr/shard_e/. core=shard_e 2012-11-02 14:14:50,987 INFO [solrj.impl.HttpClientUtil] - [RecoveryThread] - : Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false 2012-11-02 14:15:03,303 INFO [solr.core.CachingDirectoryFactory] - [RecoveryThread] - : Releasing directory:/opt/solr/cores/shard_f/data/index 2012-11-02 14:15:03,303 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : removing temporary index download directory files NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/solr/cores/shard_f/data/index.20121102141424591 lockFactory=org.apache.lucene.store.SimpleFSLockFactory@1520a48c; maxCacheMB=48.0 maxMergeSizeMB=4.0) 2012-11-02 14:15:09,421 INFO [apache.zookeeper.ClientCnxn] - [main-SendThread(rot1.zkserver:2181)] - : Client session timed out, have not heard from server in 11873ms for sessionid 0x13abc504486000f, closing socket connection and attempting reconnect 2012-11-02 14:15:09,422 ERROR [solr.core.SolrCore] - [http-8080-exec-1] - : org.apache.solr.common.SolrException: Ping query caused exception: Java heap space . . 2012-11-02 14:15:09,867 INFO [common.cloud.ConnectionManager] - [main-EventThread] - : Watcher org.apache.solr.common.cloud.ConnectionManager@305e7020 name:ZooKeeperConnection Watcher:rot1.zkserver:2181,rot2.zkserver:2181 got event WatchedEvent state:Disconnected type:None path:null path:null type:None 2012-11-02 14:15:09,867 INFO [common.cloud.ConnectionManager] - [main-EventThread] - : zkClient has disconnected 2012-11-02 14:15:09,869 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Error while trying to recover:java.lang.OutOfMemoryError: Java heap space . . 2012-11-02 14:15:10,159 INFO [solr.update.UpdateLog] - [RecoveryThread] - : Dropping buffered updates FSUpdateLog{state=BUFFERING, tlog=null} 2012-11-02 14:15:10,159 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - trying again... (3) core=shard_e 2012-11-02 14:15:10,159 INFO [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Wait 16.0 seconds before trying to recover again (4) 2012-11-02 14:15:09,878 INFO [solr.core.CachingDirectoryFactory] - [RecoveryThread] - : Releasing directory:/opt/solr/cores/shard_f/data/index.20121102141424591 2012-11-02 14:15:10,192 INFO [solr.core.CachingDirectoryFactory] - [RecoveryThread] - : Releasing directory:/opt/solr/cores/shard_f_f/data/index 2012-11-02 14:15:10,192 ERROR [solr.handler.ReplicationHandler] - [RecoveryThread] - : SnapPull failed :org.apache.solr.common.SolrException: Unable to download _773.tvf completely. Downloaded 246 415360!=562327645 . . {code} At this point indexing has already been blocked. Some nodes do not write anything to the logs and the two surrounding nodes are still busy doing some replication. Most nodes show a increased number of state changes: {code} 2012-11-02 14:16:47,768 INFO [common.cloud.ZkStateReader] - [main-EventThread] - : A cluster state change has occurred -
[jira] [Created] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable
Piotr created LUCENE-4542: - Summary: Make RECURSION_CAP in HunspellStemmer configurable Key: LUCENE-4542 URL: https://issues.apache.org/jira/browse/LUCENE-4542 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Reporter: Piotr Priority: Minor Currently there is private static final int RECURSION_CAP = 2; in the code. It makes using hunspell with several dictionaries almost unusable, due to bad performance. It would be nice to be able to tune this number as needed. (it's a first issue in my life, so please forgive me any mistakes done). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable
[ https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Piotr updated LUCENE-4542: -- Description: Currently there is private static final int RECURSION_CAP = 2; in the code of the class HunspellStemmer. It makes using hunspell with several dictionaries almost unusable, due to bad performance. It would be nice to be able to tune this number as needed. (it's a first issue in my life, so please forgive me any mistakes done). was: Currently there is private static final int RECURSION_CAP = 2; in the code. It makes using hunspell with several dictionaries almost unusable, due to bad performance. It would be nice to be able to tune this number as needed. (it's a first issue in my life, so please forgive me any mistakes done). Make RECURSION_CAP in HunspellStemmer configurable -- Key: LUCENE-4542 URL: https://issues.apache.org/jira/browse/LUCENE-4542 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Reporter: Piotr Priority: Minor Currently there is private static final int RECURSION_CAP = 2; in the code of the class HunspellStemmer. It makes using hunspell with several dictionaries almost unusable, due to bad performance. It would be nice to be able to tune this number as needed. (it's a first issue in my life, so please forgive me any mistakes done). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4537) Move RateLimiter up to Directory and make it IOContext aware
[ https://issues.apache.org/jira/browse/LUCENE-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491390#comment-13491390 ] Michael McCandless commented on LUCENE-4537: I think rate limiting merge IO is important functionality: merges easily kill search performance if you index/search on one box (NRT app). But I agree: Directory is abstract and minimal and we should keep it that way. A generic wrapper around any IO would be great ... but I'm not sure how we'd do it? EG, would we have to tally up our own bytes in every write method (writeInt/Long/VInt/VLong/etc.)? Maybe that's acceptable? It's only for writing ... Or maybe we only make RateLimitingBufferedIO subclass? Though I had wanted to try this with RAMDirectory too (playing with Zing)... I guess we could make a RateLimitingRAMOutputStream ... Move RateLimiter up to Directory and make it IOContext aware Key: LUCENE-4537 URL: https://issues.apache.org/jira/browse/LUCENE-4537 Project: Lucene - Core Issue Type: Improvement Components: core/store Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.1, 5.0 Attachments: LUCENE-4537.patch, LUCENE-4537.patch, LUCENE-4537.patch Currently the RateLimiter only applies to FSDirectory which is fine in general but always requires casts and other dir. impls (custom ones could benefit from this too.) We are also only able to rate limit merge operations which limits the functionality here a lot. Since we have the context information what the IndexOutput is used for we can use that for rate limiting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4540) Allow packed ints norms
[ https://issues.apache.org/jira/browse/LUCENE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491391#comment-13491391 ] Michael McCandless commented on LUCENE-4540: +1, very cool! Allow packed ints norms --- Key: LUCENE-4540 URL: https://issues.apache.org/jira/browse/LUCENE-4540 Project: Lucene - Core Issue Type: Task Components: core/index Reporter: Robert Muir Attachments: LUCENE-4540.patch I was curious what the performance would be, because it might be useful option to use packedints for norms if you have lots of fields and still want good scoring: Today the smallest norm per-field-per-doc you can use is a single byte, and if you have _f_ fields with norms enabled and _n_ docs, it uses _f_ * _n_ bytes of space in RAM. Especially if you aren't using index-time boosting (or even if you are, but not with ridiculous values), this could be wasting a ton of RAM. But then I noticed there was no clean way to allow you to do this in your Similarity: its a trivial patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Adding index epoch tracking to Lucene
I'm not sure, but I think something in Solr's replication needed this information? And maybe that's why it uses timestamps today instead...? Mike McCandless http://blog.mikemccandless.com On Tue, Nov 6, 2012 at 3:09 AM, Shai Erera ser...@gmail.com wrote: Hi guys, Lucene today tracks the index generation, which is incremented whenever changes are committed to the index. In LUCENE-4532 I needed to add epoch tracking, which is incremented whenever the index is being re-created. An index is considered to be re-created (for the use case in LUCENE-4532) either when you open IndexWriter with OpenMode.CREATE or when you call IndexWriter.deleteAll(). In LUCENE-4532 I did that through index commit data. I was wondering if others find this information useful, and whether we should add it to Lucene alongside the generation and version tracking. It's just another int/long, and is not supposed to complicate the code or add any runtime complexity. Thoughts? Shai - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable
[ https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Piotr updated LUCENE-4542: -- Priority: Major (was: Minor) Description: Currently there is private static final int RECURSION_CAP = 2; in the code of the class HunspellStemmer. It makes using hunspell with several dictionaries almost unusable, due to bad performance (f.ex. it costs 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for recursion_cap=1). It would be nice to be able to tune this number as needed. AFAIK this number (2) was chosen arbitrary. (it's a first issue in my life, so please forgive me any mistakes done). was: Currently there is private static final int RECURSION_CAP = 2; in the code of the class HunspellStemmer. It makes using hunspell with several dictionaries almost unusable, due to bad performance. It would be nice to be able to tune this number as needed. (it's a first issue in my life, so please forgive me any mistakes done). Make RECURSION_CAP in HunspellStemmer configurable -- Key: LUCENE-4542 URL: https://issues.apache.org/jira/browse/LUCENE-4542 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Reporter: Piotr Currently there is private static final int RECURSION_CAP = 2; in the code of the class HunspellStemmer. It makes using hunspell with several dictionaries almost unusable, due to bad performance (f.ex. it costs 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for recursion_cap=1). It would be nice to be able to tune this number as needed. AFAIK this number (2) was chosen arbitrary. (it's a first issue in my life, so please forgive me any mistakes done). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: concurrentmergescheduller
Hi Radim, While confusing, I think the code is actually nearly correct... but I would love to find some simplifications of CMS's logic (it's really hairy). It turns out mergeThreadCount() is allowed to go higher than maxThreadCount; when this happens, Lucene pauses mergeThreadCount()-maxThreadCount of those merge threads, and resumes them once threads finish (see updateMergeThreads). Ie, CMS will accept up to maxMergeCount merges (and launch threads for them), but will only allow maxThreadCount of those threads to be running at once. So what that while loop is doing is preventing more than maxMergeCount+1 threads from starting, and then pausing the incoming thread to slow down the rate of segment creation (since merging cannot keep up). But ... I think the 1+ is wrong ... it seems like it should just be mergeThreadCount() = maxMergeCount(). Could you please open an issue for this? I'll fix ... and I'll add a comment explaining that confusing loop. Would really be nice to simplify the code too. Thanks! Mike McCandless http://blog.mikemccandless.com On Mon, Nov 5, 2012 at 1:42 PM, Radim Kolar h...@filez.com wrote: i suspect that this code is broken. Lines 331 - 343 in org.apache.lucene.index.ConcurrentMergeScheduler.merge(IndexWriter) mergeThreadCount() are currently active merges, they can be at most maxThreadCount, maxMergeCount is number of queued merges defaulted with maxThreadCount+2 and it can never be lower then maxThreadCount, which means that condition in while can never become true. synchronized(this) { long startStallTime = 0; while (mergeThreadCount() = 1+maxMergeCount) { startStallTime = System.currentTimeMillis(); if (verbose()) { message(too many merges; stalling...); } try { wait(); } catch (InterruptedException ie) { throw new ThreadInterruptedException(ie); } } maxThreadCount+2; - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Source Control
On Sat, 2012-10-27 at 01:02 +0200, Mark Miller wrote: What are peoples thoughts about moving to git? Speaking as a contributor without commit rights and with very little interest in build systems and versioning control, my experience with Lucene hacking so far has had a few technical annoyances: - It is hard to collaborate on a patch, internally in my organization as well as externally - Local backups, independent of SVN, are needed to be sure not to loose code in case of crashes - Patching a dormant patch is confusing at it very quickly gets hard to keep track of which version is current: There is no versioning control for the patches, except for trivial timestamps - When a patch is created against trunk and the JIRA issue is revisited some time later, chances are that the interfaced Lucene/Solr-code has changed. To apply the patch, one needs to hunt down the SVN tag that the patch was generated against What I would like to see is something like GitHub, where everyone can easily fork the code, share it and just point to it in the JIRA issue. A read-only git (or another distributed versioning system) repository alone would only solve the first two problems and then only if I had a place to make a public repository (which admittedly is easy enough with GitHub et al). - Toke Eskildsen, State and University Library, Denmark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4537) Move RateLimiter up to Directory and make it IOContext aware
[ https://issues.apache.org/jira/browse/LUCENE-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491413#comment-13491413 ] Robert Muir commented on LUCENE-4537: - I didn't say it wasn't important. I guess, if its really important, then we'll invest the time to figure out clean APIs to support it. Otherwise we can remove it :) Move RateLimiter up to Directory and make it IOContext aware Key: LUCENE-4537 URL: https://issues.apache.org/jira/browse/LUCENE-4537 Project: Lucene - Core Issue Type: Improvement Components: core/store Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.1, 5.0 Attachments: LUCENE-4537.patch, LUCENE-4537.patch, LUCENE-4537.patch Currently the RateLimiter only applies to FSDirectory which is fine in general but always requires casts and other dir. impls (custom ones could benefit from this too.) We are also only able to rate limit merge operations which limits the functionality here a lot. Since we have the context information what the IndexOutput is used for we can use that for rate limiting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4540) Allow packed ints norms
[ https://issues.apache.org/jira/browse/LUCENE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491415#comment-13491415 ] Robert Muir commented on LUCENE-4540: - I don't understand the question Simon: all the ones we provide happen to use Norm.setByte I don't think we need to add documentation to Norm.setFloat,Norm.setDouble saying that we don't provide any similarities that call these methods: thats not important to anybody. Allow packed ints norms --- Key: LUCENE-4540 URL: https://issues.apache.org/jira/browse/LUCENE-4540 Project: Lucene - Core Issue Type: Task Components: core/index Reporter: Robert Muir Attachments: LUCENE-4540.patch I was curious what the performance would be, because it might be useful option to use packedints for norms if you have lots of fields and still want good scoring: Today the smallest norm per-field-per-doc you can use is a single byte, and if you have _f_ fields with norms enabled and _n_ docs, it uses _f_ * _n_ bytes of space in RAM. Especially if you aren't using index-time boosting (or even if you are, but not with ridiculous values), this could be wasting a ton of RAM. But then I noticed there was no clean way to allow you to do this in your Similarity: its a trivial patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Source Control
- It is hard to collaborate on a patch, internally in my organization as well as externally solr/lucene git repo is on github. You can use git within your organization without problem. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4540) Allow packed ints norms
[ https://issues.apache.org/jira/browse/LUCENE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491417#comment-13491417 ] Simon Willnauer commented on LUCENE-4540: - bq. I don't understand the question Simon: all the ones we provide happen to use Norm.setByte Just to clarify. Currently if we write packed ints and a similarity calls Source#getArray you get an UOE. I think we should document that our current impls won't handle this. Allow packed ints norms --- Key: LUCENE-4540 URL: https://issues.apache.org/jira/browse/LUCENE-4540 Project: Lucene - Core Issue Type: Task Components: core/index Reporter: Robert Muir Attachments: LUCENE-4540.patch I was curious what the performance would be, because it might be useful option to use packedints for norms if you have lots of fields and still want good scoring: Today the smallest norm per-field-per-doc you can use is a single byte, and if you have _f_ fields with norms enabled and _n_ docs, it uses _f_ * _n_ bytes of space in RAM. Especially if you aren't using index-time boosting (or even if you are, but not with ridiculous values), this could be wasting a ton of RAM. But then I noticed there was no clean way to allow you to do this in your Similarity: its a trivial patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4540) Allow packed ints norms
[ https://issues.apache.org/jira/browse/LUCENE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491421#comment-13491421 ] Robert Muir commented on LUCENE-4540: - I don't see how its relevant. Issues will happen if you use Norm.setFloat (as they expect a byte). I'm not going to confuse the documentation. The built-in Similarities at query-time depend upon their index-time norm implementation: this is documented extensively everywhere! Allow packed ints norms --- Key: LUCENE-4540 URL: https://issues.apache.org/jira/browse/LUCENE-4540 Project: Lucene - Core Issue Type: Task Components: core/index Reporter: Robert Muir Attachments: LUCENE-4540.patch I was curious what the performance would be, because it might be useful option to use packedints for norms if you have lots of fields and still want good scoring: Today the smallest norm per-field-per-doc you can use is a single byte, and if you have _f_ fields with norms enabled and _n_ docs, it uses _f_ * _n_ bytes of space in RAM. Especially if you aren't using index-time boosting (or even if you are, but not with ridiculous values), this could be wasting a ton of RAM. But then I noticed there was no clean way to allow you to do this in your Similarity: its a trivial patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4540) Allow packed ints norms
[ https://issues.apache.org/jira/browse/LUCENE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491422#comment-13491422 ] Simon Willnauer commented on LUCENE-4540: - fair enough. I just wanted to mention it.. Allow packed ints norms --- Key: LUCENE-4540 URL: https://issues.apache.org/jira/browse/LUCENE-4540 Project: Lucene - Core Issue Type: Task Components: core/index Reporter: Robert Muir Attachments: LUCENE-4540.patch I was curious what the performance would be, because it might be useful option to use packedints for norms if you have lots of fields and still want good scoring: Today the smallest norm per-field-per-doc you can use is a single byte, and if you have _f_ fields with norms enabled and _n_ docs, it uses _f_ * _n_ bytes of space in RAM. Especially if you aren't using index-time boosting (or even if you are, but not with ridiculous values), this could be wasting a ton of RAM. But then I noticed there was no clean way to allow you to do this in your Similarity: its a trivial patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4540) Allow packed ints norms
[ https://issues.apache.org/jira/browse/LUCENE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491423#comment-13491423 ] Robert Muir commented on LUCENE-4540: - If someone changes their similarity to use a different norm type at index-time than at query-time, then he or she is an idiot! Allow packed ints norms --- Key: LUCENE-4540 URL: https://issues.apache.org/jira/browse/LUCENE-4540 Project: Lucene - Core Issue Type: Task Components: core/index Reporter: Robert Muir Attachments: LUCENE-4540.patch I was curious what the performance would be, because it might be useful option to use packedints for norms if you have lots of fields and still want good scoring: Today the smallest norm per-field-per-doc you can use is a single byte, and if you have _f_ fields with norms enabled and _n_ docs, it uses _f_ * _n_ bytes of space in RAM. Especially if you aren't using index-time boosting (or even if you are, but not with ridiculous values), this could be wasting a ton of RAM. But then I noticed there was no clean way to allow you to do this in your Similarity: its a trivial patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4543) Bring back TFIDFSim.lengthNorm
Robert Muir created LUCENE-4543: --- Summary: Bring back TFIDFSim.lengthNorm Key: LUCENE-4543 URL: https://issues.apache.org/jira/browse/LUCENE-4543 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir We removed this before because of LUCENE-2828, but the problem there was the delegator (not the lengthNorm method). TFIDFSim requires byte[] norms today. So its computeNorm should be final, calling lengthNorm() that returns a byte. This way there is no possibility for you to do something stupid. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4539) DocValues impls should read all headers up-front instead of per-directsource
[ https://issues.apache.org/jira/browse/LUCENE-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491432#comment-13491432 ] Adrien Grand commented on LUCENE-4539: -- Only in tests. This is why I think that writing a full header (including the PackedInts codec name) is useless most of time if not always. DocValues impls should read all headers up-front instead of per-directsource Key: LUCENE-4539 URL: https://issues.apache.org/jira/browse/LUCENE-4539 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Robert Muir Attachments: LUCENE-4539.patch Currently, when DocValues opens, it just opens files. it doesnt read codec headers etc. Instead we read these every single time a directsource opens. I think it should work like PostingsReaders: e.g. the PackedInts impl would read its versioning info and codec headers and creating a new Direct impl should be a IndexInput.clone() + getDirectReaderNoHeader(). Today its much more costly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4538) Cache DocValues DirecSource
[ https://issues.apache.org/jira/browse/LUCENE-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-4538: Attachment: LUCENE-4538.patch here is a new patch makeing the loadSource loadDirectSource protected. This is really confusing if you have two ways to get a Source instance and you need to take care of if it is cached or not. This really should not have been public at all. I will commit this soon Cache DocValues DirecSource Key: LUCENE-4538 URL: https://issues.apache.org/jira/browse/LUCENE-4538 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.1, 5.0 Attachments: LUCENE-4538.patch, LUCENE-4538.patch Currently the user need to make sure that a direct source is not shared between threads and each time someone calls getDirectSource we create a new source which has a reasonable overhead. We can certainly reduce the overhead (maybe different issue) but it should be easier for the user to get a direct source and handle it. More than that is should be consistent with getSource / loadSource. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4536) Make PackedInts byte-aligned?
[ https://issues.apache.org/jira/browse/LUCENE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491449#comment-13491449 ] Michael McCandless commented on LUCENE-4536: This patch only changes the on-disk format right? The specialized in-memory readers are still backed by native arrays (short[]/int[]/long[], etc.)? Instead of PackedInts.VERSION_CURRENT = 1 can we add VERSION_BYTE_ALIGNED = 1 and then set VERSION_CURRENT = VERSION_BYTE_ALIGNED? Also, can we leave VERSION_START = 0 (ie don't rename that to VERSION_LONG_ALIGNED)? But we should put a comment saying that one was long aligned ... Ie, in general, I think the version constants should be created once and then not changed (write once), and VERSION_CURRENT changes to point to whichever is most recent. That careful anonymous subclass in PackedInts to handle seeking to the end when the last value is read is sort of sneaky ... this should only kick in when reading the old (long-aligned) format right? Can you add an assert checking that the version is VERSION_START? Or ... maybe ... we should not promise this (no trailing wasted bytes) in the API? Or maybe we expose a new explicit method to seek to the end of this packed ints or something (eg maybe skipTrailingBytes). Make PackedInts byte-aligned? - Key: LUCENE-4536 URL: https://issues.apache.org/jira/browse/LUCENE-4536 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 4.1 Attachments: LUCENE-4536.patch PackedInts are more and more used to save/restore small arrays, but given that they are long-aligned, up to 63 bits are wasted per array. We should try to make PackedInts storage byte-aligned so that only 7 bits are wasted in the worst case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4539) DocValues impls should read all headers up-front instead of per-directsource
[ https://issues.apache.org/jira/browse/LUCENE-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491454#comment-13491454 ] Robert Muir commented on LUCENE-4539: - I agree with you its bogus how it writes its header. But I see a downside (I hope we can come up with an idea to deal with it rather than keeping the header!) One advantage of PackedInts writing its versioning (like FSTs) is that lots of things nest them in their own file. The problem with these two things is that they are themselves changing and versioned: they arent like readVint() which is pretty much fixed in what it does. So having them write their own versions etc today to some extent makes back compat management of file formats easier: today its just DocValues and Term dictionaries using these things, tomorrow (4.1) its also the postings lists: documents, frequencies, and positions, and maybe in the future even stored fields (LUCENE-4527). Who is keeping up with all the places that must be managed when a packed ints version change needs to happen? Today the header encapsulates in one place: if i backwards break FSTs and it breaks a few suggester impls, i know anyone using those suggesters will get IndexFormatTooOldException without me doing anything. So thats very convenient. DocValues impls should read all headers up-front instead of per-directsource Key: LUCENE-4539 URL: https://issues.apache.org/jira/browse/LUCENE-4539 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Robert Muir Attachments: LUCENE-4539.patch Currently, when DocValues opens, it just opens files. it doesnt read codec headers etc. Instead we read these every single time a directsource opens. I think it should work like PostingsReaders: e.g. the PackedInts impl would read its versioning info and codec headers and creating a new Direct impl should be a IndexInput.clone() + getDirectReaderNoHeader(). Today its much more costly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4536) Make PackedInts byte-aligned?
[ https://issues.apache.org/jira/browse/LUCENE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491456#comment-13491456 ] Adrien Grand commented on LUCENE-4536: -- bq. This patch only changes the on-disk format right? The specialized in-memory readers are still backed by native arrays (short[]/int[]/long[], etc.)? Exactly. bq. Ie, in general, I think the version constants should be created once and then not changed (write once), and VERSION_CURRENT changes to point to whichever is most recent. Ok, I'll change it. bq. That careful anonymous subclass in PackedInts to handle seeking to the end when the last value is read is sort of sneaky ... this should only kick in when reading the old (long-aligned) format right? This only happens when reading the old format AND the number of bytes used to serialized the array is not a multiple of 8. I'll add an assert to make sure that this condition can only be true with the old format. bq. Or ... maybe... we should not promise this (no trailing wasted bytes) in the API? bq. Or maybe we expose a new explicit method to seek to the end of this packed ints or something (eg maybe skipTrailingBytes). These were my first ideas, but the truth is that I was very scared to break something (for example doc values rely on the assumption that after reading the last value of a direct array, the whole stream is consumed). Fixing PackedInts to make sure those assumptions are still true looked easier to me as I was able to create fake long-aligned packed ints and make sure that the whole stream was consumed after reading the last value. But your option makes perfect sense to me and I will do it if you think it is cleaner. Thanks for the review! Make PackedInts byte-aligned? - Key: LUCENE-4536 URL: https://issues.apache.org/jira/browse/LUCENE-4536 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 4.1 Attachments: LUCENE-4536.patch PackedInts are more and more used to save/restore small arrays, but given that they are long-aligned, up to 63 bits are wasted per array. We should try to make PackedInts storage byte-aligned so that only 7 bits are wasted in the worst case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4532) TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy failure
[ https://issues.apache.org/jira/browse/LUCENE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491466#comment-13491466 ] Gilad Barkai commented on LUCENE-4532: -- Reviewed the patch - and it looks very good. A few comments: 1. in TestDirectoryTaxonomyWriter.java, the error string _index.create.time not found in commitData_ should be updated. 2. if the index creation time is in the commit data, it will not be removed - as the epoch is added to whatever commit data was read from the index. I think perhaps it should be removed? 3. since the members related to the old 'timestamp' method are removed - no test could check the migration from old to new methods. Might be a good idea to add one with a comment to remove it when backward compatibility is no longer required (Lucene 6?). 4. 'Epoch' is usually in the context of time, or in relation of a period. Perhaps the name 'version' is more closely related to the implementation? TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy failure Key: LUCENE-4532 URL: https://issues.apache.org/jira/browse/LUCENE-4532 Project: Lucene - Core Issue Type: Bug Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4532.patch The following failure on Jenkins: {noformat} Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/1404/ Java: 32bit/jdk1.6.0_37 -client -XX:+UseConcMarkSweepGC 1 tests failed. REGRESSION: org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy Error Message: Stack Trace: java.lang.ArrayIndexOutOfBoundsException at __randomizedtesting.SeedInfo.seed([6AB10D3E4E956CFA:BFB2863DB7E077E0]:0) at java.lang.System.arraycopy(Native Method) at org.apache.lucene.facet.taxonomy.directory.ParentArray.refresh(ParentArray.java:99) at org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyReader.refresh(DirectoryTaxonomyReader.java:407) at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.doTestReadRecreatedTaxono(TestDirectoryTaxonomyReader.java:167) at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy(TestDirectoryTaxonomyReader.java:130) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at
Re: Source Control
One issue is how to use git and github. One can certainly use it as if it were svn, but that misses a lot of the power of git, particularly the collaborative tools on github. For example, one approach is to create a branch for every Jira ticket and then instead of posting raw patches on the Jira ticket, create git pull requests from the branch, which make it easy to comment on individual file changes, right down to comments on individual lines of code. Changes can be committed and pushed to the branch as work continues and new pull requests generated. Eventually, pull requests can then be easily merged into the master, as desired. Users can selectively include pull requests as they see fit as well. But... can all of us, even non-committers do that? Or would the better features of github be available only to committers? I don't know enough about github to know whether you can have one class of user able to create branches or comment on them but not merge into master or tagged branches such as releases. -- Jack Krupansky -Original Message- From: Mark Miller Sent: Friday, October 26, 2012 7:02 PM To: dev@lucene.apache.org Subject: Source Control So, it's not everyone's favorite tool, but it sure seems to be the most popular tool. What are peoples thoughts about moving to git? Distributed version control is where it's at :) I know some prefer mercurial, but git and github clearly are taking over the world. Also, the cmd line for git is a little eccentric - I use a GUI client called SmartGit. Some very clever German's make it. A few Apache projects are already using git. I'd like to hear what people feel about this idea. - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4536) Make PackedInts byte-aligned?
[ https://issues.apache.org/jira/browse/LUCENE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491479#comment-13491479 ] Michael McCandless commented on LUCENE-4536: bq. These were my first ideas, but the truth is that I was very scared to break something (for example doc values rely on the assumption that after reading the last value of a direct array, the whole stream is consumed). It's hard to know what's best :) I like the explicitness / transparency / no sneaky code solution of .skipTrailingBytes(). But then I don't like that skipTrailingBytes would only be for back compat (ie, we will remove it eventually, unless somehow we go back to wasted trailing bytes) ... annoying to add essentially a deprecated API. But then really it's presumptuous of the consumers of PackedInts to expect all bytes are consumed after iterating all values ... like that's making a sometimes invalid assumption about the file format of PackedInts. And this is an internal API so we are free to change things .. But net/net I think we should stick w/ your current patch? Make PackedInts byte-aligned? - Key: LUCENE-4536 URL: https://issues.apache.org/jira/browse/LUCENE-4536 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 4.1 Attachments: LUCENE-4536.patch PackedInts are more and more used to save/restore small arrays, but given that they are long-aligned, up to 63 bits are wasted per array. We should try to make PackedInts storage byte-aligned so that only 7 bits are wasted in the worst case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4544) possible bug in ConcurrentMergeScheduler.merge(IndexWriter)
Radim Kolar created LUCENE-4544: --- Summary: possible bug in ConcurrentMergeScheduler.merge(IndexWriter) Key: LUCENE-4544 URL: https://issues.apache.org/jira/browse/LUCENE-4544 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 5.0 Reporter: Radim Kolar from dev list: ¨i suspect that this code is broken. Lines 331 - 343 in org.apache.lucene.index.ConcurrentMergeScheduler.merge(IndexWriter) mergeThreadCount() are currently active merges, they can be at most maxThreadCount, maxMergeCount is number of queued merges defaulted with maxThreadCount+2 and it can never be lower then maxThreadCount, which means that condition in while can never become true. synchronized(this) { long startStallTime = 0; while (mergeThreadCount() = 1+maxMergeCount) { startStallTime = System.currentTimeMillis(); if (verbose()) { message(too many merges; stalling...); } try { wait(); } catch (InterruptedException ie) { throw new ThreadInterruptedException(ie); } } While confusing, I think the code is actually nearly correct... but I would love to find some simplifications of CMS's logic (it's really hairy). It turns out mergeThreadCount() is allowed to go higher than maxThreadCount; when this happens, Lucene pauses mergeThreadCount()-maxThreadCount of those merge threads, and resumes them once threads finish (see updateMergeThreads). Ie, CMS will accept up to maxMergeCount merges (and launch threads for them), but will only allow maxThreadCount of those threads to be running at once. So what that while loop is doing is preventing more than maxMergeCount+1 threads from starting, and then pausing the incoming thread to slow down the rate of segment creation (since merging cannot keep up). But ... I think the 1+ is wrong ... it seems like it should just be mergeThreadCount() = maxMergeCount(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4037) Continuous Ping query caused exception: java.util.concurrent.RejectedExecutionException
[ https://issues.apache.org/jira/browse/SOLR-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491485#comment-13491485 ] Markus Jelsma commented on SOLR-4037: - We've also seen these exceptions when fireing normal queries _after_ restarting all nodes in a sequence. Clearing ZK data and restarting again is a quick fix. {code} java.util.concurrent.RejectedExecutionException at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:152) at org.apache.solr.handler.component.HttpShardHandler.submit(HttpShardHandler.java:190) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} Continuous Ping query caused exception: java.util.concurrent.RejectedExecutionException --- Key: SOLR-4037 URL: https://issues.apache.org/jira/browse/SOLR-4037 Project: Solr Issue Type: Bug Affects Versions: 4.0 Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 12:37:38 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. Reporter: Markus Jelsma Fix For: 4.1, 5.0 See: http://lucene.472066.n3.nabble.com/Continuous-Ping-query-caused-exception-java-util-concurrent-RejectedExecutionException-td4017470.html Using this week's trunk we sometime see nodes entering a some funky state where it continuously reports exceptions. Replication and query handling is still possible but there is an increase in CPU time: {code} 2012-11-01 09:24:28,337 INFO [solr.core.SolrCore] - [http-8080-exec-4] - : [openindex_f] webapp=/solr path=/admin/ping params={} status=500 QTime=21 2012-11-01 09:24:28,337 ERROR [solr.core.SolrCore] - [http-8080-exec-4] - : org.apache.solr.common.SolrException: Ping query caused exception: java.util.concurrent.RejectedExecutionException at org.apache.solr.handler.PingRequestHandler.handlePing(PingRequestHandler.java:259) at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:207) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at
[jira] [Assigned] (LUCENE-4544) possible bug in ConcurrentMergeScheduler.merge(IndexWriter)
[ https://issues.apache.org/jira/browse/LUCENE-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-4544: -- Assignee: Michael McCandless possible bug in ConcurrentMergeScheduler.merge(IndexWriter) Key: LUCENE-4544 URL: https://issues.apache.org/jira/browse/LUCENE-4544 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 5.0 Reporter: Radim Kolar Assignee: Michael McCandless from dev list: ¨i suspect that this code is broken. Lines 331 - 343 in org.apache.lucene.index.ConcurrentMergeScheduler.merge(IndexWriter) mergeThreadCount() are currently active merges, they can be at most maxThreadCount, maxMergeCount is number of queued merges defaulted with maxThreadCount+2 and it can never be lower then maxThreadCount, which means that condition in while can never become true. synchronized(this) { long startStallTime = 0; while (mergeThreadCount() = 1+maxMergeCount) { startStallTime = System.currentTimeMillis(); if (verbose()) { message(too many merges; stalling...); } try { wait(); } catch (InterruptedException ie) { throw new ThreadInterruptedException(ie); } } While confusing, I think the code is actually nearly correct... but I would love to find some simplifications of CMS's logic (it's really hairy). It turns out mergeThreadCount() is allowed to go higher than maxThreadCount; when this happens, Lucene pauses mergeThreadCount()-maxThreadCount of those merge threads, and resumes them once threads finish (see updateMergeThreads). Ie, CMS will accept up to maxMergeCount merges (and launch threads for them), but will only allow maxThreadCount of those threads to be running at once. So what that while loop is doing is preventing more than maxMergeCount+1 threads from starting, and then pausing the incoming thread to slow down the rate of segment creation (since merging cannot keep up). But ... I think the 1+ is wrong ... it seems like it should just be mergeThreadCount() = maxMergeCount(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-4538) Cache DocValues DirecSource
[ https://issues.apache.org/jira/browse/LUCENE-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-4538: --- Assignee: Simon Willnauer Cache DocValues DirecSource Key: LUCENE-4538 URL: https://issues.apache.org/jira/browse/LUCENE-4538 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.1, 5.0 Attachments: LUCENE-4538.patch, LUCENE-4538.patch Currently the user need to make sure that a direct source is not shared between threads and each time someone calls getDirectSource we create a new source which has a reasonable overhead. We can certainly reduce the overhead (maybe different issue) but it should be easier for the user to get a direct source and handle it. More than that is should be consistent with getSource / loadSource. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4538) Cache DocValues DirecSource
[ https://issues.apache.org/jira/browse/LUCENE-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-4538. - Resolution: Fixed committed to trunk in rev. 1406153 backported to 4.x in rev. 1406169 Cache DocValues DirecSource Key: LUCENE-4538 URL: https://issues.apache.org/jira/browse/LUCENE-4538 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.1, 5.0 Attachments: LUCENE-4538.patch, LUCENE-4538.patch Currently the user need to make sure that a direct source is not shared between threads and each time someone calls getDirectSource we create a new source which has a reasonable overhead. We can certainly reduce the overhead (maybe different issue) but it should be easier for the user to get a direct source and handle it. More than that is should be consistent with getSource / loadSource. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4532) TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy failure
[ https://issues.apache.org/jira/browse/LUCENE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491511#comment-13491511 ] Shai Erera commented on LUCENE-4532: Thanks for the review. bq. in TestDirectoryTaxonomyWriter.java, the error string index.create.time not found in commitData should be updated. done (will upload an updated patch soon) bq. if the index creation time is in the commit data, it will not be removed - as the epoch is added to whatever commit data was read from the index. I think perhaps it should be removed? No quite. Every commit, DirTaxoWriter writes a new commitData, combining whatever commitData that is passed from the caller. But it does not merge it with the existing commitData. That's how IndexWriter works too, and it's the responsible of the caller to pass the commitData in every commit(), if he'd like to persist it. But, DirTaxoReader does let you read the commitData and so it is possible that someone will obtain the commitData from DirTaxoReader (with the old property), add his stuff to it and pass that to DirTaxoWriter. I don't think that it's critical though, and I doubt if anyone does that. bq. ...no test could check the migration from old to new methods... Right, I'll add a test case. bq. 'Epoch' is usually in the context of time I don't think that it's critical. Version is problematic since Lucene already uses 'version' and 'generation'. I think that 'epoch' is fine, but if anyone has a better suggestion, I don't mind changing it. TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy failure Key: LUCENE-4532 URL: https://issues.apache.org/jira/browse/LUCENE-4532 Project: Lucene - Core Issue Type: Bug Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4532.patch The following failure on Jenkins: {noformat} Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/1404/ Java: 32bit/jdk1.6.0_37 -client -XX:+UseConcMarkSweepGC 1 tests failed. REGRESSION: org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy Error Message: Stack Trace: java.lang.ArrayIndexOutOfBoundsException at __randomizedtesting.SeedInfo.seed([6AB10D3E4E956CFA:BFB2863DB7E077E0]:0) at java.lang.System.arraycopy(Native Method) at org.apache.lucene.facet.taxonomy.directory.ParentArray.refresh(ParentArray.java:99) at org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyReader.refresh(DirectoryTaxonomyReader.java:407) at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.doTestReadRecreatedTaxono(TestDirectoryTaxonomyReader.java:167) at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy(TestDirectoryTaxonomyReader.java:130) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
[jira] [Commented] (LUCENE-4532) TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy failure
[ https://issues.apache.org/jira/browse/LUCENE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491517#comment-13491517 ] Shai Erera commented on LUCENE-4532: bq. ...no test could check the migration from old to new methods... Actually there was such test !. TestDirTaxoWriter.testUndefinedCreateTime. I'll rename it to testBackwardsComptability though. TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy failure Key: LUCENE-4532 URL: https://issues.apache.org/jira/browse/LUCENE-4532 Project: Lucene - Core Issue Type: Bug Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4532.patch The following failure on Jenkins: {noformat} Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/1404/ Java: 32bit/jdk1.6.0_37 -client -XX:+UseConcMarkSweepGC 1 tests failed. REGRESSION: org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy Error Message: Stack Trace: java.lang.ArrayIndexOutOfBoundsException at __randomizedtesting.SeedInfo.seed([6AB10D3E4E956CFA:BFB2863DB7E077E0]:0) at java.lang.System.arraycopy(Native Method) at org.apache.lucene.facet.taxonomy.directory.ParentArray.refresh(ParentArray.java:99) at org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyReader.refresh(DirectoryTaxonomyReader.java:407) at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.doTestReadRecreatedTaxono(TestDirectoryTaxonomyReader.java:167) at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy(TestDirectoryTaxonomyReader.java:130) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at
[jira] [Commented] (LUCENE-4539) DocValues impls should read all headers up-front instead of per-directsource
[ https://issues.apache.org/jira/browse/LUCENE-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491520#comment-13491520 ] Adrien Grand commented on LUCENE-4539: -- bq. Who is keeping up with all the places that must be managed when a packed ints version change needs to happen? Sorry I was not clear: I didn't mean to remove the version number, just the codec name. I think the Lucene41 postings format is a good example: it never writes PackedInts in the stream, writes the PackedInts version at the beginning of the stream and may then serialize thousands arrays of 128 values with the number of bits per value as a byte in front of each of them. DocValues impls should read all headers up-front instead of per-directsource Key: LUCENE-4539 URL: https://issues.apache.org/jira/browse/LUCENE-4539 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Robert Muir Attachments: LUCENE-4539.patch Currently, when DocValues opens, it just opens files. it doesnt read codec headers etc. Instead we read these every single time a directsource opens. I think it should work like PostingsReaders: e.g. the PackedInts impl would read its versioning info and codec headers and creating a new Direct impl should be a IndexInput.clone() + getDirectReaderNoHeader(). Today its much more costly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4543) Bring back TFIDFSim.lengthNorm
[ https://issues.apache.org/jira/browse/LUCENE-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4543: Attachment: LUCENE-4543.patch Here's the patch. The api bug was introduced when sim was expanded to use norms beside a single byte: at query-time TFIDFSim is limited to single-byte norms (the code bits are final), but computeNorm is not final. I'll commit soon. Bring back TFIDFSim.lengthNorm -- Key: LUCENE-4543 URL: https://issues.apache.org/jira/browse/LUCENE-4543 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4543.patch We removed this before because of LUCENE-2828, but the problem there was the delegator (not the lengthNorm method). TFIDFSim requires byte[] norms today. So its computeNorm should be final, calling lengthNorm() that returns a byte. This way there is no possibility for you to do something stupid. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4543) Bring back TFIDFSim.lengthNorm
[ https://issues.apache.org/jira/browse/LUCENE-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491528#comment-13491528 ] Michael McCandless commented on LUCENE-4543: +1 Bring back TFIDFSim.lengthNorm -- Key: LUCENE-4543 URL: https://issues.apache.org/jira/browse/LUCENE-4543 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4543.patch We removed this before because of LUCENE-2828, but the problem there was the delegator (not the lengthNorm method). TFIDFSim requires byte[] norms today. So its computeNorm should be final, calling lengthNorm() that returns a byte. This way there is no possibility for you to do something stupid. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4538) Cache DocValues DirecSource
[ https://issues.apache.org/jira/browse/LUCENE-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491558#comment-13491558 ] Robert Muir commented on LUCENE-4538: - Thanks Simon! Cache DocValues DirecSource Key: LUCENE-4538 URL: https://issues.apache.org/jira/browse/LUCENE-4538 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.1, 5.0 Attachments: LUCENE-4538.patch, LUCENE-4538.patch Currently the user need to make sure that a direct source is not shared between threads and each time someone calls getDirectSource we create a new source which has a reasonable overhead. We can certainly reduce the overhead (maybe different issue) but it should be easier for the user to get a direct source and handle it. More than that is should be consistent with getSource / loadSource. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4532) TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy failure
[ https://issues.apache.org/jira/browse/LUCENE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-4532: --- Attachment: LUCENE-4532.patch Patch addresses the comments. For now, I kept the 'epoch' wording, unless there's another suggestion. TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy failure Key: LUCENE-4532 URL: https://issues.apache.org/jira/browse/LUCENE-4532 Project: Lucene - Core Issue Type: Bug Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4532.patch, LUCENE-4532.patch The following failure on Jenkins: {noformat} Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/1404/ Java: 32bit/jdk1.6.0_37 -client -XX:+UseConcMarkSweepGC 1 tests failed. REGRESSION: org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy Error Message: Stack Trace: java.lang.ArrayIndexOutOfBoundsException at __randomizedtesting.SeedInfo.seed([6AB10D3E4E956CFA:BFB2863DB7E077E0]:0) at java.lang.System.arraycopy(Native Method) at org.apache.lucene.facet.taxonomy.directory.ParentArray.refresh(ParentArray.java:99) at org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyReader.refresh(DirectoryTaxonomyReader.java:407) at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.doTestReadRecreatedTaxono(TestDirectoryTaxonomyReader.java:167) at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy(TestDirectoryTaxonomyReader.java:130) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at
[jira] [Commented] (SOLR-3904) add package level javadocs to every package
[ https://issues.apache.org/jira/browse/SOLR-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491586#comment-13491586 ] Hoss Man commented on SOLR-3904: Progress... Committed revision 1406204. - trunk Committed revision 1406209. - 4x add package level javadocs to every package --- Key: SOLR-3904 URL: https://issues.apache.org/jira/browse/SOLR-3904 Project: Solr Issue Type: Improvement Components: documentation Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.1 Attachments: SOLR-3904_buildxml.patch quoth rmuir on the mailing list... {quote} We've been working on this for the lucene side (3.6 was the first release where every package had docs, 4.0 will be the first where every class had docs, and we are now working towards methods/fields/ctors/enums). I think this would be valuable for solr too (especially solrj as a start). Besides users, its really useful to developers as well. Of course we all think our code is self-documenting, but its not always the case. a few extra seconds can save someone a ton of time trying to figure out your code. Additionally at least in my IDE, when things are done as javadoc comments then they are more easily accessible than code comments. I'm sure its the case for some other development environments too. Filling in these package.html's to at least have a one sentence description would be a really good start. It lets someone know where to go at the high-level. If I was brand new to solr and wanted to write a java app that uses solrj, i wouldn't have a clue where to start (https://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-solrj/index.html). 12 sentences could go a really long way. And for all new code, I hope we can all try harder for more complete javadocs. when you are working on something and its fresh in your head its a lot easier to do this than for someone else to come back around and figure it out. {quote} I'm going to try and make it a priority for me to fill in package level docs as we look towards 4.1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4039) MergeIndex on multiple cores impossible with SolrJ
Mathieu Gond created SOLR-4039: -- Summary: MergeIndex on multiple cores impossible with SolrJ Key: SOLR-4039 URL: https://issues.apache.org/jira/browse/SOLR-4039 Project: Solr Issue Type: Bug Affects Versions: 3.6.1 Environment: Windows Reporter: Mathieu Gond It is not possible to do a mergeIndexes action on multiple cores at the same time with SolrJ. Only the last core set in the srcCores parameter is used. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1306) Support pluggable persistence/loading of solr.xml details
[ https://issues.apache.org/jira/browse/SOLR-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-1306: - Attachment: (was: SOLR-1306.patch) Support pluggable persistence/loading of solr.xml details - Key: SOLR-1306 URL: https://issues.apache.org/jira/browse/SOLR-1306 Project: Solr Issue Type: New Feature Components: multicore Reporter: Noble Paul Assignee: Erick Erickson Fix For: 4.1 Attachments: SOLR-1306.patch Persisting and loading details from one xml is fine if the no:of cores are small and the no:of cores are few/fixed . If there are 10's of thousands of cores in a single box adding a new core (with persistent=true) becomes very expensive because every core creation has to write this huge xml. Moreover , there is a good chance that the file gets corrupted and all the cores become unusable . In that case I would prefer it to be stored in a centralized DB which is backed up/replicated and all the information is available in a centralized location. We may need to refactor CoreContainer to have a pluggable implementation which can load/persist the details . The default implementation should write/read from/to solr.xml . And the class should be pluggable as follows in solr.xml {code:xml} solr dataProvider class=com.foo.FooDataProvider attr1=val1 attr2=val2/ /solr {code} There will be a new interface (or abstract class ) called SolrDataProvider which this class must implement -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1306) Support pluggable persistence/loading of solr.xml details
[ https://issues.apache.org/jira/browse/SOLR-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-1306: - Attachment: (was: SOLR-1306.patch) Support pluggable persistence/loading of solr.xml details - Key: SOLR-1306 URL: https://issues.apache.org/jira/browse/SOLR-1306 Project: Solr Issue Type: New Feature Components: multicore Reporter: Noble Paul Assignee: Erick Erickson Fix For: 4.1 Attachments: SOLR-1306.patch Persisting and loading details from one xml is fine if the no:of cores are small and the no:of cores are few/fixed . If there are 10's of thousands of cores in a single box adding a new core (with persistent=true) becomes very expensive because every core creation has to write this huge xml. Moreover , there is a good chance that the file gets corrupted and all the cores become unusable . In that case I would prefer it to be stored in a centralized DB which is backed up/replicated and all the information is available in a centralized location. We may need to refactor CoreContainer to have a pluggable implementation which can load/persist the details . The default implementation should write/read from/to solr.xml . And the class should be pluggable as follows in solr.xml {code:xml} solr dataProvider class=com.foo.FooDataProvider attr1=val1 attr2=val2/ /solr {code} There will be a new interface (or abstract class ) called SolrDataProvider which this class must implement -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1306) Support pluggable persistence/loading of solr.xml details
[ https://issues.apache.org/jira/browse/SOLR-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-1306: - Attachment: SOLR-1306.patch took out some extraneous crud that made it into the last patch. When creating a custom core descriptor, the following changes need to be made to solr.xml: 1 add a sharedLib directive to the solr tag to a directory containing the your custom jar 2 add coreDescriptorProviderClass to the cores tag. Here's an example: solr persistent=false sharedLib=../../../../../your/path/here/ cores [all the other opts] coreDescriptorProviderClass=your.company.TheCoreDescriptorProvider / Support pluggable persistence/loading of solr.xml details - Key: SOLR-1306 URL: https://issues.apache.org/jira/browse/SOLR-1306 Project: Solr Issue Type: New Feature Components: multicore Reporter: Noble Paul Assignee: Erick Erickson Fix For: 4.1 Attachments: SOLR-1306.patch, SOLR-1306.patch Persisting and loading details from one xml is fine if the no:of cores are small and the no:of cores are few/fixed . If there are 10's of thousands of cores in a single box adding a new core (with persistent=true) becomes very expensive because every core creation has to write this huge xml. Moreover , there is a good chance that the file gets corrupted and all the cores become unusable . In that case I would prefer it to be stored in a centralized DB which is backed up/replicated and all the information is available in a centralized location. We may need to refactor CoreContainer to have a pluggable implementation which can load/persist the details . The default implementation should write/read from/to solr.xml . And the class should be pluggable as follows in solr.xml {code:xml} solr dataProvider class=com.foo.FooDataProvider attr1=val1 attr2=val2/ /solr {code} There will be a new interface (or abstract class ) called SolrDataProvider which this class must implement -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3856) DIH: Better tests for SqlEntityProcessor
[ https://issues.apache.org/jira/browse/SOLR-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-3856: - Fix Version/s: 5.0 DIH: Better tests for SqlEntityProcessor Key: SOLR-3856 URL: https://issues.apache.org/jira/browse/SOLR-3856 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 3.6, 4.0 Reporter: James Dyer Assignee: James Dyer Fix For: 4.1, 5.0 Attachments: SOLR-3856-3.5.patch, SOLR-3856.patch, SOLR-3856.patch, SOLR-3856.patch The current tests for SqlEntityProcessor ( CachedSqlEntityProcessor), while many, do not reliably fail when bugs are introduced! They are also difficult to look at and understand. As we move Jenkins onto new environments, we have found several of them fail regularly leading to @Ignore. My aim here is to write all new tests for (Cached)SqlEntityProcessor, and to document (hopefully fix) any bugs this reveals. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4040) SolrCloud deleteByQuery requires multiple commits
Darin Plutchok created SOLR-4040: Summary: SolrCloud deleteByQuery requires multiple commits Key: SOLR-4040 URL: https://issues.apache.org/jira/browse/SOLR-4040 Project: Solr Issue Type: Bug Components: update Affects Versions: 4.0 Environment: OSX Reporter: Darin Plutchok Fix For: 4.0 I am using embedded zookeeper and my cloud layout is show below (all actions are done on the patents' collection only). First commit/delete works for a single shard only, dropping query results by about a third. Second commit/delete drops query results to zero. http://127.0.0.1:8893/solr/patents/update?commit=truestream.body=deletequerydogs/query/delete http://localhost:8893/solr/patents/select?q=dogsrows=0 (results drop by a third) http://127.0.0.1:8893/solr/patents/update?commit=truestream.body=deletequerydogs/query/delete http://localhost:8893/solr/patents/select?q=dogsrows=0 (results drop to zero) Note that a delete without a commit followed by a commit drops query results to zero, as it should: http://127.0.0.1:8893/solr/patents/update/?stream.body=deletequerydogs/query/delete http://localhost:8893/solr/patents/select?q=dogsrows=0 (full count as no commit yet) http://127.0.0.1:8893/solr/patents/update/?commit=true http://localhost:8893/solr/patents/select?q=dogsrows=0 (results drop to zero) One workaround (produces zero hits in one shot): http://127.0.0.1:8893/solr/patents/update?commit=truestream.body=outerdeletequerysun/query/deletecommit//outer The workaround I am using for now (produces zero hits in one shot): http://127.0.0.1:8893/solr/patents/update?stream.body=outerdeletequeryknee/query/deletecommit/commit//outer {code} { otherdocs:{slice0:{replicas:{Darins-MacBook-Pro.local:8893_solr_otherdocs_shard0:{ shard:slice0, roles:null, state:active, core:otherdocs_shard0, collection:otherdocs, node_name:Darins-MacBook-Pro.local:8893_solr, base_url:http://Darins-MacBook-Pro.local:8893/solr;, leader:true, patents:{ slice0:{replicas:{Darins-MacBook-Pro.local:8893_solr_patents_shard0:{ shard:slice0, roles:null, state:active, core:patents_shard0, collection:patents, node_name:Darins-MacBook-Pro.local:8893_solr, base_url:http://Darins-MacBook-Pro.local:8893/solr;, leader:true}}}, slice1:{replicas:{Darins-MacBook-Pro.local:8893_solr_patents_shard1:{ shard:slice1, roles:null, state:active, core:patents_shard1, collection:patents, node_name:Darins-MacBook-Pro.local:8893_solr, base_url:http://Darins-MacBook-Pro.local:8893/solr;, leader:true}}}, slice2:{replicas:{Darins-MacBook-Pro.local:8893_solr_patents_shard2:{ shard:slice2, roles:null, state:active, core:patents_shard2, collection:patents, node_name:Darins-MacBook-Pro.local:8893_solr, base_url:http://Darins-MacBook-Pro.local:8893/solr;, leader:true} {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3856) DIH: Better tests for SqlEntityProcessor
[ https://issues.apache.org/jira/browse/SOLR-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer resolved SOLR-3856. -- Resolution: Fixed committed. Trunk: r1406231 (CHANGES.txt: r1406245) 4x: r1406246 DIH: Better tests for SqlEntityProcessor Key: SOLR-3856 URL: https://issues.apache.org/jira/browse/SOLR-3856 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 3.6, 4.0 Reporter: James Dyer Assignee: James Dyer Fix For: 4.1, 5.0 Attachments: SOLR-3856-3.5.patch, SOLR-3856.patch, SOLR-3856.patch, SOLR-3856.patch The current tests for SqlEntityProcessor ( CachedSqlEntityProcessor), while many, do not reliably fail when bugs are introduced! They are also difficult to look at and understand. As we move Jenkins onto new environments, we have found several of them fail regularly leading to @Ignore. My aim here is to write all new tests for (Cached)SqlEntityProcessor, and to document (hopefully fix) any bugs this reveals. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4544) possible bug in ConcurrentMergeScheduler.merge(IndexWriter)
[ https://issues.apache.org/jira/browse/LUCENE-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-4544: --- Attachment: LUCENE-4544.patch Patch. It was somewhat tricky to fix the off-by-one, because we only want to stall the thread(s) producing segments if the number of running merges is = maxMergeCount AND another merge wants to kick off ... I made CMS.merged sync'd, and removed the synchronous IW.mergeInit (I think deterministic segment name assignment isn't important). possible bug in ConcurrentMergeScheduler.merge(IndexWriter) Key: LUCENE-4544 URL: https://issues.apache.org/jira/browse/LUCENE-4544 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 5.0 Reporter: Radim Kolar Assignee: Michael McCandless Attachments: LUCENE-4544.patch from dev list: ¨i suspect that this code is broken. Lines 331 - 343 in org.apache.lucene.index.ConcurrentMergeScheduler.merge(IndexWriter) mergeThreadCount() are currently active merges, they can be at most maxThreadCount, maxMergeCount is number of queued merges defaulted with maxThreadCount+2 and it can never be lower then maxThreadCount, which means that condition in while can never become true. synchronized(this) { long startStallTime = 0; while (mergeThreadCount() = 1+maxMergeCount) { startStallTime = System.currentTimeMillis(); if (verbose()) { message(too many merges; stalling...); } try { wait(); } catch (InterruptedException ie) { throw new ThreadInterruptedException(ie); } } While confusing, I think the code is actually nearly correct... but I would love to find some simplifications of CMS's logic (it's really hairy). It turns out mergeThreadCount() is allowed to go higher than maxThreadCount; when this happens, Lucene pauses mergeThreadCount()-maxThreadCount of those merge threads, and resumes them once threads finish (see updateMergeThreads). Ie, CMS will accept up to maxMergeCount merges (and launch threads for them), but will only allow maxThreadCount of those threads to be running at once. So what that while loop is doing is preventing more than maxMergeCount+1 threads from starting, and then pausing the incoming thread to slow down the rate of segment creation (since merging cannot keep up). But ... I think the 1+ is wrong ... it seems like it should just be mergeThreadCount() = maxMergeCount(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2045) DIH doesn't release jdbc connections in conjunction with DB2
[ https://issues.apache.org/jira/browse/SOLR-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-2045: - Affects Version/s: 3.6 4.0 Fix Version/s: 5.0 4.1 Assignee: James Dyer DIH doesn't release jdbc connections in conjunction with DB2 - Key: SOLR-2045 URL: https://issues.apache.org/jira/browse/SOLR-2045 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.4.1, 3.6, 4.0 Environment: DB2 SQLLIB 9.5, 9.7 jdbc Driver Reporter: Fenlor Sebastia Assignee: James Dyer Fix For: 4.1, 5.0 Using the JDBCDatasource in conjunction with the DB2 JDBC Drivers results in the following error when the DIH tries to close the the connection due to active transactions. As a consequence each delta im port or full import opens a new connection without closing it. So the maximum amount of connections will be reached soon. Setting the connection to readOnly or changing the transaction isolation level doesn't help neither. The JDBC Driver I used: com.ibm.db2.jcc.DB2Driver relieing in db2jcc4.jar shipped with DB2 Express 9.7 for example Here is the stack trace... 14.08.2010 01:49:51 org.apache.solr.handler.dataimport.JdbcDataSource closeConnection FATAL: Ignoring Error when closing connection com.ibm.db2.jcc.am.SqlException: [jcc][10251][10308][4.8.87] java.sql.Connection.close() requested while a transaction is in progress on the connection.The transaction remains active, and the connection cannot be closed. ERRORCODE=-4471, SQLSTATE=null at com.ibm.db2.jcc.am.gd.a(gd.java:660) at com.ibm.db2.jcc.am.gd.a(gd.java:60) at com.ibm.db2.jcc.am.gd.a(gd.java:120) at com.ibm.db2.jcc.am.lb.u(lb.java:1202) at com.ibm.db2.jcc.am.lb.x(lb.java:1225) at com.ibm.db2.jcc.am.lb.v(lb.java:1211) at com.ibm.db2.jcc.am.lb.close(lb.java:1195) at com.ibm.db2.jcc.uw.UWConnection.close(UWConnection.java:838) at org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:399) at org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:390) at org.apache.solr.handler.dataimport.DataConfig$Entity.clearCache(DataConfig.java:173) at org.apache.solr.handler.dataimport.DataConfig.clearCaches(DataConfig.java:331) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:339) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Well the issue can be solved by invoking a commit or rollback directly before the connection.close() statement. Here is the code snipped of changes I made in JdbcDatasource.java private void closeConnection() { try { if (conn != null) { if (conn.isReadOnly()) { LOG.info(connection is readonly, therefore rollback); conn.rollback(); } else { LOG.info(connection is not readonly, therefore commit); conn.commit(); } conn.close(); } } catch (Exception e) { LOG.error(Ignoring Error when closing connection, e); } } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2045) DIH doesn't release jdbc connections in conjunction with DB2
[ https://issues.apache.org/jira/browse/SOLR-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-2045: - Attachment: SOLR-2045.patch This patch fixes the problem by issuing a commit before closing the connection, as suggested by Fenior. I added Derby as a randomly-selected test db to have coverage for this bug. As derby is only needed for testing, I configured Ivy to locate the derby jar in the same directory as the hsqldb jar, under the dih example. I also added the 2 db jars to the Eclipse dot.classpath and to the Idea config files so that you can easily run these tests from either ide. (this is my first exposure to Idea but with all the good words I've heard on this mailing list I thought this a good time to try it out...) I plan on committing this patch tomorrow. DIH doesn't release jdbc connections in conjunction with DB2 - Key: SOLR-2045 URL: https://issues.apache.org/jira/browse/SOLR-2045 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.4.1, 3.6, 4.0 Environment: DB2 SQLLIB 9.5, 9.7 jdbc Driver Reporter: Fenlor Sebastia Assignee: James Dyer Fix For: 4.1, 5.0 Attachments: SOLR-2045.patch Using the JDBCDatasource in conjunction with the DB2 JDBC Drivers results in the following error when the DIH tries to close the the connection due to active transactions. As a consequence each delta im port or full import opens a new connection without closing it. So the maximum amount of connections will be reached soon. Setting the connection to readOnly or changing the transaction isolation level doesn't help neither. The JDBC Driver I used: com.ibm.db2.jcc.DB2Driver relieing in db2jcc4.jar shipped with DB2 Express 9.7 for example Here is the stack trace... 14.08.2010 01:49:51 org.apache.solr.handler.dataimport.JdbcDataSource closeConnection FATAL: Ignoring Error when closing connection com.ibm.db2.jcc.am.SqlException: [jcc][10251][10308][4.8.87] java.sql.Connection.close() requested while a transaction is in progress on the connection.The transaction remains active, and the connection cannot be closed. ERRORCODE=-4471, SQLSTATE=null at com.ibm.db2.jcc.am.gd.a(gd.java:660) at com.ibm.db2.jcc.am.gd.a(gd.java:60) at com.ibm.db2.jcc.am.gd.a(gd.java:120) at com.ibm.db2.jcc.am.lb.u(lb.java:1202) at com.ibm.db2.jcc.am.lb.x(lb.java:1225) at com.ibm.db2.jcc.am.lb.v(lb.java:1211) at com.ibm.db2.jcc.am.lb.close(lb.java:1195) at com.ibm.db2.jcc.uw.UWConnection.close(UWConnection.java:838) at org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:399) at org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:390) at org.apache.solr.handler.dataimport.DataConfig$Entity.clearCache(DataConfig.java:173) at org.apache.solr.handler.dataimport.DataConfig.clearCaches(DataConfig.java:331) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:339) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Well the issue can be solved by invoking a commit or rollback directly before the connection.close() statement. Here is the code snipped of changes I made in JdbcDatasource.java private void closeConnection() { try { if (conn != null) { if (conn.isReadOnly()) { LOG.info(connection is readonly, therefore rollback); conn.rollback(); } else { LOG.info(connection is not readonly, therefore commit); conn.commit(); } conn.close(); } } catch (Exception e) { LOG.error(Ignoring Error when closing connection, e); } } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4544) possible bug in ConcurrentMergeScheduler.merge(IndexWriter)
[ https://issues.apache.org/jira/browse/LUCENE-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491800#comment-13491800 ] Radim Kolar commented on LUCENE-4544: - it can not be made simple by not creating thread for every new segment merge but to use standard threadpool + queue scheme? Lot of libraries can do it easily. possible bug in ConcurrentMergeScheduler.merge(IndexWriter) Key: LUCENE-4544 URL: https://issues.apache.org/jira/browse/LUCENE-4544 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 5.0 Reporter: Radim Kolar Assignee: Michael McCandless Attachments: LUCENE-4544.patch from dev list: ¨i suspect that this code is broken. Lines 331 - 343 in org.apache.lucene.index.ConcurrentMergeScheduler.merge(IndexWriter) mergeThreadCount() are currently active merges, they can be at most maxThreadCount, maxMergeCount is number of queued merges defaulted with maxThreadCount+2 and it can never be lower then maxThreadCount, which means that condition in while can never become true. synchronized(this) { long startStallTime = 0; while (mergeThreadCount() = 1+maxMergeCount) { startStallTime = System.currentTimeMillis(); if (verbose()) { message(too many merges; stalling...); } try { wait(); } catch (InterruptedException ie) { throw new ThreadInterruptedException(ie); } } While confusing, I think the code is actually nearly correct... but I would love to find some simplifications of CMS's logic (it's really hairy). It turns out mergeThreadCount() is allowed to go higher than maxThreadCount; when this happens, Lucene pauses mergeThreadCount()-maxThreadCount of those merge threads, and resumes them once threads finish (see updateMergeThreads). Ie, CMS will accept up to maxMergeCount merges (and launch threads for them), but will only allow maxThreadCount of those threads to be running at once. So what that while loop is doing is preventing more than maxMergeCount+1 threads from starting, and then pausing the incoming thread to slow down the rate of segment creation (since merging cannot keep up). But ... I think the 1+ is wrong ... it seems like it should just be mergeThreadCount() = maxMergeCount(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable
[ https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491845#comment-13491845 ] Chris Male commented on LUCENE-4542: +1 I absolutely agree we need to make this change. There is another issue (I can't remember what just yet and I'm using a bad connection) where the recursion cap was causing analysis loops. Do you want to create a patch? We need to maintain backwards compatibility so the default experience should be using RECURSION_CAP as it is today. However users should be able to pass in a value as well (that includes the HunspellStemFilterFactory). Make RECURSION_CAP in HunspellStemmer configurable -- Key: LUCENE-4542 URL: https://issues.apache.org/jira/browse/LUCENE-4542 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Reporter: Piotr Currently there is private static final int RECURSION_CAP = 2; in the code of the class HunspellStemmer. It makes using hunspell with several dictionaries almost unusable, due to bad performance (f.ex. it costs 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for recursion_cap=1). It would be nice to be able to tune this number as needed. AFAIK this number (2) was chosen arbitrary. (it's a first issue in my life, so please forgive me any mistakes done). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable
[ https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male reassigned LUCENE-4542: -- Assignee: Chris Male Make RECURSION_CAP in HunspellStemmer configurable -- Key: LUCENE-4542 URL: https://issues.apache.org/jira/browse/LUCENE-4542 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Reporter: Piotr Assignee: Chris Male Currently there is private static final int RECURSION_CAP = 2; in the code of the class HunspellStemmer. It makes using hunspell with several dictionaries almost unusable, due to bad performance (f.ex. it costs 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for recursion_cap=1). It would be nice to be able to tune this number as needed. AFAIK this number (2) was chosen arbitrary. (it's a first issue in my life, so please forgive me any mistakes done). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4543) Bring back TFIDFSim.lengthNorm
[ https://issues.apache.org/jira/browse/LUCENE-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491890#comment-13491890 ] Simon Willnauer commented on LUCENE-4543: - +1 thanks robert! Bring back TFIDFSim.lengthNorm -- Key: LUCENE-4543 URL: https://issues.apache.org/jira/browse/LUCENE-4543 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4543.patch We removed this before because of LUCENE-2828, but the problem there was the delegator (not the lengthNorm method). TFIDFSim requires byte[] norms today. So its computeNorm should be final, calling lengthNorm() that returns a byte. This way there is no possibility for you to do something stupid. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token
[ https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Burton-West updated SOLR-3589: -- Attachment: SOLR-3589.patch Back-port to 3.6 branch Edismax parser does not honor mm parameter if analyzer splits a token - Key: SOLR-3589 URL: https://issues.apache.org/jira/browse/SOLR-3589 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.6, 4.0-BETA Reporter: Tom Burton-West Assignee: Robert Muir Attachments: SOLR-3589.patch, SOLR-3589.patch, SOLR-3589.patch, SOLR-3589.patch, SOLR-3589.patch, SOLR-3589_test.patch, testSolr3589.xml.gz, testSolr3589.xml.gz With edismax mm set to 100% if one of the tokens is split into two tokens by the analyzer chain (i.e. fire-fly = fire fly), the mm parameter is ignored and the equivalent of OR query for fire OR fly is produced. This is particularly a problem for languages that do not use white space to separate words such as Chinese or Japenese. See these messages for more discussion: http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-hypenated-words-WDF-splitting-etc-tc3991911.html http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-i-e-CJK-tc3991438.html http://lucene.472066.n3.nabble.com/Why-won-t-dismax-create-multiple-DisjunctionMaxQueries-when-autoGeneratePhraseQueries-is-false-tc3992109.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token
[ https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491922#comment-13491922 ] Tom Burton-West commented on SOLR-3589: --- I back-ported to 3.6 branch. Forgot to change the name from SOLR-3589.patch, so the 6/Nov/12 patch is the 3.6 patch against yesterdays svn version of 3.6. Main difference I saw between 3.6 and 4.0 is that Solr 4.0 uses DisMaxQParser.parseMinShouldMatch() to set the default at 0% if q.op=OR and %100 if q.op =AND I just kept the 3.6 behavior which uses 3.6 default of 100% (if mm is not set) I'll test the 3.6 patch against a production index tomorrow. Edismax parser does not honor mm parameter if analyzer splits a token - Key: SOLR-3589 URL: https://issues.apache.org/jira/browse/SOLR-3589 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.6, 4.0-BETA Reporter: Tom Burton-West Assignee: Robert Muir Attachments: SOLR-3589.patch, SOLR-3589.patch, SOLR-3589.patch, SOLR-3589.patch, SOLR-3589.patch, SOLR-3589_test.patch, testSolr3589.xml.gz, testSolr3589.xml.gz With edismax mm set to 100% if one of the tokens is split into two tokens by the analyzer chain (i.e. fire-fly = fire fly), the mm parameter is ignored and the equivalent of OR query for fire OR fly is produced. This is particularly a problem for languages that do not use white space to separate words such as Chinese or Japenese. See these messages for more discussion: http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-hypenated-words-WDF-splitting-etc-tc3991911.html http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-i-e-CJK-tc3991438.html http://lucene.472066.n3.nabble.com/Why-won-t-dismax-create-multiple-DisjunctionMaxQueries-when-autoGeneratePhraseQueries-is-false-tc3992109.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4041) Allow segment merge monitoring in Solr Admin gui
Radim Kolar created SOLR-4041: - Summary: Allow segment merge monitoring in Solr Admin gui Key: SOLR-4041 URL: https://issues.apache.org/jira/browse/SOLR-4041 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 5.0 Reporter: Radim Kolar add solrMbean for ConcurrentMergeScheduler -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4041) Allow segment merge monitoring in Solr Admin gui
[ https://issues.apache.org/jira/browse/SOLR-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar updated SOLR-4041: -- Attachment: solr-monitormerge.txt Allow segment merge monitoring in Solr Admin gui Key: SOLR-4041 URL: https://issues.apache.org/jira/browse/SOLR-4041 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 5.0 Reporter: Radim Kolar Labels: patch Attachments: solr-monitormerge.txt add solrMbean for ConcurrentMergeScheduler -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4041) Allow segment merge monitoring in Solr Admin gui
[ https://issues.apache.org/jira/browse/SOLR-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491979#comment-13491979 ] Otis Gospodnetic commented on SOLR-4041: This looks nice and useful. Any reason this shouldn't be in 4.1? Allow segment merge monitoring in Solr Admin gui Key: SOLR-4041 URL: https://issues.apache.org/jira/browse/SOLR-4041 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 5.0 Reporter: Radim Kolar Labels: patch Fix For: 4.1 Attachments: solr-monitormerge.txt add solrMbean for ConcurrentMergeScheduler -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4041) Allow segment merge monitoring in Solr Admin gui
[ https://issues.apache.org/jira/browse/SOLR-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated SOLR-4041: --- Fix Version/s: 4.1 Allow segment merge monitoring in Solr Admin gui Key: SOLR-4041 URL: https://issues.apache.org/jira/browse/SOLR-4041 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 5.0 Reporter: Radim Kolar Labels: patch Fix For: 4.1 Attachments: solr-monitormerge.txt add solrMbean for ConcurrentMergeScheduler -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4041) Allow segment merge monitoring in Solr Admin gui
[ https://issues.apache.org/jira/browse/SOLR-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491980#comment-13491980 ] Radim Kolar commented on SOLR-4041: --- it could be Allow segment merge monitoring in Solr Admin gui Key: SOLR-4041 URL: https://issues.apache.org/jira/browse/SOLR-4041 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 5.0 Reporter: Radim Kolar Labels: patch Fix For: 4.1 Attachments: solr-monitormerge.txt add solrMbean for ConcurrentMergeScheduler -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4536) Make PackedInts byte-aligned?
[ https://issues.apache.org/jira/browse/LUCENE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-4536: - Attachment: LUCENE-4536.patch New patch including Mike's suggestions: - VERSION_LONG_ALIGNED renamed to VERSION_START, - VERSION_CURRENT aliased to VERSION_BYTE_ALIGNED, - added an assert that the sneaky direct reader impl can only be instantiated if the stream has been produced with VERSION_START. Make PackedInts byte-aligned? - Key: LUCENE-4536 URL: https://issues.apache.org/jira/browse/LUCENE-4536 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 4.1 Attachments: LUCENE-4536.patch, LUCENE-4536.patch PackedInts are more and more used to save/restore small arrays, but given that they are long-aligned, up to 63 bits are wasted per array. We should try to make PackedInts storage byte-aligned so that only 7 bits are wasted in the worst case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4030) Use Lucene segment merge throttling
[ https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar updated SOLR-4030: -- Fix Version/s: 4.1 Use Lucene segment merge throttling --- Key: SOLR-4030 URL: https://issues.apache.org/jira/browse/SOLR-4030 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Radim Kolar Labels: patch Fix For: 4.1 Attachments: solr-mergeratelimit.txt add argument maxMergeWriteMBPerSec to Solr directory factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1306) Support pluggable persistence/loading of solr.xml details
[ https://issues.apache.org/jira/browse/SOLR-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-1306: - Attachment: SOLR-1306.patch Fix for problem when no CoreDescriptorProvider was supplied but a bunch of cores were specified as loadOnStartup=false. CoreContainer.getCoreNames was not returning these cores. Support pluggable persistence/loading of solr.xml details - Key: SOLR-1306 URL: https://issues.apache.org/jira/browse/SOLR-1306 Project: Solr Issue Type: New Feature Components: multicore Reporter: Noble Paul Assignee: Erick Erickson Fix For: 4.1 Attachments: SOLR-1306.patch, SOLR-1306.patch, SOLR-1306.patch Persisting and loading details from one xml is fine if the no:of cores are small and the no:of cores are few/fixed . If there are 10's of thousands of cores in a single box adding a new core (with persistent=true) becomes very expensive because every core creation has to write this huge xml. Moreover , there is a good chance that the file gets corrupted and all the cores become unusable . In that case I would prefer it to be stored in a centralized DB which is backed up/replicated and all the information is available in a centralized location. We may need to refactor CoreContainer to have a pluggable implementation which can load/persist the details . The default implementation should write/read from/to solr.xml . And the class should be pluggable as follows in solr.xml {code:xml} solr dataProvider class=com.foo.FooDataProvider attr1=val1 attr2=val2/ /solr {code} There will be a new interface (or abstract class ) called SolrDataProvider which this class must implement -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3904) add package level javadocs to every package
[ https://issues.apache.org/jira/browse/SOLR-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492027#comment-13492027 ] Robert Muir commented on SOLR-3904: --- Thanks for starting this! add package level javadocs to every package --- Key: SOLR-3904 URL: https://issues.apache.org/jira/browse/SOLR-3904 Project: Solr Issue Type: Improvement Components: documentation Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.1 Attachments: SOLR-3904_buildxml.patch quoth rmuir on the mailing list... {quote} We've been working on this for the lucene side (3.6 was the first release where every package had docs, 4.0 will be the first where every class had docs, and we are now working towards methods/fields/ctors/enums). I think this would be valuable for solr too (especially solrj as a start). Besides users, its really useful to developers as well. Of course we all think our code is self-documenting, but its not always the case. a few extra seconds can save someone a ton of time trying to figure out your code. Additionally at least in my IDE, when things are done as javadoc comments then they are more easily accessible than code comments. I'm sure its the case for some other development environments too. Filling in these package.html's to at least have a one sentence description would be a really good start. It lets someone know where to go at the high-level. If I was brand new to solr and wanted to write a java app that uses solrj, i wouldn't have a clue where to start (https://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-solrj/index.html). 12 sentences could go a really long way. And for all new code, I hope we can all try harder for more complete javadocs. when you are working on something and its fresh in your head its a lot easier to do this than for someone else to come back around and figure it out. {quote} I'm going to try and make it a priority for me to fill in package level docs as we look towards 4.1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token
[ https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492048#comment-13492048 ] Robert Muir commented on SOLR-3589: --- Hi Tom: thanks for working on the 3.6 backport! I'll commit the trunk/4.x patch first, and wait for your testing and review your patch before looking at 3.6! Edismax parser does not honor mm parameter if analyzer splits a token - Key: SOLR-3589 URL: https://issues.apache.org/jira/browse/SOLR-3589 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.6, 4.0-BETA Reporter: Tom Burton-West Assignee: Robert Muir Attachments: SOLR-3589.patch, SOLR-3589.patch, SOLR-3589.patch, SOLR-3589.patch, SOLR-3589.patch, SOLR-3589_test.patch, testSolr3589.xml.gz, testSolr3589.xml.gz With edismax mm set to 100% if one of the tokens is split into two tokens by the analyzer chain (i.e. fire-fly = fire fly), the mm parameter is ignored and the equivalent of OR query for fire OR fly is produced. This is particularly a problem for languages that do not use white space to separate words such as Chinese or Japenese. See these messages for more discussion: http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-hypenated-words-WDF-splitting-etc-tc3991911.html http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-i-e-CJK-tc3991438.html http://lucene.472066.n3.nabble.com/Why-won-t-dismax-create-multiple-DisjunctionMaxQueries-when-autoGeneratePhraseQueries-is-false-tc3992109.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4543) Bring back TFIDFSim.lengthNorm
[ https://issues.apache.org/jira/browse/LUCENE-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4543. - Resolution: Fixed Fix Version/s: 5.0 4.1 Bring back TFIDFSim.lengthNorm -- Key: LUCENE-4543 URL: https://issues.apache.org/jira/browse/LUCENE-4543 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.1, 5.0 Attachments: LUCENE-4543.patch We removed this before because of LUCENE-2828, but the problem there was the delegator (not the lengthNorm method). TFIDFSim requires byte[] norms today. So its computeNorm should be final, calling lengthNorm() that returns a byte. This way there is no possibility for you to do something stupid. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4040) SolrCloud deleteByQuery requires multiple commits
[ https://issues.apache.org/jira/browse/SOLR-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492082#comment-13492082 ] David Smiley commented on SOLR-4040: (I'm working with Darin on this) To add a little context, there is no soft committing, unlike the default config (if I recall). There might be an auto-commit enabled but it has a large ~10minute time window so it isn't going to play a role with this bug. Committing is generally explicit. SolrCloud deleteByQuery requires multiple commits - Key: SOLR-4040 URL: https://issues.apache.org/jira/browse/SOLR-4040 Project: Solr Issue Type: Bug Components: update Affects Versions: 4.0 Environment: OSX Reporter: Darin Plutchok Labels: SolrCloud, commit, delete Fix For: 4.0 I am using embedded zookeeper and my cloud layout is show below (all actions are done on the patents' collection only). First commit/delete works for a single shard only, dropping query results by about a third. Second commit/delete drops query results to zero. http://127.0.0.1:8893/solr/patents/update?commit=truestream.body=deletequerydogs/query/delete http://localhost:8893/solr/patents/select?q=dogsrows=0 (results drop by a third) http://127.0.0.1:8893/solr/patents/update?commit=truestream.body=deletequerydogs/query/delete http://localhost:8893/solr/patents/select?q=dogsrows=0 (results drop to zero) Note that a delete without a commit followed by a commit drops query results to zero, as it should: http://127.0.0.1:8893/solr/patents/update/?stream.body=deletequerydogs/query/delete http://localhost:8893/solr/patents/select?q=dogsrows=0 (full count as no commit yet) http://127.0.0.1:8893/solr/patents/update/?commit=true http://localhost:8893/solr/patents/select?q=dogsrows=0 (results drop to zero) One workaround (produces zero hits in one shot): http://127.0.0.1:8893/solr/patents/update?commit=truestream.body=outerdeletequerysun/query/deletecommit//outer The workaround I am using for now (produces zero hits in one shot): http://127.0.0.1:8893/solr/patents/update?stream.body=outerdeletequeryknee/query/deletecommit/commit//outer {code} { otherdocs:{slice0:{replicas:{Darins-MacBook-Pro.local:8893_solr_otherdocs_shard0:{ shard:slice0, roles:null, state:active, core:otherdocs_shard0, collection:otherdocs, node_name:Darins-MacBook-Pro.local:8893_solr, base_url:http://Darins-MacBook-Pro.local:8893/solr;, leader:true, patents:{ slice0:{replicas:{Darins-MacBook-Pro.local:8893_solr_patents_shard0:{ shard:slice0, roles:null, state:active, core:patents_shard0, collection:patents, node_name:Darins-MacBook-Pro.local:8893_solr, base_url:http://Darins-MacBook-Pro.local:8893/solr;, leader:true}}}, slice1:{replicas:{Darins-MacBook-Pro.local:8893_solr_patents_shard1:{ shard:slice1, roles:null, state:active, core:patents_shard1, collection:patents, node_name:Darins-MacBook-Pro.local:8893_solr, base_url:http://Darins-MacBook-Pro.local:8893/solr;, leader:true}}}, slice2:{replicas:{Darins-MacBook-Pro.local:8893_solr_patents_shard2:{ shard:slice2, roles:null, state:active, core:patents_shard2, collection:patents, node_name:Darins-MacBook-Pro.local:8893_solr, base_url:http://Darins-MacBook-Pro.local:8893/solr;, leader:true} {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4042) NullPointerException for query type 'query' without '{! ...}' syntax
Joel Nothman created SOLR-4042: -- Summary: NullPointerException for query type 'query' without '{! ...}' syntax Key: SOLR-4042 URL: https://issues.apache.org/jira/browse/SOLR-4042 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.0, 5.0 Reporter: Joel Nothman Priority: Minor The 'query' query type, corresponding to NestedQParserPlugin, expects a query of form: {! ... } An empty q parameter, or a list of search terms causes a NullPointerException because NestedQParserPlugin.createParser receives a localParams == null, which is then used without checking in NestedQParserPlugin.QParser.parse(). Correct behaviour is currently ambiguous: throw a syntax error? or execute the query with the default parser? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4540) Allow packed ints norms
[ https://issues.apache.org/jira/browse/LUCENE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4540. - Resolution: Fixed Fix Version/s: 5.0 4.1 Allow packed ints norms --- Key: LUCENE-4540 URL: https://issues.apache.org/jira/browse/LUCENE-4540 Project: Lucene - Core Issue Type: Task Components: core/index Reporter: Robert Muir Fix For: 4.1, 5.0 Attachments: LUCENE-4540.patch I was curious what the performance would be, because it might be useful option to use packedints for norms if you have lots of fields and still want good scoring: Today the smallest norm per-field-per-doc you can use is a single byte, and if you have _f_ fields with norms enabled and _n_ docs, it uses _f_ * _n_ bytes of space in RAM. Especially if you aren't using index-time boosting (or even if you are, but not with ridiculous values), this could be wasting a ton of RAM. But then I noticed there was no clean way to allow you to do this in your Similarity: its a trivial patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token
[ https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492095#comment-13492095 ] Robert Muir commented on SOLR-3589: --- Committed to trunk/4.x. Will look tomorrow at 3.6. Edismax parser does not honor mm parameter if analyzer splits a token - Key: SOLR-3589 URL: https://issues.apache.org/jira/browse/SOLR-3589 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.6, 4.0-BETA Reporter: Tom Burton-West Assignee: Robert Muir Attachments: SOLR-3589.patch, SOLR-3589.patch, SOLR-3589.patch, SOLR-3589.patch, SOLR-3589.patch, SOLR-3589_test.patch, testSolr3589.xml.gz, testSolr3589.xml.gz With edismax mm set to 100% if one of the tokens is split into two tokens by the analyzer chain (i.e. fire-fly = fire fly), the mm parameter is ignored and the equivalent of OR query for fire OR fly is produced. This is particularly a problem for languages that do not use white space to separate words such as Chinese or Japenese. See these messages for more discussion: http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-hypenated-words-WDF-splitting-etc-tc3991911.html http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-i-e-CJK-tc3991438.html http://lucene.472066.n3.nabble.com/Why-won-t-dismax-create-multiple-DisjunctionMaxQueries-when-autoGeneratePhraseQueries-is-false-tc3992109.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4041) Allow segment merge monitoring in Solr Admin gui
[ https://issues.apache.org/jira/browse/SOLR-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492096#comment-13492096 ] Otis Gospodnetic commented on SOLR-4041: Does this automatically expose this in JMX? Looks like it does, but if not, would be good to have these numbers there, too. Allow segment merge monitoring in Solr Admin gui Key: SOLR-4041 URL: https://issues.apache.org/jira/browse/SOLR-4041 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 5.0 Reporter: Radim Kolar Labels: patch Fix For: 4.1 Attachments: solr-monitormerge.txt add solrMbean for ConcurrentMergeScheduler -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: concurrentmergescheduller
On Tue, Nov 6, 2012 at 6:32 AM, Michael McCandless luc...@mikemccandless.com wrote: While confusing, I think the code is actually nearly correct... My question is, who is going to create the MikeSays account? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4483) Make Term constructor javadoc refer to BytesRef.deepCopyOf
[ https://issues.apache.org/jira/browse/LUCENE-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4483. - Resolution: Fixed Fix Version/s: (was: 4.0.1) 5.0 4.1 Thanks Paul! Good catch! Make Term constructor javadoc refer to BytesRef.deepCopyOf -- Key: LUCENE-4483 URL: https://issues.apache.org/jira/browse/LUCENE-4483 Project: Lucene - Core Issue Type: Improvement Components: core/index Affects Versions: 4.1 Reporter: Paul Elschot Priority: Trivial Fix For: 4.1, 5.0 Attachments: LUCENE-4483.patch The Term constructor from BytesRef javadoc indicates that a clone needs to be made of the BytesRef. But the clone() method of BytesRef is not what is meant, a deep copy needs to be made. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4043) It isn't right response for create/delete/reload collections
Raintung Li created SOLR-4043: - Summary: It isn't right response for create/delete/reload collections Key: SOLR-4043 URL: https://issues.apache.org/jira/browse/SOLR-4043 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0, 4.0-BETA, 4.0-ALPHA Environment: Solr cloud cluster Reporter: Raintung Li Attachments: patch-4043.txt The create/delete/reload collections are asynchronous process, the client can't get the right response, only make sure the information have been saved into the OverseerCollectionQueue. The client will get the response directly that don't wait the result of behavior(create/delete/reload collection) whatever successful. The easy solution is client wait until the asynchronous process success, the create/delete/reload collection thread will save the response into OverseerCollectionQueue, then notify client to get the response. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4043) It isn't right response for create/delete/reload collections
[ https://issues.apache.org/jira/browse/SOLR-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raintung Li updated SOLR-4043: -- Attachment: patch-4043.txt The patch fix this bug It isn't right response for create/delete/reload collections - Key: SOLR-4043 URL: https://issues.apache.org/jira/browse/SOLR-4043 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0 Environment: Solr cloud cluster Reporter: Raintung Li Attachments: patch-4043.txt The create/delete/reload collections are asynchronous process, the client can't get the right response, only make sure the information have been saved into the OverseerCollectionQueue. The client will get the response directly that don't wait the result of behavior(create/delete/reload collection) whatever successful. The easy solution is client wait until the asynchronous process success, the create/delete/reload collection thread will save the response into OverseerCollectionQueue, then notify client to get the response. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org