date:20121106


[ 
https://issues.apache.org/jira/browse/LUCENE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491341#comment-13491341
 ] 

Simon Willnauer commented on LUCENE-4540:
-

+1 - should we also document that we don't have similarities that can make use 
of it at this point?


 Allow packed ints norms
 ---

 Key: LUCENE-4540
 URL: https://issues.apache.org/jira/browse/LUCENE-4540
 Project: Lucene - Core
  Issue Type: Task
  Components: core/index
Reporter: Robert Muir
 Attachments: LUCENE-4540.patch


 I was curious what the performance would be, because it might be useful 
 option to use packedints for norms if you have lots of fields and still want 
 good scoring: 
 Today the smallest norm per-field-per-doc you can use is a single byte, and 
 if you have _f_ fields with norms enabled and _n_ docs, it uses _f_ * _n_ 
 bytes of space in RAM.
 Especially if you aren't using index-time boosting (or even if you are, but 
 not with ridiculous values), this could be wasting a ton of RAM.
 But then I noticed there was no clean way to allow you to do this in your 
 Similarity: its a trivial patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4537) Move RateLimiter up to Directory and make it IOContext aware

[
https://issues.apache.org/jira/browse/LUCENE-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491345#comment-13491345
]

Simon Willnauer commented on LUCENE-4537:
-

bq. I think this is some premature optimization.
I am not sure if that is premature. But I do agree it would be great if we
could just wrap the IndexOutput to do this kind of stuff entirely outside of
Directory. Maybe we can have a flush callback on BufferedIndexOutput we can
hook into the flush call. This would also enable us to do some flush statistics
which is independent of this issue. This could be an impl detail of
BufferedIndexOutput but it would enable us to 1. do the opto we do today 2.
divorce the rate limiting entirely from Dir.

Move RateLimiter up to Directory and make it IOContext aware

Key: LUCENE-4537
URL: https://issues.apache.org/jira/browse/LUCENE-4537
Project: Lucene - Core
Issue Type: Improvement
Components: core/store
Affects Versions: 4.0
Reporter: Simon Willnauer
Fix For: 4.1, 5.0

Attachments: LUCENE-4537.patch, LUCENE-4537.patch, LUCENE-4537.patch

Currently the RateLimiter only applies to FSDirectory which is fine in
general but always requires casts and other dir. impls (custom ones could
benefit from this too.) We are also only able to rate limit merge operations
which limits the functionality here a lot. Since we have the context
information what the IndexOutput is used for we can use that for rate
limiting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4037) Continuous Ping query caused exception: java.util.concurrent.RejectedExecutionException

Markus Jelsma created SOLR-4037:
---

 Summary: Continuous Ping query caused exception: 
java.util.concurrent.RejectedExecutionException
 Key: SOLR-4037
 URL: https://issues.apache.org/jira/browse/SOLR-4037
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
 Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 
12:37:38
Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
Reporter: Markus Jelsma
 Fix For: 4.1, 5.0


See: 
http://lucene.472066.n3.nabble.com/Continuous-Ping-query-caused-exception-java-util-concurrent-RejectedExecutionException-td4017470.html
Using this week's trunk we sometime see nodes entering a some funky state where 
it continuously reports exceptions. Replication and query handling is still 
possible but there is an increase in CPU time:

{code}
2012-11-01 09:24:28,337 INFO [solr.core.SolrCore] - [http-8080-exec-4] - : 
[openindex_f] webapp=/solr path=/admin/ping params={} status=500 QTime=21
2012-11-01 09:24:28,337 ERROR [solr.core.SolrCore] - [http-8080-exec-4] - : 
org.apache.solr.common.SolrException: Ping query caused exception: 
java.util.concurrent.RejectedExecutionException
at 
org.apache.solr.handler.PingRequestHandler.handlePing(PingRequestHandler.java:259)
at 
org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:207)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
at 
org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.solr.common.SolrException: 
java.util.concurrent.RejectedExecutionException
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1674)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1330)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1265)
at 
org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:88)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:214)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830)
at 
org.apache.solr.handler.PingRequestHandler.handlePing(PingRequestHandler.java:250)
... 19 more
Caused by: java.util.concurrent.RejectedExecutionException
at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:92)
at 
java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:603)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1605)
... 27 more
{code}

This won't stop until i restart the servlet container but began in the first 
place after restarting the servlet container. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

[jira] [Created] (SOLR-4038) SolrCloud indexing blocks if node is recovering

Markus Jelsma created SOLR-4038:
---

 Summary: SolrCloud indexing blocks if node is recovering
 Key: SOLR-4038
 URL: https://issues.apache.org/jira/browse/SOLR-4038
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Markus Jelsma
 Fix For: 4.1, 5.0


See: 
http://lucene.472066.n3.nabble.com/SolrCloud-indexing-blocks-if-node-is-recovering-td4017827.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4038) SolrCloud indexing blocks if node is recovering


 [ 
https://issues.apache.org/jira/browse/SOLR-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated SOLR-4038:


Description: 
See: 
http://lucene.472066.n3.nabble.com/SolrCloud-indexing-blocks-if-node-is-recovering-td4017827.html

While indexing (without CloudSolrServer at that time) one node dies with an 
OOME perhaps  because of the linked issue SOLR-4032. The OOME stack traces are 
varied but here are some ZK-related logs between the OOME stack traces:

{code}
2012-11-02 14:14:37,126 INFO [solr.update.UpdateLog] - [RecoveryThread] - : 
Dropping buffered updates FSUpdateLog{state=BUFFERING, tlog=null}
2012-11-02 14:14:37,127 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] 
- : Recovery failed - trying again... (2) core=shard_e
2012-11-02 14:14:37,127 INFO [solr.cloud.RecoveryStrategy] - [RecoveryThread] - 
: Wait 8.0 seconds before trying to recover again (3)
2012-11-02 14:14:45,328 INFO [solr.cloud.ZkController] - [RecoveryThread] - : 
numShards not found on descriptor - reading it from system property
2012-11-02 14:14:45,363 INFO [solr.cloud.RecoveryStrategy] - [RecoveryThread] - 
: Starting Replication Recovery. core=shard_e
2012-11-02 14:14:45,363 INFO [solrj.impl.HttpClientUtil] - [RecoveryThread] - : 
Creating new http client, 
config:maxConnections=128maxConnectionsPerHost=32followRedirects=false
2012-11-02 14:14:45,775 INFO [common.cloud.ZkStateReader] - [main-EventThread] 
- : A cluster state change has occurred - updating... (10)
2012-11-02 14:14:50,987 INFO [solr.cloud.RecoveryStrategy] - [RecoveryThread] - 
: Begin buffering updates. core=shard_e
2012-11-02 14:14:50,987 INFO [solr.update.UpdateLog] - [RecoveryThread] - : 
Starting to buffer updates. FSUpdateLog{state=ACTIVE, tlog=null}
2012-11-02 14:14:50,987 INFO [solr.cloud.RecoveryStrategy] - [RecoveryThread] - 
: Attempting to replicate from http://rot05.solrserver:8080/solr/shard_e/. 
core=shard_e
2012-11-02 14:14:50,987 INFO [solrj.impl.HttpClientUtil] - [RecoveryThread] - : 
Creating new http client, 
config:maxConnections=128maxConnectionsPerHost=32followRedirects=false
2012-11-02 14:15:03,303 INFO [solr.core.CachingDirectoryFactory] - 
[RecoveryThread] - : Releasing directory:/opt/solr/cores/shard_f/data/index
2012-11-02 14:15:03,303 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : 
removing temporary index download directory files 
NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/solr/cores/shard_f/data/index.20121102141424591
 lockFactory=org.apache.lucene.store.SimpleFSLockFactory@1520a48c; 
maxCacheMB=48.0 maxMergeSizeMB=4.0)
2012-11-02 14:15:09,421 INFO [apache.zookeeper.ClientCnxn] - 
[main-SendThread(rot1.zkserver:2181)] - : Client session timed out, have not 
heard from server in 11873ms for sessionid 0x13abc504486000f, closing socket 
connection and attempting reconnect
2012-11-02 14:15:09,422 ERROR [solr.core.SolrCore] - [http-8080-exec-1] - : 
org.apache.solr.common.SolrException: Ping query caused exception: Java heap 
space
.
.
2012-11-02 14:15:09,867 INFO [common.cloud.ConnectionManager] - 
[main-EventThread] - : Watcher 
org.apache.solr.common.cloud.ConnectionManager@305e7020 
name:ZooKeeperConnection Watcher:rot1.zkserver:2181,rot2.zkserver:2181 got 
event WatchedEvent state:Disconnected type:None path:null path:null type:None
2012-11-02 14:15:09,867 INFO [common.cloud.ConnectionManager] - 
[main-EventThread] - : zkClient has disconnected
2012-11-02 14:15:09,869 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] 
- : Error while trying to recover:java.lang.OutOfMemoryError: Java heap space
.
.
2012-11-02 14:15:10,159 INFO [solr.update.UpdateLog] - [RecoveryThread] - : 
Dropping buffered updates FSUpdateLog{state=BUFFERING, tlog=null}
2012-11-02 14:15:10,159 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] 
- : Recovery failed - trying again... (3) core=shard_e
2012-11-02 14:15:10,159 INFO [solr.cloud.RecoveryStrategy] - [RecoveryThread] - 
: Wait 16.0 seconds before trying to recover again (4)
2012-11-02 14:15:09,878 INFO [solr.core.CachingDirectoryFactory] - 
[RecoveryThread] - : Releasing 
directory:/opt/solr/cores/shard_f/data/index.20121102141424591
2012-11-02 14:15:10,192 INFO [solr.core.CachingDirectoryFactory] - 
[RecoveryThread] - : Releasing directory:/opt/solr/cores/shard_f_f/data/index
2012-11-02 14:15:10,192 ERROR [solr.handler.ReplicationHandler] - 
[RecoveryThread] - : SnapPull failed :org.apache.solr.common.SolrException: 
Unable to download _773.tvf completely. Downloaded 246
415360!=562327645
.
.
{code} 

At this point indexing has already been blocked. Some nodes do not write 
anything to the logs and the two surrounding nodes are still busy doing some 
replication. Most nodes show a increased number of state changes:

{code}
2012-11-02 14:16:47,768 INFO [common.cloud.ZkStateReader] - [main-EventThread] 
- : A cluster state change has occurred -

[jira] [Created] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable

2012-11-06 Thread Piotr (JIRA)

Piotr created LUCENE-4542:
-

 Summary: Make RECURSION_CAP in HunspellStemmer configurable
 Key: LUCENE-4542
 URL: https://issues.apache.org/jira/browse/LUCENE-4542
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Piotr
Priority: Minor


Currently there is 
private static final int RECURSION_CAP = 2;
in the code. It makes using hunspell with several dictionaries almost unusable, 
due to bad performance. It would be nice to be able to tune this number as 
needed.

(it's a first issue in my life, so please forgive me any mistakes done).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable

2012-11-06 Thread Piotr (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr updated LUCENE-4542:
--

Description: 
Currently there is 
private static final int RECURSION_CAP = 2;
in the code of the class HunspellStemmer. It makes using hunspell with several 
dictionaries almost unusable, due to bad performance. It would be nice to be 
able to tune this number as needed.

(it's a first issue in my life, so please forgive me any mistakes done).

  was:
Currently there is 
private static final int RECURSION_CAP = 2;
in the code. It makes using hunspell with several dictionaries almost unusable, 
due to bad performance. It would be nice to be able to tune this number as 
needed.

(it's a first issue in my life, so please forgive me any mistakes done).


 Make RECURSION_CAP in HunspellStemmer configurable
 --

 Key: LUCENE-4542
 URL: https://issues.apache.org/jira/browse/LUCENE-4542
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Piotr
Priority: Minor

 Currently there is 
 private static final int RECURSION_CAP = 2;
 in the code of the class HunspellStemmer. It makes using hunspell with 
 several dictionaries almost unusable, due to bad performance. It would be 
 nice to be able to tune this number as needed.
 (it's a first issue in my life, so please forgive me any mistakes done).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4537) Move RateLimiter up to Directory and make it IOContext aware

[
https://issues.apache.org/jira/browse/LUCENE-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491390#comment-13491390
]

Michael McCandless commented on LUCENE-4537:

I think rate limiting merge IO is important functionality: merges
easily kill search performance if you index/search on one box (NRT
app).

But I agree: Directory is abstract and minimal and we should keep it
that way.

A generic wrapper around any IO would be great ... but I'm not sure
how we'd do it? EG, would we have to tally up our own bytes in every
write method (writeInt/Long/VInt/VLong/etc.)? Maybe that's acceptable?
It's only for writing ...

Or maybe we only make RateLimitingBufferedIO subclass? Though I had
wanted to try this with RAMDirectory too (playing with Zing)... I
guess we could make a RateLimitingRAMOutputStream ...

Move RateLimiter up to Directory and make it IOContext aware

Attachments: LUCENE-4537.patch, LUCENE-4537.patch, LUCENE-4537.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4540) Allow packed ints norms


[ 
https://issues.apache.org/jira/browse/LUCENE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491391#comment-13491391
 ] 

Michael McCandless commented on LUCENE-4540:


+1, very cool!

 Allow packed ints norms
 ---

 Key: LUCENE-4540
 URL: https://issues.apache.org/jira/browse/LUCENE-4540
 Project: Lucene - Core
  Issue Type: Task
  Components: core/index
Reporter: Robert Muir
 Attachments: LUCENE-4540.patch


 I was curious what the performance would be, because it might be useful 
 option to use packedints for norms if you have lots of fields and still want 
 good scoring: 
 Today the smallest norm per-field-per-doc you can use is a single byte, and 
 if you have _f_ fields with norms enabled and _n_ docs, it uses _f_ * _n_ 
 bytes of space in RAM.
 Especially if you aren't using index-time boosting (or even if you are, but 
 not with ridiculous values), this could be wasting a ton of RAM.
 But then I noticed there was no clean way to allow you to do this in your 
 Similarity: its a trivial patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Adding index epoch tracking to Lucene

2012-11-06 Thread Michael McCandless

I'm not sure, but I think something in Solr's replication needed this
information?  And maybe that's why it uses timestamps today
instead...?

Mike McCandless

http://blog.mikemccandless.com


On Tue, Nov 6, 2012 at 3:09 AM, Shai Erera ser...@gmail.com wrote:
 Hi guys,

 Lucene today tracks the index generation, which is incremented whenever
 changes are committed to the index. In LUCENE-4532 I needed to add epoch
 tracking, which is incremented whenever the index is being re-created. An
 index is considered to be re-created (for the use case in LUCENE-4532)
 either when you open IndexWriter with OpenMode.CREATE or when you call
 IndexWriter.deleteAll().

 In LUCENE-4532 I did that through index commit data. I was wondering if
 others find this information useful, and whether we should add it to Lucene
 alongside the generation and version tracking. It's just another int/long,
 and is not supposed to complicate the code or add any runtime complexity.

 Thoughts?

 Shai

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable

2012-11-06 Thread Piotr (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr updated LUCENE-4542:
--

   Priority: Major  (was: Minor)
Description: 
Currently there is 
private static final int RECURSION_CAP = 2;
in the code of the class HunspellStemmer. It makes using hunspell with several 
dictionaries almost unusable, due to bad performance (f.ex. it costs 36ms to 
stem long sentence in latvian for recursion_cap=2 and 5 ms for 
recursion_cap=1). It would be nice to be able to tune this number as needed.
AFAIK this number (2) was chosen arbitrary.

(it's a first issue in my life, so please forgive me any mistakes done).

  was:
Currently there is 
private static final int RECURSION_CAP = 2;
in the code of the class HunspellStemmer. It makes using hunspell with several 
dictionaries almost unusable, due to bad performance. It would be nice to be 
able to tune this number as needed.

(it's a first issue in my life, so please forgive me any mistakes done).


 Make RECURSION_CAP in HunspellStemmer configurable
 --

 Key: LUCENE-4542
 URL: https://issues.apache.org/jira/browse/LUCENE-4542
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Piotr

 Currently there is 
 private static final int RECURSION_CAP = 2;
 in the code of the class HunspellStemmer. It makes using hunspell with 
 several dictionaries almost unusable, due to bad performance (f.ex. it costs 
 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for 
 recursion_cap=1). It would be nice to be able to tune this number as needed.
 AFAIK this number (2) was chosen arbitrary.
 (it's a first issue in my life, so please forgive me any mistakes done).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: concurrentmergescheduller

2012-11-06 Thread Michael McCandless

Hi Radim,

While confusing, I think the code is actually nearly correct... but I
would love to find some simplifications of CMS's logic (it's really
hairy).

It turns out mergeThreadCount() is allowed to go higher than
maxThreadCount; when this happens, Lucene pauses
mergeThreadCount()-maxThreadCount of those merge threads, and resumes
them once threads finish (see updateMergeThreads).  Ie, CMS will
accept up to maxMergeCount merges (and launch threads for them), but
will only allow maxThreadCount of those threads to be running at once.

So what that while loop is doing is preventing more than
maxMergeCount+1 threads from starting, and then pausing the incoming
thread to slow down the rate of segment creation (since merging cannot
keep up).

But ... I think the 1+ is wrong ... it seems like it should just be
mergeThreadCount() = maxMergeCount().  Could you please open an issue
for this?  I'll fix ... and I'll add a comment explaining that
confusing loop.

Would really be nice to simplify the code too.

Thanks!

Mike McCandless

http://blog.mikemccandless.com


On Mon, Nov 5, 2012 at 1:42 PM, Radim Kolar h...@filez.com wrote:
 i suspect that this code is broken. Lines 331 - 343 in
 org.apache.lucene.index.ConcurrentMergeScheduler.merge(IndexWriter)

 mergeThreadCount() are currently active merges, they can be at most
 maxThreadCount, maxMergeCount is number of queued merges defaulted with
 maxThreadCount+2 and it can never be lower then maxThreadCount, which means
 that condition in while can never become true.

   synchronized(this) {
 long startStallTime = 0;
 while (mergeThreadCount() = 1+maxMergeCount) {
   startStallTime = System.currentTimeMillis();
   if (verbose()) {
 message(too many merges; stalling...);
   }
   try {
 wait();
   } catch (InterruptedException ie) {
 throw new ThreadInterruptedException(ie);
   }
 }



 maxThreadCount+2;

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Source Control

2012-11-06 Thread Toke Eskildsen

On Sat, 2012-10-27 at 01:02 +0200, Mark Miller wrote:
 What are peoples thoughts about moving to git?

Speaking as a contributor without commit rights and with very little
interest in build systems and versioning control, my experience with
Lucene hacking so far has had a few technical annoyances:

- It is hard to collaborate on a patch, internally in my organization as
well as externally

- Local backups, independent of SVN, are needed to be sure not to loose
code in case of crashes

- Patching a dormant patch is confusing at it very quickly gets hard to
keep track of which version is current: There is no versioning control
for the patches, except for trivial timestamps

- When a patch is created against trunk and the JIRA issue is revisited
some time later, chances are that the interfaced Lucene/Solr-code has
changed. To apply the patch, one needs to hunt down the SVN tag that the
patch was generated against


What I would like to see is something like GitHub, where everyone can
easily fork the code, share it and just point to it in the JIRA issue.
A read-only git (or another distributed versioning system) repository
alone would only solve the first two problems and then only if I had a
place to make a public repository (which admittedly is easy enough with
GitHub et al).

- Toke Eskildsen, State and University Library, Denmark


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4537) Move RateLimiter up to Directory and make it IOContext aware


[ 
https://issues.apache.org/jira/browse/LUCENE-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491413#comment-13491413
 ] 

Robert Muir commented on LUCENE-4537:
-

I didn't say it wasn't important.

I guess, if its really important, then we'll invest the time to 
figure out clean APIs to support it. Otherwise we can remove it :)


 Move RateLimiter up to Directory and make it IOContext aware
 

 Key: LUCENE-4537
 URL: https://issues.apache.org/jira/browse/LUCENE-4537
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/store
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.1, 5.0

 Attachments: LUCENE-4537.patch, LUCENE-4537.patch, LUCENE-4537.patch


 Currently the RateLimiter only applies to FSDirectory which is fine in 
 general but always requires casts and other dir. impls (custom ones could 
 benefit from this too.) We are also only able to rate limit merge operations 
 which limits the functionality here a lot. Since we have the context 
 information what the IndexOutput is used for we can use that for rate 
 limiting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4540) Allow packed ints norms


[ 
https://issues.apache.org/jira/browse/LUCENE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491415#comment-13491415
 ] 

Robert Muir commented on LUCENE-4540:
-

I don't understand the question Simon: all the ones we provide happen to use 
Norm.setByte

I don't think we need to add documentation to Norm.setFloat,Norm.setDouble 
saying that we don't
provide any similarities that call these methods: thats not important to 
anybody.

 Allow packed ints norms
 ---

 Key: LUCENE-4540
 URL: https://issues.apache.org/jira/browse/LUCENE-4540
 Project: Lucene - Core
  Issue Type: Task
  Components: core/index
Reporter: Robert Muir
 Attachments: LUCENE-4540.patch


 I was curious what the performance would be, because it might be useful 
 option to use packedints for norms if you have lots of fields and still want 
 good scoring: 
 Today the smallest norm per-field-per-doc you can use is a single byte, and 
 if you have _f_ fields with norms enabled and _n_ docs, it uses _f_ * _n_ 
 bytes of space in RAM.
 Especially if you aren't using index-time boosting (or even if you are, but 
 not with ridiculous values), this could be wasting a ton of RAM.
 But then I noticed there was no clean way to allow you to do this in your 
 Similarity: its a trivial patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Source Control

2012-11-06 Thread Radim Kolar




- It is hard to collaborate on a patch, internally in my organization as
well as externally
solr/lucene git repo is on github. You can use git within your 
organization without problem.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4540) Allow packed ints norms


[ 
https://issues.apache.org/jira/browse/LUCENE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491417#comment-13491417
 ] 

Simon Willnauer commented on LUCENE-4540:
-

bq. I don't understand the question Simon: all the ones we provide happen to 
use Norm.setByte
Just to clarify. Currently if we write packed ints and a similarity calls 
Source#getArray you get an UOE. I think we should document that our current 
impls won't handle this. 

 Allow packed ints norms
 ---

 Key: LUCENE-4540
 URL: https://issues.apache.org/jira/browse/LUCENE-4540
 Project: Lucene - Core
  Issue Type: Task
  Components: core/index
Reporter: Robert Muir
 Attachments: LUCENE-4540.patch


 I was curious what the performance would be, because it might be useful 
 option to use packedints for norms if you have lots of fields and still want 
 good scoring: 
 Today the smallest norm per-field-per-doc you can use is a single byte, and 
 if you have _f_ fields with norms enabled and _n_ docs, it uses _f_ * _n_ 
 bytes of space in RAM.
 Especially if you aren't using index-time boosting (or even if you are, but 
 not with ridiculous values), this could be wasting a ton of RAM.
 But then I noticed there was no clean way to allow you to do this in your 
 Similarity: its a trivial patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4540) Allow packed ints norms


[ 
https://issues.apache.org/jira/browse/LUCENE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491421#comment-13491421
 ] 

Robert Muir commented on LUCENE-4540:
-

I don't see how its relevant. Issues will happen if you use Norm.setFloat (as 
they expect a byte).

I'm not going to confuse the documentation. The built-in Similarities at 
query-time
depend upon their index-time norm implementation: this is documented 
extensively everywhere!



 Allow packed ints norms
 ---

 Key: LUCENE-4540
 URL: https://issues.apache.org/jira/browse/LUCENE-4540
 Project: Lucene - Core
  Issue Type: Task
  Components: core/index
Reporter: Robert Muir
 Attachments: LUCENE-4540.patch


 I was curious what the performance would be, because it might be useful 
 option to use packedints for norms if you have lots of fields and still want 
 good scoring: 
 Today the smallest norm per-field-per-doc you can use is a single byte, and 
 if you have _f_ fields with norms enabled and _n_ docs, it uses _f_ * _n_ 
 bytes of space in RAM.
 Especially if you aren't using index-time boosting (or even if you are, but 
 not with ridiculous values), this could be wasting a ton of RAM.
 But then I noticed there was no clean way to allow you to do this in your 
 Similarity: its a trivial patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4540) Allow packed ints norms


[ 
https://issues.apache.org/jira/browse/LUCENE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491422#comment-13491422
 ] 

Simon Willnauer commented on LUCENE-4540:
-

fair enough. I just wanted to mention it..

 Allow packed ints norms
 ---

 Key: LUCENE-4540
 URL: https://issues.apache.org/jira/browse/LUCENE-4540
 Project: Lucene - Core
  Issue Type: Task
  Components: core/index
Reporter: Robert Muir
 Attachments: LUCENE-4540.patch


 I was curious what the performance would be, because it might be useful 
 option to use packedints for norms if you have lots of fields and still want 
 good scoring: 
 Today the smallest norm per-field-per-doc you can use is a single byte, and 
 if you have _f_ fields with norms enabled and _n_ docs, it uses _f_ * _n_ 
 bytes of space in RAM.
 Especially if you aren't using index-time boosting (or even if you are, but 
 not with ridiculous values), this could be wasting a ton of RAM.
 But then I noticed there was no clean way to allow you to do this in your 
 Similarity: its a trivial patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4540) Allow packed ints norms


[ 
https://issues.apache.org/jira/browse/LUCENE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491423#comment-13491423
 ] 

Robert Muir commented on LUCENE-4540:
-

If someone changes their similarity to use a different norm type at index-time 
than at query-time,
then he or she is an idiot!

 Allow packed ints norms
 ---

 Key: LUCENE-4540
 URL: https://issues.apache.org/jira/browse/LUCENE-4540
 Project: Lucene - Core
  Issue Type: Task
  Components: core/index
Reporter: Robert Muir
 Attachments: LUCENE-4540.patch


 I was curious what the performance would be, because it might be useful 
 option to use packedints for norms if you have lots of fields and still want 
 good scoring: 
 Today the smallest norm per-field-per-doc you can use is a single byte, and 
 if you have _f_ fields with norms enabled and _n_ docs, it uses _f_ * _n_ 
 bytes of space in RAM.
 Especially if you aren't using index-time boosting (or even if you are, but 
 not with ridiculous values), this could be wasting a ton of RAM.
 But then I noticed there was no clean way to allow you to do this in your 
 Similarity: its a trivial patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4543) Bring back TFIDFSim.lengthNorm

Robert Muir created LUCENE-4543:
---

 Summary: Bring back TFIDFSim.lengthNorm
 Key: LUCENE-4543
 URL: https://issues.apache.org/jira/browse/LUCENE-4543
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir


We removed this before because of LUCENE-2828,
but the problem there was the delegator (not the lengthNorm method).

TFIDFSim requires byte[] norms today. So its computeNorm should be final,
calling lengthNorm() that returns a byte.

This way there is no possibility for you to do something stupid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4539) DocValues impls should read all headers up-front instead of per-directsource


[ 
https://issues.apache.org/jira/browse/LUCENE-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491432#comment-13491432
 ] 

Adrien Grand commented on LUCENE-4539:
--

Only in tests. This is why I think that writing a full header (including the 
PackedInts codec name) is useless most of time if not always.

 DocValues impls should read all headers up-front instead of per-directsource
 

 Key: LUCENE-4539
 URL: https://issues.apache.org/jira/browse/LUCENE-4539
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Robert Muir
 Attachments: LUCENE-4539.patch


 Currently, when DocValues opens, it just opens files. it doesnt read codec 
 headers etc.
 Instead we read these every single time a directsource opens. 
 I think it should work like PostingsReaders: e.g. the PackedInts impl would 
 read its versioning info and codec headers and creating a new Direct impl 
 should be a IndexInput.clone() + getDirectReaderNoHeader().
 Today its much more costly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4538) Cache DocValues DirecSource


 [ 
https://issues.apache.org/jira/browse/LUCENE-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-4538:


Attachment: LUCENE-4538.patch

here is a new patch makeing the loadSource  loadDirectSource protected. This 
is really confusing if you have two ways to get a Source instance and you need 
to take care of if it is cached or not. This really should not have been public 
at all. 

I will commit this soon

 Cache DocValues DirecSource 
 

 Key: LUCENE-4538
 URL: https://issues.apache.org/jira/browse/LUCENE-4538
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.1, 5.0

 Attachments: LUCENE-4538.patch, LUCENE-4538.patch


 Currently the user need to make sure that a direct source is not shared 
 between threads and each time someone calls getDirectSource we create a new 
 source which has a reasonable overhead. We can certainly reduce the overhead 
 (maybe different issue) but it should be easier for the user to get a direct 
 source and handle it. More than that is should be consistent with getSource / 
 loadSource.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4536) Make PackedInts byte-aligned?

[
https://issues.apache.org/jira/browse/LUCENE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491449#comment-13491449
]

Michael McCandless commented on LUCENE-4536:

This patch only changes the on-disk format right? The specialized
in-memory readers are still backed by native arrays
(short[]/int[]/long[], etc.)?

Instead of PackedInts.VERSION_CURRENT = 1 can we add
VERSION_BYTE_ALIGNED = 1 and then set VERSION_CURRENT =
VERSION_BYTE_ALIGNED?

Also, can we leave VERSION_START = 0 (ie don't rename that to
VERSION_LONG_ALIGNED)? But we should put a comment saying that one
was long aligned ...

Ie, in general, I think the version constants should be created once
and then not changed (write once), and VERSION_CURRENT changes to
point to whichever is most recent.

That careful anonymous subclass in PackedInts to handle seeking to the
end when the last value is read is sort of sneaky ... this should only
kick in when reading the old (long-aligned) format right? Can you add
an assert checking that the version is VERSION_START? Or ... maybe
... we should not promise this (no trailing wasted bytes) in the
API? Or maybe we expose a new explicit method to seek to the end of
this packed ints or something (eg maybe skipTrailingBytes).

Make PackedInts byte-aligned?
-

Key: LUCENE-4536
URL: https://issues.apache.org/jira/browse/LUCENE-4536
Project: Lucene - Core
Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
Fix For: 4.1

Attachments: LUCENE-4536.patch

PackedInts are more and more used to save/restore small arrays, but given
that they are long-aligned, up to 63 bits are wasted per array. We should try
to make PackedInts storage byte-aligned so that only 7 bits are wasted in the
worst case.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4539) DocValues impls should read all headers up-front instead of per-directsource

[
https://issues.apache.org/jira/browse/LUCENE-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491454#comment-13491454
]

Robert Muir commented on LUCENE-4539:
-

I agree with you its bogus how it writes its header.

But I see a downside (I hope we can come up with an idea to deal with it rather
than keeping the header!)

One advantage of PackedInts writing its versioning (like FSTs) is that lots of
things nest them in their own file.

The problem with these two things is that they are themselves changing and
versioned: they arent like readVint()
which is pretty much fixed in what it does.

So having them write their own versions etc today to some extent makes back
compat management of file formats easier:
today its just DocValues and Term dictionaries using these things, tomorrow
(4.1) its also the postings lists: documents,
frequencies, and positions, and maybe in the future even stored fields
(LUCENE-4527).

Who is keeping up with all the places that must be managed when a packed ints
version change needs to happen? Today
the header encapsulates in one place: if i backwards break FSTs and it breaks a
few suggester impls, i know anyone
using those suggesters will get IndexFormatTooOldException without me doing
anything. So thats very convenient.

DocValues impls should read all headers up-front instead of per-directsource

Key: LUCENE-4539
URL: https://issues.apache.org/jira/browse/LUCENE-4539
Project: Lucene - Core
Issue Type: Bug
Components: core/index
Reporter: Robert Muir
Attachments: LUCENE-4539.patch

Currently, when DocValues opens, it just opens files. it doesnt read codec
headers etc.
Instead we read these every single time a directsource opens.
I think it should work like PostingsReaders: e.g. the PackedInts impl would
read its versioning info and codec headers and creating a new Direct impl
should be a IndexInput.clone() + getDirectReaderNoHeader().
Today its much more costly.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4536) Make PackedInts byte-aligned?

[
https://issues.apache.org/jira/browse/LUCENE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491456#comment-13491456
]

Adrien Grand commented on LUCENE-4536:
--

bq. This patch only changes the on-disk format right? The specialized in-memory
readers are still backed by native arrays (short[]/int[]/long[], etc.)?

Exactly.

bq. Ie, in general, I think the version constants should be created once and
then not changed (write once), and VERSION_CURRENT changes to point to
whichever is most recent.

Ok, I'll change it.

bq. That careful anonymous subclass in PackedInts to handle seeking to the end
when the last value is read is sort of sneaky ... this should only kick in when
reading the old (long-aligned) format right?

This only happens when reading the old format AND the number of bytes used to
serialized the array is not a multiple of 8. I'll add an assert to make sure
that this condition can only be true with the old format.

bq. Or ... maybe... we should not promise this (no trailing wasted bytes) in
the API?
bq. Or maybe we expose a new explicit method to seek to the end of this packed
ints or something (eg maybe skipTrailingBytes).

These were my first ideas, but the truth is that I was very scared to break
something (for example doc values rely on the assumption that after reading the
last value of a direct array, the whole stream is consumed). Fixing PackedInts
to make sure those assumptions are still true looked easier to me as I was able
to create fake long-aligned packed ints and make sure that the whole stream
was consumed after reading the last value.

But your option makes perfect sense to me and I will do it if you think it is
cleaner.

Thanks for the review!

Make PackedInts byte-aligned?
-

Key: LUCENE-4536
URL: https://issues.apache.org/jira/browse/LUCENE-4536
Project: Lucene - Core
Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
Fix For: 4.1

Attachments: LUCENE-4536.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4532) TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy failure

2012-11-06 Thread Gilad Barkai (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491466#comment-13491466
 ] 

Gilad Barkai commented on LUCENE-4532:
--

Reviewed the patch - and it looks very good.

A few comments:
1. in TestDirectoryTaxonomyWriter.java, the error string _index.create.time 
not found in commitData_ should be updated.
2. if the index creation time is in the commit data, it will not be removed - 
as the epoch is added to whatever commit data was read from the index. I think 
perhaps it should be removed?
3. since the members related to the old 'timestamp' method are removed - no 
test could check the migration from old to new methods. Might be a good idea to 
add one with a comment to remove it when backward compatibility is no longer 
required (Lucene 6?).
4. 'Epoch' is usually in the context of time, or in relation of a period. 
Perhaps the name 'version' is more closely related to the implementation?



 TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy failure
 

 Key: LUCENE-4532
 URL: https://issues.apache.org/jira/browse/LUCENE-4532
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-4532.patch


 The following failure on Jenkins:
 {noformat}
  Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/1404/
  Java: 32bit/jdk1.6.0_37 -client -XX:+UseConcMarkSweepGC
 
  1 tests failed.
  REGRESSION:  
  org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy
 
  Error Message:
 
 
  Stack Trace:
  java.lang.ArrayIndexOutOfBoundsException
  at 
  __randomizedtesting.SeedInfo.seed([6AB10D3E4E956CFA:BFB2863DB7E077E0]:0)
  at java.lang.System.arraycopy(Native Method)
  at 
  org.apache.lucene.facet.taxonomy.directory.ParentArray.refresh(ParentArray.java:99)
  at 
  org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyReader.refresh(DirectoryTaxonomyReader.java:407)
  at 
  org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.doTestReadRecreatedTaxono(TestDirectoryTaxonomyReader.java:167)
  at 
  org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy(TestDirectoryTaxonomyReader.java:130)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at 
  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
  at 
  org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
  at 
  org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
  at 
  org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
  at 
  com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
  at 
  org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
  at 
  org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
  at 
  org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
  at 
  com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at 
  com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
  at 
  com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
  at 
  com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
  at

Re: Source Control

2012-11-06 Thread Jack Krupansky

One issue is how to use git and github. One can certainly use it as if it 
were svn, but that misses a lot of the power of git, particularly the 
collaborative tools on github.


For example, one approach is to create a branch for every Jira ticket and 
then instead of posting raw patches on the Jira ticket, create git pull 
requests from the branch, which make it easy to comment on individual file 
changes, right down to comments on individual lines of code. Changes can be 
committed and pushed to the branch as work continues and new pull requests 
generated. Eventually, pull requests can then be easily merged into the 
master, as desired. Users can selectively include pull requests as they see 
fit as well.


But... can all of us, even non-committers do that? Or would the better 
features of github be available only to committers? I don't know enough 
about github to know whether you can have one class of user able to create 
branches or comment on them but not merge into master or tagged branches 
such as releases.


-- Jack Krupansky

-Original Message- 
From: Mark Miller

Sent: Friday, October 26, 2012 7:02 PM
To: dev@lucene.apache.org
Subject: Source Control

So, it's not everyone's favorite tool, but it sure seems to be the most 
popular tool.


What are peoples thoughts about moving to git?

Distributed version control is where it's at :)

I know some prefer mercurial, but git and github clearly are taking over the 
world.


Also, the cmd line for git is a little eccentric - I use a GUI client called 
SmartGit. Some very clever German's make it.


A few Apache projects are already using git.

I'd like to hear what people feel about this idea.

- Mark
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4536) Make PackedInts byte-aligned?

[
https://issues.apache.org/jira/browse/LUCENE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491479#comment-13491479
]

Michael McCandless commented on LUCENE-4536:

bq. These were my first ideas, but the truth is that I was very scared to break
something (for example doc values rely on the assumption that after reading the
last value of a direct array, the whole stream is consumed).

It's hard to know what's best :)

I like the explicitness / transparency / no sneaky code solution of
.skipTrailingBytes().

But then I don't like that skipTrailingBytes would only be for back
compat (ie, we will remove it eventually, unless somehow we go back to
wasted trailing bytes) ... annoying to add essentially a deprecated
API.

But then really it's presumptuous of the consumers of PackedInts to
expect all bytes are consumed after iterating all values ... like
that's making a sometimes invalid assumption about the file format of
PackedInts.

And this is an internal API so we are free to change things ..

But net/net I think we should stick w/ your current patch?

Make PackedInts byte-aligned?
-

Key: LUCENE-4536
URL: https://issues.apache.org/jira/browse/LUCENE-4536
Project: Lucene - Core
Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
Fix For: 4.1

Attachments: LUCENE-4536.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4544) possible bug in ConcurrentMergeScheduler.merge(IndexWriter)

Radim Kolar created LUCENE-4544:
---

 Summary: possible bug in 
ConcurrentMergeScheduler.merge(IndexWriter) 
 Key: LUCENE-4544
 URL: https://issues.apache.org/jira/browse/LUCENE-4544
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 5.0
Reporter: Radim Kolar


from dev list:

¨i suspect that this code is broken. Lines 331 - 343 in 
org.apache.lucene.index.ConcurrentMergeScheduler.merge(IndexWriter)

mergeThreadCount() are currently active merges, they can be at most 
maxThreadCount, maxMergeCount is number of queued merges defaulted with 
maxThreadCount+2 and it can never be lower then maxThreadCount, which means 
that condition in while can never become true.

  synchronized(this) {
long startStallTime = 0;
while (mergeThreadCount() = 1+maxMergeCount) {
  startStallTime = System.currentTimeMillis();
  if (verbose()) {
message(too many merges; stalling...);
  }
  try {
wait();
  } catch (InterruptedException ie) {
throw new ThreadInterruptedException(ie);
  }
} 

While confusing, I think the code is actually nearly correct... but I
would love to find some simplifications of CMS's logic (it's really
hairy).

It turns out mergeThreadCount() is allowed to go higher than
maxThreadCount; when this happens, Lucene pauses
mergeThreadCount()-maxThreadCount of those merge threads, and resumes
them once threads finish (see updateMergeThreads).  Ie, CMS will
accept up to maxMergeCount merges (and launch threads for them), but
will only allow maxThreadCount of those threads to be running at once.

So what that while loop is doing is preventing more than
maxMergeCount+1 threads from starting, and then pausing the incoming
thread to slow down the rate of segment creation (since merging cannot
keep up).

But ... I think the 1+ is wrong ... it seems like it should just be
mergeThreadCount() = maxMergeCount().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4037) Continuous Ping query caused exception: java.util.concurrent.RejectedExecutionException


[ 
https://issues.apache.org/jira/browse/SOLR-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491485#comment-13491485
 ] 

Markus Jelsma commented on SOLR-4037:
-

We've also seen these exceptions when fireing normal queries _after_ restarting 
all nodes in a sequence. Clearing ZK data and restarting again is a quick fix.

{code}
java.util.concurrent.RejectedExecutionException
at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at 
java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:152)
at 
org.apache.solr.handler.component.HttpShardHandler.submit(HttpShardHandler.java:190)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
at 
org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

{code}

 Continuous Ping query caused exception: 
 java.util.concurrent.RejectedExecutionException
 ---

 Key: SOLR-4037
 URL: https://issues.apache.org/jira/browse/SOLR-4037
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
 Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 
 12:37:38
 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
Reporter: Markus Jelsma
 Fix For: 4.1, 5.0


 See: 
 http://lucene.472066.n3.nabble.com/Continuous-Ping-query-caused-exception-java-util-concurrent-RejectedExecutionException-td4017470.html
 Using this week's trunk we sometime see nodes entering a some funky state 
 where it continuously reports exceptions. Replication and query handling is 
 still possible but there is an increase in CPU time:
 {code}
 2012-11-01 09:24:28,337 INFO [solr.core.SolrCore] - [http-8080-exec-4] - : 
 [openindex_f] webapp=/solr path=/admin/ping params={} status=500 QTime=21
 2012-11-01 09:24:28,337 ERROR [solr.core.SolrCore] - [http-8080-exec-4] - : 
 org.apache.solr.common.SolrException: Ping query caused exception: 
 java.util.concurrent.RejectedExecutionException
 at 
 org.apache.solr.handler.PingRequestHandler.handlePing(PingRequestHandler.java:259)
 at 
 org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:207)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at

[jira] [Assigned] (LUCENE-4544) possible bug in ConcurrentMergeScheduler.merge(IndexWriter)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-4544:
--

Assignee: Michael McCandless

 possible bug in ConcurrentMergeScheduler.merge(IndexWriter) 
 

 Key: LUCENE-4544
 URL: https://issues.apache.org/jira/browse/LUCENE-4544
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 5.0
Reporter: Radim Kolar
Assignee: Michael McCandless

 from dev list:
 ¨i suspect that this code is broken. Lines 331 - 343 in 
 org.apache.lucene.index.ConcurrentMergeScheduler.merge(IndexWriter)
 mergeThreadCount() are currently active merges, they can be at most 
 maxThreadCount, maxMergeCount is number of queued merges defaulted with 
 maxThreadCount+2 and it can never be lower then maxThreadCount, which means 
 that condition in while can never become true.
   synchronized(this) {
 long startStallTime = 0;
 while (mergeThreadCount() = 1+maxMergeCount) {
   startStallTime = System.currentTimeMillis();
   if (verbose()) {
 message(too many merges; stalling...);
   }
   try {
 wait();
   } catch (InterruptedException ie) {
 throw new ThreadInterruptedException(ie);
   }
 } 
 While confusing, I think the code is actually nearly correct... but I
 would love to find some simplifications of CMS's logic (it's really
 hairy).
 It turns out mergeThreadCount() is allowed to go higher than
 maxThreadCount; when this happens, Lucene pauses
 mergeThreadCount()-maxThreadCount of those merge threads, and resumes
 them once threads finish (see updateMergeThreads).  Ie, CMS will
 accept up to maxMergeCount merges (and launch threads for them), but
 will only allow maxThreadCount of those threads to be running at once.
 So what that while loop is doing is preventing more than
 maxMergeCount+1 threads from starting, and then pausing the incoming
 thread to slow down the rate of segment creation (since merging cannot
 keep up).
 But ... I think the 1+ is wrong ... it seems like it should just be
 mergeThreadCount() = maxMergeCount().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-4538) Cache DocValues DirecSource


 [ 
https://issues.apache.org/jira/browse/LUCENE-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-4538:
---

Assignee: Simon Willnauer

 Cache DocValues DirecSource 
 

 Key: LUCENE-4538
 URL: https://issues.apache.org/jira/browse/LUCENE-4538
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.1, 5.0

 Attachments: LUCENE-4538.patch, LUCENE-4538.patch


 Currently the user need to make sure that a direct source is not shared 
 between threads and each time someone calls getDirectSource we create a new 
 source which has a reasonable overhead. We can certainly reduce the overhead 
 (maybe different issue) but it should be easier for the user to get a direct 
 source and handle it. More than that is should be consistent with getSource / 
 loadSource.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4538) Cache DocValues DirecSource


 [ 
https://issues.apache.org/jira/browse/LUCENE-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-4538.
-

Resolution: Fixed

committed to trunk in rev. 1406153
backported to 4.x in rev. 1406169

 Cache DocValues DirecSource 
 

 Key: LUCENE-4538
 URL: https://issues.apache.org/jira/browse/LUCENE-4538
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.1, 5.0

 Attachments: LUCENE-4538.patch, LUCENE-4538.patch


 Currently the user need to make sure that a direct source is not shared 
 between threads and each time someone calls getDirectSource we create a new 
 source which has a reasonable overhead. We can certainly reduce the overhead 
 (maybe different issue) but it should be easier for the user to get a direct 
 source and handle it. More than that is should be consistent with getSource / 
 loadSource.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4532) TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy failure

2012-11-06 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491511#comment-13491511
 ] 

Shai Erera commented on LUCENE-4532:


Thanks for the review.

bq. in TestDirectoryTaxonomyWriter.java, the error string index.create.time 
not found in commitData should be updated.

done (will upload an updated patch soon)

bq. if the index creation time is in the commit data, it will not be removed - 
as the epoch is added to whatever commit data was read from the index. I think 
perhaps it should be removed?

No quite. Every commit, DirTaxoWriter writes a new commitData, combining 
whatever commitData that is passed from the caller. But it does not merge it 
with the existing commitData. That's how IndexWriter works too, and it's the 
responsible of the caller to pass the commitData in every commit(), if he'd 
like to persist it.
But, DirTaxoReader does let you read the commitData and so it is possible that 
someone will obtain the commitData from DirTaxoReader (with the old property), 
add his stuff to it and pass that to DirTaxoWriter. I don't think that it's 
critical though, and I doubt if anyone does that.

bq. ...no test could check the migration from old to new methods...

Right, I'll add a test case.

bq. 'Epoch' is usually in the context of time

I don't think that it's critical. Version is problematic since Lucene already 
uses 'version' and 'generation'. I think that 'epoch' is fine, but if anyone 
has a better suggestion, I don't mind changing it.

 TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy failure
 

 Key: LUCENE-4532
 URL: https://issues.apache.org/jira/browse/LUCENE-4532
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-4532.patch


 The following failure on Jenkins:
 {noformat}
  Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/1404/
  Java: 32bit/jdk1.6.0_37 -client -XX:+UseConcMarkSweepGC
 
  1 tests failed.
  REGRESSION:  
  org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy
 
  Error Message:
 
 
  Stack Trace:
  java.lang.ArrayIndexOutOfBoundsException
  at 
  __randomizedtesting.SeedInfo.seed([6AB10D3E4E956CFA:BFB2863DB7E077E0]:0)
  at java.lang.System.arraycopy(Native Method)
  at 
  org.apache.lucene.facet.taxonomy.directory.ParentArray.refresh(ParentArray.java:99)
  at 
  org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyReader.refresh(DirectoryTaxonomyReader.java:407)
  at 
  org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.doTestReadRecreatedTaxono(TestDirectoryTaxonomyReader.java:167)
  at 
  org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy(TestDirectoryTaxonomyReader.java:130)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at 
  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
  at 
  org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
  at 
  org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
  at 
  org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
  at 
  com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
  at 
  org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
  at 
  org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
  at 
  org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
  at 
  com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at

[jira] [Commented] (LUCENE-4532) TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy failure

2012-11-06 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491517#comment-13491517
 ] 

Shai Erera commented on LUCENE-4532:


bq. ...no test could check the migration from old to new methods...

Actually there was such test !. TestDirTaxoWriter.testUndefinedCreateTime. I'll 
rename it to testBackwardsComptability though.

 TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy failure
 

 Key: LUCENE-4532
 URL: https://issues.apache.org/jira/browse/LUCENE-4532
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-4532.patch


 The following failure on Jenkins:
 {noformat}
  Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/1404/
  Java: 32bit/jdk1.6.0_37 -client -XX:+UseConcMarkSweepGC
 
  1 tests failed.
  REGRESSION:  
  org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy
 
  Error Message:
 
 
  Stack Trace:
  java.lang.ArrayIndexOutOfBoundsException
  at 
  __randomizedtesting.SeedInfo.seed([6AB10D3E4E956CFA:BFB2863DB7E077E0]:0)
  at java.lang.System.arraycopy(Native Method)
  at 
  org.apache.lucene.facet.taxonomy.directory.ParentArray.refresh(ParentArray.java:99)
  at 
  org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyReader.refresh(DirectoryTaxonomyReader.java:407)
  at 
  org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.doTestReadRecreatedTaxono(TestDirectoryTaxonomyReader.java:167)
  at 
  org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy(TestDirectoryTaxonomyReader.java:130)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at 
  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
  at 
  org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
  at 
  org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
  at 
  org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
  at 
  com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
  at 
  org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
  at 
  org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
  at 
  org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
  at 
  com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at 
  com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
  at 
  com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
  at 
  com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
  at 
  org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
  at 
  org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
  at 
  com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
  at

[jira] [Commented] (LUCENE-4539) DocValues impls should read all headers up-front instead of per-directsource


[ 
https://issues.apache.org/jira/browse/LUCENE-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491520#comment-13491520
 ] 

Adrien Grand commented on LUCENE-4539:
--

bq. Who is keeping up with all the places that must be managed when a packed 
ints version change needs to happen?

Sorry I was not clear: I didn't mean to remove the version number, just the 
codec name. I think the Lucene41 postings format is a good example: it never 
writes PackedInts in the stream, writes the PackedInts version at the 
beginning of the stream and may then serialize thousands arrays of 128 values 
with the number of bits per value as a byte in front of each of them.

 DocValues impls should read all headers up-front instead of per-directsource
 

 Key: LUCENE-4539
 URL: https://issues.apache.org/jira/browse/LUCENE-4539
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Robert Muir
 Attachments: LUCENE-4539.patch


 Currently, when DocValues opens, it just opens files. it doesnt read codec 
 headers etc.
 Instead we read these every single time a directsource opens. 
 I think it should work like PostingsReaders: e.g. the PackedInts impl would 
 read its versioning info and codec headers and creating a new Direct impl 
 should be a IndexInput.clone() + getDirectReaderNoHeader().
 Today its much more costly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4543) Bring back TFIDFSim.lengthNorm


 [ 
https://issues.apache.org/jira/browse/LUCENE-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4543:


Attachment: LUCENE-4543.patch

Here's the patch.

The api bug was introduced when sim was expanded to use norms beside a single 
byte: at query-time TFIDFSim is limited to single-byte norms (the code bits are 
final), but computeNorm is not final.

I'll commit soon.

 Bring back TFIDFSim.lengthNorm
 --

 Key: LUCENE-4543
 URL: https://issues.apache.org/jira/browse/LUCENE-4543
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4543.patch


 We removed this before because of LUCENE-2828,
 but the problem there was the delegator (not the lengthNorm method).
 TFIDFSim requires byte[] norms today. So its computeNorm should be final,
 calling lengthNorm() that returns a byte.
 This way there is no possibility for you to do something stupid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4543) Bring back TFIDFSim.lengthNorm


[ 
https://issues.apache.org/jira/browse/LUCENE-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491528#comment-13491528
 ] 

Michael McCandless commented on LUCENE-4543:


+1

 Bring back TFIDFSim.lengthNorm
 --

 Key: LUCENE-4543
 URL: https://issues.apache.org/jira/browse/LUCENE-4543
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4543.patch


 We removed this before because of LUCENE-2828,
 but the problem there was the delegator (not the lengthNorm method).
 TFIDFSim requires byte[] norms today. So its computeNorm should be final,
 calling lengthNorm() that returns a byte.
 This way there is no possibility for you to do something stupid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4538) Cache DocValues DirecSource


[ 
https://issues.apache.org/jira/browse/LUCENE-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491558#comment-13491558
 ] 

Robert Muir commented on LUCENE-4538:
-

Thanks Simon!

 Cache DocValues DirecSource 
 

 Key: LUCENE-4538
 URL: https://issues.apache.org/jira/browse/LUCENE-4538
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.1, 5.0

 Attachments: LUCENE-4538.patch, LUCENE-4538.patch


 Currently the user need to make sure that a direct source is not shared 
 between threads and each time someone calls getDirectSource we create a new 
 source which has a reasonable overhead. We can certainly reduce the overhead 
 (maybe different issue) but it should be easier for the user to get a direct 
 source and handle it. More than that is should be consistent with getSource / 
 loadSource.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4532) TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy failure

2012-11-06 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4532:
---

Attachment: LUCENE-4532.patch

Patch addresses the comments. For now, I kept the 'epoch' wording, unless 
there's another suggestion.

 TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy failure
 

 Key: LUCENE-4532
 URL: https://issues.apache.org/jira/browse/LUCENE-4532
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-4532.patch, LUCENE-4532.patch


 The following failure on Jenkins:
 {noformat}
  Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/1404/
  Java: 32bit/jdk1.6.0_37 -client -XX:+UseConcMarkSweepGC
 
  1 tests failed.
  REGRESSION:  
  org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy
 
  Error Message:
 
 
  Stack Trace:
  java.lang.ArrayIndexOutOfBoundsException
  at 
  __randomizedtesting.SeedInfo.seed([6AB10D3E4E956CFA:BFB2863DB7E077E0]:0)
  at java.lang.System.arraycopy(Native Method)
  at 
  org.apache.lucene.facet.taxonomy.directory.ParentArray.refresh(ParentArray.java:99)
  at 
  org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyReader.refresh(DirectoryTaxonomyReader.java:407)
  at 
  org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.doTestReadRecreatedTaxono(TestDirectoryTaxonomyReader.java:167)
  at 
  org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy(TestDirectoryTaxonomyReader.java:130)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at 
  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
  at 
  org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
  at 
  org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
  at 
  org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
  at 
  com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
  at 
  org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
  at 
  org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
  at 
  org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
  at 
  com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at 
  com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
  at 
  com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
  at 
  com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
  at 
  com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
  at 
  org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
  at 
  org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
  at 
  com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
  at 
  com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
  at

[jira] [Commented] (SOLR-3904) add package level javadocs to every package

2012-11-06 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491586#comment-13491586
 ] 

Hoss Man commented on SOLR-3904:


Progress...

Committed revision 1406204. - trunk
Committed revision 1406209. - 4x



 add package level javadocs to every package
 ---

 Key: SOLR-3904
 URL: https://issues.apache.org/jira/browse/SOLR-3904
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.1

 Attachments: SOLR-3904_buildxml.patch


 quoth rmuir on the mailing list...
 {quote}
 We've been working on this for the lucene side (3.6 was the first
 release where every package had docs, 4.0 will be the first where
 every class had docs, and we are now working towards
 methods/fields/ctors/enums).
 I think this would be valuable for solr too (especially solrj as a start).
 Besides users, its really useful to developers as well. Of course we
 all think our code is self-documenting, but its not always the case. a
 few extra seconds can save someone a ton of time trying to figure out
 your code.
 Additionally at least in my IDE, when things are done as javadoc
 comments then they are more easily accessible than code comments. I'm
 sure its the case for some other development environments too.
 Filling in these package.html's to at least have a one sentence
 description would be a really good start. It lets someone know where
 to go at the high-level.
 If I was brand new to solr and wanted to write a java app that uses
 solrj, i wouldn't have a clue where to start
 (https://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-solrj/index.html).
 12 sentences could go a really long way.
 And for all new code, I hope we can all try harder for more complete
 javadocs. when you are working on something and its fresh in your head
 its a lot easier to do this than for someone else to come back around
 and figure it out.
 {quote}
 I'm going to try and make it a priority for me to fill in package level docs 
 as we look towards 4.1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4039) MergeIndex on multiple cores impossible with SolrJ

2012-11-06 Thread Mathieu Gond (JIRA)

Mathieu Gond created SOLR-4039:
--

 Summary: MergeIndex on multiple cores impossible with SolrJ
 Key: SOLR-4039
 URL: https://issues.apache.org/jira/browse/SOLR-4039
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6.1
 Environment: Windows
Reporter: Mathieu Gond


It is not possible to do a mergeIndexes action on multiple cores at the same 
time with SolrJ.
Only the last core set in the srcCores parameter is used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1306) Support pluggable persistence/loading of solr.xml details


 [ 
https://issues.apache.org/jira/browse/SOLR-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-1306:
-

Attachment: (was: SOLR-1306.patch)

 Support pluggable persistence/loading of solr.xml details
 -

 Key: SOLR-1306
 URL: https://issues.apache.org/jira/browse/SOLR-1306
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Noble Paul
Assignee: Erick Erickson
 Fix For: 4.1

 Attachments: SOLR-1306.patch


 Persisting and loading details from one xml is fine if the no:of cores are 
 small and the no:of cores are few/fixed . If there are 10's of thousands of 
 cores in a single box adding a new core (with persistent=true) becomes very 
 expensive because every core creation has to write this huge xml. 
 Moreover , there is a good chance that the file gets corrupted and all the 
 cores become unusable . In that case I would prefer it to be stored in a 
 centralized DB which is backed up/replicated and all the information is 
 available in a centralized location. 
 We may need to refactor CoreContainer to have a pluggable implementation 
 which can load/persist the details . The default implementation should 
 write/read from/to solr.xml . And the class should be pluggable as follows in 
 solr.xml
 {code:xml}
 solr
   dataProvider class=com.foo.FooDataProvider attr1=val1 attr2=val2/
 /solr
 {code}
 There will be a new interface (or abstract class ) called SolrDataProvider 
 which this class must implement

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1306) Support pluggable persistence/loading of solr.xml details


 [ 
https://issues.apache.org/jira/browse/SOLR-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-1306:
-

Attachment: (was: SOLR-1306.patch)

 Support pluggable persistence/loading of solr.xml details
 -

 Key: SOLR-1306
 URL: https://issues.apache.org/jira/browse/SOLR-1306
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Noble Paul
Assignee: Erick Erickson
 Fix For: 4.1

 Attachments: SOLR-1306.patch


 Persisting and loading details from one xml is fine if the no:of cores are 
 small and the no:of cores are few/fixed . If there are 10's of thousands of 
 cores in a single box adding a new core (with persistent=true) becomes very 
 expensive because every core creation has to write this huge xml. 
 Moreover , there is a good chance that the file gets corrupted and all the 
 cores become unusable . In that case I would prefer it to be stored in a 
 centralized DB which is backed up/replicated and all the information is 
 available in a centralized location. 
 We may need to refactor CoreContainer to have a pluggable implementation 
 which can load/persist the details . The default implementation should 
 write/read from/to solr.xml . And the class should be pluggable as follows in 
 solr.xml
 {code:xml}
 solr
   dataProvider class=com.foo.FooDataProvider attr1=val1 attr2=val2/
 /solr
 {code}
 There will be a new interface (or abstract class ) called SolrDataProvider 
 which this class must implement

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1306) Support pluggable persistence/loading of solr.xml details


 [ 
https://issues.apache.org/jira/browse/SOLR-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-1306:
-

Attachment: SOLR-1306.patch

took out some extraneous crud that made it into the last patch.

When creating a custom core descriptor, the following changes need to be made 
to solr.xml:
1 add a sharedLib directive to the solr tag to a directory containing the 
your custom jar
2 add coreDescriptorProviderClass to the cores tag. Here's an example:

solr persistent=false sharedLib=../../../../../your/path/here/

cores [all the other opts] 
coreDescriptorProviderClass=your.company.TheCoreDescriptorProvider /

 Support pluggable persistence/loading of solr.xml details
 -

 Key: SOLR-1306
 URL: https://issues.apache.org/jira/browse/SOLR-1306
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Noble Paul
Assignee: Erick Erickson
 Fix For: 4.1

 Attachments: SOLR-1306.patch, SOLR-1306.patch


 Persisting and loading details from one xml is fine if the no:of cores are 
 small and the no:of cores are few/fixed . If there are 10's of thousands of 
 cores in a single box adding a new core (with persistent=true) becomes very 
 expensive because every core creation has to write this huge xml. 
 Moreover , there is a good chance that the file gets corrupted and all the 
 cores become unusable . In that case I would prefer it to be stored in a 
 centralized DB which is backed up/replicated and all the information is 
 available in a centralized location. 
 We may need to refactor CoreContainer to have a pluggable implementation 
 which can load/persist the details . The default implementation should 
 write/read from/to solr.xml . And the class should be pluggable as follows in 
 solr.xml
 {code:xml}
 solr
   dataProvider class=com.foo.FooDataProvider attr1=val1 attr2=val2/
 /solr
 {code}
 There will be a new interface (or abstract class ) called SolrDataProvider 
 which this class must implement

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3856) DIH: Better tests for SqlEntityProcessor


 [ 
https://issues.apache.org/jira/browse/SOLR-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-3856:
-

Fix Version/s: 5.0

 DIH: Better tests for SqlEntityProcessor
 

 Key: SOLR-3856
 URL: https://issues.apache.org/jira/browse/SOLR-3856
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 3.6, 4.0
Reporter: James Dyer
Assignee: James Dyer
 Fix For: 4.1, 5.0

 Attachments: SOLR-3856-3.5.patch, SOLR-3856.patch, SOLR-3856.patch, 
 SOLR-3856.patch


 The current tests for SqlEntityProcessor ( CachedSqlEntityProcessor), while 
 many, do not reliably fail when bugs are introduced!  They are also difficult 
 to look at and understand.  As we move Jenkins onto new environments, we have 
 found several of them fail regularly leading to @Ignore.  
 My aim here is to write all new tests for (Cached)SqlEntityProcessor, and to 
 document (hopefully fix) any bugs this reveals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4040) SolrCloud deleteByQuery requires multiple commits

2012-11-06 Thread Darin Plutchok (JIRA)

Darin Plutchok created SOLR-4040:


 Summary: SolrCloud deleteByQuery requires multiple commits
 Key: SOLR-4040
 URL: https://issues.apache.org/jira/browse/SOLR-4040
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 4.0
 Environment: OSX
Reporter: Darin Plutchok
 Fix For: 4.0


I am using embedded zookeeper and my cloud layout is show below (all actions 
are done on the patents' collection only).

First commit/delete works for a single shard only, dropping query results by 
about a third. Second commit/delete drops query results to zero.

http://127.0.0.1:8893/solr/patents/update?commit=truestream.body=deletequerydogs/query/delete
http://localhost:8893/solr/patents/select?q=dogsrows=0 (results drop by a 
third)
http://127.0.0.1:8893/solr/patents/update?commit=truestream.body=deletequerydogs/query/delete
http://localhost:8893/solr/patents/select?q=dogsrows=0 (results drop to zero)

Note that a delete without a commit followed by a commit drops query results to 
zero, as it should:

http://127.0.0.1:8893/solr/patents/update/?stream.body=deletequerydogs/query/delete
http://localhost:8893/solr/patents/select?q=dogsrows=0 (full count as no 
commit yet)
http://127.0.0.1:8893/solr/patents/update/?commit=true
http://localhost:8893/solr/patents/select?q=dogsrows=0   (results drop to zero)

One workaround (produces zero hits in one shot):

http://127.0.0.1:8893/solr/patents/update?commit=truestream.body=outerdeletequerysun/query/deletecommit//outer

The workaround I am using for now (produces zero hits in one shot):

http://127.0.0.1:8893/solr/patents/update?stream.body=outerdeletequeryknee/query/deletecommit/commit//outer

{code}
{
  
otherdocs:{slice0:{replicas:{Darins-MacBook-Pro.local:8893_solr_otherdocs_shard0:{
  shard:slice0,
  roles:null,
  state:active,
  core:otherdocs_shard0,
  collection:otherdocs,
  node_name:Darins-MacBook-Pro.local:8893_solr,
  base_url:http://Darins-MacBook-Pro.local:8893/solr;,
  leader:true,
  patents:{
slice0:{replicas:{Darins-MacBook-Pro.local:8893_solr_patents_shard0:{
  shard:slice0,
  roles:null,
  state:active,
  core:patents_shard0,
  collection:patents,
  node_name:Darins-MacBook-Pro.local:8893_solr,
  base_url:http://Darins-MacBook-Pro.local:8893/solr;,
  leader:true}}},
slice1:{replicas:{Darins-MacBook-Pro.local:8893_solr_patents_shard1:{
  shard:slice1,
  roles:null,
  state:active,
  core:patents_shard1,
  collection:patents,
  node_name:Darins-MacBook-Pro.local:8893_solr,
  base_url:http://Darins-MacBook-Pro.local:8893/solr;,
  leader:true}}},
slice2:{replicas:{Darins-MacBook-Pro.local:8893_solr_patents_shard2:{
  shard:slice2,
  roles:null,
  state:active,
  core:patents_shard2,
  collection:patents,
  node_name:Darins-MacBook-Pro.local:8893_solr,
  base_url:http://Darins-MacBook-Pro.local:8893/solr;,
  leader:true}
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-3856) DIH: Better tests for SqlEntityProcessor


 [ 
https://issues.apache.org/jira/browse/SOLR-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer resolved SOLR-3856.
--

Resolution: Fixed

committed.

Trunk: r1406231 (CHANGES.txt: r1406245)
4x: r1406246

 DIH: Better tests for SqlEntityProcessor
 

 Key: SOLR-3856
 URL: https://issues.apache.org/jira/browse/SOLR-3856
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 3.6, 4.0
Reporter: James Dyer
Assignee: James Dyer
 Fix For: 4.1, 5.0

 Attachments: SOLR-3856-3.5.patch, SOLR-3856.patch, SOLR-3856.patch, 
 SOLR-3856.patch


 The current tests for SqlEntityProcessor ( CachedSqlEntityProcessor), while 
 many, do not reliably fail when bugs are introduced!  They are also difficult 
 to look at and understand.  As we move Jenkins onto new environments, we have 
 found several of them fail regularly leading to @Ignore.  
 My aim here is to write all new tests for (Cached)SqlEntityProcessor, and to 
 document (hopefully fix) any bugs this reveals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4544) possible bug in ConcurrentMergeScheduler.merge(IndexWriter)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4544:
---

Attachment: LUCENE-4544.patch

Patch.

It was somewhat tricky to fix the off-by-one, because we only want to stall the 
thread(s) producing segments if the number of running merges is = 
maxMergeCount AND another merge wants to kick off ... I made CMS.merged sync'd, 
and removed the synchronous IW.mergeInit (I think deterministic segment name 
assignment isn't important).

 possible bug in ConcurrentMergeScheduler.merge(IndexWriter) 
 

 Key: LUCENE-4544
 URL: https://issues.apache.org/jira/browse/LUCENE-4544
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 5.0
Reporter: Radim Kolar
Assignee: Michael McCandless
 Attachments: LUCENE-4544.patch


 from dev list:
 ¨i suspect that this code is broken. Lines 331 - 343 in 
 org.apache.lucene.index.ConcurrentMergeScheduler.merge(IndexWriter)
 mergeThreadCount() are currently active merges, they can be at most 
 maxThreadCount, maxMergeCount is number of queued merges defaulted with 
 maxThreadCount+2 and it can never be lower then maxThreadCount, which means 
 that condition in while can never become true.
   synchronized(this) {
 long startStallTime = 0;
 while (mergeThreadCount() = 1+maxMergeCount) {
   startStallTime = System.currentTimeMillis();
   if (verbose()) {
 message(too many merges; stalling...);
   }
   try {
 wait();
   } catch (InterruptedException ie) {
 throw new ThreadInterruptedException(ie);
   }
 } 
 While confusing, I think the code is actually nearly correct... but I
 would love to find some simplifications of CMS's logic (it's really
 hairy).
 It turns out mergeThreadCount() is allowed to go higher than
 maxThreadCount; when this happens, Lucene pauses
 mergeThreadCount()-maxThreadCount of those merge threads, and resumes
 them once threads finish (see updateMergeThreads).  Ie, CMS will
 accept up to maxMergeCount merges (and launch threads for them), but
 will only allow maxThreadCount of those threads to be running at once.
 So what that while loop is doing is preventing more than
 maxMergeCount+1 threads from starting, and then pausing the incoming
 thread to slow down the rate of segment creation (since merging cannot
 keep up).
 But ... I think the 1+ is wrong ... it seems like it should just be
 mergeThreadCount() = maxMergeCount().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2045) DIH doesn't release jdbc connections in conjunction with DB2


 [ 
https://issues.apache.org/jira/browse/SOLR-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-2045:
-

Affects Version/s: 3.6
   4.0
Fix Version/s: 5.0
   4.1
 Assignee: James Dyer

 DIH doesn't release jdbc connections in conjunction with DB2 
 -

 Key: SOLR-2045
 URL: https://issues.apache.org/jira/browse/SOLR-2045
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.4.1, 3.6, 4.0
 Environment: DB2 SQLLIB 9.5, 9.7 jdbc Driver
Reporter: Fenlor Sebastia
Assignee: James Dyer
 Fix For: 4.1, 5.0


 Using the JDBCDatasource in conjunction with the DB2 JDBC Drivers results in 
 the following error when the DIH tries to close the the connection due to 
 active transactions. As a consequence each delta im port or full import opens 
 a new connection without closing it. So the maximum amount of connections 
 will be reached soon. Setting the connection to readOnly or changing the 
 transaction isolation level doesn't help neither.
 The JDBC Driver I used: com.ibm.db2.jcc.DB2Driver relieing in db2jcc4.jar 
 shipped with DB2 Express 9.7 for example
 Here is the stack trace...
 14.08.2010 01:49:51 org.apache.solr.handler.dataimport.JdbcDataSource 
 closeConnection
 FATAL: Ignoring Error when closing connection
 com.ibm.db2.jcc.am.SqlException: [jcc][10251][10308][4.8.87] 
 java.sql.Connection.close() requested while a transaction is in progress on 
 the connection.The transaction remains active, and the connection cannot be 
 closed. ERRORCODE=-4471, SQLSTATE=null
   at com.ibm.db2.jcc.am.gd.a(gd.java:660)
   at com.ibm.db2.jcc.am.gd.a(gd.java:60)
   at com.ibm.db2.jcc.am.gd.a(gd.java:120)
   at com.ibm.db2.jcc.am.lb.u(lb.java:1202)
   at com.ibm.db2.jcc.am.lb.x(lb.java:1225)
   at com.ibm.db2.jcc.am.lb.v(lb.java:1211)
   at com.ibm.db2.jcc.am.lb.close(lb.java:1195)
   at com.ibm.db2.jcc.uw.UWConnection.close(UWConnection.java:838)
   at 
 org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:399)
   at 
 org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:390)
   at 
 org.apache.solr.handler.dataimport.DataConfig$Entity.clearCache(DataConfig.java:173)
   at 
 org.apache.solr.handler.dataimport.DataConfig.clearCaches(DataConfig.java:331)
   at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:339)
   at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
   at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
 Well the issue can be solved by invoking a commit or rollback directly before 
 the connection.close() statement. Here is the code snipped of changes I made 
 in JdbcDatasource.java
   private void closeConnection()  {
 try {
   if (conn != null) {
   if (conn.isReadOnly())
   {
   LOG.info(connection is readonly, therefore rollback);
   conn.rollback();
   } else
   {
   LOG.info(connection is not readonly, therefore 
 commit);
   conn.commit();
   }
 
 conn.close();
   }
 } catch (Exception e) {
   LOG.error(Ignoring Error when closing connection, e);
 }
   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2045) DIH doesn't release jdbc connections in conjunction with DB2


 [ 
https://issues.apache.org/jira/browse/SOLR-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-2045:
-

Attachment: SOLR-2045.patch

This patch fixes the problem by issuing a commit before closing the 
connection, as suggested by Fenior.  

I added Derby as a randomly-selected test db to have coverage for this bug.  As 
derby is only needed for testing, I configured Ivy to locate the derby jar in 
the same directory as the hsqldb jar, under the dih example.  

I also added the 2 db jars to the Eclipse dot.classpath and to the Idea config 
files so that you can easily run these tests from either ide. (this is my first 
exposure to Idea but with all the good words I've heard on this mailing list I 
thought this a good time to try it out...)

I plan on committing this patch tomorrow.

 DIH doesn't release jdbc connections in conjunction with DB2 
 -

 Key: SOLR-2045
 URL: https://issues.apache.org/jira/browse/SOLR-2045
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.4.1, 3.6, 4.0
 Environment: DB2 SQLLIB 9.5, 9.7 jdbc Driver
Reporter: Fenlor Sebastia
Assignee: James Dyer
 Fix For: 4.1, 5.0

 Attachments: SOLR-2045.patch


 Using the JDBCDatasource in conjunction with the DB2 JDBC Drivers results in 
 the following error when the DIH tries to close the the connection due to 
 active transactions. As a consequence each delta im port or full import opens 
 a new connection without closing it. So the maximum amount of connections 
 will be reached soon. Setting the connection to readOnly or changing the 
 transaction isolation level doesn't help neither.
 The JDBC Driver I used: com.ibm.db2.jcc.DB2Driver relieing in db2jcc4.jar 
 shipped with DB2 Express 9.7 for example
 Here is the stack trace...
 14.08.2010 01:49:51 org.apache.solr.handler.dataimport.JdbcDataSource 
 closeConnection
 FATAL: Ignoring Error when closing connection
 com.ibm.db2.jcc.am.SqlException: [jcc][10251][10308][4.8.87] 
 java.sql.Connection.close() requested while a transaction is in progress on 
 the connection.The transaction remains active, and the connection cannot be 
 closed. ERRORCODE=-4471, SQLSTATE=null
   at com.ibm.db2.jcc.am.gd.a(gd.java:660)
   at com.ibm.db2.jcc.am.gd.a(gd.java:60)
   at com.ibm.db2.jcc.am.gd.a(gd.java:120)
   at com.ibm.db2.jcc.am.lb.u(lb.java:1202)
   at com.ibm.db2.jcc.am.lb.x(lb.java:1225)
   at com.ibm.db2.jcc.am.lb.v(lb.java:1211)
   at com.ibm.db2.jcc.am.lb.close(lb.java:1195)
   at com.ibm.db2.jcc.uw.UWConnection.close(UWConnection.java:838)
   at 
 org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:399)
   at 
 org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:390)
   at 
 org.apache.solr.handler.dataimport.DataConfig$Entity.clearCache(DataConfig.java:173)
   at 
 org.apache.solr.handler.dataimport.DataConfig.clearCaches(DataConfig.java:331)
   at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:339)
   at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
   at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
 Well the issue can be solved by invoking a commit or rollback directly before 
 the connection.close() statement. Here is the code snipped of changes I made 
 in JdbcDatasource.java
   private void closeConnection()  {
 try {
   if (conn != null) {
   if (conn.isReadOnly())
   {
   LOG.info(connection is readonly, therefore rollback);
   conn.rollback();
   } else
   {
   LOG.info(connection is not readonly, therefore 
 commit);
   conn.commit();
   }
 
 conn.close();
   }
 } catch (Exception e) {
   LOG.error(Ignoring Error when closing connection, e);
 }
   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4544) possible bug in ConcurrentMergeScheduler.merge(IndexWriter)


[ 
https://issues.apache.org/jira/browse/LUCENE-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491800#comment-13491800
 ] 

Radim Kolar commented on LUCENE-4544:
-

it can not be made simple by not creating thread for every new segment merge 
but to use standard threadpool + queue scheme? Lot of libraries can do it 
easily.

 possible bug in ConcurrentMergeScheduler.merge(IndexWriter) 
 

 Key: LUCENE-4544
 URL: https://issues.apache.org/jira/browse/LUCENE-4544
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 5.0
Reporter: Radim Kolar
Assignee: Michael McCandless
 Attachments: LUCENE-4544.patch


 from dev list:
 ¨i suspect that this code is broken. Lines 331 - 343 in 
 org.apache.lucene.index.ConcurrentMergeScheduler.merge(IndexWriter)
 mergeThreadCount() are currently active merges, they can be at most 
 maxThreadCount, maxMergeCount is number of queued merges defaulted with 
 maxThreadCount+2 and it can never be lower then maxThreadCount, which means 
 that condition in while can never become true.
   synchronized(this) {
 long startStallTime = 0;
 while (mergeThreadCount() = 1+maxMergeCount) {
   startStallTime = System.currentTimeMillis();
   if (verbose()) {
 message(too many merges; stalling...);
   }
   try {
 wait();
   } catch (InterruptedException ie) {
 throw new ThreadInterruptedException(ie);
   }
 } 
 While confusing, I think the code is actually nearly correct... but I
 would love to find some simplifications of CMS's logic (it's really
 hairy).
 It turns out mergeThreadCount() is allowed to go higher than
 maxThreadCount; when this happens, Lucene pauses
 mergeThreadCount()-maxThreadCount of those merge threads, and resumes
 them once threads finish (see updateMergeThreads).  Ie, CMS will
 accept up to maxMergeCount merges (and launch threads for them), but
 will only allow maxThreadCount of those threads to be running at once.
 So what that while loop is doing is preventing more than
 maxMergeCount+1 threads from starting, and then pausing the incoming
 thread to slow down the rate of segment creation (since merging cannot
 keep up).
 But ... I think the 1+ is wrong ... it seems like it should just be
 mergeThreadCount() = maxMergeCount().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable

2012-11-06 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491845#comment-13491845
 ] 

Chris Male commented on LUCENE-4542:


+1 I absolutely agree we need to make this change.  There is another issue (I 
can't remember what just yet and I'm using a bad connection) where the 
recursion cap was causing analysis loops.  

Do you want to create a patch? We need to maintain backwards compatibility so 
the default experience should be using RECURSION_CAP as it is today.  However 
users should be able to pass in a value as well (that includes the 
HunspellStemFilterFactory).

 Make RECURSION_CAP in HunspellStemmer configurable
 --

 Key: LUCENE-4542
 URL: https://issues.apache.org/jira/browse/LUCENE-4542
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Piotr

 Currently there is 
 private static final int RECURSION_CAP = 2;
 in the code of the class HunspellStemmer. It makes using hunspell with 
 several dictionaries almost unusable, due to bad performance (f.ex. it costs 
 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for 
 recursion_cap=1). It would be nice to be able to tune this number as needed.
 AFAIK this number (2) was chosen arbitrary.
 (it's a first issue in my life, so please forgive me any mistakes done).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable

2012-11-06 Thread Chris Male (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male reassigned LUCENE-4542:
--

Assignee: Chris Male

 Make RECURSION_CAP in HunspellStemmer configurable
 --

 Key: LUCENE-4542
 URL: https://issues.apache.org/jira/browse/LUCENE-4542
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Piotr
Assignee: Chris Male

 Currently there is 
 private static final int RECURSION_CAP = 2;
 in the code of the class HunspellStemmer. It makes using hunspell with 
 several dictionaries almost unusable, due to bad performance (f.ex. it costs 
 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for 
 recursion_cap=1). It would be nice to be able to tune this number as needed.
 AFAIK this number (2) was chosen arbitrary.
 (it's a first issue in my life, so please forgive me any mistakes done).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4543) Bring back TFIDFSim.lengthNorm


[ 
https://issues.apache.org/jira/browse/LUCENE-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491890#comment-13491890
 ] 

Simon Willnauer commented on LUCENE-4543:
-

+1 thanks robert!

 Bring back TFIDFSim.lengthNorm
 --

 Key: LUCENE-4543
 URL: https://issues.apache.org/jira/browse/LUCENE-4543
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4543.patch


 We removed this before because of LUCENE-2828,
 but the problem there was the delegator (not the lengthNorm method).
 TFIDFSim requires byte[] norms today. So its computeNorm should be final,
 calling lengthNorm() that returns a byte.
 This way there is no possibility for you to do something stupid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token

2012-11-06 Thread Tom Burton-West (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Burton-West updated SOLR-3589:
--

Attachment: SOLR-3589.patch

Back-port to 3.6 branch

 Edismax parser does not honor mm parameter if analyzer splits a token
 -

 Key: SOLR-3589
 URL: https://issues.apache.org/jira/browse/SOLR-3589
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.6, 4.0-BETA
Reporter: Tom Burton-West
Assignee: Robert Muir
 Attachments: SOLR-3589.patch, SOLR-3589.patch, SOLR-3589.patch, 
 SOLR-3589.patch, SOLR-3589.patch, SOLR-3589_test.patch, testSolr3589.xml.gz, 
 testSolr3589.xml.gz


 With edismax mm set to 100%  if one of the tokens is split into two tokens by 
 the analyzer chain (i.e. fire-fly  = fire fly), the mm parameter is 
 ignored and the equivalent of  OR query for fire OR fly is produced.
 This is particularly a problem for languages that do not use white space to 
 separate words such as Chinese or Japenese.
 See these messages for more discussion:
 http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-hypenated-words-WDF-splitting-etc-tc3991911.html
 http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-i-e-CJK-tc3991438.html
 http://lucene.472066.n3.nabble.com/Why-won-t-dismax-create-multiple-DisjunctionMaxQueries-when-autoGeneratePhraseQueries-is-false-tc3992109.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token

2012-11-06 Thread Tom Burton-West (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491922#comment-13491922
 ] 

Tom Burton-West commented on SOLR-3589:
---

I back-ported to 3.6 branch.  Forgot to change the name from SOLR-3589.patch, 
so the 6/Nov/12 patch is the 3.6 patch against yesterdays svn version of 3.6.

Main difference I saw between 3.6 and 4.0 is that Solr 4.0 uses 
DisMaxQParser.parseMinShouldMatch() to set the default at 0% if q.op=OR and 
%100 if q.op =AND

I just kept the 3.6 behavior which uses 3.6 default of 100% (if mm is not set)

I'll test the 3.6 patch against a production index tomorrow.
 



 Edismax parser does not honor mm parameter if analyzer splits a token
 -

 Key: SOLR-3589
 URL: https://issues.apache.org/jira/browse/SOLR-3589
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.6, 4.0-BETA
Reporter: Tom Burton-West
Assignee: Robert Muir
 Attachments: SOLR-3589.patch, SOLR-3589.patch, SOLR-3589.patch, 
 SOLR-3589.patch, SOLR-3589.patch, SOLR-3589_test.patch, testSolr3589.xml.gz, 
 testSolr3589.xml.gz


 With edismax mm set to 100%  if one of the tokens is split into two tokens by 
 the analyzer chain (i.e. fire-fly  = fire fly), the mm parameter is 
 ignored and the equivalent of  OR query for fire OR fly is produced.
 This is particularly a problem for languages that do not use white space to 
 separate words such as Chinese or Japenese.
 See these messages for more discussion:
 http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-hypenated-words-WDF-splitting-etc-tc3991911.html
 http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-i-e-CJK-tc3991438.html
 http://lucene.472066.n3.nabble.com/Why-won-t-dismax-create-multiple-DisjunctionMaxQueries-when-autoGeneratePhraseQueries-is-false-tc3992109.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4041) Allow segment merge monitoring in Solr Admin gui

Radim Kolar created SOLR-4041:
-

 Summary: Allow segment merge monitoring in Solr Admin gui
 Key: SOLR-4041
 URL: https://issues.apache.org/jira/browse/SOLR-4041
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 5.0
Reporter: Radim Kolar


add solrMbean for ConcurrentMergeScheduler

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4041) Allow segment merge monitoring in Solr Admin gui


 [ 
https://issues.apache.org/jira/browse/SOLR-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Radim Kolar updated SOLR-4041:
--

Attachment: solr-monitormerge.txt

 Allow segment merge monitoring in Solr Admin gui
 

 Key: SOLR-4041
 URL: https://issues.apache.org/jira/browse/SOLR-4041
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 5.0
Reporter: Radim Kolar
  Labels: patch
 Attachments: solr-monitormerge.txt


 add solrMbean for ConcurrentMergeScheduler

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4041) Allow segment merge monitoring in Solr Admin gui

2012-11-06 Thread Otis Gospodnetic (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491979#comment-13491979
 ] 

Otis Gospodnetic commented on SOLR-4041:


This looks nice and useful.  Any reason this shouldn't be in 4.1?


 Allow segment merge monitoring in Solr Admin gui
 

 Key: SOLR-4041
 URL: https://issues.apache.org/jira/browse/SOLR-4041
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 5.0
Reporter: Radim Kolar
  Labels: patch
 Fix For: 4.1

 Attachments: solr-monitormerge.txt


 add solrMbean for ConcurrentMergeScheduler

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4041) Allow segment merge monitoring in Solr Admin gui

2012-11-06 Thread Otis Gospodnetic (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated SOLR-4041:
---

Fix Version/s: 4.1

 Allow segment merge monitoring in Solr Admin gui
 

 Key: SOLR-4041
 URL: https://issues.apache.org/jira/browse/SOLR-4041
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 5.0
Reporter: Radim Kolar
  Labels: patch
 Fix For: 4.1

 Attachments: solr-monitormerge.txt


 add solrMbean for ConcurrentMergeScheduler

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4041) Allow segment merge monitoring in Solr Admin gui


[ 
https://issues.apache.org/jira/browse/SOLR-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491980#comment-13491980
 ] 

Radim Kolar commented on SOLR-4041:
---

it could be

 Allow segment merge monitoring in Solr Admin gui
 

 Key: SOLR-4041
 URL: https://issues.apache.org/jira/browse/SOLR-4041
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 5.0
Reporter: Radim Kolar
  Labels: patch
 Fix For: 4.1

 Attachments: solr-monitormerge.txt


 add solrMbean for ConcurrentMergeScheduler

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4536) Make PackedInts byte-aligned?


 [ 
https://issues.apache.org/jira/browse/LUCENE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-4536:
-

Attachment: LUCENE-4536.patch

New patch including Mike's suggestions:
 - VERSION_LONG_ALIGNED renamed to VERSION_START,
 - VERSION_CURRENT aliased to VERSION_BYTE_ALIGNED,
 - added an assert that the sneaky direct reader impl can only be instantiated 
if the stream has been produced with VERSION_START.

 Make PackedInts byte-aligned?
 -

 Key: LUCENE-4536
 URL: https://issues.apache.org/jira/browse/LUCENE-4536
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.1

 Attachments: LUCENE-4536.patch, LUCENE-4536.patch


 PackedInts are more and more used to save/restore small arrays, but given 
 that they are long-aligned, up to 63 bits are wasted per array. We should try 
 to make PackedInts storage byte-aligned so that only 7 bits are wasted in the 
 worst case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4030) Use Lucene segment merge throttling


 [ 
https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Radim Kolar updated SOLR-4030:
--

Fix Version/s: 4.1

 Use Lucene segment merge throttling
 ---

 Key: SOLR-4030
 URL: https://issues.apache.org/jira/browse/SOLR-4030
 Project: Solr
  Issue Type: Improvement
Affects Versions: 5.0
Reporter: Radim Kolar
  Labels: patch
 Fix For: 4.1

 Attachments: solr-mergeratelimit.txt


 add argument maxMergeWriteMBPerSec to Solr directory factories.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1306) Support pluggable persistence/loading of solr.xml details


 [ 
https://issues.apache.org/jira/browse/SOLR-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-1306:
-

Attachment: SOLR-1306.patch

Fix for problem when no CoreDescriptorProvider was supplied but a bunch of 
cores were specified as loadOnStartup=false. CoreContainer.getCoreNames was 
not returning these cores.

 Support pluggable persistence/loading of solr.xml details
 -

 Key: SOLR-1306
 URL: https://issues.apache.org/jira/browse/SOLR-1306
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Noble Paul
Assignee: Erick Erickson
 Fix For: 4.1

 Attachments: SOLR-1306.patch, SOLR-1306.patch, SOLR-1306.patch


 Persisting and loading details from one xml is fine if the no:of cores are 
 small and the no:of cores are few/fixed . If there are 10's of thousands of 
 cores in a single box adding a new core (with persistent=true) becomes very 
 expensive because every core creation has to write this huge xml. 
 Moreover , there is a good chance that the file gets corrupted and all the 
 cores become unusable . In that case I would prefer it to be stored in a 
 centralized DB which is backed up/replicated and all the information is 
 available in a centralized location. 
 We may need to refactor CoreContainer to have a pluggable implementation 
 which can load/persist the details . The default implementation should 
 write/read from/to solr.xml . And the class should be pluggable as follows in 
 solr.xml
 {code:xml}
 solr
   dataProvider class=com.foo.FooDataProvider attr1=val1 attr2=val2/
 /solr
 {code}
 There will be a new interface (or abstract class ) called SolrDataProvider 
 which this class must implement

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3904) add package level javadocs to every package


[ 
https://issues.apache.org/jira/browse/SOLR-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492027#comment-13492027
 ] 

Robert Muir commented on SOLR-3904:
---

Thanks for starting this!

 add package level javadocs to every package
 ---

 Key: SOLR-3904
 URL: https://issues.apache.org/jira/browse/SOLR-3904
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.1

 Attachments: SOLR-3904_buildxml.patch


 quoth rmuir on the mailing list...
 {quote}
 We've been working on this for the lucene side (3.6 was the first
 release where every package had docs, 4.0 will be the first where
 every class had docs, and we are now working towards
 methods/fields/ctors/enums).
 I think this would be valuable for solr too (especially solrj as a start).
 Besides users, its really useful to developers as well. Of course we
 all think our code is self-documenting, but its not always the case. a
 few extra seconds can save someone a ton of time trying to figure out
 your code.
 Additionally at least in my IDE, when things are done as javadoc
 comments then they are more easily accessible than code comments. I'm
 sure its the case for some other development environments too.
 Filling in these package.html's to at least have a one sentence
 description would be a really good start. It lets someone know where
 to go at the high-level.
 If I was brand new to solr and wanted to write a java app that uses
 solrj, i wouldn't have a clue where to start
 (https://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-solrj/index.html).
 12 sentences could go a really long way.
 And for all new code, I hope we can all try harder for more complete
 javadocs. when you are working on something and its fresh in your head
 its a lot easier to do this than for someone else to come back around
 and figure it out.
 {quote}
 I'm going to try and make it a priority for me to fill in package level docs 
 as we look towards 4.1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token


[ 
https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492048#comment-13492048
 ] 

Robert Muir commented on SOLR-3589:
---

Hi Tom: thanks for working on the 3.6 backport!

I'll commit the trunk/4.x patch first, and wait for your testing and review 
your patch before looking at 3.6!

 Edismax parser does not honor mm parameter if analyzer splits a token
 -

 Key: SOLR-3589
 URL: https://issues.apache.org/jira/browse/SOLR-3589
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.6, 4.0-BETA
Reporter: Tom Burton-West
Assignee: Robert Muir
 Attachments: SOLR-3589.patch, SOLR-3589.patch, SOLR-3589.patch, 
 SOLR-3589.patch, SOLR-3589.patch, SOLR-3589_test.patch, testSolr3589.xml.gz, 
 testSolr3589.xml.gz


 With edismax mm set to 100%  if one of the tokens is split into two tokens by 
 the analyzer chain (i.e. fire-fly  = fire fly), the mm parameter is 
 ignored and the equivalent of  OR query for fire OR fly is produced.
 This is particularly a problem for languages that do not use white space to 
 separate words such as Chinese or Japenese.
 See these messages for more discussion:
 http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-hypenated-words-WDF-splitting-etc-tc3991911.html
 http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-i-e-CJK-tc3991438.html
 http://lucene.472066.n3.nabble.com/Why-won-t-dismax-create-multiple-DisjunctionMaxQueries-when-autoGeneratePhraseQueries-is-false-tc3992109.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4543) Bring back TFIDFSim.lengthNorm


 [ 
https://issues.apache.org/jira/browse/LUCENE-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4543.
-

   Resolution: Fixed
Fix Version/s: 5.0
   4.1

 Bring back TFIDFSim.lengthNorm
 --

 Key: LUCENE-4543
 URL: https://issues.apache.org/jira/browse/LUCENE-4543
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.1, 5.0

 Attachments: LUCENE-4543.patch


 We removed this before because of LUCENE-2828,
 but the problem there was the delegator (not the lengthNorm method).
 TFIDFSim requires byte[] norms today. So its computeNorm should be final,
 calling lengthNorm() that returns a byte.
 This way there is no possibility for you to do something stupid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4040) SolrCloud deleteByQuery requires multiple commits

2012-11-06 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492082#comment-13492082
 ] 

David Smiley commented on SOLR-4040:


(I'm working with Darin on this)
To add a little context, there is no soft committing, unlike the default config 
(if I recall).  There might be an auto-commit enabled but it has a large 
~10minute time window so it isn't going to play a role with this bug.  
Committing is generally explicit.

 SolrCloud deleteByQuery requires multiple commits
 -

 Key: SOLR-4040
 URL: https://issues.apache.org/jira/browse/SOLR-4040
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 4.0
 Environment: OSX
Reporter: Darin Plutchok
  Labels: SolrCloud, commit, delete
 Fix For: 4.0


 I am using embedded zookeeper and my cloud layout is show below (all actions 
 are done on the patents' collection only).
 First commit/delete works for a single shard only, dropping query results by 
 about a third. Second commit/delete drops query results to zero.
 http://127.0.0.1:8893/solr/patents/update?commit=truestream.body=deletequerydogs/query/delete
 http://localhost:8893/solr/patents/select?q=dogsrows=0 (results drop by a 
 third)
 http://127.0.0.1:8893/solr/patents/update?commit=truestream.body=deletequerydogs/query/delete
 http://localhost:8893/solr/patents/select?q=dogsrows=0 (results drop to zero)
 Note that a delete without a commit followed by a commit drops query results 
 to zero, as it should:
 http://127.0.0.1:8893/solr/patents/update/?stream.body=deletequerydogs/query/delete
 http://localhost:8893/solr/patents/select?q=dogsrows=0 (full count as no 
 commit yet)
 http://127.0.0.1:8893/solr/patents/update/?commit=true
 http://localhost:8893/solr/patents/select?q=dogsrows=0   (results drop to 
 zero)
 One workaround (produces zero hits in one shot):
 http://127.0.0.1:8893/solr/patents/update?commit=truestream.body=outerdeletequerysun/query/deletecommit//outer
 The workaround I am using for now (produces zero hits in one shot):
 http://127.0.0.1:8893/solr/patents/update?stream.body=outerdeletequeryknee/query/deletecommit/commit//outer
 {code}
 {
   
 otherdocs:{slice0:{replicas:{Darins-MacBook-Pro.local:8893_solr_otherdocs_shard0:{
   shard:slice0,
   roles:null,
   state:active,
   core:otherdocs_shard0,
   collection:otherdocs,
   node_name:Darins-MacBook-Pro.local:8893_solr,
   base_url:http://Darins-MacBook-Pro.local:8893/solr;,
   leader:true,
   patents:{
 
 slice0:{replicas:{Darins-MacBook-Pro.local:8893_solr_patents_shard0:{
   shard:slice0,
   roles:null,
   state:active,
   core:patents_shard0,
   collection:patents,
   node_name:Darins-MacBook-Pro.local:8893_solr,
   base_url:http://Darins-MacBook-Pro.local:8893/solr;,
   leader:true}}},
 
 slice1:{replicas:{Darins-MacBook-Pro.local:8893_solr_patents_shard1:{
   shard:slice1,
   roles:null,
   state:active,
   core:patents_shard1,
   collection:patents,
   node_name:Darins-MacBook-Pro.local:8893_solr,
   base_url:http://Darins-MacBook-Pro.local:8893/solr;,
   leader:true}}},
 
 slice2:{replicas:{Darins-MacBook-Pro.local:8893_solr_patents_shard2:{
   shard:slice2,
   roles:null,
   state:active,
   core:patents_shard2,
   collection:patents,
   node_name:Darins-MacBook-Pro.local:8893_solr,
   base_url:http://Darins-MacBook-Pro.local:8893/solr;,
   leader:true}
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4042) NullPointerException for query type 'query' without '{! ...}' syntax

2012-11-06 Thread Joel Nothman (JIRA)

Joel Nothman created SOLR-4042:
--

 Summary: NullPointerException for query type 'query' without '{! 
...}' syntax
 Key: SOLR-4042
 URL: https://issues.apache.org/jira/browse/SOLR-4042
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0, 5.0
Reporter: Joel Nothman
Priority: Minor


The 'query' query type, corresponding to NestedQParserPlugin, expects a query 
of form: {! ... }  An empty q parameter, or a list of search terms causes 
a NullPointerException because NestedQParserPlugin.createParser receives a 
localParams == null, which is then used without checking in 
NestedQParserPlugin.QParser.parse().

Correct behaviour is currently ambiguous: throw a syntax error? or execute the 
query with the default parser?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4540) Allow packed ints norms


 [ 
https://issues.apache.org/jira/browse/LUCENE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4540.
-

   Resolution: Fixed
Fix Version/s: 5.0
   4.1

 Allow packed ints norms
 ---

 Key: LUCENE-4540
 URL: https://issues.apache.org/jira/browse/LUCENE-4540
 Project: Lucene - Core
  Issue Type: Task
  Components: core/index
Reporter: Robert Muir
 Fix For: 4.1, 5.0

 Attachments: LUCENE-4540.patch


 I was curious what the performance would be, because it might be useful 
 option to use packedints for norms if you have lots of fields and still want 
 good scoring: 
 Today the smallest norm per-field-per-doc you can use is a single byte, and 
 if you have _f_ fields with norms enabled and _n_ docs, it uses _f_ * _n_ 
 bytes of space in RAM.
 Especially if you aren't using index-time boosting (or even if you are, but 
 not with ridiculous values), this could be wasting a ton of RAM.
 But then I noticed there was no clean way to allow you to do this in your 
 Similarity: its a trivial patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token


[ 
https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492095#comment-13492095
 ] 

Robert Muir commented on SOLR-3589:
---

Committed to trunk/4.x.

Will look tomorrow at 3.6.

 Edismax parser does not honor mm parameter if analyzer splits a token
 -

 Key: SOLR-3589
 URL: https://issues.apache.org/jira/browse/SOLR-3589
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.6, 4.0-BETA
Reporter: Tom Burton-West
Assignee: Robert Muir
 Attachments: SOLR-3589.patch, SOLR-3589.patch, SOLR-3589.patch, 
 SOLR-3589.patch, SOLR-3589.patch, SOLR-3589_test.patch, testSolr3589.xml.gz, 
 testSolr3589.xml.gz


 With edismax mm set to 100%  if one of the tokens is split into two tokens by 
 the analyzer chain (i.e. fire-fly  = fire fly), the mm parameter is 
 ignored and the equivalent of  OR query for fire OR fly is produced.
 This is particularly a problem for languages that do not use white space to 
 separate words such as Chinese or Japenese.
 See these messages for more discussion:
 http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-hypenated-words-WDF-splitting-etc-tc3991911.html
 http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-i-e-CJK-tc3991438.html
 http://lucene.472066.n3.nabble.com/Why-won-t-dismax-create-multiple-DisjunctionMaxQueries-when-autoGeneratePhraseQueries-is-false-tc3992109.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4041) Allow segment merge monitoring in Solr Admin gui

2012-11-06 Thread Otis Gospodnetic (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492096#comment-13492096
 ] 

Otis Gospodnetic commented on SOLR-4041:


Does this automatically expose this in JMX?  Looks like it does, but if not, 
would be good to have these numbers there, too.

 Allow segment merge monitoring in Solr Admin gui
 

 Key: SOLR-4041
 URL: https://issues.apache.org/jira/browse/SOLR-4041
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 5.0
Reporter: Radim Kolar
  Labels: patch
 Fix For: 4.1

 Attachments: solr-monitormerge.txt


 add solrMbean for ConcurrentMergeScheduler

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: concurrentmergescheduller

2012-11-06 Thread Robert Muir

On Tue, Nov 6, 2012 at 6:32 AM, Michael McCandless
luc...@mikemccandless.com wrote:

 While confusing, I think the code is actually nearly correct...

My question is, who is going to create the MikeSays account?

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4483) Make Term constructor javadoc refer to BytesRef.deepCopyOf