[jira] [Updated] (LUCENE-5189) Numeric DocValues Updates

2013-10-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5189:
---

Attachment: LUCENE-5189-4x.patch

Patch covers the work from all the issues, ported to 4x (created w/ 
--show-copies-as-adds). I think it's ready (tests pass several times).

If there are no objections, I will add a CHANGES entry and commit it.

 Numeric DocValues Updates
 -

 Key: LUCENE-5189
 URL: https://issues.apache.org/jira/browse/LUCENE-5189
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, 
 LUCENE-5189-no-lost-updates.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189_process_events.patch, 
 LUCENE-5189_process_events.patch, LUCENE-5189-segdv.patch, 
 LUCENE-5189-updates-order.patch, LUCENE-5189-updates-order.patch


 In LUCENE-4258 we started to work on incremental field updates, however the 
 amount of changes are immense and hard to follow/consume. The reason is that 
 we targeted postings, stored fields, DV etc., all from the get go.
 I'd like to start afresh here, with numeric-dv-field updates only. There are 
 a couple of reasons to that:
 * NumericDV fields should be easier to update, if e.g. we write all the 
 values of all the documents in a segment for the updated field (similar to 
 how livedocs work, and previously norms).
 * It's a fairly contained issue, attempting to handle just one data type to 
 update, yet requires many changes to core code which will also be useful for 
 updating other data types.
 * It has value in and on itself, and we don't need to allow updating all the 
 data types in Lucene at once ... we can do that gradually.
 I have some working patch already which I'll upload next, explaining the 
 changes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5331) SolrCloud 4.5 bulk add errors

2013-10-25 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805167#comment-13805167
 ] 

Chris commented on SOLR-5331:
-

Closing this... With the above changes, and changes to my tomcat connector 
limits, this issue has gone away.

 SolrCloud 4.5 bulk add errors
 -

 Key: SOLR-5331
 URL: https://issues.apache.org/jira/browse/SOLR-5331
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5
Reporter: Chris
 Fix For: 4.5.1


 Since Solr 4.5 bulk adding documents via SolrJ (at least) is causing errors.
 // build array list of SolrInputDocuments
 server.add(docs);
 I've tried with CUSS (which swallows exceptions as expected) however they are 
 shown in the logs on server, and with CloudSolrServer which is returning the 
 errors as well as seeing them in the server logs.
 I've tried downgrading my SolrJ to 4.4, still errors so looks like a 
 regression in the server code. Reverting to Solr 4.4 on server and I don't 
 get errors (however run into deadlock issues).
 I raised this issue in IRC - NOT the mailing list, and elyorag suggested 
 opening a ticket, and to mention this has now been discussed in IRC.
 The exceptions would indicate I'm attempting to do multiple operations in a 
 single request which is malformed. I am not, I am only attempting to add 
 documents.
 Stack traces seen here:
 14:57:13 ERROR SolrCmdDistributor shard update error RetryNode: 
 http://X.X.X.X:8080/solr/collection1_shard16_replica2/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
  Illegal to have multiple roots (start tag in epilog?).
  
 shard update error RetryNode: 
 http://X.X.X.X:8080/solr/collection1_shard16_replica2/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
  Illegal to have multiple roots (start tag in epilog?).
 at [row,col {unknown-source}]: [18,327]
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:424)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
 at 
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
 at 
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:375)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 
 org.apache.solr.common.SolrException: Illegal to have multiple roots 
 (start tag in epilog?).
 at [row,col {unknown-source}]: [7,6314]
 at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
 at 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
 at 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
 at 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
 at 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
 at 
 

[jira] [Closed] (SOLR-5331) SolrCloud 4.5 bulk add errors

2013-10-25 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris closed SOLR-5331.
---

Resolution: Invalid

 SolrCloud 4.5 bulk add errors
 -

 Key: SOLR-5331
 URL: https://issues.apache.org/jira/browse/SOLR-5331
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5
Reporter: Chris
 Fix For: 4.5.1


 Since Solr 4.5 bulk adding documents via SolrJ (at least) is causing errors.
 // build array list of SolrInputDocuments
 server.add(docs);
 I've tried with CUSS (which swallows exceptions as expected) however they are 
 shown in the logs on server, and with CloudSolrServer which is returning the 
 errors as well as seeing them in the server logs.
 I've tried downgrading my SolrJ to 4.4, still errors so looks like a 
 regression in the server code. Reverting to Solr 4.4 on server and I don't 
 get errors (however run into deadlock issues).
 I raised this issue in IRC - NOT the mailing list, and elyorag suggested 
 opening a ticket, and to mention this has now been discussed in IRC.
 The exceptions would indicate I'm attempting to do multiple operations in a 
 single request which is malformed. I am not, I am only attempting to add 
 documents.
 Stack traces seen here:
 14:57:13 ERROR SolrCmdDistributor shard update error RetryNode: 
 http://X.X.X.X:8080/solr/collection1_shard16_replica2/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
  Illegal to have multiple roots (start tag in epilog?).
  
 shard update error RetryNode: 
 http://X.X.X.X:8080/solr/collection1_shard16_replica2/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
  Illegal to have multiple roots (start tag in epilog?).
 at [row,col {unknown-source}]: [18,327]
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:424)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
 at 
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
 at 
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:375)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 
 org.apache.solr.common.SolrException: Illegal to have multiple roots 
 (start tag in epilog?).
 at [row,col {unknown-source}]: [7,6314]
 at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
 at 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
 at 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
 at 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
 at 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
 at 
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 

[jira] [Commented] (SOLR-5364) SolrCloud stops accepting updates

2013-10-25 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805172#comment-13805172
 ] 

Chris commented on SOLR-5364:
-

Mark, thanks for taking the time to test this.

It turns out it is a garbage collection issue, I was running the default java 
args too, however my machines have 128GB RAM, and had been allocated 20GB heap. 
It seems this really needs tuning with Java 7. Since updating my GC collection 
settings, I have not had any issues. For reference these are the GC settings 
I'm running with:

JAVA_OPTS=$JAVA_OPTS -server -XX:NewRatio=1 -XX:SurvivorRatio=6 \
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
-XX:CMSIncrementalDutyCycleMin=0 \
-XX:CMSIncrementalDutyCycle=10 -XX:+CMSIncrementalPacing \
-XX:+CMSClassUnloadingEnabled -XX:+DisableExplicitGC \
-XX:ConcGCThreads=10 \
-XX:ParallelGCThreads=10 \
-XX:MaxGCPauseMillis=3

I've also set heap to 12g, eden to 5g:  -Xmx12g -Xms12g -Xmn5g


 SolrCloud stops accepting updates
 -

 Key: SOLR-5364
 URL: https://issues.apache.org/jira/browse/SOLR-5364
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Chris
Priority: Blocker

 I'm attempting to import data into a SolrCloud cluster. After a certain 
 amount of time, the cluster stops accepting updates.
 I have tried numerous suggestions in IRC from Elyorag and others without 
 resolve.
 I have had this issue with 4.4, and understood there was a deadlock issue 
 fixed in 4.5, which hasn't resolved the issue, neither have the 4.6 snapshots.
 I've tried with Tomcat, various tomcat configuration changes to threading, 
 and with Jetty. Tried with various index merging configurations as I 
 initially thought there was a deadlock with concurrent merg scheduler, 
 however same issue with SerialMergeScheduler.
 The cluster stops accepting updates after some amount of time, this seems to 
 vary and is inconsistent. Sometimes I manage to index 400k docs, other times 
 ~1million . Querying  the cluster continues to work. I can reproduce the 
 issue consistently, and is currently blocking our transition to Solr.
 I can provide stack traces, thread dumps, jstack dumps as required.
 Here are two jstacks thus far:
 http://pastebin.com/1ktjBYbf
 http://pastebin.com/8JiQc3rb
 I have got these jstacks from the latest 4.6 snapshot, also running solrj 
 snapshot. The issue is also consistently reproducable with BinaryRequest 
 writer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-25 Thread Marco Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805179#comment-13805179
 ] 

Marco Wong commented on SOLR-5379:
--

Excuse me, for the synonym-expander.patch, does the updated SolrQueryParserBase 
will emitting PhraseQuery(Term(a), Term(b)), when I have a ShingleFilter in 
query time analyzer which emits bigram like Term(a b), which makes my existing 
tokenization logic fail?

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Nguyen Manh Tien
  Labels: multi-word, queryparser, synonym
 Fix For: 4.5.1, 4.6

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-25 Thread Marco Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805179#comment-13805179
 ] 

Marco Wong edited comment on SOLR-5379 at 10/25/13 8:56 AM:


Excuse me, for the synonym-expander.patch, when I have a ShingleFilter in query 
time analyzer which emits bigram TermQuery like Term(a b), does the updated 
SolrQueryParserBase will emitting PhraseQuery(Term(a), Term(b)), making my 
existing tokenization logic fail?


was (Author: marcowong):
Excuse me, for the synonym-expander.patch, does the updated SolrQueryParserBase 
will emitting PhraseQuery(Term(a), Term(b)), when I have a ShingleFilter in 
query time analyzer which emits bigram like Term(a b), and makes my existing 
tokenization logic fail?

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Nguyen Manh Tien
  Labels: multi-word, queryparser, synonym
 Fix For: 4.5.1, 4.6

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-25 Thread Marco Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805179#comment-13805179
 ] 

Marco Wong edited comment on SOLR-5379 at 10/25/13 8:55 AM:


Excuse me, for the synonym-expander.patch, does the updated SolrQueryParserBase 
will emitting PhraseQuery(Term(a), Term(b)), when I have a ShingleFilter in 
query time analyzer which emits bigram like Term(a b), and makes my existing 
tokenization logic fail?


was (Author: marcowong):
Excuse me, for the synonym-expander.patch, does the updated SolrQueryParserBase 
will emitting PhraseQuery(Term(a), Term(b)), when I have a ShingleFilter in 
query time analyzer which emits bigram like Term(a b), which makes my existing 
tokenization logic fail?

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Nguyen Manh Tien
  Labels: multi-word, queryparser, synonym
 Fix For: 4.5.1, 4.6

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-25 Thread Nguyen Manh Tien (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805188#comment-13805188
 ] 

Nguyen Manh Tien commented on SOLR-5379:


Yes, it will emit PhraseQuery(Term(a), Term(b)).
We must additional check to only tokenize term when it is synonym.
I will change the patch.

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Nguyen Manh Tien
  Labels: multi-word, queryparser, synonym
 Fix For: 4.5.1, 4.6

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-25 Thread Nguyen Manh Tien (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nguyen Manh Tien updated SOLR-5379:
---

Attachment: synonym-expander.patch

Patch check synonym term

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Nguyen Manh Tien
  Labels: multi-word, queryparser, synonym
 Fix For: 4.5.1, 4.6

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-25 Thread Nguyen Manh Tien (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nguyen Manh Tien updated SOLR-5379:
---

Attachment: (was: synonym-expander.patch)

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Nguyen Manh Tien
  Labels: multi-word, queryparser, synonym
 Fix For: 4.5.1, 4.6

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5393) Solr Cloud Polygon insert over non leader shard produces error

2013-10-25 Thread Mahledivic Stronza (JIRA)
Mahledivic Stronza created SOLR-5393:


 Summary: Solr Cloud Polygon insert over non leader shard produces 
error
 Key: SOLR-5393
 URL: https://issues.apache.org/jira/browse/SOLR-5393
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5
Reporter: Mahledivic Stronza


We get the following error when trying to load polygon data into the solr cloud 
of type solr.SpatialRecursivePrefixTreeFieldType. 

When we insert the polygon data directly into the leader shard everything (also 
replication) works fine, but we get the following error whenever we try to 
insert the data over a non leader shard:

ERROR SolrCmdDistributor forwarding update to 
http://192.168.14.68:7460/geodata/ failed - retrying 

Without polygon we can insert in any shards (leader or replica) without 
problems.

Code to reproduce:

SolrInputDocument input = new SolrInputDocument();
input.setField(id, test);
String poly = POLYGON((13.994591177713277 51.002523790166705, 
13.999693765863235 51.002226426778684, 14.001913786758868 51.003267620336686, 
14.009460032984512 51.00507616830811, 14.014352741862453 51.005397015805684, 
14.015753050939129 51.006165440252694, 14.01713711596325 51.006907316716564, 
14.018964454037809 51.006716998902526, 14.021343903143388 51.006464920150016, 
14.02502734579799 51.00552477067982, 14.024643222840908 51.00516708013014, 
14.023995094883304 51.004529130427706, 14.02693059874708 51.00426911242828, 
14.029893872882688 51.00399916991763, 14.03384714224316 51.00248305935758, 
14.03569145591514 51.00176950378165, 14.037017581967358 51.00193640923721, 
14.039078347499816 51.00218906817236, 14.041385544697533 51.001182106955575, 
14.042961142804325 51.00050362709656, 14.045466411981872 51.00171549577248, 
14.047436055944628 51.00170955402223, 14.047501462719008 51.000698678987334, 
14.0512755199 51.00113313322361, 14.05199253668852 50.99931867381475, 
14.049908557072259 50.99764370517098, 14.048323570941262 50.99633182825351, 
14.048756259187261 50.995084559520684, 14.049083460379496 50.9941377646161, 
14.048559547400519 50.99251429812778, 14.048031545244314 50.990836907379865, 
14.049200640514442 50.988396160957755, 14.048752386129843 50.98645511274733, 
14.049899403692196 50.985032920490475, 14.050433093991956 50.982683593518296, 
14.049663453585083 50.98008576559324, 14.053447525957514 50.979934262898006, 
14.054101699616101 50.979905280314895, 14.054541876032507 50.979504496321006, 
14.056883382560684 50.97728906687162, 14.061461001521177 50.977626495499514, 
14.065739069155171 50.977036091326006, 14.070237468930674 50.976528847851064, 
14.072737969004711 50.9763890539482, 14.073435256940234 50.97580914575129, 
14.073656252079921 50.97497361816587, 14.073507842972424 50.97415844208817, 
14.073477076544268 50.97357386295706, 14.073497624562199 50.97328497403231, 
14.07353309846212 50.973004634793355, 14.073629355769953 50.97258730796079, 
14.074031133526418 50.97206150901968, 14.074173161574096 50.9718679773403, 
14.074953685738512 50.97125847625354, 14.076318122528281 50.970450862964476, 
14.077419495985641 50.968822576119045, 14.07938471092451 50.96785232072157, 
14.079935466625953 50.965376163972905, 14.075928740820487 50.965769620044185, 
14.076477656611141 50.96493301242208, 14.07715569856788 50.96392127776921, 
14.074518652583494 50.96468694014318, 14.070947374058479 50.965733442336116, 
14.070067128285528 50.964490320180744, 14.073638108045227 50.962885351295846, 
14.074800325552822 50.96204792730229, 14.07578512690655 50.96131503249083, 
14.072648032067532 50.96077183886253, 14.069990748256624 50.96108762107119, 
14.068779840845313 50.962034585317255, 14.067957389649735 50.96265434173164, 
14.067324872520413 50.96203416740502, 14.066893256314694 50.96161501555103, 
14.063746581910717 50.96094576617525, 14.060359780703521 50.96104946306251, 
14.05747465371602 50.96025493987205, 14.05373788209221 50.960071883898436, 
14.048310513394146 50.95997631668494, 14.048472867689442 50.96098025221684, 
14.048526758248931 50.96131190166739, 14.047891683599207 50.961214167562254, 
14.043110704367722 50.96043205625402, 14.04200268876 50.96254661748835, 
14.039611197301197 50.962610392016906, 14.039496995984026 50.962983189010785, 
14.039153009982101 50.96408360520823, 14.036527043440328 50.964253532621974, 
14.03671903618213 50.962842471755415, 14.037626765889286 50.962400512989845, 
14.039081773666089 50.96165364817774, 14.039871682812647 50.96060271915765, 
14.040057039055357 50.95948010265582, 14.037002440491422 50.9541537619, 
14.035134869565928 50.9605846093134, 14.033106236231959 50.96185328415876, 
14.03250868364643 50.962997407710056, 14.03011455632278 50.963214199168796, 
14.030856865387815 50.96219178096697, 14.0308016783549 50.96188314726897, 
14.02902553543847 50.96211457060064, 14.028183254277486 50.961264041726984, 
14.025263665203713 50.962073167844004, 14.021542191222775 

[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 946 - Still Failing!

2013-10-25 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/946/
Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 9765 lines...]
   [junit4] JVM J0: stdout was not empty, see: 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp/junit4-J0-20131025_122106_052.sysout
   [junit4]  JVM J0: stdout (verbatim) 
   [junit4] #
   [junit4] # A fatal error has been detected by the Java Runtime Environment:
   [junit4] #
   [junit4] #  SIGSEGV (0xb) at pc=0x00010feeba2b, pid=185, tid=134147
   [junit4] #
   [junit4] # JRE version: Java(TM) SE Runtime Environment (7.0_45-b18) (build 
1.7.0_45-b18)
   [junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.45-b08 mixed mode 
bsd-amd64 )
   [junit4] # Problematic frame:
   [junit4] # C  [libjava.dylib+0x9a2b]  JNU_NewStringPlatform+0x1d3
   [junit4] #
   [junit4] # Failed to write core dump. Core dumps have been disabled. To 
enable core dumping, try ulimit -c unlimited before starting Java again
   [junit4] #
   [junit4] # An error report file with more information is saved as:
   [junit4] # 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/J0/hs_err_pid185.log
   [junit4] [thread 139779 also had an error]
   [junit4] #
   [junit4] # If you would like to submit a bug report, please visit:
   [junit4] #   http://bugreport.sun.com/bugreport/crash.jsp
   [junit4] # The crash happened outside the Java Virtual Machine in native 
code.
   [junit4] # See problematic frame for where to report the bug.
   [junit4] #
   [junit4]  JVM J0: EOF 

[...truncated 1 lines...]
   [junit4] ERROR: JVM J0 ended with an exception, command line: 
/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/bin/java 
-XX:-UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/heapdumps 
-Dtests.prefix=tests -Dtests.seed=F31293EB573940CD -Xmx512M -Dtests.iters= 
-Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
-Dtests.postingsformat=random -Dtests.docvaluesformat=random 
-Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random 
-Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 
-Dtests.cleanthreads=perClass 
-Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/logging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true 
-Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. 
-Djava.io.tmpdir=. 
-Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp
 
-Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db
 -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
-Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy
 -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Djava.awt.headless=true -Dtests.disableHdfs=true -Dfile.encoding=UTF-8 
-classpath 

[jira] [Resolved] (SOLR-5387) Multi-Term analyser not working

2013-10-25 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-5387.
--

Resolution: Not A Problem

Please raise questions like this on the user's list before raising a JIRA to 
see if it's a real bug or just a misunderstanding on your part.

In this case, the code is functioning as expected. MultiTerm analysis chains 
may NOT break incoming tokens up into more than one token. In this case, 
anything with an @ symbol is broken up into more than one term due to the 
StandardTokenizer, which is an illegal condition for multiterm queries.

 Multi-Term analyser not working
 ---

 Key: SOLR-5387
 URL: https://issues.apache.org/jira/browse/SOLR-5387
 Project: Solr
  Issue Type: Bug
Reporter: Claire Chan
 Fix For: 4.5


 I tried the solr 4.5 example schema modifed by changing a field, say, 'manu' 
 to the following fieldType:
 fieldType name=text_general_mt class=solr.TextField 
 positionIncrementGap=100
 ...
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=multiterm
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType
 After indexing a document with manu value europ...@union.de, 
 the following search throw an Exception:
 manu:(european@unio*)
 The exception:
 analyzer returned too many terms for multiTerm term: european@unio
 org.apache.solr.common.SolrException: analyzer returned too many terms for 
 multiTerm term: european@unio
 at 
 org.apache.solr.schema.TextField.analyzeMultiTerm(TextField.java:157)
 at 
 org.apache.solr.parser.SolrQueryParserBase.analyzeIfMultitermTermText(SolrQueryParserBase.java:936)
 at 
 org.apache.solr.parser.SolrQueryParserBase.getPrefixQuery(SolrQueryParserBase.java:981)
 at 
 org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:746)
 at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
 at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
 at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
 I thought I did exactly as instructed by various MultiTerm-blogs  
 Wiki-Pages. So please take a look if this is a bug.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0-ea-b109) - Build # 8036 - Failure!

2013-10-25 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/8036/
Java: 32bit/jdk1.8.0-ea-b109 -server -XX:+UseSerialGC

1 tests failed.
REGRESSION:  org.apache.solr.core.TestNonNRTOpen.testReaderIsNotNRT

Error Message:
expected:3 but was:2

Stack Trace:
java.lang.AssertionError: expected:3 but was:2
at 
__randomizedtesting.SeedInfo.seed([8EB9DFDAC3456D7E:3B3FBE5D7C84DF8A]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.solr.core.TestNonNRTOpen.assertNotNRT(TestNonNRTOpen.java:133)
at 
org.apache.solr.core.TestNonNRTOpen.testReaderIsNotNRT(TestNonNRTOpen.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:491)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 

[jira] [Updated] (LUCENE-5285) FastVectorHighlighter copies segments scores when splitting segments across multi-valued fields

2013-10-25 Thread Nik Everett (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nik Everett updated LUCENE-5285:


Attachment: LUCENE-5285.patch

New patch fixes my broken WeightedFragList change and expands  
WeightedFragListBuilderTest to catch the broken implementation.

 FastVectorHighlighter copies segments scores when splitting segments across 
 multi-valued fields
 ---

 Key: LUCENE-5285
 URL: https://issues.apache.org/jira/browse/LUCENE-5285
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Priority: Minor
 Attachments: LUCENE-5285.patch, LUCENE-5285.patch


 FastVectorHighlighter copies segments scores when splitting segments across 
 multi-valued fields.  This is only a problem when you want to sort the 
 fragments by score. Technically BaseFragmentsBuilder (line 261 in my copy of 
 the source) does the copying.
 Rather than copying the score I _think_ it'd be more right to pull that 
 copying logic into a protected method that child classes (such as 
 ScoreOrderFragmentsBuilder) can override to do more intelligent things.  
 Exactly what that means isn't clear to me at the moment.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



New JVM bug on MacOSX ?!?

2013-10-25 Thread Uwe Schindler
Hi Rory, hi Lucene/Solr committers,

this is a JVM crash with 7u45 on MacOSX, but also happened with u40. I set the 
broken build to be sticky, the hs_err file is also available (was archived as 
build artifact):

Build:
http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/946/

hs_err:
http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/946/artifact/solr/build/solr-core/test/J0/hs_err_pid185.log

SIGSEGV happens here:
[libjava.dylib+0x9a2b]  JNU_NewStringPlatform+0x1d3

We have seen this bug on Oct 10, too. It's also mentioned in this issue: 
https://issues.apache.org/jira/browse/SOLR-4593
It only reproduces on MacOSX computers (sometimes). I happens in line with 
other malloc/free bugs: MacOSX crashes the JVM quite often with complaining 
about double free() on pointers malloc'ed before. MacOSX seems to be very picky 
about double freeing pointers in their libc. If I have a new failure about that 
one, I will post it, too. The double free() one reproduces from time to time on 
all MacOSX machines. This one above was only seen on this virtual machine 
(VirtualBOX 4.2.18 and 4.3.0, stock OSX 10.8.5 EFI64 Guest) up to now.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Policeman Jenkins Server [mailto:jenk...@thetaphi.de]
 Sent: Friday, October 25, 2013 2:31 PM
 To: dev@lucene.apache.org
 Subject: [JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 946 -
 Still Failing!
 
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/946/
 Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC
 
 All tests passed
 
 Build Log:
 [...truncated 9765 lines...]
[junit4] JVM J0: stdout was not empty, see:
 /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-
 core/test/temp/junit4-J0-20131025_122106_052.sysout
[junit4]  JVM J0: stdout (verbatim) 
[junit4] #
[junit4] # A fatal error has been detected by the Java Runtime
 Environment:
[junit4] #
[junit4] #  SIGSEGV (0xb) at pc=0x00010feeba2b, pid=185, tid=134147
[junit4] #
[junit4] # JRE version: Java(TM) SE Runtime Environment (7.0_45-b18)
 (build 1.7.0_45-b18)
[junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.45-b08 mixed
 mode bsd-amd64 )
[junit4] # Problematic frame:
[junit4] # C  [libjava.dylib+0x9a2b]  JNU_NewStringPlatform+0x1d3
[junit4] #
[junit4] # Failed to write core dump. Core dumps have been disabled. To
 enable core dumping, try ulimit -c unlimited before starting Java again
[junit4] #
[junit4] # An error report file with more information is saved as:
[junit4] # /Users/jenkins/workspace/Lucene-Solr-trunk-
 MacOSX/solr/build/solr-core/test/J0/hs_err_pid185.log
[junit4] [thread 139779 also had an error]
[junit4] #
[junit4] # If you would like to submit a bug report, please visit:
[junit4] #   http://bugreport.sun.com/bugreport/crash.jsp
[junit4] # The crash happened outside the Java Virtual Machine in native
 code.
[junit4] # See problematic frame for where to report the bug.
[junit4] #
[junit4]  JVM J0: EOF 
 
 [...truncated 1 lines...]
[junit4] ERROR: JVM J0 ended with an exception, command line:
 /Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/bin/
 java -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC -
 XX:+HeapDumpOnOutOfMemoryError -
 XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-
 MacOSX/heapdumps -Dtests.prefix=tests -Dtests.seed=F31293EB573940CD -
 Xmx512M -Dtests.iters= -Dtests.verbose=false -Dtests.infostream=false -
 Dtests.codec=random -Dtests.postingsformat=random -
 Dtests.docvaluesformat=random -Dtests.locale=random -
 Dtests.timezone=random -Dtests.directory=random -
 Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 -
 Dtests.cleanthreads=perClass -
 Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk-
 MacOSX/lucene/tools/junit4/logging.properties -Dtests.nightly=false -
 Dtests.weekly=false -Dtests.slow=true -Dtests.asserts.gracious=false -
 Dtests.multiplier=1 -DtempDir=. -Djava.io.tmpdir=. -
 Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk-
 MacOSX/solr/build/solr-core/test/temp -
 Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk-
 MacOSX/lucene/build/clover/db -
 Djava.security.manager=org.apache.lucene.util.TestSecurityManager -
 Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk-
 MacOSX/lucene/tools/junit4/tests.policy -Dlucene.version=5.0-SNAPSHOT -
 Djetty.testMode=1 -Djetty.insecurerandom=1 -
 Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory -
 Djava.awt.headless=true -Dtests.disableHdfs=true -Dfile.encoding=UTF-8 -
 classpath /Users/jenkins/workspace/Lucene-Solr-trunk-
 MacOSX/solr/build/solr-
 core/classes/test:/Users/jenkins/workspace/Lucene-Solr-trunk-
 MacOSX/solr/build/solr-test-
 

[jira] [Created] (LUCENE-5307) Inconsistency between Weight.scorer documentation and ConstantScoreQuery on top of a Filter

2013-10-25 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-5307:


 Summary: Inconsistency between Weight.scorer documentation and 
ConstantScoreQuery on top of a Filter
 Key: LUCENE-5307
 URL: https://issues.apache.org/jira/browse/LUCENE-5307
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor


{{Weight.scorer}} states that if {{topScorer == true}}, {{Scorer.collect}} will 
be called and that otherwise {{Scorer.nextDoc/advance}} will be called.

This is a problem when {{ConstantScoreQuery}} is used on top of a 
{{QueryWrapperFilter}}:
 1. {{ConstantScoreWeight}}  calls {{getDocIdSet}} on the filter to know which 
documents to collect.
 2. {{QueryWrapperFilter.getDocIdSet}} returns a {{Scorer}} created with 
{{topScorer == false}} so that {{nextDoc/advance}} are supported.
 3. But then {{ConstantScorer.score(Collector)}} has the following optimization:
{code}
// this optimization allows out of order scoring as top scorer!
@Override
public void score(Collector collector) throws IOException {
  if (docIdSetIterator instanceof Scorer) {
((Scorer) docIdSetIterator).score(wrapCollector(collector));
  } else {
super.score(collector);
  }
}
{code}

So the filter iterator is a scorer which was created with {{topScorer = false}} 
but {{ParentScorer}} ends up using its {{score(Collector)}} method, which is 
illegal. (I found this out because AssertingSearcher has some checks to make 
sure Scorers are used accordingly to the value of topScorer.)

I can imagine several fixes, including:
 - removing this optimization when working on top of a filter
 - relaxing Weight documentation to allow for using {{score(Collector)}} when 
{{topScorer == false}}

but I'm not sure which one is the best one. What do you think?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2013-10-25 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805485#comment-13805485
 ] 

Kranti Parisa commented on SOLR-4787:
-

I have recently extended the hjoin further to support multiple FQs separated by 
comma (,)

{code}
/masterCore/select?q=*:*fq=({!hjoin fromIndex=ACore from=parentid to=id v=$aQ 
*fq=$BJoinQ,$AlocalFQ*})aQ=(f1:false)BJoinQ=({!hjoin fromIndex=BCore from=bid 
to=aid}tag:abc)AlocalFQ=(fieldName:value)
{code}

This will allow using the filter caches for multiple nested queries while using 
the hjoin like how solr supports multiple FQ params within the same request.

Any feedback for the syntax? is comma separated FQs sounds ok?

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.6

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the 

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-10-25 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805485#comment-13805485
 ] 

Kranti Parisa edited comment on SOLR-4787 at 10/25/13 5:29 PM:
---

I have recently extended the hjoin further to support multiple FQs separated by 
comma (,)

{code}
/masterCore/select?q=*:*fq=({!hjoin fromIndex=ACore from=parentid to=id v=$aQ 
fq=$BJoinQ,$AlocalFQ})aQ=(f1:false)BJoinQ=({!hjoin fromIndex=BCore from=bid 
to=aid}tag:abc)AlocalFQ=(fieldName:value)
{code}

This will allow using the filter caches for multiple nested queries while using 
the hjoin like how solr supports multiple FQ params within the same request.

Any feedback for the syntax? is comma separated FQs (eg: 
*fq=$BJoinQ,$AlocalFQ*) sounds ok?


was (Author: krantiparisa):
I have recently extended the hjoin further to support multiple FQs separated by 
comma (,)

{code}
/masterCore/select?q=*:*fq=({!hjoin fromIndex=ACore from=parentid to=id v=$aQ 
*fq=$BJoinQ,$AlocalFQ*})aQ=(f1:false)BJoinQ=({!hjoin fromIndex=BCore from=bid 
to=aid}tag:abc)AlocalFQ=(fieldName:value)
{code}

This will allow using the filter caches for multiple nested queries while using 
the hjoin like how solr supports multiple FQ params within the same request.

Any feedback for the syntax? is comma separated FQs sounds ok?

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.6

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is 

[jira] [Assigned] (LUCENE-5307) Inconsistency between Weight.scorer documentation and ConstantScoreQuery on top of a Filter

2013-10-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-5307:
-

Assignee: Uwe Schindler  (was: Adrien Grand)

 Inconsistency between Weight.scorer documentation and ConstantScoreQuery on 
 top of a Filter
 ---

 Key: LUCENE-5307
 URL: https://issues.apache.org/jira/browse/LUCENE-5307
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Uwe Schindler
Priority: Minor

 {{Weight.scorer}} states that if {{topScorer == true}}, {{Scorer.collect}} 
 will be called and that otherwise {{Scorer.nextDoc/advance}} will be called.
 This is a problem when {{ConstantScoreQuery}} is used on top of a 
 {{QueryWrapperFilter}}:
  1. {{ConstantScoreWeight}}  calls {{getDocIdSet}} on the filter to know 
 which documents to collect.
  2. {{QueryWrapperFilter.getDocIdSet}} returns a {{Scorer}} created with 
 {{topScorer == false}} so that {{nextDoc/advance}} are supported.
  3. But then {{ConstantScorer.score(Collector)}} has the following 
 optimization:
 {code}
 // this optimization allows out of order scoring as top scorer!
 @Override
 public void score(Collector collector) throws IOException {
   if (docIdSetIterator instanceof Scorer) {
 ((Scorer) docIdSetIterator).score(wrapCollector(collector));
   } else {
 super.score(collector);
   }
 }
 {code}
 So the filter iterator is a scorer which was created with {{topScorer = 
 false}} but {{ParentScorer}} ends up using its {{score(Collector)}} method, 
 which is illegal. (I found this out because AssertingSearcher has some checks 
 to make sure Scorers are used accordingly to the value of topScorer.)
 I can imagine several fixes, including:
  - removing this optimization when working on top of a filter
  - relaxing Weight documentation to allow for using {{score(Collector)}} when 
 {{topScorer == false}}
 but I'm not sure which one is the best one. What do you think?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5307) Inconsistency between Weight.scorer documentation and ConstantScoreQuery on top of a Filter

2013-10-25 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805506#comment-13805506
 ] 

Uwe Schindler commented on LUCENE-5307:
---

Hi Adrien,
This is actually my fault. The following fix would be most correct and makes 
the optimization still work for the not really useful combination: 
ConstsntScoreQuery(QueryWrapperFilter(Query))

LotS of old code is using this, instead it should directly wrap the Query 
instead of creating a filter wrap.

I would fix:
- change the instanceof check to a query != null and assert that it is a Scorer
- add another special case in rewrite to prevent the old style stupidity: 
rewrite the above combination to a simple ConstantScoreQuery with the Query 
that was wrapped by the filter, ignoring inner boost.

Il'll upoad patch later.

 Inconsistency between Weight.scorer documentation and ConstantScoreQuery on 
 top of a Filter
 ---

 Key: LUCENE-5307
 URL: https://issues.apache.org/jira/browse/LUCENE-5307
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Uwe Schindler
Priority: Minor

 {{Weight.scorer}} states that if {{topScorer == true}}, {{Scorer.collect}} 
 will be called and that otherwise {{Scorer.nextDoc/advance}} will be called.
 This is a problem when {{ConstantScoreQuery}} is used on top of a 
 {{QueryWrapperFilter}}:
  1. {{ConstantScoreWeight}}  calls {{getDocIdSet}} on the filter to know 
 which documents to collect.
  2. {{QueryWrapperFilter.getDocIdSet}} returns a {{Scorer}} created with 
 {{topScorer == false}} so that {{nextDoc/advance}} are supported.
  3. But then {{ConstantScorer.score(Collector)}} has the following 
 optimization:
 {code}
 // this optimization allows out of order scoring as top scorer!
 @Override
 public void score(Collector collector) throws IOException {
   if (docIdSetIterator instanceof Scorer) {
 ((Scorer) docIdSetIterator).score(wrapCollector(collector));
   } else {
 super.score(collector);
   }
 }
 {code}
 So the filter iterator is a scorer which was created with {{topScorer = 
 false}} but {{ParentScorer}} ends up using its {{score(Collector)}} method, 
 which is illegal. (I found this out because AssertingSearcher has some checks 
 to make sure Scorers are used accordingly to the value of topScorer.)
 I can imagine several fixes, including:
  - removing this optimization when working on top of a filter
  - relaxing Weight documentation to allow for using {{score(Collector)}} when 
 {{topScorer == false}}
 but I'm not sure which one is the best one. What do you think?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5294) Suggester Dictionary implementation that takes expressions as term weights

2013-10-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805513#comment-13805513
 ] 

ASF subversion and git services commented on LUCENE-5294:
-

Commit 1535797 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1535797 ]

LUCENE-5294: IntelliJ config

 Suggester Dictionary implementation that takes expressions as term weights
 --

 Key: LUCENE-5294
 URL: https://issues.apache.org/jira/browse/LUCENE-5294
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/search
Reporter: Areek Zillur
 Fix For: 4.6, 5.0

 Attachments: LUCENE-5294.patch


 It would be nice to have a Suggester Dictionary implementation that could 
 compute the weights of the terms consumed by the suggester based on an 
 user-defined expression (using lucene's expression module).
 It could be an extension of the existing DocumentDictionary (which takes 
 terms, weights and (optionally) payloads from the stored documents in the 
 index). The only exception being that instead of taking the weights for the 
 terms from the specified weight fields, it could compute the weights using an 
 user-defn expression, that uses one or more NumicDocValuesField from the 
 document.
 Example:
   let the document have
  - product_id
  - product_name
  - product_popularity
  - product_profit
   Then this implementation could be used with an expression of 
 0.2*product_popularity + 0.8*product_profit to determine the weights of the 
 terms for the corresponding documents (optionally along with a payload 
 (product_id))



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5294) Suggester Dictionary implementation that takes expressions as term weights

2013-10-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805519#comment-13805519
 ] 

ASF subversion and git services commented on LUCENE-5294:
-

Commit 1535798 from [~steve_rowe] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1535798 ]

LUCENE-5294: IntelliJ config (merged trunk r1535797)

 Suggester Dictionary implementation that takes expressions as term weights
 --

 Key: LUCENE-5294
 URL: https://issues.apache.org/jira/browse/LUCENE-5294
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/search
Reporter: Areek Zillur
 Fix For: 4.6, 5.0

 Attachments: LUCENE-5294.patch


 It would be nice to have a Suggester Dictionary implementation that could 
 compute the weights of the terms consumed by the suggester based on an 
 user-defined expression (using lucene's expression module).
 It could be an extension of the existing DocumentDictionary (which takes 
 terms, weights and (optionally) payloads from the stored documents in the 
 index). The only exception being that instead of taking the weights for the 
 terms from the specified weight fields, it could compute the weights using an 
 user-defn expression, that uses one or more NumicDocValuesField from the 
 document.
 Example:
   let the document have
  - product_id
  - product_name
  - product_popularity
  - product_profit
   Then this implementation could be used with an expression of 
 0.2*product_popularity + 0.8*product_profit to determine the weights of the 
 terms for the corresponding documents (optionally along with a payload 
 (product_id))



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: New JVM bug on MacOSX ?!?

2013-10-25 Thread Rory O'Donnell Oracle, Dublin Ireland

Hi Uwe,

I am adding Sean Coffey to this thread, I am out next week until Friday, 
1st of November.
Could you log a webbug (http://bugreport.sun.com/bugreport) , Sean will 
try to help

move it along.

Rgds,Rory


On 25/10/2013 17:42, Uwe Schindler wrote:

Hi,

 From the hs_err file it looks like this one happens only, if while reading from a socket 
(the HTTP socket) an I/O-error occurs. The native socket read code tries to throw an 
IOException and creates the message string from the errno macro of the libc 
and segfaults while doing this. So it could be a simple network stack problem on OSX 
where some errno/LastError has no valid message,

This is why we might not see this bug on other MacOSX machines, because this 
machine is not the fastest one and we know that read errors occur much more 
often (this is why Solr test fail on this OSX machine). So it might affect 
other OSX machines, if you make them very busy by a parallel SETI@home or 
whatever :-)

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de



-Original Message-
From: Uwe Schindler [mailto:u...@thetaphi.de]
Sent: Friday, October 25, 2013 6:15 PM
To: dev@lucene.apache.org
Cc: rory.odonn...@oracle.com
Subject: New JVM bug on MacOSX ?!?

Hi Rory, hi Lucene/Solr committers,

this is a JVM crash with 7u45 on MacOSX, but also happened with u40. I set
the broken build to be sticky, the hs_err file is also available (was archived 
as
build artifact):

Build:
http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/946/

hs_err:
http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-
MacOSX/946/artifact/solr/build/solr-core/test/J0/hs_err_pid185.log

SIGSEGV happens here:
[libjava.dylib+0x9a2b]  JNU_NewStringPlatform+0x1d3

We have seen this bug on Oct 10, too. It's also mentioned in this issue:
https://issues.apache.org/jira/browse/SOLR-4593
It only reproduces on MacOSX computers (sometimes). I happens in line with
other malloc/free bugs: MacOSX crashes the JVM quite often with
complaining about double free() on pointers malloc'ed before. MacOSX
seems to be very picky about double freeing pointers in their libc. If I have a
new failure about that one, I will post it, too. The double free() one
reproduces from time to time on all MacOSX machines. This one above was
only seen on this virtual machine (VirtualBOX 4.2.18 and 4.3.0, stock OSX
10.8.5 EFI64 Guest) up to now.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de



-Original Message-
From: Policeman Jenkins Server [mailto:jenk...@thetaphi.de]
Sent: Friday, October 25, 2013 2:31 PM
To: dev@lucene.apache.org
Subject: [JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build #
946 - Still Failing!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/946/
Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -

XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 9765 lines...]
[junit4] JVM J0: stdout was not empty, see:
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-
core/test/temp/junit4-J0-20131025_122106_052.sysout
[junit4]  JVM J0: stdout (verbatim) 
[junit4] #
[junit4] # A fatal error has been detected by the Java Runtime
Environment:
[junit4] #
[junit4] #  SIGSEGV (0xb) at pc=0x00010feeba2b, pid=185, tid=134147
[junit4] #
[junit4] # JRE version: Java(TM) SE Runtime Environment
(7.0_45-b18) (build 1.7.0_45-b18)
[junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.45-b08
mixed mode bsd-amd64 )
[junit4] # Problematic frame:
[junit4] # C  [libjava.dylib+0x9a2b]  JNU_NewStringPlatform+0x1d3
[junit4] #
[junit4] # Failed to write core dump. Core dumps have been
disabled. To enable core dumping, try ulimit -c unlimited before starting

Java again

[junit4] #
[junit4] # An error report file with more information is saved as:
[junit4] # /Users/jenkins/workspace/Lucene-Solr-trunk-
MacOSX/solr/build/solr-core/test/J0/hs_err_pid185.log
[junit4] [thread 139779 also had an error]
[junit4] #
[junit4] # If you would like to submit a bug report, please visit:
[junit4] #   http://bugreport.sun.com/bugreport/crash.jsp
[junit4] # The crash happened outside the Java Virtual Machine in
native code.
[junit4] # See problematic frame for where to report the bug.
[junit4] #
[junit4]  JVM J0: EOF 

[...truncated 1 lines...]
[junit4] ERROR: JVM J0 ended with an exception, command line:
/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/bi
n/ java -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC -
XX:+HeapDumpOnOutOfMemoryError -
XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-
MacOSX/heapdumps -Dtests.prefix=tests -

Dtests.seed=F31293EB573940CD -

Xmx512M -Dtests.iters= -Dtests.verbose=false -Dtests.infostream=false
- Dtests.codec=random -Dtests.postingsformat=random -
Dtests.docvaluesformat=random 

[jira] [Commented] (LUCENE-5307) Inconsistency between Weight.scorer documentation and ConstantScoreQuery on top of a Filter

2013-10-25 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805547#comment-13805547
 ] 

Adrien Grand commented on LUCENE-5307:
--

Thanks Uwe!

 Inconsistency between Weight.scorer documentation and ConstantScoreQuery on 
 top of a Filter
 ---

 Key: LUCENE-5307
 URL: https://issues.apache.org/jira/browse/LUCENE-5307
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Uwe Schindler
Priority: Minor
 Attachments: LUCENE-5307-test.patch


 {{Weight.scorer}} states that if {{topScorer == true}}, {{Scorer.collect}} 
 will be called and that otherwise {{Scorer.nextDoc/advance}} will be called.
 This is a problem when {{ConstantScoreQuery}} is used on top of a 
 {{QueryWrapperFilter}}:
  1. {{ConstantScoreWeight}}  calls {{getDocIdSet}} on the filter to know 
 which documents to collect.
  2. {{QueryWrapperFilter.getDocIdSet}} returns a {{Scorer}} created with 
 {{topScorer == false}} so that {{nextDoc/advance}} are supported.
  3. But then {{ConstantScorer.score(Collector)}} has the following 
 optimization:
 {code}
 // this optimization allows out of order scoring as top scorer!
 @Override
 public void score(Collector collector) throws IOException {
   if (docIdSetIterator instanceof Scorer) {
 ((Scorer) docIdSetIterator).score(wrapCollector(collector));
   } else {
 super.score(collector);
   }
 }
 {code}
 So the filter iterator is a scorer which was created with {{topScorer = 
 false}} but {{ParentScorer}} ends up using its {{score(Collector)}} method, 
 which is illegal. (I found this out because AssertingSearcher has some checks 
 to make sure Scorers are used accordingly to the value of topScorer.)
 I can imagine several fixes, including:
  - removing this optimization when working on top of a filter
  - relaxing Weight documentation to allow for using {{score(Collector)}} when 
 {{topScorer == false}}
 but I'm not sure which one is the best one. What do you think?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5307) Inconsistency between Weight.scorer documentation and ConstantScoreQuery on top of a Filter

2013-10-25 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5307:
-

Attachment: LUCENE-5307-test.patch

Here is a test that fails (feel free to not reuse it, this is just to 
demonstrate the problem).

 Inconsistency between Weight.scorer documentation and ConstantScoreQuery on 
 top of a Filter
 ---

 Key: LUCENE-5307
 URL: https://issues.apache.org/jira/browse/LUCENE-5307
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Uwe Schindler
Priority: Minor
 Attachments: LUCENE-5307-test.patch


 {{Weight.scorer}} states that if {{topScorer == true}}, {{Scorer.collect}} 
 will be called and that otherwise {{Scorer.nextDoc/advance}} will be called.
 This is a problem when {{ConstantScoreQuery}} is used on top of a 
 {{QueryWrapperFilter}}:
  1. {{ConstantScoreWeight}}  calls {{getDocIdSet}} on the filter to know 
 which documents to collect.
  2. {{QueryWrapperFilter.getDocIdSet}} returns a {{Scorer}} created with 
 {{topScorer == false}} so that {{nextDoc/advance}} are supported.
  3. But then {{ConstantScorer.score(Collector)}} has the following 
 optimization:
 {code}
 // this optimization allows out of order scoring as top scorer!
 @Override
 public void score(Collector collector) throws IOException {
   if (docIdSetIterator instanceof Scorer) {
 ((Scorer) docIdSetIterator).score(wrapCollector(collector));
   } else {
 super.score(collector);
   }
 }
 {code}
 So the filter iterator is a scorer which was created with {{topScorer = 
 false}} but {{ParentScorer}} ends up using its {{score(Collector)}} method, 
 which is illegal. (I found this out because AssertingSearcher has some checks 
 to make sure Scorers are used accordingly to the value of topScorer.)
 I can imagine several fixes, including:
  - removing this optimization when working on top of a filter
  - relaxing Weight documentation to allow for using {{score(Collector)}} when 
 {{topScorer == false}}
 but I'm not sure which one is the best one. What do you think?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5307) Inconsistency between Weight.scorer documentation and ConstantScoreQuery on top of a Filter

2013-10-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5307:
--

Attachment: LUCENE-5307.patch

Here is the patch. Thanks for the test, I was about to write a similar one!

 Inconsistency between Weight.scorer documentation and ConstantScoreQuery on 
 top of a Filter
 ---

 Key: LUCENE-5307
 URL: https://issues.apache.org/jira/browse/LUCENE-5307
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Uwe Schindler
Priority: Minor
 Attachments: LUCENE-5307.patch, LUCENE-5307-test.patch


 {{Weight.scorer}} states that if {{topScorer == true}}, {{Scorer.collect}} 
 will be called and that otherwise {{Scorer.nextDoc/advance}} will be called.
 This is a problem when {{ConstantScoreQuery}} is used on top of a 
 {{QueryWrapperFilter}}:
  1. {{ConstantScoreWeight}}  calls {{getDocIdSet}} on the filter to know 
 which documents to collect.
  2. {{QueryWrapperFilter.getDocIdSet}} returns a {{Scorer}} created with 
 {{topScorer == false}} so that {{nextDoc/advance}} are supported.
  3. But then {{ConstantScorer.score(Collector)}} has the following 
 optimization:
 {code}
 // this optimization allows out of order scoring as top scorer!
 @Override
 public void score(Collector collector) throws IOException {
   if (docIdSetIterator instanceof Scorer) {
 ((Scorer) docIdSetIterator).score(wrapCollector(collector));
   } else {
 super.score(collector);
   }
 }
 {code}
 So the filter iterator is a scorer which was created with {{topScorer = 
 false}} but {{ParentScorer}} ends up using its {{score(Collector)}} method, 
 which is illegal. (I found this out because AssertingSearcher has some checks 
 to make sure Scorers are used accordingly to the value of topScorer.)
 I can imagine several fixes, including:
  - removing this optimization when working on top of a filter
  - relaxing Weight documentation to allow for using {{score(Collector)}} when 
 {{topScorer == false}}
 but I'm not sure which one is the best one. What do you think?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5302) Make StemmerOverrideMap methods public

2013-10-25 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805609#comment-13805609
 ] 

Alan Woodward commented on LUCENE-5302:
---

Hm, this patch fails ant precommit with a javadocs warning:

lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/StemmerOverrideFilter.java:111:
 warning - Tag @link: can't find get(char[], int, Arc, BytesReader) in 
org.apache.lucene.analysis.miscellaneous.StemmerOverrideFilter.StemmerOverrideMap

...even though that method's javadoc is definitely there.  Maybe because it's 
not defining the generic parameter on Arc?  Anybody have any ideas, apart from 
changing the javadoc from a link var to a code var?

 Make StemmerOverrideMap methods public
 --

 Key: LUCENE-5302
 URL: https://issues.apache.org/jira/browse/LUCENE-5302
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Alan Woodward
Priority: Minor
 Attachments: LUCENE-5302.patch


 StemmerOverrideFilter is configured with an FST-based map that you can build 
 at construction time from a list of entries.  Building this FST offline and 
 loading it directly as a bytestream makes construction a lot quicker, but you 
 can't do that conveniently at the moment as all the methods of 
 StemmerOverrideMap are package-private.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-10-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805658#comment-13805658
 ] 

Michael McCandless commented on LUCENE-5189:


+1 to backport.

 Numeric DocValues Updates
 -

 Key: LUCENE-5189
 URL: https://issues.apache.org/jira/browse/LUCENE-5189
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, 
 LUCENE-5189-no-lost-updates.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189_process_events.patch, 
 LUCENE-5189_process_events.patch, LUCENE-5189-segdv.patch, 
 LUCENE-5189-updates-order.patch, LUCENE-5189-updates-order.patch


 In LUCENE-4258 we started to work on incremental field updates, however the 
 amount of changes are immense and hard to follow/consume. The reason is that 
 we targeted postings, stored fields, DV etc., all from the get go.
 I'd like to start afresh here, with numeric-dv-field updates only. There are 
 a couple of reasons to that:
 * NumericDV fields should be easier to update, if e.g. we write all the 
 values of all the documents in a segment for the updated field (similar to 
 how livedocs work, and previously norms).
 * It's a fairly contained issue, attempting to handle just one data type to 
 update, yet requires many changes to core code which will also be useful for 
 updating other data types.
 * It has value in and on itself, and we don't need to allow updating all the 
 data types in Lucene at once ... we can do that gradually.
 I have some working patch already which I'll upload next, explaining the 
 changes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5307) Inconsistency between Weight.scorer documentation and ConstantScoreQuery on top of a Filter

2013-10-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5307:
--

  Component/s: core/search
Fix Version/s: 5.0
   4.6

 Inconsistency between Weight.scorer documentation and ConstantScoreQuery on 
 top of a Filter
 ---

 Key: LUCENE-5307
 URL: https://issues.apache.org/jira/browse/LUCENE-5307
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Reporter: Adrien Grand
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 4.6, 5.0

 Attachments: LUCENE-5307.patch, LUCENE-5307-test.patch


 {{Weight.scorer}} states that if {{topScorer == true}}, {{Scorer.collect}} 
 will be called and that otherwise {{Scorer.nextDoc/advance}} will be called.
 This is a problem when {{ConstantScoreQuery}} is used on top of a 
 {{QueryWrapperFilter}}:
  1. {{ConstantScoreWeight}}  calls {{getDocIdSet}} on the filter to know 
 which documents to collect.
  2. {{QueryWrapperFilter.getDocIdSet}} returns a {{Scorer}} created with 
 {{topScorer == false}} so that {{nextDoc/advance}} are supported.
  3. But then {{ConstantScorer.score(Collector)}} has the following 
 optimization:
 {code}
 // this optimization allows out of order scoring as top scorer!
 @Override
 public void score(Collector collector) throws IOException {
   if (docIdSetIterator instanceof Scorer) {
 ((Scorer) docIdSetIterator).score(wrapCollector(collector));
   } else {
 super.score(collector);
   }
 }
 {code}
 So the filter iterator is a scorer which was created with {{topScorer = 
 false}} but {{ParentScorer}} ends up using its {{score(Collector)}} method, 
 which is illegal. (I found this out because AssertingSearcher has some checks 
 to make sure Scorers are used accordingly to the value of topScorer.)
 I can imagine several fixes, including:
  - removing this optimization when working on top of a filter
  - relaxing Weight documentation to allow for using {{score(Collector)}} when 
 {{topScorer == false}}
 but I'm not sure which one is the best one. What do you think?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5307) Inconsistency between Weight.scorer documentation and ConstantScoreQuery on top of a Filter

2013-10-25 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805673#comment-13805673
 ] 

Adrien Grand commented on LUCENE-5307:
--

+1!

 Inconsistency between Weight.scorer documentation and ConstantScoreQuery on 
 top of a Filter
 ---

 Key: LUCENE-5307
 URL: https://issues.apache.org/jira/browse/LUCENE-5307
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Reporter: Adrien Grand
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 4.6, 5.0

 Attachments: LUCENE-5307.patch, LUCENE-5307-test.patch


 {{Weight.scorer}} states that if {{topScorer == true}}, {{Scorer.collect}} 
 will be called and that otherwise {{Scorer.nextDoc/advance}} will be called.
 This is a problem when {{ConstantScoreQuery}} is used on top of a 
 {{QueryWrapperFilter}}:
  1. {{ConstantScoreWeight}}  calls {{getDocIdSet}} on the filter to know 
 which documents to collect.
  2. {{QueryWrapperFilter.getDocIdSet}} returns a {{Scorer}} created with 
 {{topScorer == false}} so that {{nextDoc/advance}} are supported.
  3. But then {{ConstantScorer.score(Collector)}} has the following 
 optimization:
 {code}
 // this optimization allows out of order scoring as top scorer!
 @Override
 public void score(Collector collector) throws IOException {
   if (docIdSetIterator instanceof Scorer) {
 ((Scorer) docIdSetIterator).score(wrapCollector(collector));
   } else {
 super.score(collector);
   }
 }
 {code}
 So the filter iterator is a scorer which was created with {{topScorer = 
 false}} but {{ParentScorer}} ends up using its {{score(Collector)}} method, 
 which is illegal. (I found this out because AssertingSearcher has some checks 
 to make sure Scorers are used accordingly to the value of topScorer.)
 I can imagine several fixes, including:
  - removing this optimization when working on top of a filter
  - relaxing Weight documentation to allow for using {{score(Collector)}} when 
 {{topScorer == false}}
 but I'm not sure which one is the best one. What do you think?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5239) Create and edit cwiki page for clustering search results

2013-10-25 Thread Cassandra Targett (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805702#comment-13805702
 ] 

Cassandra Targett commented on SOLR-5239:
-

[~dweiss] I finally got a chance to read this over, and I think it's really 
good the way you wrote it. The only thing I changed is the borders around your 
code examples - just to standardize them within the page and across all the 
pages. 

 Create and edit cwiki page for clustering search results
 

 Key: SOLR-5239
 URL: https://issues.apache.org/jira/browse/SOLR-5239
 Project: Solr
  Issue Type: Bug
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor

 Essentially pull out the information from:
 https://wiki.apache.org/solr/ClusteringComponent
 skipping any information about ancient versions?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5239) Create and edit cwiki page for clustering search results

2013-10-25 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805708#comment-13805708
 ] 

Dawid Weiss commented on SOLR-5239:
---

Thanks Cassandra!

 Create and edit cwiki page for clustering search results
 

 Key: SOLR-5239
 URL: https://issues.apache.org/jira/browse/SOLR-5239
 Project: Solr
  Issue Type: Bug
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor

 Essentially pull out the information from:
 https://wiki.apache.org/solr/ClusteringComponent
 skipping any information about ancient versions?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5392) extend solrj apis to cover collection management

2013-10-25 Thread Roman Shaposhnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Shaposhnik updated SOLR-5392:
---

Attachment: 0001-SOLR-5392.-extend-solrj-apis-to-cover-collection-man.patch

Please consider the following patch against trunk.

What I did here is -- I completely aped CoreAdminRequest and CoreAdminResponse 
keeping up with all the stylistic idiosyncrasies of the two. Hope this was the 
right thing to do.

Either way, please let me know what do you guys think.

 extend solrj apis to cover collection management
 

 Key: SOLR-5392
 URL: https://issues.apache.org/jira/browse/SOLR-5392
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 4.5
Reporter: Roman Shaposhnik
 Attachments: 
 0001-SOLR-5392.-extend-solrj-apis-to-cover-collection-man.patch


 It would be useful to extend solrj APIs to cover collection management calls: 
 https://cwiki.apache.org/confluence/display/solr/Collections+API 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5307) Inconsistency between Weight.scorer documentation and ConstantScoreQuery on top of a Filter

2013-10-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5307:
--

Attachment: LUCENE-5307.patch

Simplified patch. API of ConstantScorer does not change!

Thanks Adrien for review, will commit tomorrow!

 Inconsistency between Weight.scorer documentation and ConstantScoreQuery on 
 top of a Filter
 ---

 Key: LUCENE-5307
 URL: https://issues.apache.org/jira/browse/LUCENE-5307
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Reporter: Adrien Grand
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 4.6, 5.0

 Attachments: LUCENE-5307.patch, LUCENE-5307.patch, 
 LUCENE-5307-test.patch


 {{Weight.scorer}} states that if {{topScorer == true}}, {{Scorer.collect}} 
 will be called and that otherwise {{Scorer.nextDoc/advance}} will be called.
 This is a problem when {{ConstantScoreQuery}} is used on top of a 
 {{QueryWrapperFilter}}:
  1. {{ConstantScoreWeight}}  calls {{getDocIdSet}} on the filter to know 
 which documents to collect.
  2. {{QueryWrapperFilter.getDocIdSet}} returns a {{Scorer}} created with 
 {{topScorer == false}} so that {{nextDoc/advance}} are supported.
  3. But then {{ConstantScorer.score(Collector)}} has the following 
 optimization:
 {code}
 // this optimization allows out of order scoring as top scorer!
 @Override
 public void score(Collector collector) throws IOException {
   if (docIdSetIterator instanceof Scorer) {
 ((Scorer) docIdSetIterator).score(wrapCollector(collector));
   } else {
 super.score(collector);
   }
 }
 {code}
 So the filter iterator is a scorer which was created with {{topScorer = 
 false}} but {{ParentScorer}} ends up using its {{score(Collector)}} method, 
 which is illegal. (I found this out because AssertingSearcher has some checks 
 to make sure Scorers are used accordingly to the value of topScorer.)
 I can imagine several fixes, including:
  - removing this optimization when working on top of a filter
  - relaxing Weight documentation to allow for using {{score(Collector)}} when 
 {{topScorer == false}}
 but I'm not sure which one is the best one. What do you think?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5392) extend solrj apis to cover collection management

2013-10-25 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805743#comment-13805743
 ] 

Mark Miller commented on SOLR-5392:
---

bq.  I completely aped CoreAdminRequest and CoreAdminResponse keeping up with 
all the stylistic idiosyncrasies of the two

+1 - until someone is willing to clean up the whole shebang.

 extend solrj apis to cover collection management
 

 Key: SOLR-5392
 URL: https://issues.apache.org/jira/browse/SOLR-5392
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 4.5
Reporter: Roman Shaposhnik
 Attachments: 
 0001-SOLR-5392.-extend-solrj-apis-to-cover-collection-man.patch


 It would be useful to extend solrj APIs to cover collection management calls: 
 https://cwiki.apache.org/confluence/display/solr/Collections+API 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: What is recommended version of jdk 1.7?

2013-10-25 Thread Israel Ekpo
Also,

It is sometimes difficult to find this specific version.

You can download it here

http://www.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-javase7-521261.html#jre-7u25-oth-JPR




On Wed, Oct 23, 2013 at 8:46 AM, Uwe Schindler u...@thetaphi.de wrote:

 Use u25, this ist he latest stable version and works very fine.

 ** **

 -

 Uwe Schindler

 H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de

 eMail: u...@thetaphi.de

 ** **

 *From:* Danil ŢORIN [mailto:torin...@gmail.com]
 *Sent:* Wednesday, October 23, 2013 1:40 PM
 *To:* lucene-...@apache.org
 *Subject:* What is recommended version of jdk 1.7?

 ** **

 We had some problems with u45.

 I know there are several jiras, and a bug report for oracle.

 ** **

 But my question in more pragmatic: when running test for release like
 latest 4.5.1, what jvm (preferably 1.7) did you used ?

 ** **

 What is the lastest but safe version to use with Lucene?

 ** **




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Lucene40TermVectorsReader TVTermsEnum totalTermFreq() is not a total

2013-10-25 Thread Tom Burton-West
Hi all,

I was reading some code that calls Lucene40TermVectorsReader
TVTermsEnum

The method totalTermFreq() actually returns freq and the method docFreq()
returns 1.
Once you think about the context this sort of makes sense but I found this
confusing.

I'm guessing there is a good reason for the method to be called
totalTermFreq(), but I would like to know what that is.  Also is there
documentation somewhere in the javadocs that explains this?

Better yet, is there a good example of how to use the Lucene 4.x
TermVectors API?


Tom


[jira] [Commented] (LUCENE-5302) Make StemmerOverrideMap methods public

2013-10-25 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806001#comment-13806001
 ] 

Robert Muir commented on LUCENE-5302:
-

The \@link was broken before, javadocs were just never generated because it 
only had package visibility.

I think in this case the \@link just has to be qualified as 
FST.Arc/FST.BytesReader or fully-qualify or whatever.

 Make StemmerOverrideMap methods public
 --

 Key: LUCENE-5302
 URL: https://issues.apache.org/jira/browse/LUCENE-5302
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Alan Woodward
Priority: Minor
 Attachments: LUCENE-5302.patch


 StemmerOverrideFilter is configured with an FST-based map that you can build 
 at construction time from a list of entries.  Building this FST offline and 
 loading it directly as a bytestream makes construction a lot quicker, but you 
 can't do that conveniently at the moment as all the methods of 
 StemmerOverrideMap are package-private.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org