date:20130527


[ 
https://issues.apache.org/jira/browse/SOLR-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667683#comment-13667683
 ] 

Anshum Gupta commented on SOLR-4744:


Looks fine to me other than one small change which I don't think is a part of 
your patch but would be good if fixed.

DistributedUpdateProcessor.updateAdd(): Line 404

{quote}
 if (isLeader) {
   params.set(distrib.from, ZkCoreNodeProps.getCoreUrl(
   zkController.getBaseUrl(), req.getCore().getName()));
   }

   params.set(distrib.from, ZkCoreNodeProps.getCoreUrl(
   zkController.getBaseUrl(), req.getCore().getName()));
{quote}

 Version conflict error during shard split test
 --

 Key: SOLR-4744
 URL: https://issues.apache.org/jira/browse/SOLR-4744
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 4.4

 Attachments: SOLR-4744.patch


 ShardSplitTest fails sometimes with the following error:
 {code}
 [junit4:junit4]   1 INFO  - 2013-04-14 19:05:26.861; 
 org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state 
 invoked for collection: collection1
 [junit4:junit4]   1 INFO  - 2013-04-14 19:05:26.861; 
 org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state shard1 
 to inactive
 [junit4:junit4]   1 INFO  - 2013-04-14 19:05:26.861; 
 org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state 
 shard1_0 to active
 [junit4:junit4]   1 INFO  - 2013-04-14 19:05:26.861; 
 org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state 
 shard1_1 to active
 [junit4:junit4]   1 INFO  - 2013-04-14 19:05:26.873; 
 org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp= 
 path=/update params={wt=javabinversion=2} {add=[169 (1432319507166134272)]} 
 0 2
 [junit4:junit4]   1 INFO  - 2013-04-14 19:05:26.877; 
 org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
 WatchedEvent state:SyncConnected type:NodeDataChanged 
 path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
 [junit4:junit4]   1 INFO  - 2013-04-14 19:05:26.877; 
 org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
 WatchedEvent state:SyncConnected type:NodeDataChanged 
 path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
 [junit4:junit4]   1 INFO  - 2013-04-14 19:05:26.877; 
 org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
 WatchedEvent state:SyncConnected type:NodeDataChanged 
 path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
 [junit4:junit4]   1 INFO  - 2013-04-14 19:05:26.877; 
 org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
 WatchedEvent state:SyncConnected type:NodeDataChanged 
 path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
 [junit4:junit4]   1 INFO  - 2013-04-14 19:05:26.877; 
 org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
 WatchedEvent state:SyncConnected type:NodeDataChanged 
 path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
 [junit4:junit4]   1 INFO  - 2013-04-14 19:05:26.877; 
 org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
 WatchedEvent state:SyncConnected type:NodeDataChanged 
 path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
 [junit4:junit4]   1 INFO  - 2013-04-14 19:05:26.884; 
 org.apache.solr.update.processor.LogUpdateProcessor; 
 [collection1_shard1_1_replica1] webapp= path=/update 
 params={distrib.from=http://127.0.0.1:41028/collection1/update.distrib=FROMLEADERwt=javabindistrib.from.parent=shard1version=2}
  {} 0 1
 [junit4:junit4]   1 INFO  - 2013-04-14 19:05:26.885; 
 org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp= 
 path=/update 
 params={distrib.from=http://127.0.0.1:41028/collection1/update.distrib=FROMLEADERwt=javabindistrib.from.parent=shard1version=2}
  {add=[169 (1432319507173474304)]} 0 2
 [junit4:junit4]   1 ERROR - 2013-04-14 19:05:26.885; 
 org.apache.solr.common.SolrException; shard update error StdNode: 
 http://127.0.0.1:41028/collection1_shard1_1_replica1/:org.apache.solr.common.SolrException:
  version conflict for 169 expected=1432319507173474304 actual=-1
 [junit4:junit4]   1  at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:404)
 [junit4:junit4]   1  at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
 [junit4:junit4]   1  at 
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
 [junit4:junit4]   1  at 
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
 [junit4:junit4]   1  at

A strange RemoteSolrException

2013-05-27 Thread Hans-Peter Stricker

Hello,

I'm writing my first little Solrj program, but don't get it running because of 
an RemoteSolrException: Server at http://localhost:8983/solr returned non ok 
status:404

The server is definitely running and the url works in the browser.

I am working with Solr 4.3.0.

This is my source code:

public static void main(String[] args) {

String url = http://localhost:8983/solr;;
SolrServer server;

try {
server = new HttpSolrServer(url);
server.ping();
   } catch (Exception ex) {
ex.printStackTrace();
   }
}

with the stack trace:

org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at 
http://localhost:8983/solr returned non ok status:404, message:Not Found
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
 at org.apache.solr.client.solrj.request.SolrPing.process(SolrPing.java:62)
 at org.apache.solr.client.solrj.SolrServer.ping(SolrServer.java:293)
 at de.epublius.blogindexer.App.main(App.java:47)

If I call server.shutdown(), there is no such exception, but for almost all 
other SolrServer-methods.

What am I doing wrong?

Thanks in advance

Hans-Peter

[jira] [Updated] (LUCENE-5013) ScandinavianInterintelligableASCIIFoldingFilter

[
https://issues.apache.org/jira/browse/LUCENE-5013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Karl Wettin updated LUCENE-5013:

Attachment: LUCENE-5013-6.txt

It's all good now.

Thanks for the help and input, everybody. Have fun, and I hope someone else buy
me finds this useful.

ScandinavianInterintelligableASCIIFoldingFilter
---

Key: LUCENE-5013
URL: https://issues.apache.org/jira/browse/LUCENE-5013
Project: Lucene - Core
Issue Type: New Feature
Components: modules/analysis
Affects Versions: 4.3
Reporter: Karl Wettin
Priority: Trivial
Attachments: LUCENE-5013-2.txt, LUCENE-5013-3.txt, LUCENE-5013-4.txt,
LUCENE-5013-5.txt, LUCENE-5013-6.txt, LUCENE-5013.txt

This filter is an augmentation of output from ASCIIFoldingFilter,
it discriminate against double vowels aa, ae, ao, oe and oo, leaving just the
first one.
blåbærsyltetøj == blåbärsyltetöj == blaabaarsyltetoej == blabarsyltetoj
räksmörgås == ræksmørgås == ræksmörgaos == raeksmoergaas == raksmorgas
Caveats:
Since this is a filtering on top of ASCIIFoldingFilter äöåøæ already has been
folded down to aoaoae when handled by this filter it will cause effects such
as:
bøen - boen - bon
åene - aene - ane
I find this to be a trivial problem compared to not finding anything at all.
Background:
Swedish åäö is in fact the same letters as Norwegian and Danish åæø and thus
interchangeable in when used between these languages. They are however folded
differently when people type them on a keyboard lacking these characters and
ASCIIFoldingFilter handle ä and æ differently.
When a Swedish person is lacking umlauted characters on the keyboard they
consistently type a, a, o instead of å, ä, ö. Foreigners also tend to use a,
a, o.
In Norway people tend to type aa, ae and oe instead of å, æ and ø. Some use
a, a, o. I've also seen oo, ao, etc. And permutations. Not sure about Denmark
but the pattern is probably the same.
This filter solves that problem, but might also cause new.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5013) ScandinavianInterintelligableASCIIFoldingFilter

[
https://issues.apache.org/jira/browse/LUCENE-5013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667708#comment-13667708
]

Karl Wettin edited comment on LUCENE-5013 at 5/27/13 11:45 AM:
---

It's all good now.

Thanks for the help and input, everybody. Have fun, and I hope someone else but
me finds this useful.

was (Author: karl.wettin):
It's all good now.

Thanks for the help and input, everybody. Have fun, and I hope someone else buy
me finds this useful.

ScandinavianInterintelligableASCIIFoldingFilter
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-4470) Support for basic http auth in internal solr requests

[
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667712#comment-13667712
]

Jan Høydahl edited comment on SOLR-4470 at 5/27/13 12:02 PM:
-

bq. JVM params is the simplest way to control which implementations to be used
behind the interfaces. That is, in my opinion, what should have been included
here. Going from control through JVM params and adding support for control
through solr.xml or something else should be another issue, but it is certainly
a good and valid idea.

The way we normally set up {{solr.xml}} configs is with
mytag$\{my.jvm.param\}/mytag style, so admin can choose whether to pass the
option as JVM param or include it in solr.xml directly. Something like

{code:xml}
solr
...
subRequestFactory class=${solr.subRequestFactory} /
internalRequestFactory class=${solr.internalRequestFactory} /
/solr
{code}

Regarding [~markrmil...@gmail.com]'s concern with authorization creep, I to
some extent agree. But since, as you say, this is test-code only, let's move
the class {{RegExpAuthorizationFilter}} from runtime codebase and into the test
framework. In that way, it is clear that it is only used for realistic test
coverage. And if anyone wishes to setup a similar setup in their production
they may borrow code from the test class, but it will be a manual step
reinforcing that this is not a supported feature of the project as such.

was (Author: janhoy):
bq. JVM params is the simplest way to control which implementations to be
used behind the interfaces. That is, in my opinion, what should have been
included here. Going from control through JVM params and adding support for
control through solr.xml or something else should be another issue, but it is
certainly a good and valid idea.

{code:xml}
solr
...
subRequestFactory class=${solr.subRequestFactory} /
internalRequestFactory class=${solr.internalRequestFactory} /
/solr
{code}

Support for basic http auth in internal solr requests
-

Attachments: SOLR-4470_branch_4x_r1452629.patch,
SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r145.patch,
SOLR-4470.patch

We want to protect any HTTP-resource (url). We want to require credentials no
matter what kind of HTTP-request you make to a Solr-node.
It can faily easy be acheived as described on
http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes
also make internal request to other Solr-nodes, and for it to work
credentials need to be provided here also.
Ideally we would like to forward credentials from a particular request to
all the internal sub-requests it triggers. E.g. for search and update
request.
But there are also internal requests
* that only indirectly/asynchronously triggered from outside requests (e.g.
shard creation/deletion/etc based on calls to the Collection API)
* that do not in any way have relation to an outside super-request (e.g.
replica synching stuff)
We would like to aim at a solution where original credentials are
forwarded when a request directly/synchronously trigger a subrequest, and
fallback to a configured internal credentials for the
asynchronous/non-rooted requests.
In our solution we would aim at only supporting basic http auth, but we would
like to make a framework around it, so that not to much refactoring is
needed if you later want to make support for other kinds of auth (e.g. digest)
We will work at a solution but create this JIRA issue early in order to get
input/comments from the community as early as

[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests

[
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667712#comment-13667712
]

Jan Høydahl commented on SOLR-4470:
---

{code:xml}
solr
...
subRequestFactory class=${solr.subRequestFactory} /
internalRequestFactory class=${solr.internalRequestFactory} /
/solr
{code}

Support for basic http auth in internal solr requests
-

Attachments: SOLR-4470_branch_4x_r1452629.patch,
SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r145.patch,
SOLR-4470.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5013) ScandinavianInterintelligableASCIIFoldingFilter

[
https://issues.apache.org/jira/browse/LUCENE-5013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667713#comment-13667713
]

Jan Høydahl commented on LUCENE-5013:
-

Can you upload the patch as LUCENE-5013.patch ? That's the standard naming
convention around here :)

ScandinavianInterintelligableASCIIFoldingFilter
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5013) ScandinavianInterintelligableASCIIFoldingFilter

[
https://issues.apache.org/jira/browse/LUCENE-5013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Karl Wettin updated LUCENE-5013:

Attachment: LUCENE-5013.patch

Patch blessed with ASL2

ScandinavianInterintelligableASCIIFoldingFilter
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5013) ScandinavianInterintelligableASCIIFoldingFilter

[
https://issues.apache.org/jira/browse/LUCENE-5013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Karl Wettin updated LUCENE-5013:

Attachment: (was: LUCENE-5013.patch)

ScandinavianInterintelligableASCIIFoldingFilter
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5013) ScandinavianInterintelligableASCIIFoldingFilter

[
https://issues.apache.org/jira/browse/LUCENE-5013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Karl Wettin updated LUCENE-5013:

Attachment: LUCENE-5013.patch

Patch blessed with ASL2.

ScandinavianInterintelligableASCIIFoldingFilter
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests

[
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667754#comment-13667754
]

Jan Høydahl commented on SOLR-4470:
---

I tried to move AuthCredentialsSource to test scope, but there is a
compile-time dependency in JettySolrRunner method lifeCycleStarted(). Can we
refactor this piece of code into test-scope as well, e.g. by exposing some a
Filter setter on JettySolrRunner?

Support for basic http auth in internal solr requests
-

Attachments: SOLR-4470_branch_4x_r1452629.patch,
SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r145.patch,
SOLR-4470.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-4470) Support for basic http auth in internal solr requests

[
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667754#comment-13667754
]

Jan Høydahl edited comment on SOLR-4470 at 5/27/13 1:59 PM:

I tried to move {{RegExpAuthorizationFilter}} to test scope, but there is a
compile-time dependency in JettySolrRunner method lifeCycleStarted(). Can we
refactor this piece of code into test-scope as well, e.g. by exposing some a
Filter setter on JettySolrRunner?

was (Author: janhoy):
I tried to move AuthCredentialsSource to test scope, but there is a
compile-time dependency in JettySolrRunner method lifeCycleStarted(). Can we
refactor this piece of code into test-scope as well, e.g. by exposing some a
Filter setter on JettySolrRunner?

Support for basic http auth in internal solr requests
-

Attachments: SOLR-4470_branch_4x_r1452629.patch,
SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r145.patch,
SOLR-4470.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4862) Core admin action CREATE fails to persist some settings in solr.xml

André Widhani created SOLR-4862:
---

 Summary: Core admin action CREATE fails to persist some settings 
in solr.xml
 Key: SOLR-4862
 URL: https://issues.apache.org/jira/browse/SOLR-4862
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 4.3
Reporter: André Widhani
Priority: Minor


When I create a core with Core admin handler using these request parameters:

action=CREATE
name=core-tex69bbum21ctk1kq6lmkir-index3
schema=/etc/opt/dcx/solr/conf/schema.xml
instanceDir=/etc/opt/dcx/solr/
config=/etc/opt/dcx/solr/conf/solrconfig.xml
dataDir=/var/opt/dcx/solr/core-tex69bbum21ctk1kq6lmkir-index3

in Solr 4.1, solr.xml would have the following entry:

core schema=/etc/opt/dcx/solr/conf/schema.xml loadOnStartup=true 
instanceDir=/etc/opt/dcx/solr/ transient=false 
name=core-tex69bbum21ctk1kq6lmkir-index3 
config=/etc/opt/dcx/solr/conf/solrconfig.xml 
dataDir=/var/opt/dcx/solr/core-tex69bbum21ctk1kq6lmkir-index3/ 
collection=core-tex69bbum21ctk1kq6lmkir-index3/

while in Solr 4.3 schema, config and dataDir will be missing:

core loadOnStartup=true instanceDir=/etc/opt/dcx/solr/ 
transient=false name=core-tex69bbum21ctk1kq6lmkir-index3 
collection=core-tex69bbum21ctk1kq6lmkir-index3/

The new core would use the settings specified during CREATE, but after a Solr 
restart they are lost (fall back to some defaults), as they are not persisted 
in solr.xml. I should add that solr.xml has persistent=true in the root 
element.

http://lucene.472066.n3.nabble.com/Core-admin-action-quot-CREATE-quot-fails-to-persist-some-settings-in-solr-xml-with-Solr-4-3-td4065786.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Scoring prefix query matches by matching word position

2013-05-27 Thread Han Wang

I have found a mail sent to you long ago for scoring prefix query matches
by matching word position
http://mail-archives.apache.org/mod_mbox/lucene-dev/200612.mbox/%3CF43898E1E0300149BEB43F3A66423D6C0C141AEE@ex8.hostedexchange.local%3E

May i know the offical solution for this quetion?

-- 
Tom Wang
EECS of Peking University
汪罕
北京大学信息科学技术学院计算机系

[jira] [Created] (SOLR-4863) SolrDynamicMBean still uses sourceId in dynamic stats

2013-05-27 Thread Shalin Shekhar Mangar (JIRA)

Shalin Shekhar Mangar created SOLR-4863:
---

 Summary: SolrDynamicMBean still uses sourceId in dynamic stats
 Key: SOLR-4863
 URL: https://issues.apache.org/jira/browse/SOLR-4863
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.4


As noted in solr-user:

http://www.mail-archive.com/solr-user@lucene.apache.org/msg82650.html

SOLR-3329 removed the sourceId from SolrInfoMBean but it wasn't removed from 
the dynamic stats. This leads to exceptions on access.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-4862) Core admin action CREATE fails to persist some settings in solr.xml

2013-05-27 Thread Shalin Shekhar Mangar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar reassigned SOLR-4862:
---

Assignee: Erick Erickson

 Core admin action CREATE fails to persist some settings in solr.xml
 -

 Key: SOLR-4862
 URL: https://issues.apache.org/jira/browse/SOLR-4862
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 4.3
Reporter: André Widhani
Assignee: Erick Erickson
Priority: Minor

 When I create a core with Core admin handler using these request parameters:
 action=CREATE
 name=core-tex69bbum21ctk1kq6lmkir-index3
 schema=/etc/opt/dcx/solr/conf/schema.xml
 instanceDir=/etc/opt/dcx/solr/
 config=/etc/opt/dcx/solr/conf/solrconfig.xml
 dataDir=/var/opt/dcx/solr/core-tex69bbum21ctk1kq6lmkir-index3
 in Solr 4.1, solr.xml would have the following entry:
 core schema=/etc/opt/dcx/solr/conf/schema.xml loadOnStartup=true 
 instanceDir=/etc/opt/dcx/solr/ transient=false 
 name=core-tex69bbum21ctk1kq6lmkir-index3 
 config=/etc/opt/dcx/solr/conf/solrconfig.xml 
 dataDir=/var/opt/dcx/solr/core-tex69bbum21ctk1kq6lmkir-index3/ 
 collection=core-tex69bbum21ctk1kq6lmkir-index3/
 while in Solr 4.3 schema, config and dataDir will be missing:
 core loadOnStartup=true instanceDir=/etc/opt/dcx/solr/ 
 transient=false name=core-tex69bbum21ctk1kq6lmkir-index3 
 collection=core-tex69bbum21ctk1kq6lmkir-index3/
 The new core would use the settings specified during CREATE, but after a Solr 
 restart they are lost (fall back to some defaults), as they are not persisted 
 in solr.xml. I should add that solr.xml has persistent=true in the root 
 element.
 http://lucene.472066.n3.nabble.com/Core-admin-action-quot-CREATE-quot-fails-to-persist-some-settings-in-solr-xml-with-Solr-4-3-td4065786.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5017) SpatialOpRecursivePrefixTreeTest is failing

2013-05-27 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667773#comment-13667773
 ] 

David Smiley commented on LUCENE-5017:
--

Thanks for bringing this to my attention Mike.  I'll look into it.  I wish I 
could subscribe to test failures in spatial, and if somehow test failures that 
still fail for a given seed could be tracked somewhere such that we can see 
outstanding problems that haven't been fixed.

 SpatialOpRecursivePrefixTreeTest is failing
 ---

 Key: LUCENE-5017
 URL: https://issues.apache.org/jira/browse/LUCENE-5017
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spatial
Reporter: Michael McCandless
 Fix For: 5.0, 4.4


 This has been failing lately on trunk (e.g. on rev 1486339):
 {noformat}
 ant test  -Dtestcase=SpatialOpRecursivePrefixTreeTest 
 -Dtestmethod=testContains -Dtests.seed=456022665217DADF:2C2A2816BD2BA1C5 
 -Dtests.slow=true -Dtests.locale=nl_BE -Dtests.timezone=Poland 
 -Dtests.file.encoding=ISO-8859-1
 {noformat}
 Not sure what's up ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4858) updateLog + core reload + deleteByQuery = leaked directory


[ 
https://issues.apache.org/jira/browse/SOLR-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667817#comment-13667817
 ] 

Anshum Gupta commented on SOLR-4858:


Flipping the core reload and delete i.e. delete followed by a core reload also 
makes it pass.

 updateLog + core reload + deleteByQuery = leaked directory
 --

 Key: SOLR-4858
 URL: https://issues.apache.org/jira/browse/SOLR-4858
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2.1
Reporter: Hoss Man
 Attachments: SOLR-4858.patch


 I havene't been able to make sense of this yet, but trying to track down 
 another bug lead me to discover that the following combination leads to 
 problems...
 * updateLog enabled
 * do a core reload
 * do a delete by query \*:\*
 ...leave out any one of the three, and everything works fine.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4655) The Overseer should assign node names by default.

[
https://issues.apache.org/jira/browse/SOLR-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667862#comment-13667862
]

Anshum Gupta commented on SOLR-4655:

I'll just start working on this. The email for this completely skipped my eyes.

The Overseer should assign node names by default.
-

Key: SOLR-4655
URL: https://issues.apache.org/jira/browse/SOLR-4655
Project: Solr
Issue Type: Improvement
Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Fix For: 4.4

Attachments: SOLR-4655.patch, SOLR-4655.patch, SOLR-4655.patch,
SOLR-4655.patch, SOLR-4655.patch, SOLR-4655.patch, SOLR-4655.patch

Currently we make a unique node name by using the host address as part of the
name. This means that if you want a node with a new address to take over, the
node name is misleading. It's best if you set custom names for each node
before starting your cluster. This is cumbersome though, and cannot currently
be done with the collections API. Instead, the overseer could assign a more
generic name such as nodeN by default. Then you can easily swap in another
node with no pre planning and no confusion in the name.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5018) Never update offsets in CompoundWordTokenFilterBase

2013-05-27 Thread Adrien Grand (JIRA)

Adrien Grand created LUCENE-5018:


 Summary: Never update offsets in CompoundWordTokenFilterBase
 Key: LUCENE-5018
 URL: https://issues.apache.org/jira/browse/LUCENE-5018
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Adrien Grand


CompoundWordTokenFilterBase and its children DictionaryCompoundWordTokenFilter 
and HyphenationCompoundWordTokenFilter update offsets. This can make 
OffsetAttributeImpl trip an exception when chained with other filters that 
group tokens together such as ShingleFilter, see 
http://www.gossamer-threads.com/lists/lucene/java-dev/196376?page=last.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.8.0-ea-b89) - Build # 5762 - Failure!

2013-05-27 Thread Adrien Grand

The culprit is HyphenationCompoundWordTokenFilter, I opened
https://issues.apache.org/jira/browse/LUCENE-5018.

--
Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4228) SolrPing - add methods for enable/disable

2013-05-27 Thread Shawn Heisey (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-4228:
---

Attachment: SOLR-4228.patch

New patch with some cleanups and CHANGES.txt that lists it as a new feature on 
version 4.4.  Like the previous patch versions, it doesn't change default 
behavior, just adds new capability.  On trunk, precommit passes and solr tests 
are underway.  If that works OK, I will commit soon.  Before committing to 4x, 
I will also give it a try in my dev environment.

 SolrPing - add methods for enable/disable
 -

 Key: SOLR-4228
 URL: https://issues.apache.org/jira/browse/SOLR-4228
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 4.0
Reporter: Shawn Heisey
 Fix For: 4.4

 Attachments: SOLR-4228.patch, SOLR-4228.patch, SOLR-4228.patch, 
 SOLR-4228.patch, SOLR-4228.patch, SOLR-4228.patch, SOLR-4228.patch, 
 SOLR-4228.patch


 The new PingRequestHandler in Solr 4.0 takes over what actions.jsp used to do 
 in older versions.  Create methods in the SolrPing request object to access 
 this capability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4655) The Overseer should assign node names by default.


[ 
https://issues.apache.org/jira/browse/SOLR-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667889#comment-13667889
 ] 

Anshum Gupta commented on SOLR-4655:


I integrated the above patch and have the following tests failing on the trunk. 
[~markrmil...@gmail.com] Can you confirm that all of these tests fail for you 
as well?

[junit4:junit4]   - org.apache.solr.cloud.ShardSplitTest.testDistribSearch
[junit4:junit4]   - 
org.apache.solr.cloud.ChaosMonkeyShardSplitTest.testDistribSearch
[junit4:junit4]   - 
org.apache.solr.cloud.ClusterStateUpdateTest.testCoreRegistration
[junit4:junit4]   - 
org.apache.solr.cloud.BasicDistributedZkTest.testDistribSearch
[junit4:junit4]   - org.apache.solr.cloud.BasicDistributedZkTest (suite)

 The Overseer should assign node names by default.
 -

 Key: SOLR-4655
 URL: https://issues.apache.org/jira/browse/SOLR-4655
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.4

 Attachments: SOLR-4655.patch, SOLR-4655.patch, SOLR-4655.patch, 
 SOLR-4655.patch, SOLR-4655.patch, SOLR-4655.patch, SOLR-4655.patch


 Currently we make a unique node name by using the host address as part of the 
 name. This means that if you want a node with a new address to take over, the 
 node name is misleading. It's best if you set custom names for each node 
 before starting your cluster. This is cumbersome though, and cannot currently 
 be done with the collections API. Instead, the overseer could assign a more 
 generic name such as nodeN by default. Then you can easily swap in another 
 node with no pre planning and no confusion in the name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5018) Never update offsets in CompoundWordTokenFilterBase

2013-05-27 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5018:
-

Attachment: LUCENE-5018.patch

Here is a patch.

 Never update offsets in CompoundWordTokenFilterBase
 ---

 Key: LUCENE-5018
 URL: https://issues.apache.org/jira/browse/LUCENE-5018
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Adrien Grand
 Attachments: LUCENE-5018.patch


 CompoundWordTokenFilterBase and its children 
 DictionaryCompoundWordTokenFilter and HyphenationCompoundWordTokenFilter 
 update offsets. This can make OffsetAttributeImpl trip an exception when 
 chained with other filters that group tokens together such as ShingleFilter, 
 see http://www.gossamer-threads.com/lists/lucene/java-dev/196376?page=last.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5014) ANTLR Lucene query parser

2013-05-27 Thread Roman Chyla (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667908#comment-13667908
 ] 

Roman Chyla commented on LUCENE-5014:
-

Hi David,
In practical terms ANTLR can do exactly the same thing as PEG (ie lookahead, 
backtracking,memoization) - see this 
http://stackoverflow.com/questions/8816759/ll-versus-peg-parsers-what-is-the-difference

But it is also capable of doing more things than PEG (ie. better error recovery 
- PEG parser needs to parse the whole tree before it discovers an error; then 
the error recovery is not the same thing)

PEG's can be easier *especially* because of the first-choice operator; in fact 
at times I wished that ANTLR just chose the first available option (well, it 
does, but it reports and error and I didn't want to have grammar with errors). 
So, in CFGANTLR world, ambiguity is solved using syntactic predicated 
(lookahead) -- so far, this has been a theoretical, here are few more points:

Clarity
===

I looked at the presentation and the parser contains the operator precedence, 
however there it is spread across several screens of java code, i find the 
following much more readable

{code}
mainQ : 
  clauseOr+ EOF
  ;
  
clauseOr
  : clauseAnd (or clauseAnd )*
  ;

clauseAnd
  : clauseNot  (and clauseNot)*
  ; 
{code}
  
It is essentially the same thing, but it is independent of the Java and I can 
see it on few lines - and extend it adding few more lines. The patch I wrote 
makes the handling of separate grammar and generated code seamless. So the 2/3 
advantages of PEG over ANTLR disappear.


Syntax vs semantics (business logic)


The example from the presentation needs to be much more involved if it is to be 
used in the real life. Consider this query:

{noformat}
dog NEAR cat
{noformat}

This is going to work only in the simplest case, where each term is a single 
TermQuery. Yet if there was a synonym expansion (where would it go inside the 
PEG parser, is one question) - the parser needs to *rewrite* the query 

something like:

{noformat}
(dog|canin) NEAR cat -- (dog NEAR cat) OR (canin NEAR cat)
{noformat}

So, there you get the 'spaghetti problem' - in the example presented, the logic 
that rewrites the query must reside in the same place as the query parsing. 
That is not an improvement IMO, it is the same thing as the old Lucene parsers 
written in JavaCC which are very difficult to extend or debug

I think I'll add a new grammar with the proximity operators so that you can see 
how easy it is to solve the same situation with ANTLR (but you will need to 
read the patch this time ;)) btw. the patch is big because i included the html 
with SVG charts of the generated parse trees and one Excel file (that one helps 
in writing unittest for the grammar)

Developer vs user experience


I think PEG definitely looks simpler (in the presented example) and its main 
advantage is the first-choice operator. But since ANTLR can do the same and it 
has programming language independent grammar, it can do the same job. The 
difference may be in maturity of the project, tools available (ie debuggers) - 
and of course implementation (see the link above for details)

I can imagine that for PEG you can use your IDE of choice, while with ANTLR 
there is this 'pesky' level of abstraction - but there are tools that make life 
bearable, such as ANTLRWorks or Eclipse ANTLR debugger (though I have not liked 
that one); grammar unittest and I added ways to debug/view the grammar. Again, 
I recommend trying it, e.g. 

{code}
ant -f aqp-build.xml gunit
# edit StandardLuceneGrammar and save as 'mytestgrammar'
ant -f aqp-build.xml try-view -Dquery=foo NEAR bar -Dgrammar=mytestgrammar
{code}


There may be of course more things to consider, but I believe the 3 issues 
above present some interesting vantage points.

 ANTLR Lucene query parser
 -

 Key: LUCENE-5014
 URL: https://issues.apache.org/jira/browse/LUCENE-5014
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser, modules/queryparser
Affects Versions: 4.3
 Environment: all
Reporter: Roman Chyla
  Labels: antlr, query, queryparser
 Attachments: LUCENE-5014.txt, LUCENE-5014.txt


 I would like to propose a new way of building query parsers for Lucene.  
 Currently, most Lucene parsers are hard to extend because they are either 
 written in Java (ie. the SOLR query parser, or edismax) or the parsing logic 
 is 'married' with the query building logic (i.e. the standard lucene parser, 
 generated by JavaCC) - which makes any extension really hard.
 Few years back, Lucene got the contrib/modern query parser (later renamed to 
 'flexible'), yet that parser didn't become a star (it

[jira] [Comment Edited] (LUCENE-5014) ANTLR Lucene query parser

2013-05-27 Thread Roman Chyla (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667908#comment-13667908
 ] 

Roman Chyla edited comment on LUCENE-5014 at 5/27/13 7:04 PM:
--

Hi David,
In practical terms ANTLR can do exactly the same thing as PEG (ie lookahead, 
backtracking,memoization) - see this 
http://stackoverflow.com/questions/8816759/ll-versus-peg-parsers-what-is-the-difference

But it is also capable of doing more things than PEG (ie. better error recovery 
- PEG parser needs to parse the whole tree before it discovers an error; then 
the error recovery is not the same thing)

PEG's can be easier *especially* because of the first-choice operator; in fact 
at times I wished that ANTLR just chose the first available option (well, it 
does, but it reports and error and I didn't want to have grammar with errors). 
So, in CFGANTLR world, ambiguity is solved using syntactic predicates 
(lookahead) -- so far, this has been a theoretical, here are few more points:

Grammar vs code
===

I looked at the presentation and the parser contains the operator precedence, 
however there it is spread across several screens of java code, i find the 
following much more readable

{code}
mainQ : 
  clauseOr+ EOF
  ;
  
clauseOr
  : clauseAnd (or clauseAnd )*
  ;

clauseAnd
  : clauseNot  (and clauseNot)*
  ; 
{code}
  
It is essentially the same thing, but it is independent of the Java and I can 
see it on few lines - and extend it adding few more lines. The patch I wrote 
makes the handling of separate grammar and generated code seamless. So the 2/3 
advantages of PEG over ANTLR disappear.


Syntax vs semantics (business logic)


The example from the presentation needs to be much more involved if it is to be 
used in the real life. Consider this query:

{noformat}
dog NEAR cat
{noformat}

This is going to work only in the simplest case, where each term is a single 
TermQuery. Yet if there was a synonym expansion (where would it go inside the 
PEG parser, is one question) - the parser needs to *rewrite* the query 

something like:

{noformat}
(dog|canin) NEAR cat -- (dog NEAR cat) OR (canin NEAR cat)
{noformat}

So, there you get the 'spaghetti problem' - in the example presented, the logic 
that rewrites the query must reside in the same place as the query parsing. 
That is not an improvement IMO, it is the same thing as the old Lucene parsers 
written in JavaCC which are very difficult to extend or debug

I think I'll add a new grammar with the proximity operators so that you can see 
how easy it is to solve the same situation with ANTLR (but you will need to 
read the patch this time ;)) btw. the patch is big because i included the html 
with SVG charts of the generated parse trees and one Excel file (that one helps 
in writing unittest for the grammar)


Developer vs user experience


I think PEG definitely looks simpler to developers (in the presented example) 
and its main advantage is the first-choice operator. But since ANTLR can do the 
same and it has programming language independent grammar, it can do the same 
job. The difference may be in maturity of the project, tools available (ie 
debuggers) - and of course implementation (see the link above for details)

I can imagine that for PEG you can use your IDE of choice, while with ANTLR 
there is this 'pesky' level of abstraction - but there are tools that make life 
bearable, such as ANTLRWorks or Eclipse ANTLR debugger (though I have not liked 
that one); grammar unittest and I added ways to debug/view the grammar. If you 
apply the patch, you can try:

{code}
ant -f aqp-build.xml gunit
# edit StandardLuceneGrammar and save as 'mytestgrammar'
ant -f aqp-build.xml try-view -Dquery=foo NEAR bar -Dgrammar=mytestgrammar
{code}


There may be of course more things to consider, but I believe the 3 issues 
above present some interesting vantage points.

  was (Author: rchyla):
Hi David,
In practical terms ANTLR can do exactly the same thing as PEG (ie lookahead, 
backtracking,memoization) - see this 
http://stackoverflow.com/questions/8816759/ll-versus-peg-parsers-what-is-the-difference

But it is also capable of doing more things than PEG (ie. better error recovery 
- PEG parser needs to parse the whole tree before it discovers an error; then 
the error recovery is not the same thing)

PEG's can be easier *especially* because of the first-choice operator; in fact 
at times I wished that ANTLR just chose the first available option (well, it 
does, but it reports and error and I didn't want to have grammar with errors). 
So, in CFGANTLR world, ambiguity is solved using syntactic predicated 
(lookahead) -- so far, this has been a theoretical, here are few more points:

Clarity
===

I looked at the presentation and the parser contains

[jira] [Commented] (SOLR-4862) Core admin action CREATE fails to persist some settings in solr.xml

2013-05-27 Thread Li Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667923#comment-13667923
 ] 

Li Xu commented on SOLR-4862:
-

In the example given above, you don't really need to specify config and schema 
parameters. By default, Solr looks in instanceDir/conf for them. However, if 
you name your xml files different from the defaults, then this bug will cause 
you problems.

 Core admin action CREATE fails to persist some settings in solr.xml
 -

 Key: SOLR-4862
 URL: https://issues.apache.org/jira/browse/SOLR-4862
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 4.3
Reporter: André Widhani
Assignee: Erick Erickson
Priority: Minor

 When I create a core with Core admin handler using these request parameters:
 action=CREATE
 name=core-tex69bbum21ctk1kq6lmkir-index3
 schema=/etc/opt/dcx/solr/conf/schema.xml
 instanceDir=/etc/opt/dcx/solr/
 config=/etc/opt/dcx/solr/conf/solrconfig.xml
 dataDir=/var/opt/dcx/solr/core-tex69bbum21ctk1kq6lmkir-index3
 in Solr 4.1, solr.xml would have the following entry:
 core schema=/etc/opt/dcx/solr/conf/schema.xml loadOnStartup=true 
 instanceDir=/etc/opt/dcx/solr/ transient=false 
 name=core-tex69bbum21ctk1kq6lmkir-index3 
 config=/etc/opt/dcx/solr/conf/solrconfig.xml 
 dataDir=/var/opt/dcx/solr/core-tex69bbum21ctk1kq6lmkir-index3/ 
 collection=core-tex69bbum21ctk1kq6lmkir-index3/
 while in Solr 4.3 schema, config and dataDir will be missing:
 core loadOnStartup=true instanceDir=/etc/opt/dcx/solr/ 
 transient=false name=core-tex69bbum21ctk1kq6lmkir-index3 
 collection=core-tex69bbum21ctk1kq6lmkir-index3/
 The new core would use the settings specified during CREATE, but after a Solr 
 restart they are lost (fall back to some defaults), as they are not persisted 
 in solr.xml. I should add that solr.xml has persistent=true in the root 
 element.
 http://lucene.472066.n3.nabble.com/Core-admin-action-quot-CREATE-quot-fails-to-persist-some-settings-in-solr-xml-with-Solr-4-3-td4065786.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5019) SimpleFragmentScorer can create very long fragments

Alexandre Patry created LUCENE-5019:
---

 Summary: SimpleFragmentScorer can create very long fragments
 Key: LUCENE-5019
 URL: https://issues.apache.org/jira/browse/LUCENE-5019
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 4.3
Reporter: Alexandre Patry
Priority: Minor


In SimpleFragmentScorer, when a query term is followed by a stop word, the 
fragment will run until the end of the document.

When a query term is encountered (line 80), SimpleFragmentScorer waits for the 
token following it before allowing the fragment to end (lines 68 to 72). When a 
stop word follows the query word (or any token with a position increment 
greater than 1), its position is skipped and the token SimpleFragmentScorer is 
waiting for never arrive.

The attached patch fixes that by waiting for the first token following the 
query word instead of the token at the position after the query term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5019) SimpleFragmentScorer can create very long fragments


 [ 
https://issues.apache.org/jira/browse/LUCENE-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Patry updated LUCENE-5019:


Attachment: simple-span-fragmenter.patch

A patch to fix SimpleFragmentScorer.

 SimpleFragmentScorer can create very long fragments
 ---

 Key: LUCENE-5019
 URL: https://issues.apache.org/jira/browse/LUCENE-5019
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 4.3
Reporter: Alexandre Patry
Priority: Minor
 Attachments: simple-span-fragmenter.patch


 In SimpleFragmentScorer, when a query term is followed by a stop word, the 
 fragment will run until the end of the document.
 When a query term is encountered (line 80), SimpleFragmentScorer waits for 
 the token following it before allowing the fragment to end (lines 68 to 72). 
 When a stop word follows the query word (or any token with a position 
 increment greater than 1), its position is skipped and the token 
 SimpleFragmentScorer is waiting for never arrive.
 The attached patch fixes that by waiting for the first token following the 
 query word instead of the token at the position after the query term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5019) SimpleFragmentScorer can create very long fragments


[ 
https://issues.apache.org/jira/browse/LUCENE-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667932#comment-13667932
 ] 

Alexandre Patry edited comment on LUCENE-5019 at 5/27/13 7:59 PM:
--

A patch to fix SimpleSpanFragmenter.

  was (Author: apatry):
A patch to fix SimpleFragmentScorer.
  
 SimpleFragmentScorer can create very long fragments
 ---

 Key: LUCENE-5019
 URL: https://issues.apache.org/jira/browse/LUCENE-5019
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 4.3
Reporter: Alexandre Patry
Priority: Minor
 Attachments: simple-span-fragmenter.patch


 In SimpleFragmentScorer, when a query term is followed by a stop word, the 
 fragment will run until the end of the document.
 When a query term is encountered (line 80), SimpleFragmentScorer waits for 
 the token following it before allowing the fragment to end (lines 68 to 72). 
 When a stop word follows the query word (or any token with a position 
 increment greater than 1), its position is skipped and the token 
 SimpleFragmentScorer is waiting for never arrive.
 The attached patch fixes that by waiting for the first token following the 
 query word instead of the token at the position after the query term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5019) SimpleSpanFragmenter can create very long fragments