from:"Greg Pendlebury \(JIRA\)"

[jira] [Commented] (SOLR-10856) ExtendedDismaxQParser (edismax) override OR when mm=100%

2017-06-24 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062195#comment-16062195
 ] 

Greg Pendlebury commented on SOLR-10856:


You are describing exactly what mm is supposed to do. The change made in 
SOLR-2649 was the root cause (deliberately... because of the bug caused by the 
inverse impact boolean operators had on mm), and SOLR-8812 was about choosing 
less disruptive default values when users are not specifying them.

In this case, however you are explicitly requesting mm=100%... and getting 
answers that match. The short answer is don't use mm=100% if you want boolean 
logic. It is not feature compatible.

The longer answer is nasty and would require delving into how boolean operators 
are truly handled by Solr when translated into OCCURS flags. The mm parameter 
operates on the SHOULD OCCUR flags, which is (roughly) what your OR terms are 
translated into.

> ExtendedDismaxQParser (edismax) override OR when mm=100%
> 
>
> Key: SOLR-10856
> URL: https://issues.apache.org/jira/browse/SOLR-10856
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 5.5, 6.0, 6.6
>Reporter: Sébastien LECACHEUR
>
> Since Solr 5.5.1, edismax parser override OR (with AND behavior) in queries 
> when mm=100%. This behavior is new from Solr 5.5.1 to 6.6.0.
> Concerned query :
> {code:none}
> curl -s 
> 'http://localhost:8983/solr/mycorename/select?q=type_s%3A(A+OR+C)=json=edismax=100%25=true=true'
> {code}
> 1) Solr 5.4.1 :
> {code:javascript}
> "rawquerystring":"type_s:(A OR C)",
> "querystring":"type_s:(A OR C)",
> "parsedquery":"(+(type_s:A type_s:C))/no_coord",
> "parsedquery_toString":"+(type_s:A type_s:C)",
> "explain":{...},
> "QParser":"ExtendedDismaxQParser",
> {code}
> Returns docs as expected.
> 2) Solr 5.5.1 :
> {code:javascript}
> "rawquerystring":"type_s:(A OR C)",
> "querystring":"type_s:(A OR C)",
> "parsedquery":"(+((type_s:A type_s:C)~2))/no_coord",
> "parsedquery_toString":"+((type_s:A type_s:C)~2)",
> "explain":{},
> "QParser":"ExtendedDismaxQParser",
> {code}
> Returns no results
> 3) Solr 6.6.0 :
> {code:javascript}
> "rawquerystring":"type_s:(A OR C)",
> "querystring":"type_s:(A OR C)",
> "parsedquery":"(+(type_s:A type_s:C)~2)/no_coord",
> "parsedquery_toString":"+((type_s:A type_s:C)~2)",
> "explain":{},
> "QParser":"ExtendedDismaxQParser",
> {code}
> Returns no results
> This bug looks like SOLR-8812 issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4823) Split LBHttpSolrServer into two classes one for the solrj use case and one for the solr cloud use case

2016-11-06 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15642463#comment-15642463
 ] 

Greg Pendlebury commented on SOLR-4823:
---

I know that my question will be off-topic for this particular issue, but it 
seems that it might be a viable launching point for a customization our team 
has been considering in-house. We were thinking of trying out the addition of 
one or more nodes in the cluster that had no allocated range hash in 
clusterstate (whether or not we needed to modify to code to achieve this we 
haven't looked yet).

Their purpose would be to act as search entry points for the cluster with more 
stable JVM performance (because they manage no lucene segments) as well as 
internalizing cluster security at the OS level. Right now, in a 200 replica 
cluster we need to let any/all SolrJ clients have access to the ZK ensemble as 
well as ports on every replica. It also makes managing threading (such as in 
the default http client thread pool) annoying to configure and test for 
performance.

With [~phloy]'s patch we could still make use of SolrJ, but just provide a 
small whitelist of our 'search nodes' and keep client-side requirements for 
searching very simple in terms of security and thread management.

> Split LBHttpSolrServer into two classes one for the solrj use case and one 
> for the solr cloud use case
> --
>
> Key: SOLR-4823
> URL: https://issues.apache.org/jira/browse/SOLR-4823
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: philip hoy
>Priority: Minor
> Attachments: SOLR-4823.patch, SOLR-4823.patch
>
>
> The LBHttpSolrServer has too many responsibilities. It could perhaps be 
> broken into two classes, one in solrj to be used in the place of an external 
> load balancer that balances across a known set of solr servers defined at 
> construction time and one in solr core to be used by the solr cloud 
> components that balances across servers dependant on the request.
> To save code duplication, if much arises an abstract bass class could be 
> introduced in to solrj.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8016) CloudSolrClient has extremely verbose error logging

2016-10-18 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587251#comment-15587251
 ] 

Greg Pendlebury commented on SOLR-8016:
---

Not that I am aware of. I can see the problem still in our newest server 
(5.5.3). I like [~markrmil...@gmail.com]'s suggestion of lowering the log level 
to info. It is simple and we can filter it out via logging config. The deeper 
issues of whether the retry should even be attempted sound interesting to me, 
but I'd be happy to just not see the log entries.

> CloudSolrClient has extremely verbose error logging
> ---
>
> Key: SOLR-8016
> URL: https://issues.apache.org/jira/browse/SOLR-8016
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 5.2.1, 6.0
>Reporter: Greg Pendlebury
>Priority: Minor
>  Labels: easyfix
>
> CloudSolrClient has this error logging line which is fairly annoying:
> {code}
>   log.error("Request to collection {} failed due to ("+errorCode+
>   ") {}, retry? "+retryCount, collection, rootCause.toString());
> {code}
> Given that this is a client library and then gets embedded into other 
> applications this line is very problematic to handle gracefully. In today's 
> example I was looking at, every failed search was logging over 100 lines, 
> including the full HTML response from the responding node in the cluster.
> The resulting SolrServerException that comes out to our application is 
> handled appropriately but we can't stop this class complaining in logs 
> without suppressing the entire ERROR channel, which we don't want to do. This 
> is the only direct line writing to the log I could find in the client, so we 
> _could_ suppress errors, but that just feels dirty, and fragile for the 
> future.
> From looking at the code I am fairly certain it is not as simple as throwing 
> an exception instead of logging... it is right in the middle of the method. I 
> suspect the simplest answer is adding a marker 
> (http://www.slf4j.org/api/org/slf4j/Marker.html) to the logging call.
> Then solrj users can choose what to do with these log entries. I don't know 
> if there is a broader strategy for handling this that I am ignorant of; 
> apologies if that is the case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND and mm is not explicitly set

2016-06-10 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325638#comment-15325638
 ] 

Greg Pendlebury commented on SOLR-8812:
---

Sounds great. Add my thanks to the those you've already received.

> ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND and mm is 
> not explicitly set
> -
>
> Key: SOLR-8812
> URL: https://issues.apache.org/jira/browse/SOLR-8812
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.5
>Reporter: Ryan Steinberg
>Assignee: Steve Rowe
> Fix For: 5.6, 6.1, 5.5.2, master (7.0), 6.0.2
>
> Attachments: SOLR-8812-barbie.patch, SOLR-8812.patch, 
> SOLR-8812.patch, SOLR-8812.patch
>
>
> The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior 
> is new to Solr 5.5.0 and an unexpected major change.
> Example:
>   "q": "id:12345 OR zz",
>   "defType": "edismax",
>   "q.op": "AND",
> where "12345" is a known document ID and "zz" is a string NOT present 
> in my data
> Version 5.5.0 produces zero results:
> "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+((id:12345 
> DisjunctionMaxQuery((text:zz)))~2))/no_coord",
> "parsedquery_toString": "+((id:12345 (text:zz))~2)",
> "explain": {},
> "QParser": "ExtendedDismaxQParser"
> Version 5.4.0 produces one result as expected
>   "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+(id:12345 
> DisjunctionMaxQuery((text:zz/no_coord",
> "parsedquery_toString": "+(id:12345 (text:zz))"
> "explain": {},
> "QParser": "ExtendedDismaxQParser"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND and mm is not explicitly set

2016-06-10 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324063#comment-15324063
 ] 

Greg Pendlebury commented on SOLR-8812:
---

Sounds (tentatively) ok to me. I was quite concerned when you said it puts 
things back to pre-SOLR-2649 functionality, but from looking at what got 
committed it seems that q.op=OR is no longer hardcoded in setDefaultOperator() 
(which was fixed in SOLR-2649). I haven't executed anything, but this seems 
like a good step with regards to mm handling.

> ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND and mm is 
> not explicitly set
> -
>
> Key: SOLR-8812
> URL: https://issues.apache.org/jira/browse/SOLR-8812
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.5
>Reporter: Ryan Steinberg
>Assignee: Steve Rowe
> Fix For: 6.1, 5.5.2, 6.0.2
>
> Attachments: SOLR-8812-barbie.patch, SOLR-8812.patch, 
> SOLR-8812.patch, SOLR-8812.patch
>
>
> The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior 
> is new to Solr 5.5.0 and an unexpected major change.
> Example:
>   "q": "id:12345 OR zz",
>   "defType": "edismax",
>   "q.op": "AND",
> where "12345" is a known document ID and "zz" is a string NOT present 
> in my data
> Version 5.5.0 produces zero results:
> "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+((id:12345 
> DisjunctionMaxQuery((text:zz)))~2))/no_coord",
> "parsedquery_toString": "+((id:12345 (text:zz))~2)",
> "explain": {},
> "QParser": "ExtendedDismaxQParser"
> Version 5.4.0 produces one result as expected
>   "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+(id:12345 
> DisjunctionMaxQuery((text:zz/no_coord",
> "parsedquery_toString": "+(id:12345 (text:zz))"
> "explain": {},
> "QParser": "ExtendedDismaxQParser"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

2016-05-19 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292397#comment-15292397
 ] 

Greg Pendlebury commented on SOLR-2649:
---

[~rebeccatang], that sounds like expected behaviour. Your 'OR' operator is not 
being ignored; but rather, Solr translates OR operators into SHOULD occur flags 
(ie. optional search terms)... then, if 'mm' is set to 100%, this tells Solr 
that you require every optional search term to be present in the result set.

If you are explicitly setting 'mm' you should use a different value if you want 
OR operators to function. Also see SOLR-8812, which discusses setting a better 
default value for 'mm', particularly one that changes depending on the 'q.op' 
parameter. Of course that only applies in the case where you are not explicitly 
setting 'mm'.

> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Magnus Bergmark
>Assignee: Erick Erickson
> Fix For: 5.5, 6.0
>
> Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, 
> SOLR-2649.diff, SOLR-2649.patch, SOLR-2649.patch
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8955) ReplicationHandler should throttle across all requests instead of for each client

2016-04-08 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233113#comment-15233113
 ] 

Greg Pendlebury commented on SOLR-8955:
---

I like the idea, but maybe it should be configurable? If the master has 
multiple NICs than hard coding an arbitrary limit because two unrelated slaves 
from different network interfaces are both online would actually be more of a 
hindrance than an improvement.

> ReplicationHandler should throttle across all requests instead of for each 
> client
> -
>
> Key: SOLR-8955
> URL: https://issues.apache.org/jira/browse/SOLR-8955
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (java), SolrCloud
>Reporter: Shalin Shekhar Mangar
>  Labels: difficulty-easy, impact-medium, newdev
> Fix For: master, 6.1
>
>
> SOLR-6485 added the ability to throttle the speed of replication but the 
> implementation rate limits each request. So e.g. the maxWriteMBPerSec is 1 
> and 5 slaves request full replication then the effective transfer rate from 
> the master is 5 MB/second which is not what is often desired.
> I propose to make the rate limit global (across all replication requests) 
> instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND

2016-04-03 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223581#comment-15223581
 ] 

Greg Pendlebury commented on SOLR-8812:
---

[~erickerickson], personally, I am ambivalent with regards to timing and 
versions. I am still not convinced there is actually an issue here, but I don't 
want to be a dick and dismiss it out-of-hand.

The patches provided are simply about choosing default parameter values that 
disrupt the least number of users who did not have mm set to an appropriate 
value. Any user (risky, broad generalisation incoming) who puts a boolean OR 
operator into an edismax query string would not want mm=100%, but that is what 
is happening here.

> ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
> 
>
> Key: SOLR-8812
> URL: https://issues.apache.org/jira/browse/SOLR-8812
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.5
>Reporter: Ryan Steinberg
>Assignee: Erick Erickson
>Priority: Blocker
> Fix For: 6.0, 5.5.1
>
> Attachments: SOLR-8812-barbie.patch, SOLR-8812.patch, SOLR-8812.patch
>
>
> The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior 
> is new to Solr 5.5.0 and an unexpected major change.
> Example:
>   "q": "id:12345 OR zz",
>   "defType": "edismax",
>   "q.op": "AND",
> where "12345" is a known document ID and "zz" is a string NOT present 
> in my data
> Version 5.5.0 produces zero results:
> "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+((id:12345 
> DisjunctionMaxQuery((text:zz)))~2))/no_coord",
> "parsedquery_toString": "+((id:12345 (text:zz))~2)",
> "explain": {},
> "QParser": "ExtendedDismaxQParser"
> Version 5.4.0 produces one result as expected
>   "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+(id:12345 
> DisjunctionMaxQuery((text:zz/no_coord",
> "parsedquery_toString": "+(id:12345 (text:zz))"
> "explain": {},
> "QParser": "ExtendedDismaxQParser"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND

2016-04-03 Thread Greg Pendlebury (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Pendlebury updated SOLR-8812:
--
Attachment: SOLR-8812-barbie.patch

Adding a 'hair ties -barbie' example to unit tests. Not sure it demonstrates 
anything new, but it does work as I would expect.

I can't get git to generate a combined patch the way I would have in svn... my 
git-fu is weak.

> ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
> 
>
> Key: SOLR-8812
> URL: https://issues.apache.org/jira/browse/SOLR-8812
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.5
>Reporter: Ryan Steinberg
>Assignee: Erick Erickson
>Priority: Blocker
> Fix For: 6.0, 5.5.1
>
> Attachments: SOLR-8812-barbie.patch, SOLR-8812.patch, SOLR-8812.patch
>
>
> The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior 
> is new to Solr 5.5.0 and an unexpected major change.
> Example:
>   "q": "id:12345 OR zz",
>   "defType": "edismax",
>   "q.op": "AND",
> where "12345" is a known document ID and "zz" is a string NOT present 
> in my data
> Version 5.5.0 produces zero results:
> "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+((id:12345 
> DisjunctionMaxQuery((text:zz)))~2))/no_coord",
> "parsedquery_toString": "+((id:12345 (text:zz))~2)",
> "explain": {},
> "QParser": "ExtendedDismaxQParser"
> Version 5.4.0 produces one result as expected
>   "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+(id:12345 
> DisjunctionMaxQuery((text:zz/no_coord",
> "parsedquery_toString": "+(id:12345 (text:zz))"
> "explain": {},
> "QParser": "ExtendedDismaxQParser"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND

2016-03-31 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15221079#comment-15221079
 ] 

Greg Pendlebury commented on SOLR-8812:
---

I also confirmed (for my own sanity) that q.op does indeed influence the 
default value of mm, as per [~janhoy]. Personally I don't like that, and 
perhaps it isn't relevant anymore since SOLR-2649... but I left it alone.

> ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
> 
>
> Key: SOLR-8812
> URL: https://issues.apache.org/jira/browse/SOLR-8812
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.5
>Reporter: Ryan Steinberg
>Assignee: Erick Erickson
>Priority: Blocker
> Fix For: 6.0, 5.5.1
>
> Attachments: SOLR-8812.patch, SOLR-8812.patch
>
>
> The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior 
> is new to Solr 5.5.0 and an unexpected major change.
> Example:
>   "q": "id:12345 OR zz",
>   "defType": "edismax",
>   "q.op": "AND",
> where "12345" is a known document ID and "zz" is a string NOT present 
> in my data
> Version 5.5.0 produces zero results:
> "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+((id:12345 
> DisjunctionMaxQuery((text:zz)))~2))/no_coord",
> "parsedquery_toString": "+((id:12345 (text:zz))~2)",
> "explain": {},
> "QParser": "ExtendedDismaxQParser"
> Version 5.4.0 produces one result as expected
>   "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+(id:12345 
> DisjunctionMaxQuery((text:zz/no_coord",
> "parsedquery_toString": "+(id:12345 (text:zz))"
> "explain": {},
> "QParser": "ExtendedDismaxQParser"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND

2016-03-31 Thread Greg Pendlebury (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Pendlebury updated SOLR-8812:
--
Attachment: SOLR-8812.patch

Attaching possible 'fix' that defaults mm to 0% if the users has declared no 
explicit mm, but has boolean operators in their query.

First time I have generated a patch using git, so hopefully it is ok.

> ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
> 
>
> Key: SOLR-8812
> URL: https://issues.apache.org/jira/browse/SOLR-8812
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.5
>Reporter: Ryan Steinberg
>Assignee: Erick Erickson
>Priority: Blocker
> Fix For: 6.0, 5.5.1
>
> Attachments: SOLR-8812.patch, SOLR-8812.patch
>
>
> The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior 
> is new to Solr 5.5.0 and an unexpected major change.
> Example:
>   "q": "id:12345 OR zz",
>   "defType": "edismax",
>   "q.op": "AND",
> where "12345" is a known document ID and "zz" is a string NOT present 
> in my data
> Version 5.5.0 produces zero results:
> "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+((id:12345 
> DisjunctionMaxQuery((text:zz)))~2))/no_coord",
> "parsedquery_toString": "+((id:12345 (text:zz))~2)",
> "explain": {},
> "QParser": "ExtendedDismaxQParser"
> Version 5.4.0 produces one result as expected
>   "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+(id:12345 
> DisjunctionMaxQuery((text:zz/no_coord",
> "parsedquery_toString": "+(id:12345 (text:zz))"
> "explain": {},
> "QParser": "ExtendedDismaxQParser"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND

2016-03-31 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220925#comment-15220925
 ] 

Greg Pendlebury commented on SOLR-8812:
---

Ok, I will try to find some time over the next week or so. I freely confess it 
doesn't look great on a Friday afternoon and school holidays begin here after 
next week. It might be a rough contribution someone else can carry over the 
line.

With regards to mixed cases of q.op and mm where users are explicitly setting 
them, I think they are already covered if you look in the unit test 
testDefaultOperatorWithMm(). The problem here seems to be the use case where 
people do not explicitly set mm and fall back to the default. This is treading 
on some expected behaviour from existing users.

> ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
> 
>
> Key: SOLR-8812
> URL: https://issues.apache.org/jira/browse/SOLR-8812
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.5
>Reporter: Ryan Steinberg
>Assignee: Erick Erickson
>Priority: Blocker
> Fix For: 6.0, 5.5.1
>
> Attachments: SOLR-8812.patch
>
>
> The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior 
> is new to Solr 5.5.0 and an unexpected major change.
> Example:
>   "q": "id:12345 OR zz",
>   "defType": "edismax",
>   "q.op": "AND",
> where "12345" is a known document ID and "zz" is a string NOT present 
> in my data
> Version 5.5.0 produces zero results:
> "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+((id:12345 
> DisjunctionMaxQuery((text:zz)))~2))/no_coord",
> "parsedquery_toString": "+((id:12345 (text:zz))~2)",
> "explain": {},
> "QParser": "ExtendedDismaxQParser"
> Version 5.4.0 produces one result as expected
>   "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+(id:12345 
> DisjunctionMaxQuery((text:zz/no_coord",
> "parsedquery_toString": "+(id:12345 (text:zz))"
> "explain": {},
> "QParser": "ExtendedDismaxQParser"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND

2016-03-31 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220854#comment-15220854
 ] 

Greg Pendlebury commented on SOLR-8812:
---

I don't know that what we are talking about here is a 'workaround' at all. Solr 
is doing exactly what it is being asked to do. I know it is disrupting an 
existing user base, so it warrants discussion and maybe even a 'fix'... but the 
existing user base were leaving a non-configured parameter at its default value 
(which probably didn't match their use case) and it only worked because the 
parameter was being ignored by edismax. The fact that parameter was ignored 
introduced the real bugs in SOLR-2649.

I think there has always been confusion over how this works under the hood, and 
that still continues. q.op and mm apply to two different parts of the query, 
and each of them has other factors that come into play.
 * q.op is a boolean operator, which happens pre-parse (or in the very earliest 
stages of parsing)
 * mm applies to (top level) clauses which have the SHOULD occur flag *after* 
Solr translates all the boolean operators
 * if mm is not explicitly set, the default value is determined by q.op (? I 
haven't verified this, but that is Jan's input above). The old doco says it is 
always 100% default... but I personally have always set it explicitly... no 
experience.
 * Solr translates boolean operators into occurs flags differently depending on 
the value of q.op. In particular q.op=AND causes non-intuitive generation of 
occurs flags if looked at from a purely boolean perspective.
 * mm does not make much sense at all if you think about search as a purely 
boolean query (ie. the result either matches or doesn't) instead of occurs 
flags (ie. the score of the result is either higher or lower)

So now that SOLR-2649 has come along, it slightly muddies the water because:
 * q.op is no longer hard coded to OR. Pre-patch the user could say q.op=AND, 
but it didn't do anything to the query
 * The presence of an operator no longer turns off the mm feature

*My take on the issue is that users who want to use boolean operators in 
edismax should pay attention to the mm parameter, and make sure their choice 
matches their use case*. Previously they didn't have to... but the presence of 
the boolean operators when using edismax was buggy (? debatable... it has been 
argued that it simply wasn't the use case edismax was first written for).

Having said that, IF anything was to change, I would simply play subtly with 
choosing the default value of mm. Maybe something like this:

IF (the query contains a boolean operator) AND (mm has not been explicitly set) 
THEN (mm = 0%)

It is a tweak on the work Jan did in SOLR-2649, so that instead of turning off 
mm in response to a boolean operator being present, we instead influence the 
default value. We still let users ultimately set up their parameters however 
they want though. If the user has a use case that includes both boolean 
parameters and mm logic... have fun.

> ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
> 
>
> Key: SOLR-8812
> URL: https://issues.apache.org/jira/browse/SOLR-8812
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.5
>Reporter: Ryan Steinberg
>Assignee: Erick Erickson
>Priority: Blocker
> Fix For: 6.0, 5.5.1
>
> Attachments: SOLR-8812.patch
>
>
> The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior 
> is new to Solr 5.5.0 and an unexpected major change.
> Example:
>   "q": "id:12345 OR zz",
>   "defType": "edismax",
>   "q.op": "AND",
> where "12345" is a known document ID and "zz" is a string NOT present 
> in my data
> Version 5.5.0 produces zero results:
> "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+((id:12345 
> DisjunctionMaxQuery((text:zz)))~2))/no_coord",
> "parsedquery_toString": "+((id:12345 (text:zz))~2)",
> "explain": {},
> "QParser": "ExtendedDismaxQParser"
> Version 5.4.0 produces one result as expected
>   "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+(id:12345 
> DisjunctionMaxQuery((text:zz/no_coord",
> "parsedquery_toString": "+(id:12345 (text:zz))"
> "explain": {},
> "QParser": "ExtendedDismaxQParser"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND

2016-03-30 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219133#comment-15219133
 ] 

Greg Pendlebury commented on SOLR-8812:
---

Thanks. Hopefully that is ok. I just installed git and started cloning trunk... 
now to upgrade to Java 8.

I think it is all working as intended, it is just that there is a confusing 
legacy of not having to worry about what mm was set to for some use cases. 
SOLR-2649 will force people to check what the parameters are, but all queries 
are now supported.

It would be nice if it was less disruptive, but given that pre-patch there was 
no way to get edismax to do certain queries, no matter what parameters you set, 
I think it is still an improvement.

> ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
> 
>
> Key: SOLR-8812
> URL: https://issues.apache.org/jira/browse/SOLR-8812
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.5
>Reporter: Ryan Steinberg
>Assignee: Erick Erickson
>Priority: Blocker
> Fix For: 6.0, 5.5.1
>
> Attachments: SOLR-8812.patch
>
>
> The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior 
> is new to Solr 5.5.0 and an unexpected major change.
> Example:
>   "q": "id:12345 OR zz",
>   "defType": "edismax",
>   "q.op": "AND",
> where "12345" is a known document ID and "zz" is a string NOT present 
> in my data
> Version 5.5.0 produces zero results:
> "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+((id:12345 
> DisjunctionMaxQuery((text:zz)))~2))/no_coord",
> "parsedquery_toString": "+((id:12345 (text:zz))~2)",
> "explain": {},
> "QParser": "ExtendedDismaxQParser"
> Version 5.4.0 produces one result as expected
>   "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+(id:12345 
> DisjunctionMaxQuery((text:zz/no_coord",
> "parsedquery_toString": "+(id:12345 (text:zz))"
> "explain": {},
> "QParser": "ExtendedDismaxQParser"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND

2016-03-30 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219086#comment-15219086
 ] 

Greg Pendlebury commented on SOLR-8812:
---

I am happy to take a look at any issues, since I was involved in SOLR-2649. I 
need to get a new copy of the code first, but in the interim, can someone 
confirm that explicitly setting mm to 0 does not fix this? I believe mm 
defaults to 100%, so that may be the real culprit, as opposed to q.op=AND. 
Before SOLR-2649 was resolved, setting an OR operator would have caused mm to 
be ignored. Now it will use the default value unless you set it explicitly.

Our production servers are using 5.1 with SOLR-2649 applied, and we have 
q.op=AND, with perfectly functional OR operators and mm=0%. All of the obvious 
queries work, including the cases referenced above.

>From memory there are a lot of subtle cliffs to fall off here, such as making 
>sure we are talking about top level clauses and ultimately remembering that 
>Solr does not use boolean logic... and there are some edge cases where it 
>simply doesn't work the same way as the occurs flags. SHOULD vs OR is the main 
>culprit.

> ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
> 
>
> Key: SOLR-8812
> URL: https://issues.apache.org/jira/browse/SOLR-8812
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.5
>Reporter: Ryan Steinberg
>Assignee: Erick Erickson
>Priority: Blocker
> Fix For: 6.0, 5.5.1
>
> Attachments: SOLR-8812.patch
>
>
> The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior 
> is new to Solr 5.5.0 and an unexpected major change.
> Example:
>   "q": "id:12345 OR zz",
>   "defType": "edismax",
>   "q.op": "AND",
> where "12345" is a known document ID and "zz" is a string NOT present 
> in my data
> Version 5.5.0 produces zero results:
> "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+((id:12345 
> DisjunctionMaxQuery((text:zz)))~2))/no_coord",
> "parsedquery_toString": "+((id:12345 (text:zz))~2)",
> "explain": {},
> "QParser": "ExtendedDismaxQParser"
> Version 5.4.0 produces one result as expected
>   "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+(id:12345 
> DisjunctionMaxQuery((text:zz/no_coord",
> "parsedquery_toString": "+(id:12345 (text:zz))"
> "explain": {},
> "QParser": "ExtendedDismaxQParser"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

2015-12-13 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055202#comment-15055202
 ] 

Greg Pendlebury commented on SOLR-2649:
---

[~erickerickson] thanks for this!

> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Magnus Bergmark
>Assignee: Erick Erickson
> Fix For: 4.9, Trunk
>
> Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, 
> SOLR-2649.diff, SOLR-2649.patch, SOLR-2649.patch
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

2015-12-03 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039730#comment-15039730
 ] 

Greg Pendlebury commented on SOLR-2649:
---

I just ran it against out test system (patched Solr 5.1.0): (A OR B OR C) "D E"

1) Using mm=100%, q.op=AND and searching just the fulltext field. RAW debug:
{code}
(+(+(DisjunctionMaxQuery((fulltext:a)) DisjunctionMaxQuery((fulltext:b)) 
DisjunctionMaxQuery((fulltext:c))) +DisjunctionMaxQuery((fulltext:\"d e\"
{code}
I read that as:
{code}
+(a b c) +("d e")
{code}
which looks correct

2) switching to q.op=OR. RAW debug:
{code}
(+(((DisjunctionMaxQuery((fulltext:a)) DisjunctionMaxQuery((fulltext:b)) 
DisjunctionMaxQuery((fulltext:c))) DisjunctionMaxQuery((fulltext:\"d e\")))~2))
{code}
I read that as:
{code}
((a b c) "d e")~2
{code}
Which again looks correct... but we don't generally use OR, so I could be wrong

3) Finally, lowered mm to 50%, again with q.op=OR. RAW debug:
{code}
(+(((DisjunctionMaxQuery((fulltext:a)) DisjunctionMaxQuery((fulltext:b)) 
DisjunctionMaxQuery((fulltext:c))) DisjunctionMaxQuery((fulltext:\"d e\")))~1))
{code}
I read that as:
{code}
((a b c) "d e")~1
{code}
Still looks good.


> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Magnus Bergmark
>Assignee: Erick Erickson
> Fix For: 4.9, Trunk
>
> Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, 
> SOLR-2649.diff, SOLR-2649.patch
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

2015-12-03 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038626#comment-15038626
 ] 

Greg Pendlebury commented on SOLR-2649:
---

I tried Jan's patch, and (whilst it is technically correct) it did not improve 
the usefulness of edismax without also addressing how q.op is handled. We 
continued to see absurd search results that failed UAT.

The combined patch with both has been on our prod servers since May 2014 
without any problems, but I have not heard any feedback from others that might 
have tried it. The corpus is nearly 200 million fulltext newspapers: 
http://trove.nla.gov.au/newspaper/result?q=

> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Magnus Bergmark
>Assignee: Erick Erickson
> Fix For: 4.9, Trunk
>
> Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, 
> SOLR-2649.diff, SOLR-2649.patch
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

2015-12-03 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038688#comment-15038688
 ] 

Greg Pendlebury commented on SOLR-2649:
---

Mine shows as 18th Feb, but I assume that is just timezones. Assuming we are 
talking about the same patch, then, no, that is my patch (both of the 
'with-Qop' patches are from me). Jan, submitted the earlier 2014 patch which I 
used as a baseline to add the q.op change as well.

> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Magnus Bergmark
>Assignee: Erick Erickson
> Fix For: 4.9, Trunk
>
> Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, 
> SOLR-2649.diff, SOLR-2649.patch
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-2649) MM ignored in edismax queries with operators

2015-12-03 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038626#comment-15038626
 ] 

Greg Pendlebury edited comment on SOLR-2649 at 12/3/15 9:31 PM:


I tried Jan's patch, and (whilst it is technically correct) it did not improve 
the usefulness of edismax without also addressing how q.op is handled. We 
continued to see absurd search results that failed UAT.

The combined patch with both has been on our prod servers since May 2014 
without any problems, but I have not heard any feedback from others that might 
have tried it. The corpus is nearly 200 million fulltext newspaper articles: 
http://trove.nla.gov.au/newspaper/result?q=


was (Author: gpendleb):
I tried Jan's patch, and (whilst it is technically correct) it did not improve 
the usefulness of edismax without also addressing how q.op is handled. We 
continued to see absurd search results that failed UAT.

The combined patch with both has been on our prod servers since May 2014 
without any problems, but I have not heard any feedback from others that might 
have tried it. The corpus is nearly 200 million fulltext newspapers: 
http://trove.nla.gov.au/newspaper/result?q=

> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Magnus Bergmark
>Assignee: Erick Erickson
> Fix For: 4.9, Trunk
>
> Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, 
> SOLR-2649.diff, SOLR-2649.patch
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems

2015-10-28 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979672#comment-14979672
 ] 

Greg Pendlebury commented on SOLR-3274:
---

FWIW we ran into this issue today as well, and nothing worked until ZK was 
restarted. I would love to think that Solr could detect this issue, but it 
smells like a ZK bug to me.

> ZooKeeper related SolrCloud problems
> 
>
> Key: SOLR-3274
> URL: https://issues.apache.org/jira/browse/SOLR-3274
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0-ALPHA
> Environment: Any
>Reporter: Per Steffensen
>
> Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 
> Solr servers, running 28 slices of the same collection (collA) - all slices 
> have one replica (two shards all in all - leader + replica) - 56 cores all in 
> all (8 shards on each solr instance). But anyways...
> Besides the problem reported in SOLR-3273, the system seems to run fine under 
> high load for several hours, but eventually errors like the ones shown below 
> start to occur. I might be wrong, but they all seem to indicate some kind of 
> unstability in the collaboration between Solr and ZooKeeper. I have to say 
> that I havnt been there to check ZooKeeper "at the moment where those 
> exception occur", but basically I dont believe the exceptions occur because 
> ZooKeeper is not running stable - at least when I go and check ZooKeeper 
> through other "channels" (e.g. my eclipse ZK plugin) it is always accepting 
> my connection and generally seems to be doing fine.
> Exception 1) Often the first error we see in solr.log is something like this
> {code}
> Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - 
> Updates are disabled.
> at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678)
> at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250)
> at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
> at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80)
> at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:326)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> at 
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> {code}
> I believe this error basically occurs because SolrZkClient.isConnected 
> reports false, which means that its internal "keeper.getState" does not 
> return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED 
> for a long time, since this error starts occuring after several hours of 
> processing without this problem showing. But why is it suddenly not connected 
> anymore?!
> Exception 2) We also see errors like the following, and if Im not mistaken, 
> they start occuring shortly after "Exception 1)" (above) shows for the fist 
> time
> {code}
> Mar

[jira] [Commented] (SOLR-8016) CloudSolrClient has extremely verbose error logging

2015-09-08 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735752#comment-14735752
 ] 

Greg Pendlebury commented on SOLR-8016:
---

Lowering the level to INFO would be good in our case, although when you say 
that after all the retries it will eventually error would just delay the 
event... unless the error is thrown instead of logged. The Solr nodes were in a 
bad way and needed intervention from sysadmins because of locked index segments 
from a graceless shutdown.

Under this scenario, the UI clients were logging enormous amounts of useless 
content ('rootCause.toString()') and making finding other lines in the log very 
difficult. Because the client also throws Exceptions we had already gracefully 
handled the outage by degrading functionality.

With regards to Markers I have never used them personally, but before I 
suggested them I looked at the fact that both log4j and logback support them 
via slf4j. This covers both the solr default (log4j) and the binding we use in 
production (logback) so I am selfishly happy with the possibility... and I 
think it is the simplest change. I didn't want to propose a rethink of the 
logging, or that method's flow, but I am happy if this prompts that as well.

> CloudSolrClient has extremely verbose error logging
> ---
>
> Key: SOLR-8016
> URL: https://issues.apache.org/jira/browse/SOLR-8016
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 5.2.1, Trunk
>Reporter: Greg Pendlebury
>Priority: Minor
>  Labels: easyfix
>
> CloudSolrClient has this error logging line which is fairly annoying:
> {code}
>   log.error("Request to collection {} failed due to ("+errorCode+
>   ") {}, retry? "+retryCount, collection, rootCause.toString());
> {code}
> Given that this is a client library and then gets embedded into other 
> applications this line is very problematic to handle gracefully. In today's 
> example I was looking at, every failed search was logging over 100 lines, 
> including the full HTML response from the responding node in the cluster.
> The resulting SolrServerException that comes out to our application is 
> handled appropriately but we can't stop this class complaining in logs 
> without suppressing the entire ERROR channel, which we don't want to do. This 
> is the only direct line writing to the log I could find in the client, so we 
> _could_ suppress errors, but that just feels dirty, and fragile for the 
> future.
> From looking at the code I am fairly certain it is not as simple as throwing 
> an exception instead of logging... it is right in the middle of the method. I 
> suspect the simplest answer is adding a marker 
> (http://www.slf4j.org/api/org/slf4j/Marker.html) to the logging call.
> Then solrj users can choose what to do with these log entries. I don't know 
> if there is a broader strategy for handling this that I am ignorant of; 
> apologies if that is the case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8016) CloudSolrClient has extremely verbose error logging

2015-09-08 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735813#comment-14735813
 ] 

Greg Pendlebury commented on SOLR-8016:
---

I haven't looked at the innards of the method enough to say for sure. I know in 
our particular use case it is fruitless to keep trying. The nodes are online, 
but cannot answer in the way expected:

{code}
ERROR o.a.s.c.s.i.CloudSolrClient - Request to collection trove failed due to 
(500) org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at /solr/trove: Expected mime type 
application/octet-stream but got text/html. 


Error 500 {msg=SolrCore 'trove' is not available due to init failure: 
Index locked for write for core 
trove,trace=org.apache.solr.common.SolrException: SolrCore 'trove' is not 
available due to init failure: Index locked for write for core trove
{code}

And then lots and lots more html output.

The Exception that bubbles up to our code is more than enough for us know where 
to start looking:
{code}
ERROR a.g.n.n.c.r.SolrService - Solr search failed: No live SolrServers 
available to handle this request:[]
{code}

> CloudSolrClient has extremely verbose error logging
> ---
>
> Key: SOLR-8016
> URL: https://issues.apache.org/jira/browse/SOLR-8016
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 5.2.1, Trunk
>Reporter: Greg Pendlebury
>Priority: Minor
>  Labels: easyfix
>
> CloudSolrClient has this error logging line which is fairly annoying:
> {code}
>   log.error("Request to collection {} failed due to ("+errorCode+
>   ") {}, retry? "+retryCount, collection, rootCause.toString());
> {code}
> Given that this is a client library and then gets embedded into other 
> applications this line is very problematic to handle gracefully. In today's 
> example I was looking at, every failed search was logging over 100 lines, 
> including the full HTML response from the responding node in the cluster.
> The resulting SolrServerException that comes out to our application is 
> handled appropriately but we can't stop this class complaining in logs 
> without suppressing the entire ERROR channel, which we don't want to do. This 
> is the only direct line writing to the log I could find in the client, so we 
> _could_ suppress errors, but that just feels dirty, and fragile for the 
> future.
> From looking at the code I am fairly certain it is not as simple as throwing 
> an exception instead of logging... it is right in the middle of the method. I 
> suspect the simplest answer is adding a marker 
> (http://www.slf4j.org/api/org/slf4j/Marker.html) to the logging call.
> Then solrj users can choose what to do with these log entries. I don't know 
> if there is a broader strategy for handling this that I am ignorant of; 
> apologies if that is the case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-8016) CloudSolrClient has extremely verbose error logging

2015-09-07 Thread Greg Pendlebury (JIRA)

Greg Pendlebury created SOLR-8016:
-

 Summary: CloudSolrClient has extremely verbose error logging
 Key: SOLR-8016
 URL: https://issues.apache.org/jira/browse/SOLR-8016
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 5.2.1, Trunk
Reporter: Greg Pendlebury
Priority: Minor


CloudSolrClient has this error logging line which is fairly annoying:

{code}
  log.error("Request to collection {} failed due to ("+errorCode+
  ") {}, retry? "+retryCount, collection, rootCause.toString());
{code}

Given that this is a client library and then gets embedded into other 
applications this line is very problematic to handle gracefully. In today's 
example I was looking at, every failed search was logging over 100 lines, 
including the full HTML response from the responding node in the cluster.

The resulting SolrServerException that comes out to our application is handled 
appropriately but we can't stop this class complaining in logs without 
suppressing the entire ERROR channel, which we don't want to do. This is the 
only direct line writing to the log I could find in the client, so we _could_ 
suppress errors, but that just feels dirty, and fragile for the future.

>From looking at the code I am fairly certain it is not as simple as throwing 
>an exception instead of logging... it is right in the middle of the method. I 
>suspect the simplest answer is adding a marker 
>(http://www.slf4j.org/api/org/slf4j/Marker.html) to the logging call.

Then solrj users can choose what to do with these log entries. I don't know if 
there is a broader strategy for handling this that I am ignorant of; apologies 
if that is the case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2649) MM ignored in edismax queries with operators

2015-02-17 Thread Greg Pendlebury (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Pendlebury updated SOLR-2649:
--
Attachment: SOLR-2649-with-Qop.patch

Replacement patch for 'SOLR-2649-with-Qop.patch' against current trunk.

 MM ignored in edismax queries with operators
 

 Key: SOLR-2649
 URL: https://issues.apache.org/jira/browse/SOLR-2649
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Reporter: Magnus Bergmark
Assignee: Erick Erickson
Priority: Minor
 Fix For: 4.9, Trunk

 Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, 
 SOLR-2649.diff, SOLR-2649.patch


 Hypothetical scenario:
   1. User searches for stocks oil gold with MM set to 50%
   2. User adds -stockings to the query: stocks oil gold -stockings
   3. User gets no hits since MM was ignored and all terms where AND-ed 
 together
 The behavior seems to be intentional, although the reason why is never 
 explained:
   // For correct lucene queries, turn off mm processing if there
   // were explicit operators (except for AND).
   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
 (lines 232-234 taken from 
 tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
 This makes edismax unsuitable as an replacement to dismax; mm is one of the 
 primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

2015-02-15 Thread Greg Pendlebury (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322219#comment-14322219
]

Greg Pendlebury commented on SOLR-2649:
---

Thanks Erick,

I can recreate the SOLR-2649-with-Qop.patch this week (today looks pretty busy
sorry). Just updating trunk now. Jan's SOLR-2649 patch is technically correct
from everything I have looked at, but it actually makes the eDismax parser very
confusing for novice end users. Our investigation seemed to indicate that the
problems stem from the steps taken by Lucene/Solr to convert boolean OR
operators to the SHOULD occur flags (but running off memory here). This is made
very obvious by the fact that eDismax is hard coded to use OR as the default
operator. We were simply tea leaf gazing, but our assumption is that this
confusion may have been the original cause for disabling 'mm' when operators
were present.

So the patch we submitted simply does the same as Jan's, but also makes eDismax
read the default operator from the 'q.op' parameter. With access to both
parameters we have always been able to respond meaningfully to the queries our
users are submitting.

MM ignored in edismax queries with operators

Key: SOLR-2649
URL: https://issues.apache.org/jira/browse/SOLR-2649
Project: Solr
Issue Type: Bug
Components: query parsers
Reporter: Magnus Bergmark
Assignee: Erick Erickson
Priority: Minor
Fix For: 4.9, Trunk

Attachments: SOLR-2649-with-Qop.patch, SOLR-2649.diff, SOLR-2649.patch

Hypothetical scenario:
1. User searches for stocks oil gold with MM set to 50%
2. User adds -stockings to the query: stocks oil gold -stockings
3. User gets no hits since MM was ignored and all terms where AND-ed
together
The behavior seems to be intentional, although the reason why is never
explained:
// For correct lucene queries, turn off mm processing if there
// were explicit operators (except for AND).
boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0;
(lines 232-234 taken from
tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
This makes edismax unsuitable as an replacement to dismax; mm is one of the
primary features of dismax.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5722) Add catenateShingles option to WordDelimiterFilter

2015-02-12 Thread Greg Pendlebury (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903781#comment-13903781
]

Greg Pendlebury edited comment on SOLR-5722 at 2/13/15 3:23 AM:

The link to the doco is working for me today so I took a quick look. I think
the other reason that the HyphenatedWordsFilter is not suitable is that it
removes the hyphen from the material assuming that it can only have one
meaning. The specific circumstances I am considering is when the hyphen is part
of a legitimately hyphenated word that just happen to break across a line wrap.
eg. 'up-\{\n\}to-date'

The HyphenatedWordsFilter would turn this into 'upto-date', and cause user
searches of 'up to date' to not match, since no filters later in the chain can
really pull 'upto' apart again. Whereas the 'catenateShingles' option is
intended to preserve the word delimiter and provide all the permutations a user
might type to find that term: up to date, upto date, up todate, uptodate

was (Author: gpendleb):
The link to the doco is working for me today so I took a quick look. I think
the other reason that the HyphenatedWordsFilter is not suitable is that it
removes the hyphen from the material assuming that it can only have one
meaning. The specific circumstances I am considering is when the hyphen is part
of a legitimately hyphenated word that just happen to break across a line wrap.
eg. 'up-\{\n\}to-date'

The HyphenatedWordsFilter would turn this into 'upto-date', and cause user
searches of 'up to date' to not match, since no filters later in the change can
really pull 'upto' apart again. Whereas the 'catenateShingles' option is
intended to preserve the word delimiter and provide all the permutations a user
might type to find that term: up to date, upto date, up todate, uptodate

Add catenateShingles option to WordDelimiterFilter
--

Key: SOLR-5722
URL: https://issues.apache.org/jira/browse/SOLR-5722
Project: Solr
Issue Type: Improvement
Reporter: Greg Pendlebury
Priority: Minor
Labels: filter, newbie, patch
Attachments: WDFconcatShingles.patch

Apologies if I put this in the wrong spot. I'm attaching a patch (against
current trunk) that adds support for a 'catenateShingles' option to the
WordDelimiterFilter.
We (National Library of Australia - NLA) are currently maintaining this as an
internal modification to the Filter, but I believe it is generic enough to
contribute upstream.
Description:
=
{code}
/**
* NLA Modification to the standard word delimiter to support various
* hyphenation use cases. Primarily driven by requirements for
* newspapers where words are often broken across line endings.
*
* eg. hyphenated-surname is printed printed across a line ending and
* turns out like hyphen-ated-surname or hyphenated-sur-name.
*
* In this scenario the stock filter, with 'catenateAll' turned on, will
* generate individual tokens plus one combined token, but not
* sub-tokens like hyphenated surname and hyphenatedsur name.
*
* So we add a new 'catenateShingles' to achieve this.
*/
{code}
Includes unit tests, and as is noted in one of them CATENATE_WORDS and
CATENATE_SHINGLES are logically considered mutually exclusive for sensible
usage and can cause duplicate tokens (although they should have the same
positions etc).
I'm happy to work on it more if anyone finds problems with it.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

2014-05-01 Thread Greg Pendlebury (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986408#comment-13986408
]

Greg Pendlebury commented on SOLR-2649:
---

I applied this patch to 4.7.2 Yesterday and tried it out on or dev servers. At
first I thought it was pretty bad and failed completely... but then I had a
good think and re-read everything on this ticket and this[1] article and
realised my understanding of the problem was flawed. Using just this patch in
isolation it converted all of the OR operators to AND operators with mm=100%.
Very confusing behaviour for our business area, but I realise now that it is
correct.

Perhaps the confusion stems from the way the q.op and mm parameters interact.
If the behaviour was to instead separate them more clearly then we could change
the config entirely. At the moment our mm is 100% because we effectively want
q.op=AND, but if q.op was instead applied 1) always, 2) first and 3)
independently from mm (ie. insert AND wherever an operator is missing) we could
set mm=1 and achieve what we want by respecting the OR parameters provided by
the user.

I've added this on top of the patch already here and deployed again to our dev
servers using 'q.op=AND mm=1' and now everything appears to function as it
should. I'll upload the patch in a minute, and it includes several unit tests
with different mm and q.op values. From my perspective I think the two
parameters are interacting appropriately, but perhaps someone with more
convoluted mm settings could give it a try?

The change is simply in the constructor of the ExtendedSolrQueryParser class
where it was hardcoded to force the default operator to OR (presumably so that
mm would take care of things) I've made it look at the parameter provided with
the query (copied the code from the Simple QParser and adjusted to fit).

The unit test from the first patch that was marked TODO I have tweaked
slightly. I think not finding a result in that case is entirely appropriate if
the user can now tweak q.op. Opinions may vary of course.

[1] http://searchhub.org/2011/12/28/why-not-and-or-and-not/

MM ignored in edismax queries with operators

Key: SOLR-2649
URL: https://issues.apache.org/jira/browse/SOLR-2649
Project: Solr
Issue Type: Bug
Components: query parsers
Reporter: Magnus Bergmark
Priority: Minor
Fix For: 4.9, 5.0

Attachments: SOLR-2649.diff, SOLR-2649.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2649) MM ignored in edismax queries with operators

2014-05-01 Thread Greg Pendlebury (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Pendlebury updated SOLR-2649:
--

Attachment: SOLR-2649-with-Qop.patch

 MM ignored in edismax queries with operators
 

 Key: SOLR-2649
 URL: https://issues.apache.org/jira/browse/SOLR-2649
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Reporter: Magnus Bergmark
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: SOLR-2649-with-Qop.patch, SOLR-2649.diff, SOLR-2649.patch


 Hypothetical scenario:
   1. User searches for stocks oil gold with MM set to 50%
   2. User adds -stockings to the query: stocks oil gold -stockings
   3. User gets no hits since MM was ignored and all terms where AND-ed 
 together
 The behavior seems to be intentional, although the reason why is never 
 explained:
   // For correct lucene queries, turn off mm processing if there
   // were explicit operators (except for AND).
   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
 (lines 232-234 taken from 
 tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
 This makes edismax unsuitable as an replacement to dismax; mm is one of the 
 primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-2649) MM ignored in edismax queries with operators

2014-05-01 Thread Greg Pendlebury (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986408#comment-13986408
]

Greg Pendlebury edited comment on SOLR-2649 at 5/1/14 6:54 AM:
---

I applied this patch to 4.7.2 Yesterday and tried it out on our dev servers. At
first I thought it was pretty bad and failed completely... but then I had a
good think and re-read everything on this ticket and this[1] article and
realised my understanding of the problem was flawed. Using just this patch in
isolation it converted all of the OR operators to AND operators with mm=100%.
Very confusing behaviour for our business area, but I realise now that it is
correct.

[1] http://searchhub.org/2011/12/28/why-not-and-or-and-not/

was (Author: gpendleb):
I applied this patch to 4.7.2 Yesterday and tried it out on or dev servers. At
first I thought it was pretty bad and failed completely... but then I had a
good think and re-read everything on this ticket and this[1] article and
realised my understanding of the problem was flawed. Using just this patch in
isolation it converted all of the OR operators to AND operators with mm=100%.
Very confusing behaviour for our business area, but I realise now that it is
correct.

[1] http://searchhub.org/2011/12/28/why-not-and-or-and-not/

MM ignored in edismax queries with operators

Key: SOLR-2649
URL: https://issues.apache.org/jira/browse/SOLR-2649
Project: Solr
Issue Type: Bug
Components: query parsers
Reporter: Magnus Bergmark
Priority: Minor
Fix For: 4.9, 5.0

Attachments: SOLR-2649-with-Qop.patch, SOLR-2649.diff, SOLR-2649.patch

[jira] [Commented] (SOLR-5722) Add catenateShingles option to WordDelimiterFilter

2014-02-17 Thread Greg Pendlebury (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903781#comment-13903781
]

Greg Pendlebury commented on SOLR-5722:
---

Add catenateShingles option to WordDelimiterFilter
--

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5722) Add catenateShingles option to WordDelimiterFilter

2014-02-17 Thread Greg Pendlebury (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903781#comment-13903781
]

Greg Pendlebury edited comment on SOLR-5722 at 2/18/14 4:55 AM:

Add catenateShingles option to WordDelimiterFilter
--

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5722) Add catenateShingles option to WordDelimiterFilter

2014-02-16 Thread Greg Pendlebury (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902824#comment-13902824
]

Greg Pendlebury commented on SOLR-5722:
---

I don't think it does. It has been a while since we looked into it, and that
link is currently returning 503 for me, but my understanding was that the
HyphenatedWordsFilter put two tokens back together when a hyphen was found on
the end of the first token. The catenateShingles options we are using addresses
the scenario where multiple hyphens are found internal to a single token.

Add catenateShingles option to WordDelimiterFilter
--

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5722) Add catenateShingles option to WordDelimiterFilter

2014-02-12 Thread Greg Pendlebury (JIRA)

Greg Pendlebury created SOLR-5722:
-

 Summary: Add catenateShingles option to WordDelimiterFilter
 Key: SOLR-5722
 URL: https://issues.apache.org/jira/browse/SOLR-5722
 Project: Solr
  Issue Type: Improvement
Reporter: Greg Pendlebury
Priority: Minor


Apologies if I put this in the wrong spot. I'm attaching a patch (against 
current trunk) that adds support for a 'catenateShingles' option to the 
WordDelimiterFilter. 

We (National Library of Australia - NLA) are currently maintaining this as an 
internal modification to the Filter, but I believe it is generic enough to 
contribute upstream.

Description:
=
{code}
/**
 * NLA Modification to the standard word delimiter to support various
 * hyphenation use cases. Primarily driven by requirements for
 * newspapers where words are often broken across line endings.
 *
 *  eg. hyphenated-surname is printed printed across a line ending and
 * turns out like hyphen-ated-surname or hyphenated-sur-name.
 *
 *  In this scenario the stock filter, with 'catenateAll' turned on, will
 *  generate individual tokens plus one combined token, but not
 *  sub-tokens like hyphenated surname and hyphenatedsur name.
 *
 *  So we add a new 'catenateShingles' to achieve this.
*/
{code}

Includes unit tests, and as is noted in one of them CATENATE_WORDS and 
CATENATE_SHINGLES are logically considered mutually exclusive for sensible 
usage and can cause duplicate tokens (although they should have the same 
positions etc).

I'm happy to work on it more if anyone finds problems with it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5722) Add catenateShingles option to WordDelimiterFilter

2014-02-12 Thread Greg Pendlebury (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Greg Pendlebury updated SOLR-5722:
--

Attachment: WDFconcatShingles.patch

Patch against trunk : http://svn.apache.org/repos/asf/lucene/dev/trunk
(r1567824)

Add catenateShingles option to WordDelimiterFilter
--

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4956) make maxBufferedAddsPerServer configurable

2013-08-13 Thread Greg Pendlebury (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739058#comment-13739058
]

Greg Pendlebury commented on SOLR-4956:
---

So it seems we have three options here:
1 make it configurable with a warning that if you change it it may lead to Bad
Stuff.

I'd support this solely from the perspective of testing its impact. Rebuilding
code to change a hardcoded integer is a tad annoying if you are just diagnosing
what impact things could have. We batch ingest several thousand documents at a
time into a 96 JVM cluster (32 shards * 3 replicas). I'd love to see if we
could lower CPU load by altering this setting... even if it is only a
diagnostic step that is at odds with long term goals related to batching at all.

make maxBufferedAddsPerServer configurable
--

Key: SOLR-4956
URL: https://issues.apache.org/jira/browse/SOLR-4956
Project: Solr
Issue Type: Improvement
Affects Versions: 4.3, 5.0
Reporter: Erick Erickson

Anecdotal user's list evidence indicates that in high-throughput situations,
the default of 10 docs/batch for inter-shard batching can generate
significant CPU load. See the thread titled Sharding and Replication on
June 19th, but the gist is below.
I haven't poked around, but it's a little surprising on the surface that Asif
is seeing this kind of difference. So I'm wondering if this change indicates
some other underlying issue. Regardless, this seems like it would be good to
investigate.
Here's the gist of Asif's experience from the thread:
Its a completely practical problem - we are exploring Solr to build a real
time analytics/data solution for a system handling about 1000 qps. We have
various metrics that are stored as different collections on the cloud,
which means very high amount of writes. The cloud also needs to support
about 300-400 qps.
We initially tested with a single Solr node on a 16 core / 24 GB box for a
single metric. We saw that writes were not a issue at all - Solr was
handling it extremely well. We were also able to achieve about 200 qps from
a single node.
When we set up the cloud ( a ensemble on 6 boxes), we saw very high CPU
usage on the replicas. Up to 10 cores were getting used for writes on the
replicas. Hence my concern with respect to batch updates for the replicas.
BTW, I altered the maxBufferedAddsPerServer to 1000 - and now CPU usage is
very similar to single node installation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR

2012-05-02 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267045#comment-13267045
 ] 

Greg Pendlebury commented on SOLR-2487:
---

@Neil, that's way better then the way I do things now. Thanks.

Maven continues to surprise me.

 Do not include slf4j-jdk14 jar in WAR
 -

 Key: SOLR-2487
 URL: https://issues.apache.org/jira/browse/SOLR-2487
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.2, 4.0
Reporter: Jan Høydahl
Assignee: Jan Høydahl
  Labels: logging, slf4j
 Fix For: 3.6, 4.0

 Attachments: SOLR-2487.patch, SOLR-2487.patch, SOLR-2487.patch


 I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help 
 newbies get up and running. But I find myself re-packaging the war for every 
 customer when adapting to their choice of logger framework, which is 
 counter-productive.
 It would be sufficient to have the jdk-logging binding in example/lib to let 
 the example and tutorial still work OOTB but as soon as you deploy solr.war 
 to production you're forced to explicitly decide what logging to use.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR

2011-08-14 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084928#comment-13084928
 ] 

Greg Pendlebury commented on SOLR-2487:
---

It would be great to have a skinny WAR available as a Maven artifact. At the 
moment there is no way in Maven to have it exclude the jdk14 JAR short of 
rebuilding and rehosting the WAR elsewhere. eg: 
http://www.jarvana.com/jarvana/browse/org/dspace/dependencies/solr/dspace-solr-webapp/1.4.1.0/

And to my knowledge at the moment, there is nothing like this available for 
v3.3.0

With a skinny WAR in Maven listing all the currently bundled dependencies the 
end result for most users would be identical, since Maven will go get them all 
for you anyway. Then people that don't want jdk14 can add this to their own 
project and they will get everything but that single dependency:
dependency
  groupIdorg.slf4j/groupId
  artifactIdslf4j-jdk/artifactId
  version1.6.1/version
  scopeprovided/scope
/dependency


 Do not include slf4j-jdk14 jar in WAR
 -

 Key: SOLR-2487
 URL: https://issues.apache.org/jira/browse/SOLR-2487
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.2, 4.0
Reporter: Jan Høydahl
  Labels: logging, slf4j

 I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help 
 newbies get up and running. But I find myself re-packaging the war for every 
 customer when adapting to their choice of logger framework, which is 
 counter-productive.
 It would be sufficient to have the jdk-logging binding in example/lib to let 
 the example and tutorial still work OOTB but as soon as you deploy solr.war 
 to production you're forced to explicitly decide what logging to use.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR

2011-08-14 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084967#comment-13084967
 ] 

Greg Pendlebury commented on SOLR-2487:
---

At the moment there is no way in Maven to have it exclude the jdk14 JAR... 
Hmm, I shouldn't have stated an absolute like that. I eventually got a script 
building today that dropped the WAR as a dependency, unpacked it to a '/solr' 
context folder, then nuked the jdk14 JAR only, leaving the rest in place.

I'd still prefer a skinny WAR, since it would be a much cleaner build script, 
and allow me to eliminate duplicate JARs on the classpath with greater ease. It 
would also be more in line with the spirit of how Maven is intended to work... 
but I have a workaround, and don't expect to world to conform to my wishes :)

 Do not include slf4j-jdk14 jar in WAR
 -

 Key: SOLR-2487
 URL: https://issues.apache.org/jira/browse/SOLR-2487
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.2, 4.0
Reporter: Jan Høydahl
  Labels: logging, slf4j

 I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help 
 newbies get up and running. But I find myself re-packaging the war for every 
 customer when adapting to their choice of logger framework, which is 
 counter-productive.
 It would be sufficient to have the jdk-logging binding in example/lib to let 
 the example and tutorial still work OOTB but as soon as you deploy solr.war 
 to production you're forced to explicitly decide what logging to use.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR

2011-08-14 Thread Greg Pendlebury (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084967#comment-13084967
]

Greg Pendlebury edited comment on SOLR-2487 at 8/15/11 5:22 AM:

At the moment there is no way in Maven to have it exclude the jdk14 JAR...
Hmm, I shouldn't have stated an absolute like that. I eventually got a script
building today that dropped the WAR as a dependency, unpacked it to a '/solr'
context folder, then nuked the jdk14 JAR only, leaving the rest in place.

I'd still prefer a skinny WAR, since it would be a much cleaner build script,
and allow me to eliminate duplicate/conflicting JARs on the classpath with
greater ease. It would also be more in line with the spirit of how Maven is
intended to work... but I have a workaround, and don't expect the world to
conform to my wishes :)

was (Author: greg.pendlebury):
At the moment there is no way in Maven to have it exclude the jdk14
JAR... Hmm, I shouldn't have stated an absolute like that. I eventually got a
script building today that dropped the WAR as a dependency, unpacked it to a
'/solr' context folder, then nuked the jdk14 JAR only, leaving the rest in
place.

I'd still prefer a skinny WAR, since it would be a much cleaner build script,
and allow me to eliminate duplicate JARs on the classpath with greater ease. It
would also be more in line with the spirit of how Maven is intended to work...
but I have a workaround, and don't expect to world to conform to my wishes :)

Do not include slf4j-jdk14 jar in WAR
-

Key: SOLR-2487
URL: https://issues.apache.org/jira/browse/SOLR-2487
Project: Solr
Issue Type: Improvement
Components: Build
Affects Versions: 3.2, 4.0
Reporter: Jan Høydahl
Labels: logging, slf4j

I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help
newbies get up and running. But I find myself re-packaging the war for every
customer when adapting to their choice of logger framework, which is
counter-productive.
It would be sufficient to have the jdk-logging binding in example/lib to let
the example and tutorial still work OOTB but as soon as you deploy solr.war
to production you're forced to explicitly decide what logging to use.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

40 matches

Mail list logo