[jira] [Commented] (SOLR-10856) ExtendedDismaxQParser (edismax) override OR when mm=100%
[ https://issues.apache.org/jira/browse/SOLR-10856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062195#comment-16062195 ] Greg Pendlebury commented on SOLR-10856: You are describing exactly what mm is supposed to do. The change made in SOLR-2649 was the root cause (deliberately... because of the bug caused by the inverse impact boolean operators had on mm), and SOLR-8812 was about choosing less disruptive default values when users are not specifying them. In this case, however you are explicitly requesting mm=100%... and getting answers that match. The short answer is don't use mm=100% if you want boolean logic. It is not feature compatible. The longer answer is nasty and would require delving into how boolean operators are truly handled by Solr when translated into OCCURS flags. The mm parameter operates on the SHOULD OCCUR flags, which is (roughly) what your OR terms are translated into. > ExtendedDismaxQParser (edismax) override OR when mm=100% > > > Key: SOLR-10856 > URL: https://issues.apache.org/jira/browse/SOLR-10856 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: 5.5, 6.0, 6.6 >Reporter: Sébastien LECACHEUR > > Since Solr 5.5.1, edismax parser override OR (with AND behavior) in queries > when mm=100%. This behavior is new from Solr 5.5.1 to 6.6.0. > Concerned query : > {code:none} > curl -s > 'http://localhost:8983/solr/mycorename/select?q=type_s%3A(A+OR+C)=json=edismax=100%25=true=true' > {code} > 1) Solr 5.4.1 : > {code:javascript} > "rawquerystring":"type_s:(A OR C)", > "querystring":"type_s:(A OR C)", > "parsedquery":"(+(type_s:A type_s:C))/no_coord", > "parsedquery_toString":"+(type_s:A type_s:C)", > "explain":{...}, > "QParser":"ExtendedDismaxQParser", > {code} > Returns docs as expected. > 2) Solr 5.5.1 : > {code:javascript} > "rawquerystring":"type_s:(A OR C)", > "querystring":"type_s:(A OR C)", > "parsedquery":"(+((type_s:A type_s:C)~2))/no_coord", > "parsedquery_toString":"+((type_s:A type_s:C)~2)", > "explain":{}, > "QParser":"ExtendedDismaxQParser", > {code} > Returns no results > 3) Solr 6.6.0 : > {code:javascript} > "rawquerystring":"type_s:(A OR C)", > "querystring":"type_s:(A OR C)", > "parsedquery":"(+(type_s:A type_s:C)~2)/no_coord", > "parsedquery_toString":"+((type_s:A type_s:C)~2)", > "explain":{}, > "QParser":"ExtendedDismaxQParser", > {code} > Returns no results > This bug looks like SOLR-8812 issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4823) Split LBHttpSolrServer into two classes one for the solrj use case and one for the solr cloud use case
[ https://issues.apache.org/jira/browse/SOLR-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15642463#comment-15642463 ] Greg Pendlebury commented on SOLR-4823: --- I know that my question will be off-topic for this particular issue, but it seems that it might be a viable launching point for a customization our team has been considering in-house. We were thinking of trying out the addition of one or more nodes in the cluster that had no allocated range hash in clusterstate (whether or not we needed to modify to code to achieve this we haven't looked yet). Their purpose would be to act as search entry points for the cluster with more stable JVM performance (because they manage no lucene segments) as well as internalizing cluster security at the OS level. Right now, in a 200 replica cluster we need to let any/all SolrJ clients have access to the ZK ensemble as well as ports on every replica. It also makes managing threading (such as in the default http client thread pool) annoying to configure and test for performance. With [~phloy]'s patch we could still make use of SolrJ, but just provide a small whitelist of our 'search nodes' and keep client-side requirements for searching very simple in terms of security and thread management. > Split LBHttpSolrServer into two classes one for the solrj use case and one > for the solr cloud use case > -- > > Key: SOLR-4823 > URL: https://issues.apache.org/jira/browse/SOLR-4823 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: philip hoy >Priority: Minor > Attachments: SOLR-4823.patch, SOLR-4823.patch > > > The LBHttpSolrServer has too many responsibilities. It could perhaps be > broken into two classes, one in solrj to be used in the place of an external > load balancer that balances across a known set of solr servers defined at > construction time and one in solr core to be used by the solr cloud > components that balances across servers dependant on the request. > To save code duplication, if much arises an abstract bass class could be > introduced in to solrj. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8016) CloudSolrClient has extremely verbose error logging
[ https://issues.apache.org/jira/browse/SOLR-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587251#comment-15587251 ] Greg Pendlebury commented on SOLR-8016: --- Not that I am aware of. I can see the problem still in our newest server (5.5.3). I like [~markrmil...@gmail.com]'s suggestion of lowering the log level to info. It is simple and we can filter it out via logging config. The deeper issues of whether the retry should even be attempted sound interesting to me, but I'd be happy to just not see the log entries. > CloudSolrClient has extremely verbose error logging > --- > > Key: SOLR-8016 > URL: https://issues.apache.org/jira/browse/SOLR-8016 > Project: Solr > Issue Type: Improvement > Components: clients - java >Affects Versions: 5.2.1, 6.0 >Reporter: Greg Pendlebury >Priority: Minor > Labels: easyfix > > CloudSolrClient has this error logging line which is fairly annoying: > {code} > log.error("Request to collection {} failed due to ("+errorCode+ > ") {}, retry? "+retryCount, collection, rootCause.toString()); > {code} > Given that this is a client library and then gets embedded into other > applications this line is very problematic to handle gracefully. In today's > example I was looking at, every failed search was logging over 100 lines, > including the full HTML response from the responding node in the cluster. > The resulting SolrServerException that comes out to our application is > handled appropriately but we can't stop this class complaining in logs > without suppressing the entire ERROR channel, which we don't want to do. This > is the only direct line writing to the log I could find in the client, so we > _could_ suppress errors, but that just feels dirty, and fragile for the > future. > From looking at the code I am fairly certain it is not as simple as throwing > an exception instead of logging... it is right in the middle of the method. I > suspect the simplest answer is adding a marker > (http://www.slf4j.org/api/org/slf4j/Marker.html) to the logging call. > Then solrj users can choose what to do with these log entries. I don't know > if there is a broader strategy for handling this that I am ignorant of; > apologies if that is the case. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND and mm is not explicitly set
[ https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325638#comment-15325638 ] Greg Pendlebury commented on SOLR-8812: --- Sounds great. Add my thanks to the those you've already received. > ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND and mm is > not explicitly set > - > > Key: SOLR-8812 > URL: https://issues.apache.org/jira/browse/SOLR-8812 > Project: Solr > Issue Type: Bug > Components: query parsers >Affects Versions: 5.5 >Reporter: Ryan Steinberg >Assignee: Steve Rowe > Fix For: 5.6, 6.1, 5.5.2, master (7.0), 6.0.2 > > Attachments: SOLR-8812-barbie.patch, SOLR-8812.patch, > SOLR-8812.patch, SOLR-8812.patch > > > The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior > is new to Solr 5.5.0 and an unexpected major change. > Example: > "q": "id:12345 OR zz", > "defType": "edismax", > "q.op": "AND", > where "12345" is a known document ID and "zz" is a string NOT present > in my data > Version 5.5.0 produces zero results: > "rawquerystring": "id:12345 OR zz", > "querystring": "id:12345 OR zz", > "parsedquery": "(+((id:12345 > DisjunctionMaxQuery((text:zz)))~2))/no_coord", > "parsedquery_toString": "+((id:12345 (text:zz))~2)", > "explain": {}, > "QParser": "ExtendedDismaxQParser" > Version 5.4.0 produces one result as expected > "rawquerystring": "id:12345 OR zz", > "querystring": "id:12345 OR zz", > "parsedquery": "(+(id:12345 > DisjunctionMaxQuery((text:zz/no_coord", > "parsedquery_toString": "+(id:12345 (text:zz))" > "explain": {}, > "QParser": "ExtendedDismaxQParser" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND and mm is not explicitly set
[ https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324063#comment-15324063 ] Greg Pendlebury commented on SOLR-8812: --- Sounds (tentatively) ok to me. I was quite concerned when you said it puts things back to pre-SOLR-2649 functionality, but from looking at what got committed it seems that q.op=OR is no longer hardcoded in setDefaultOperator() (which was fixed in SOLR-2649). I haven't executed anything, but this seems like a good step with regards to mm handling. > ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND and mm is > not explicitly set > - > > Key: SOLR-8812 > URL: https://issues.apache.org/jira/browse/SOLR-8812 > Project: Solr > Issue Type: Bug > Components: query parsers >Affects Versions: 5.5 >Reporter: Ryan Steinberg >Assignee: Steve Rowe > Fix For: 6.1, 5.5.2, 6.0.2 > > Attachments: SOLR-8812-barbie.patch, SOLR-8812.patch, > SOLR-8812.patch, SOLR-8812.patch > > > The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior > is new to Solr 5.5.0 and an unexpected major change. > Example: > "q": "id:12345 OR zz", > "defType": "edismax", > "q.op": "AND", > where "12345" is a known document ID and "zz" is a string NOT present > in my data > Version 5.5.0 produces zero results: > "rawquerystring": "id:12345 OR zz", > "querystring": "id:12345 OR zz", > "parsedquery": "(+((id:12345 > DisjunctionMaxQuery((text:zz)))~2))/no_coord", > "parsedquery_toString": "+((id:12345 (text:zz))~2)", > "explain": {}, > "QParser": "ExtendedDismaxQParser" > Version 5.4.0 produces one result as expected > "rawquerystring": "id:12345 OR zz", > "querystring": "id:12345 OR zz", > "parsedquery": "(+(id:12345 > DisjunctionMaxQuery((text:zz/no_coord", > "parsedquery_toString": "+(id:12345 (text:zz))" > "explain": {}, > "QParser": "ExtendedDismaxQParser" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators
[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292397#comment-15292397 ] Greg Pendlebury commented on SOLR-2649: --- [~rebeccatang], that sounds like expected behaviour. Your 'OR' operator is not being ignored; but rather, Solr translates OR operators into SHOULD occur flags (ie. optional search terms)... then, if 'mm' is set to 100%, this tells Solr that you require every optional search term to be present in the result set. If you are explicitly setting 'mm' you should use a different value if you want OR operators to function. Also see SOLR-8812, which discusses setting a better default value for 'mm', particularly one that changes depending on the 'q.op' parameter. Of course that only applies in the case where you are not explicitly setting 'mm'. > MM ignored in edismax queries with operators > > > Key: SOLR-2649 > URL: https://issues.apache.org/jira/browse/SOLR-2649 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Magnus Bergmark >Assignee: Erick Erickson > Fix For: 5.5, 6.0 > > Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, > SOLR-2649.diff, SOLR-2649.patch, SOLR-2649.patch > > > Hypothetical scenario: > 1. User searches for "stocks oil gold" with MM set to "50%" > 2. User adds "-stockings" to the query: "stocks oil gold -stockings" > 3. User gets no hits since MM was ignored and all terms where AND-ed > together > The behavior seems to be intentional, although the reason why is never > explained: > // For correct lucene queries, turn off mm processing if there > // were explicit operators (except for AND). > boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; > (lines 232-234 taken from > tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java) > This makes edismax unsuitable as an replacement to dismax; mm is one of the > primary features of dismax. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8955) ReplicationHandler should throttle across all requests instead of for each client
[ https://issues.apache.org/jira/browse/SOLR-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233113#comment-15233113 ] Greg Pendlebury commented on SOLR-8955: --- I like the idea, but maybe it should be configurable? If the master has multiple NICs than hard coding an arbitrary limit because two unrelated slaves from different network interfaces are both online would actually be more of a hindrance than an improvement. > ReplicationHandler should throttle across all requests instead of for each > client > - > > Key: SOLR-8955 > URL: https://issues.apache.org/jira/browse/SOLR-8955 > Project: Solr > Issue Type: Improvement > Components: replication (java), SolrCloud >Reporter: Shalin Shekhar Mangar > Labels: difficulty-easy, impact-medium, newdev > Fix For: master, 6.1 > > > SOLR-6485 added the ability to throttle the speed of replication but the > implementation rate limits each request. So e.g. the maxWriteMBPerSec is 1 > and 5 slaves request full replication then the effective transfer rate from > the master is 5 MB/second which is not what is often desired. > I propose to make the rate limit global (across all replication requests) > instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
[ https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223581#comment-15223581 ] Greg Pendlebury commented on SOLR-8812: --- [~erickerickson], personally, I am ambivalent with regards to timing and versions. I am still not convinced there is actually an issue here, but I don't want to be a dick and dismiss it out-of-hand. The patches provided are simply about choosing default parameter values that disrupt the least number of users who did not have mm set to an appropriate value. Any user (risky, broad generalisation incoming) who puts a boolean OR operator into an edismax query string would not want mm=100%, but that is what is happening here. > ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND > > > Key: SOLR-8812 > URL: https://issues.apache.org/jira/browse/SOLR-8812 > Project: Solr > Issue Type: Bug > Components: query parsers >Affects Versions: 5.5 >Reporter: Ryan Steinberg >Assignee: Erick Erickson >Priority: Blocker > Fix For: 6.0, 5.5.1 > > Attachments: SOLR-8812-barbie.patch, SOLR-8812.patch, SOLR-8812.patch > > > The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior > is new to Solr 5.5.0 and an unexpected major change. > Example: > "q": "id:12345 OR zz", > "defType": "edismax", > "q.op": "AND", > where "12345" is a known document ID and "zz" is a string NOT present > in my data > Version 5.5.0 produces zero results: > "rawquerystring": "id:12345 OR zz", > "querystring": "id:12345 OR zz", > "parsedquery": "(+((id:12345 > DisjunctionMaxQuery((text:zz)))~2))/no_coord", > "parsedquery_toString": "+((id:12345 (text:zz))~2)", > "explain": {}, > "QParser": "ExtendedDismaxQParser" > Version 5.4.0 produces one result as expected > "rawquerystring": "id:12345 OR zz", > "querystring": "id:12345 OR zz", > "parsedquery": "(+(id:12345 > DisjunctionMaxQuery((text:zz/no_coord", > "parsedquery_toString": "+(id:12345 (text:zz))" > "explain": {}, > "QParser": "ExtendedDismaxQParser" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
[ https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Pendlebury updated SOLR-8812: -- Attachment: SOLR-8812-barbie.patch Adding a 'hair ties -barbie' example to unit tests. Not sure it demonstrates anything new, but it does work as I would expect. I can't get git to generate a combined patch the way I would have in svn... my git-fu is weak. > ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND > > > Key: SOLR-8812 > URL: https://issues.apache.org/jira/browse/SOLR-8812 > Project: Solr > Issue Type: Bug > Components: query parsers >Affects Versions: 5.5 >Reporter: Ryan Steinberg >Assignee: Erick Erickson >Priority: Blocker > Fix For: 6.0, 5.5.1 > > Attachments: SOLR-8812-barbie.patch, SOLR-8812.patch, SOLR-8812.patch > > > The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior > is new to Solr 5.5.0 and an unexpected major change. > Example: > "q": "id:12345 OR zz", > "defType": "edismax", > "q.op": "AND", > where "12345" is a known document ID and "zz" is a string NOT present > in my data > Version 5.5.0 produces zero results: > "rawquerystring": "id:12345 OR zz", > "querystring": "id:12345 OR zz", > "parsedquery": "(+((id:12345 > DisjunctionMaxQuery((text:zz)))~2))/no_coord", > "parsedquery_toString": "+((id:12345 (text:zz))~2)", > "explain": {}, > "QParser": "ExtendedDismaxQParser" > Version 5.4.0 produces one result as expected > "rawquerystring": "id:12345 OR zz", > "querystring": "id:12345 OR zz", > "parsedquery": "(+(id:12345 > DisjunctionMaxQuery((text:zz/no_coord", > "parsedquery_toString": "+(id:12345 (text:zz))" > "explain": {}, > "QParser": "ExtendedDismaxQParser" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
[ https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15221079#comment-15221079 ] Greg Pendlebury commented on SOLR-8812: --- I also confirmed (for my own sanity) that q.op does indeed influence the default value of mm, as per [~janhoy]. Personally I don't like that, and perhaps it isn't relevant anymore since SOLR-2649... but I left it alone. > ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND > > > Key: SOLR-8812 > URL: https://issues.apache.org/jira/browse/SOLR-8812 > Project: Solr > Issue Type: Bug > Components: query parsers >Affects Versions: 5.5 >Reporter: Ryan Steinberg >Assignee: Erick Erickson >Priority: Blocker > Fix For: 6.0, 5.5.1 > > Attachments: SOLR-8812.patch, SOLR-8812.patch > > > The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior > is new to Solr 5.5.0 and an unexpected major change. > Example: > "q": "id:12345 OR zz", > "defType": "edismax", > "q.op": "AND", > where "12345" is a known document ID and "zz" is a string NOT present > in my data > Version 5.5.0 produces zero results: > "rawquerystring": "id:12345 OR zz", > "querystring": "id:12345 OR zz", > "parsedquery": "(+((id:12345 > DisjunctionMaxQuery((text:zz)))~2))/no_coord", > "parsedquery_toString": "+((id:12345 (text:zz))~2)", > "explain": {}, > "QParser": "ExtendedDismaxQParser" > Version 5.4.0 produces one result as expected > "rawquerystring": "id:12345 OR zz", > "querystring": "id:12345 OR zz", > "parsedquery": "(+(id:12345 > DisjunctionMaxQuery((text:zz/no_coord", > "parsedquery_toString": "+(id:12345 (text:zz))" > "explain": {}, > "QParser": "ExtendedDismaxQParser" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
[ https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Pendlebury updated SOLR-8812: -- Attachment: SOLR-8812.patch Attaching possible 'fix' that defaults mm to 0% if the users has declared no explicit mm, but has boolean operators in their query. First time I have generated a patch using git, so hopefully it is ok. > ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND > > > Key: SOLR-8812 > URL: https://issues.apache.org/jira/browse/SOLR-8812 > Project: Solr > Issue Type: Bug > Components: query parsers >Affects Versions: 5.5 >Reporter: Ryan Steinberg >Assignee: Erick Erickson >Priority: Blocker > Fix For: 6.0, 5.5.1 > > Attachments: SOLR-8812.patch, SOLR-8812.patch > > > The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior > is new to Solr 5.5.0 and an unexpected major change. > Example: > "q": "id:12345 OR zz", > "defType": "edismax", > "q.op": "AND", > where "12345" is a known document ID and "zz" is a string NOT present > in my data > Version 5.5.0 produces zero results: > "rawquerystring": "id:12345 OR zz", > "querystring": "id:12345 OR zz", > "parsedquery": "(+((id:12345 > DisjunctionMaxQuery((text:zz)))~2))/no_coord", > "parsedquery_toString": "+((id:12345 (text:zz))~2)", > "explain": {}, > "QParser": "ExtendedDismaxQParser" > Version 5.4.0 produces one result as expected > "rawquerystring": "id:12345 OR zz", > "querystring": "id:12345 OR zz", > "parsedquery": "(+(id:12345 > DisjunctionMaxQuery((text:zz/no_coord", > "parsedquery_toString": "+(id:12345 (text:zz))" > "explain": {}, > "QParser": "ExtendedDismaxQParser" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
[ https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220925#comment-15220925 ] Greg Pendlebury commented on SOLR-8812: --- Ok, I will try to find some time over the next week or so. I freely confess it doesn't look great on a Friday afternoon and school holidays begin here after next week. It might be a rough contribution someone else can carry over the line. With regards to mixed cases of q.op and mm where users are explicitly setting them, I think they are already covered if you look in the unit test testDefaultOperatorWithMm(). The problem here seems to be the use case where people do not explicitly set mm and fall back to the default. This is treading on some expected behaviour from existing users. > ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND > > > Key: SOLR-8812 > URL: https://issues.apache.org/jira/browse/SOLR-8812 > Project: Solr > Issue Type: Bug > Components: query parsers >Affects Versions: 5.5 >Reporter: Ryan Steinberg >Assignee: Erick Erickson >Priority: Blocker > Fix For: 6.0, 5.5.1 > > Attachments: SOLR-8812.patch > > > The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior > is new to Solr 5.5.0 and an unexpected major change. > Example: > "q": "id:12345 OR zz", > "defType": "edismax", > "q.op": "AND", > where "12345" is a known document ID and "zz" is a string NOT present > in my data > Version 5.5.0 produces zero results: > "rawquerystring": "id:12345 OR zz", > "querystring": "id:12345 OR zz", > "parsedquery": "(+((id:12345 > DisjunctionMaxQuery((text:zz)))~2))/no_coord", > "parsedquery_toString": "+((id:12345 (text:zz))~2)", > "explain": {}, > "QParser": "ExtendedDismaxQParser" > Version 5.4.0 produces one result as expected > "rawquerystring": "id:12345 OR zz", > "querystring": "id:12345 OR zz", > "parsedquery": "(+(id:12345 > DisjunctionMaxQuery((text:zz/no_coord", > "parsedquery_toString": "+(id:12345 (text:zz))" > "explain": {}, > "QParser": "ExtendedDismaxQParser" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
[ https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220854#comment-15220854 ] Greg Pendlebury commented on SOLR-8812: --- I don't know that what we are talking about here is a 'workaround' at all. Solr is doing exactly what it is being asked to do. I know it is disrupting an existing user base, so it warrants discussion and maybe even a 'fix'... but the existing user base were leaving a non-configured parameter at its default value (which probably didn't match their use case) and it only worked because the parameter was being ignored by edismax. The fact that parameter was ignored introduced the real bugs in SOLR-2649. I think there has always been confusion over how this works under the hood, and that still continues. q.op and mm apply to two different parts of the query, and each of them has other factors that come into play. * q.op is a boolean operator, which happens pre-parse (or in the very earliest stages of parsing) * mm applies to (top level) clauses which have the SHOULD occur flag *after* Solr translates all the boolean operators * if mm is not explicitly set, the default value is determined by q.op (? I haven't verified this, but that is Jan's input above). The old doco says it is always 100% default... but I personally have always set it explicitly... no experience. * Solr translates boolean operators into occurs flags differently depending on the value of q.op. In particular q.op=AND causes non-intuitive generation of occurs flags if looked at from a purely boolean perspective. * mm does not make much sense at all if you think about search as a purely boolean query (ie. the result either matches or doesn't) instead of occurs flags (ie. the score of the result is either higher or lower) So now that SOLR-2649 has come along, it slightly muddies the water because: * q.op is no longer hard coded to OR. Pre-patch the user could say q.op=AND, but it didn't do anything to the query * The presence of an operator no longer turns off the mm feature *My take on the issue is that users who want to use boolean operators in edismax should pay attention to the mm parameter, and make sure their choice matches their use case*. Previously they didn't have to... but the presence of the boolean operators when using edismax was buggy (? debatable... it has been argued that it simply wasn't the use case edismax was first written for). Having said that, IF anything was to change, I would simply play subtly with choosing the default value of mm. Maybe something like this: IF (the query contains a boolean operator) AND (mm has not been explicitly set) THEN (mm = 0%) It is a tweak on the work Jan did in SOLR-2649, so that instead of turning off mm in response to a boolean operator being present, we instead influence the default value. We still let users ultimately set up their parameters however they want though. If the user has a use case that includes both boolean parameters and mm logic... have fun. > ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND > > > Key: SOLR-8812 > URL: https://issues.apache.org/jira/browse/SOLR-8812 > Project: Solr > Issue Type: Bug > Components: query parsers >Affects Versions: 5.5 >Reporter: Ryan Steinberg >Assignee: Erick Erickson >Priority: Blocker > Fix For: 6.0, 5.5.1 > > Attachments: SOLR-8812.patch > > > The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior > is new to Solr 5.5.0 and an unexpected major change. > Example: > "q": "id:12345 OR zz", > "defType": "edismax", > "q.op": "AND", > where "12345" is a known document ID and "zz" is a string NOT present > in my data > Version 5.5.0 produces zero results: > "rawquerystring": "id:12345 OR zz", > "querystring": "id:12345 OR zz", > "parsedquery": "(+((id:12345 > DisjunctionMaxQuery((text:zz)))~2))/no_coord", > "parsedquery_toString": "+((id:12345 (text:zz))~2)", > "explain": {}, > "QParser": "ExtendedDismaxQParser" > Version 5.4.0 produces one result as expected > "rawquerystring": "id:12345 OR zz", > "querystring": "id:12345 OR zz", > "parsedquery": "(+(id:12345 > DisjunctionMaxQuery((text:zz/no_coord", > "parsedquery_toString": "+(id:12345 (text:zz))" > "explain": {}, > "QParser": "ExtendedDismaxQParser" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
[ https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219133#comment-15219133 ] Greg Pendlebury commented on SOLR-8812: --- Thanks. Hopefully that is ok. I just installed git and started cloning trunk... now to upgrade to Java 8. I think it is all working as intended, it is just that there is a confusing legacy of not having to worry about what mm was set to for some use cases. SOLR-2649 will force people to check what the parameters are, but all queries are now supported. It would be nice if it was less disruptive, but given that pre-patch there was no way to get edismax to do certain queries, no matter what parameters you set, I think it is still an improvement. > ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND > > > Key: SOLR-8812 > URL: https://issues.apache.org/jira/browse/SOLR-8812 > Project: Solr > Issue Type: Bug > Components: query parsers >Affects Versions: 5.5 >Reporter: Ryan Steinberg >Assignee: Erick Erickson >Priority: Blocker > Fix For: 6.0, 5.5.1 > > Attachments: SOLR-8812.patch > > > The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior > is new to Solr 5.5.0 and an unexpected major change. > Example: > "q": "id:12345 OR zz", > "defType": "edismax", > "q.op": "AND", > where "12345" is a known document ID and "zz" is a string NOT present > in my data > Version 5.5.0 produces zero results: > "rawquerystring": "id:12345 OR zz", > "querystring": "id:12345 OR zz", > "parsedquery": "(+((id:12345 > DisjunctionMaxQuery((text:zz)))~2))/no_coord", > "parsedquery_toString": "+((id:12345 (text:zz))~2)", > "explain": {}, > "QParser": "ExtendedDismaxQParser" > Version 5.4.0 produces one result as expected > "rawquerystring": "id:12345 OR zz", > "querystring": "id:12345 OR zz", > "parsedquery": "(+(id:12345 > DisjunctionMaxQuery((text:zz/no_coord", > "parsedquery_toString": "+(id:12345 (text:zz))" > "explain": {}, > "QParser": "ExtendedDismaxQParser" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
[ https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219086#comment-15219086 ] Greg Pendlebury commented on SOLR-8812: --- I am happy to take a look at any issues, since I was involved in SOLR-2649. I need to get a new copy of the code first, but in the interim, can someone confirm that explicitly setting mm to 0 does not fix this? I believe mm defaults to 100%, so that may be the real culprit, as opposed to q.op=AND. Before SOLR-2649 was resolved, setting an OR operator would have caused mm to be ignored. Now it will use the default value unless you set it explicitly. Our production servers are using 5.1 with SOLR-2649 applied, and we have q.op=AND, with perfectly functional OR operators and mm=0%. All of the obvious queries work, including the cases referenced above. >From memory there are a lot of subtle cliffs to fall off here, such as making >sure we are talking about top level clauses and ultimately remembering that >Solr does not use boolean logic... and there are some edge cases where it >simply doesn't work the same way as the occurs flags. SHOULD vs OR is the main >culprit. > ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND > > > Key: SOLR-8812 > URL: https://issues.apache.org/jira/browse/SOLR-8812 > Project: Solr > Issue Type: Bug > Components: query parsers >Affects Versions: 5.5 >Reporter: Ryan Steinberg >Assignee: Erick Erickson >Priority: Blocker > Fix For: 6.0, 5.5.1 > > Attachments: SOLR-8812.patch > > > The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior > is new to Solr 5.5.0 and an unexpected major change. > Example: > "q": "id:12345 OR zz", > "defType": "edismax", > "q.op": "AND", > where "12345" is a known document ID and "zz" is a string NOT present > in my data > Version 5.5.0 produces zero results: > "rawquerystring": "id:12345 OR zz", > "querystring": "id:12345 OR zz", > "parsedquery": "(+((id:12345 > DisjunctionMaxQuery((text:zz)))~2))/no_coord", > "parsedquery_toString": "+((id:12345 (text:zz))~2)", > "explain": {}, > "QParser": "ExtendedDismaxQParser" > Version 5.4.0 produces one result as expected > "rawquerystring": "id:12345 OR zz", > "querystring": "id:12345 OR zz", > "parsedquery": "(+(id:12345 > DisjunctionMaxQuery((text:zz/no_coord", > "parsedquery_toString": "+(id:12345 (text:zz))" > "explain": {}, > "QParser": "ExtendedDismaxQParser" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators
[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055202#comment-15055202 ] Greg Pendlebury commented on SOLR-2649: --- [~erickerickson] thanks for this! > MM ignored in edismax queries with operators > > > Key: SOLR-2649 > URL: https://issues.apache.org/jira/browse/SOLR-2649 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Magnus Bergmark >Assignee: Erick Erickson > Fix For: 4.9, Trunk > > Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, > SOLR-2649.diff, SOLR-2649.patch, SOLR-2649.patch > > > Hypothetical scenario: > 1. User searches for "stocks oil gold" with MM set to "50%" > 2. User adds "-stockings" to the query: "stocks oil gold -stockings" > 3. User gets no hits since MM was ignored and all terms where AND-ed > together > The behavior seems to be intentional, although the reason why is never > explained: > // For correct lucene queries, turn off mm processing if there > // were explicit operators (except for AND). > boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; > (lines 232-234 taken from > tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java) > This makes edismax unsuitable as an replacement to dismax; mm is one of the > primary features of dismax. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators
[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039730#comment-15039730 ] Greg Pendlebury commented on SOLR-2649: --- I just ran it against out test system (patched Solr 5.1.0): (A OR B OR C) "D E" 1) Using mm=100%, q.op=AND and searching just the fulltext field. RAW debug: {code} (+(+(DisjunctionMaxQuery((fulltext:a)) DisjunctionMaxQuery((fulltext:b)) DisjunctionMaxQuery((fulltext:c))) +DisjunctionMaxQuery((fulltext:\"d e\" {code} I read that as: {code} +(a b c) +("d e") {code} which looks correct 2) switching to q.op=OR. RAW debug: {code} (+(((DisjunctionMaxQuery((fulltext:a)) DisjunctionMaxQuery((fulltext:b)) DisjunctionMaxQuery((fulltext:c))) DisjunctionMaxQuery((fulltext:\"d e\")))~2)) {code} I read that as: {code} ((a b c) "d e")~2 {code} Which again looks correct... but we don't generally use OR, so I could be wrong 3) Finally, lowered mm to 50%, again with q.op=OR. RAW debug: {code} (+(((DisjunctionMaxQuery((fulltext:a)) DisjunctionMaxQuery((fulltext:b)) DisjunctionMaxQuery((fulltext:c))) DisjunctionMaxQuery((fulltext:\"d e\")))~1)) {code} I read that as: {code} ((a b c) "d e")~1 {code} Still looks good. > MM ignored in edismax queries with operators > > > Key: SOLR-2649 > URL: https://issues.apache.org/jira/browse/SOLR-2649 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Magnus Bergmark >Assignee: Erick Erickson > Fix For: 4.9, Trunk > > Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, > SOLR-2649.diff, SOLR-2649.patch > > > Hypothetical scenario: > 1. User searches for "stocks oil gold" with MM set to "50%" > 2. User adds "-stockings" to the query: "stocks oil gold -stockings" > 3. User gets no hits since MM was ignored and all terms where AND-ed > together > The behavior seems to be intentional, although the reason why is never > explained: > // For correct lucene queries, turn off mm processing if there > // were explicit operators (except for AND). > boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; > (lines 232-234 taken from > tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java) > This makes edismax unsuitable as an replacement to dismax; mm is one of the > primary features of dismax. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators
[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038626#comment-15038626 ] Greg Pendlebury commented on SOLR-2649: --- I tried Jan's patch, and (whilst it is technically correct) it did not improve the usefulness of edismax without also addressing how q.op is handled. We continued to see absurd search results that failed UAT. The combined patch with both has been on our prod servers since May 2014 without any problems, but I have not heard any feedback from others that might have tried it. The corpus is nearly 200 million fulltext newspapers: http://trove.nla.gov.au/newspaper/result?q= > MM ignored in edismax queries with operators > > > Key: SOLR-2649 > URL: https://issues.apache.org/jira/browse/SOLR-2649 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Magnus Bergmark >Assignee: Erick Erickson > Fix For: 4.9, Trunk > > Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, > SOLR-2649.diff, SOLR-2649.patch > > > Hypothetical scenario: > 1. User searches for "stocks oil gold" with MM set to "50%" > 2. User adds "-stockings" to the query: "stocks oil gold -stockings" > 3. User gets no hits since MM was ignored and all terms where AND-ed > together > The behavior seems to be intentional, although the reason why is never > explained: > // For correct lucene queries, turn off mm processing if there > // were explicit operators (except for AND). > boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; > (lines 232-234 taken from > tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java) > This makes edismax unsuitable as an replacement to dismax; mm is one of the > primary features of dismax. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators
[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038688#comment-15038688 ] Greg Pendlebury commented on SOLR-2649: --- Mine shows as 18th Feb, but I assume that is just timezones. Assuming we are talking about the same patch, then, no, that is my patch (both of the 'with-Qop' patches are from me). Jan, submitted the earlier 2014 patch which I used as a baseline to add the q.op change as well. > MM ignored in edismax queries with operators > > > Key: SOLR-2649 > URL: https://issues.apache.org/jira/browse/SOLR-2649 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Magnus Bergmark >Assignee: Erick Erickson > Fix For: 4.9, Trunk > > Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, > SOLR-2649.diff, SOLR-2649.patch > > > Hypothetical scenario: > 1. User searches for "stocks oil gold" with MM set to "50%" > 2. User adds "-stockings" to the query: "stocks oil gold -stockings" > 3. User gets no hits since MM was ignored and all terms where AND-ed > together > The behavior seems to be intentional, although the reason why is never > explained: > // For correct lucene queries, turn off mm processing if there > // were explicit operators (except for AND). > boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; > (lines 232-234 taken from > tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java) > This makes edismax unsuitable as an replacement to dismax; mm is one of the > primary features of dismax. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-2649) MM ignored in edismax queries with operators
[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038626#comment-15038626 ] Greg Pendlebury edited comment on SOLR-2649 at 12/3/15 9:31 PM: I tried Jan's patch, and (whilst it is technically correct) it did not improve the usefulness of edismax without also addressing how q.op is handled. We continued to see absurd search results that failed UAT. The combined patch with both has been on our prod servers since May 2014 without any problems, but I have not heard any feedback from others that might have tried it. The corpus is nearly 200 million fulltext newspaper articles: http://trove.nla.gov.au/newspaper/result?q= was (Author: gpendleb): I tried Jan's patch, and (whilst it is technically correct) it did not improve the usefulness of edismax without also addressing how q.op is handled. We continued to see absurd search results that failed UAT. The combined patch with both has been on our prod servers since May 2014 without any problems, but I have not heard any feedback from others that might have tried it. The corpus is nearly 200 million fulltext newspapers: http://trove.nla.gov.au/newspaper/result?q= > MM ignored in edismax queries with operators > > > Key: SOLR-2649 > URL: https://issues.apache.org/jira/browse/SOLR-2649 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Magnus Bergmark >Assignee: Erick Erickson > Fix For: 4.9, Trunk > > Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, > SOLR-2649.diff, SOLR-2649.patch > > > Hypothetical scenario: > 1. User searches for "stocks oil gold" with MM set to "50%" > 2. User adds "-stockings" to the query: "stocks oil gold -stockings" > 3. User gets no hits since MM was ignored and all terms where AND-ed > together > The behavior seems to be intentional, although the reason why is never > explained: > // For correct lucene queries, turn off mm processing if there > // were explicit operators (except for AND). > boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; > (lines 232-234 taken from > tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java) > This makes edismax unsuitable as an replacement to dismax; mm is one of the > primary features of dismax. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979672#comment-14979672 ] Greg Pendlebury commented on SOLR-3274: --- FWIW we ran into this issue today as well, and nothing worked until ZK was restarted. I would love to think that Solr could detect this issue, but it smells like a ZK bug to me. > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-ALPHA > Environment: Any >Reporter: Per Steffensen > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the collaboration between Solr and ZooKeeper. I have to say > that I havnt been there to check ZooKeeper "at the moment where those > exception occur", but basically I dont believe the exceptions occur because > ZooKeeper is not running stable - at least when I go and check ZooKeeper > through other "channels" (e.g. my eclipse ZK plugin) it is always accepting > my connection and generally seems to be doing fine. > Exception 1) Often the first error we see in solr.log is something like this > {code} > Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - > Updates are disabled. > at > org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > I believe this error basically occurs because SolrZkClient.isConnected > reports false, which means that its internal "keeper.getState" does not > return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED > for a long time, since this error starts occuring after several hours of > processing without this problem showing. But why is it suddenly not connected > anymore?! > Exception 2) We also see errors like the following, and if Im not mistaken, > they start occuring shortly after "Exception 1)" (above) shows for the fist > time > {code} > Mar
[jira] [Commented] (SOLR-8016) CloudSolrClient has extremely verbose error logging
[ https://issues.apache.org/jira/browse/SOLR-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735752#comment-14735752 ] Greg Pendlebury commented on SOLR-8016: --- Lowering the level to INFO would be good in our case, although when you say that after all the retries it will eventually error would just delay the event... unless the error is thrown instead of logged. The Solr nodes were in a bad way and needed intervention from sysadmins because of locked index segments from a graceless shutdown. Under this scenario, the UI clients were logging enormous amounts of useless content ('rootCause.toString()') and making finding other lines in the log very difficult. Because the client also throws Exceptions we had already gracefully handled the outage by degrading functionality. With regards to Markers I have never used them personally, but before I suggested them I looked at the fact that both log4j and logback support them via slf4j. This covers both the solr default (log4j) and the binding we use in production (logback) so I am selfishly happy with the possibility... and I think it is the simplest change. I didn't want to propose a rethink of the logging, or that method's flow, but I am happy if this prompts that as well. > CloudSolrClient has extremely verbose error logging > --- > > Key: SOLR-8016 > URL: https://issues.apache.org/jira/browse/SOLR-8016 > Project: Solr > Issue Type: Improvement > Components: clients - java >Affects Versions: 5.2.1, Trunk >Reporter: Greg Pendlebury >Priority: Minor > Labels: easyfix > > CloudSolrClient has this error logging line which is fairly annoying: > {code} > log.error("Request to collection {} failed due to ("+errorCode+ > ") {}, retry? "+retryCount, collection, rootCause.toString()); > {code} > Given that this is a client library and then gets embedded into other > applications this line is very problematic to handle gracefully. In today's > example I was looking at, every failed search was logging over 100 lines, > including the full HTML response from the responding node in the cluster. > The resulting SolrServerException that comes out to our application is > handled appropriately but we can't stop this class complaining in logs > without suppressing the entire ERROR channel, which we don't want to do. This > is the only direct line writing to the log I could find in the client, so we > _could_ suppress errors, but that just feels dirty, and fragile for the > future. > From looking at the code I am fairly certain it is not as simple as throwing > an exception instead of logging... it is right in the middle of the method. I > suspect the simplest answer is adding a marker > (http://www.slf4j.org/api/org/slf4j/Marker.html) to the logging call. > Then solrj users can choose what to do with these log entries. I don't know > if there is a broader strategy for handling this that I am ignorant of; > apologies if that is the case. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8016) CloudSolrClient has extremely verbose error logging
[ https://issues.apache.org/jira/browse/SOLR-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735813#comment-14735813 ] Greg Pendlebury commented on SOLR-8016: --- I haven't looked at the innards of the method enough to say for sure. I know in our particular use case it is fruitless to keep trying. The nodes are online, but cannot answer in the way expected: {code} ERROR o.a.s.c.s.i.CloudSolrClient - Request to collection trove failed due to (500) org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at /solr/trove: Expected mime type application/octet-stream but got text/html. Error 500 {msg=SolrCore 'trove' is not available due to init failure: Index locked for write for core trove,trace=org.apache.solr.common.SolrException: SolrCore 'trove' is not available due to init failure: Index locked for write for core trove {code} And then lots and lots more html output. The Exception that bubbles up to our code is more than enough for us know where to start looking: {code} ERROR a.g.n.n.c.r.SolrService - Solr search failed: No live SolrServers available to handle this request:[] {code} > CloudSolrClient has extremely verbose error logging > --- > > Key: SOLR-8016 > URL: https://issues.apache.org/jira/browse/SOLR-8016 > Project: Solr > Issue Type: Improvement > Components: clients - java >Affects Versions: 5.2.1, Trunk >Reporter: Greg Pendlebury >Priority: Minor > Labels: easyfix > > CloudSolrClient has this error logging line which is fairly annoying: > {code} > log.error("Request to collection {} failed due to ("+errorCode+ > ") {}, retry? "+retryCount, collection, rootCause.toString()); > {code} > Given that this is a client library and then gets embedded into other > applications this line is very problematic to handle gracefully. In today's > example I was looking at, every failed search was logging over 100 lines, > including the full HTML response from the responding node in the cluster. > The resulting SolrServerException that comes out to our application is > handled appropriately but we can't stop this class complaining in logs > without suppressing the entire ERROR channel, which we don't want to do. This > is the only direct line writing to the log I could find in the client, so we > _could_ suppress errors, but that just feels dirty, and fragile for the > future. > From looking at the code I am fairly certain it is not as simple as throwing > an exception instead of logging... it is right in the middle of the method. I > suspect the simplest answer is adding a marker > (http://www.slf4j.org/api/org/slf4j/Marker.html) to the logging call. > Then solrj users can choose what to do with these log entries. I don't know > if there is a broader strategy for handling this that I am ignorant of; > apologies if that is the case. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8016) CloudSolrClient has extremely verbose error logging
Greg Pendlebury created SOLR-8016: - Summary: CloudSolrClient has extremely verbose error logging Key: SOLR-8016 URL: https://issues.apache.org/jira/browse/SOLR-8016 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: 5.2.1, Trunk Reporter: Greg Pendlebury Priority: Minor CloudSolrClient has this error logging line which is fairly annoying: {code} log.error("Request to collection {} failed due to ("+errorCode+ ") {}, retry? "+retryCount, collection, rootCause.toString()); {code} Given that this is a client library and then gets embedded into other applications this line is very problematic to handle gracefully. In today's example I was looking at, every failed search was logging over 100 lines, including the full HTML response from the responding node in the cluster. The resulting SolrServerException that comes out to our application is handled appropriately but we can't stop this class complaining in logs without suppressing the entire ERROR channel, which we don't want to do. This is the only direct line writing to the log I could find in the client, so we _could_ suppress errors, but that just feels dirty, and fragile for the future. >From looking at the code I am fairly certain it is not as simple as throwing >an exception instead of logging... it is right in the middle of the method. I >suspect the simplest answer is adding a marker >(http://www.slf4j.org/api/org/slf4j/Marker.html) to the logging call. Then solrj users can choose what to do with these log entries. I don't know if there is a broader strategy for handling this that I am ignorant of; apologies if that is the case. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2649) MM ignored in edismax queries with operators
[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Pendlebury updated SOLR-2649: -- Attachment: SOLR-2649-with-Qop.patch Replacement patch for 'SOLR-2649-with-Qop.patch' against current trunk. MM ignored in edismax queries with operators Key: SOLR-2649 URL: https://issues.apache.org/jira/browse/SOLR-2649 Project: Solr Issue Type: Bug Components: query parsers Reporter: Magnus Bergmark Assignee: Erick Erickson Priority: Minor Fix For: 4.9, Trunk Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, SOLR-2649.diff, SOLR-2649.patch Hypothetical scenario: 1. User searches for stocks oil gold with MM set to 50% 2. User adds -stockings to the query: stocks oil gold -stockings 3. User gets no hits since MM was ignored and all terms where AND-ed together The behavior seems to be intentional, although the reason why is never explained: // For correct lucene queries, turn off mm processing if there // were explicit operators (except for AND). boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; (lines 232-234 taken from tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java) This makes edismax unsuitable as an replacement to dismax; mm is one of the primary features of dismax. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators
[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322219#comment-14322219 ] Greg Pendlebury commented on SOLR-2649: --- Thanks Erick, I can recreate the SOLR-2649-with-Qop.patch this week (today looks pretty busy sorry). Just updating trunk now. Jan's SOLR-2649 patch is technically correct from everything I have looked at, but it actually makes the eDismax parser very confusing for novice end users. Our investigation seemed to indicate that the problems stem from the steps taken by Lucene/Solr to convert boolean OR operators to the SHOULD occur flags (but running off memory here). This is made very obvious by the fact that eDismax is hard coded to use OR as the default operator. We were simply tea leaf gazing, but our assumption is that this confusion may have been the original cause for disabling 'mm' when operators were present. So the patch we submitted simply does the same as Jan's, but also makes eDismax read the default operator from the 'q.op' parameter. With access to both parameters we have always been able to respond meaningfully to the queries our users are submitting. MM ignored in edismax queries with operators Key: SOLR-2649 URL: https://issues.apache.org/jira/browse/SOLR-2649 Project: Solr Issue Type: Bug Components: query parsers Reporter: Magnus Bergmark Assignee: Erick Erickson Priority: Minor Fix For: 4.9, Trunk Attachments: SOLR-2649-with-Qop.patch, SOLR-2649.diff, SOLR-2649.patch Hypothetical scenario: 1. User searches for stocks oil gold with MM set to 50% 2. User adds -stockings to the query: stocks oil gold -stockings 3. User gets no hits since MM was ignored and all terms where AND-ed together The behavior seems to be intentional, although the reason why is never explained: // For correct lucene queries, turn off mm processing if there // were explicit operators (except for AND). boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; (lines 232-234 taken from tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java) This makes edismax unsuitable as an replacement to dismax; mm is one of the primary features of dismax. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5722) Add catenateShingles option to WordDelimiterFilter
[ https://issues.apache.org/jira/browse/SOLR-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903781#comment-13903781 ] Greg Pendlebury edited comment on SOLR-5722 at 2/13/15 3:23 AM: The link to the doco is working for me today so I took a quick look. I think the other reason that the HyphenatedWordsFilter is not suitable is that it removes the hyphen from the material assuming that it can only have one meaning. The specific circumstances I am considering is when the hyphen is part of a legitimately hyphenated word that just happen to break across a line wrap. eg. 'up-\{\n\}to-date' The HyphenatedWordsFilter would turn this into 'upto-date', and cause user searches of 'up to date' to not match, since no filters later in the chain can really pull 'upto' apart again. Whereas the 'catenateShingles' option is intended to preserve the word delimiter and provide all the permutations a user might type to find that term: up to date, upto date, up todate, uptodate was (Author: gpendleb): The link to the doco is working for me today so I took a quick look. I think the other reason that the HyphenatedWordsFilter is not suitable is that it removes the hyphen from the material assuming that it can only have one meaning. The specific circumstances I am considering is when the hyphen is part of a legitimately hyphenated word that just happen to break across a line wrap. eg. 'up-\{\n\}to-date' The HyphenatedWordsFilter would turn this into 'upto-date', and cause user searches of 'up to date' to not match, since no filters later in the change can really pull 'upto' apart again. Whereas the 'catenateShingles' option is intended to preserve the word delimiter and provide all the permutations a user might type to find that term: up to date, upto date, up todate, uptodate Add catenateShingles option to WordDelimiterFilter -- Key: SOLR-5722 URL: https://issues.apache.org/jira/browse/SOLR-5722 Project: Solr Issue Type: Improvement Reporter: Greg Pendlebury Priority: Minor Labels: filter, newbie, patch Attachments: WDFconcatShingles.patch Apologies if I put this in the wrong spot. I'm attaching a patch (against current trunk) that adds support for a 'catenateShingles' option to the WordDelimiterFilter. We (National Library of Australia - NLA) are currently maintaining this as an internal modification to the Filter, but I believe it is generic enough to contribute upstream. Description: = {code} /** * NLA Modification to the standard word delimiter to support various * hyphenation use cases. Primarily driven by requirements for * newspapers where words are often broken across line endings. * * eg. hyphenated-surname is printed printed across a line ending and * turns out like hyphen-ated-surname or hyphenated-sur-name. * * In this scenario the stock filter, with 'catenateAll' turned on, will * generate individual tokens plus one combined token, but not * sub-tokens like hyphenated surname and hyphenatedsur name. * * So we add a new 'catenateShingles' to achieve this. */ {code} Includes unit tests, and as is noted in one of them CATENATE_WORDS and CATENATE_SHINGLES are logically considered mutually exclusive for sensible usage and can cause duplicate tokens (although they should have the same positions etc). I'm happy to work on it more if anyone finds problems with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators
[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986408#comment-13986408 ] Greg Pendlebury commented on SOLR-2649: --- I applied this patch to 4.7.2 Yesterday and tried it out on or dev servers. At first I thought it was pretty bad and failed completely... but then I had a good think and re-read everything on this ticket and this[1] article and realised my understanding of the problem was flawed. Using just this patch in isolation it converted all of the OR operators to AND operators with mm=100%. Very confusing behaviour for our business area, but I realise now that it is correct. Perhaps the confusion stems from the way the q.op and mm parameters interact. If the behaviour was to instead separate them more clearly then we could change the config entirely. At the moment our mm is 100% because we effectively want q.op=AND, but if q.op was instead applied 1) always, 2) first and 3) independently from mm (ie. insert AND wherever an operator is missing) we could set mm=1 and achieve what we want by respecting the OR parameters provided by the user. I've added this on top of the patch already here and deployed again to our dev servers using 'q.op=AND mm=1' and now everything appears to function as it should. I'll upload the patch in a minute, and it includes several unit tests with different mm and q.op values. From my perspective I think the two parameters are interacting appropriately, but perhaps someone with more convoluted mm settings could give it a try? The change is simply in the constructor of the ExtendedSolrQueryParser class where it was hardcoded to force the default operator to OR (presumably so that mm would take care of things) I've made it look at the parameter provided with the query (copied the code from the Simple QParser and adjusted to fit). The unit test from the first patch that was marked TODO I have tweaked slightly. I think not finding a result in that case is entirely appropriate if the user can now tweak q.op. Opinions may vary of course. [1] http://searchhub.org/2011/12/28/why-not-and-or-and-not/ MM ignored in edismax queries with operators Key: SOLR-2649 URL: https://issues.apache.org/jira/browse/SOLR-2649 Project: Solr Issue Type: Bug Components: query parsers Reporter: Magnus Bergmark Priority: Minor Fix For: 4.9, 5.0 Attachments: SOLR-2649.diff, SOLR-2649.patch Hypothetical scenario: 1. User searches for stocks oil gold with MM set to 50% 2. User adds -stockings to the query: stocks oil gold -stockings 3. User gets no hits since MM was ignored and all terms where AND-ed together The behavior seems to be intentional, although the reason why is never explained: // For correct lucene queries, turn off mm processing if there // were explicit operators (except for AND). boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; (lines 232-234 taken from tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java) This makes edismax unsuitable as an replacement to dismax; mm is one of the primary features of dismax. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2649) MM ignored in edismax queries with operators
[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Pendlebury updated SOLR-2649: -- Attachment: SOLR-2649-with-Qop.patch MM ignored in edismax queries with operators Key: SOLR-2649 URL: https://issues.apache.org/jira/browse/SOLR-2649 Project: Solr Issue Type: Bug Components: query parsers Reporter: Magnus Bergmark Priority: Minor Fix For: 4.9, 5.0 Attachments: SOLR-2649-with-Qop.patch, SOLR-2649.diff, SOLR-2649.patch Hypothetical scenario: 1. User searches for stocks oil gold with MM set to 50% 2. User adds -stockings to the query: stocks oil gold -stockings 3. User gets no hits since MM was ignored and all terms where AND-ed together The behavior seems to be intentional, although the reason why is never explained: // For correct lucene queries, turn off mm processing if there // were explicit operators (except for AND). boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; (lines 232-234 taken from tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java) This makes edismax unsuitable as an replacement to dismax; mm is one of the primary features of dismax. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-2649) MM ignored in edismax queries with operators
[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986408#comment-13986408 ] Greg Pendlebury edited comment on SOLR-2649 at 5/1/14 6:54 AM: --- I applied this patch to 4.7.2 Yesterday and tried it out on our dev servers. At first I thought it was pretty bad and failed completely... but then I had a good think and re-read everything on this ticket and this[1] article and realised my understanding of the problem was flawed. Using just this patch in isolation it converted all of the OR operators to AND operators with mm=100%. Very confusing behaviour for our business area, but I realise now that it is correct. Perhaps the confusion stems from the way the q.op and mm parameters interact. If the behaviour was to instead separate them more clearly then we could change the config entirely. At the moment our mm is 100% because we effectively want q.op=AND, but if q.op was instead applied 1) always, 2) first and 3) independently from mm (ie. insert AND wherever an operator is missing) we could set mm=1 and achieve what we want by respecting the OR parameters provided by the user. I've added this on top of the patch already here and deployed again to our dev servers using 'q.op=AND mm=1' and now everything appears to function as it should. I'll upload the patch in a minute, and it includes several unit tests with different mm and q.op values. From my perspective I think the two parameters are interacting appropriately, but perhaps someone with more convoluted mm settings could give it a try? The change is simply in the constructor of the ExtendedSolrQueryParser class where it was hardcoded to force the default operator to OR (presumably so that mm would take care of things) I've made it look at the parameter provided with the query (copied the code from the Simple QParser and adjusted to fit). The unit test from the first patch that was marked TODO I have tweaked slightly. I think not finding a result in that case is entirely appropriate if the user can now tweak q.op. Opinions may vary of course. [1] http://searchhub.org/2011/12/28/why-not-and-or-and-not/ was (Author: gpendleb): I applied this patch to 4.7.2 Yesterday and tried it out on or dev servers. At first I thought it was pretty bad and failed completely... but then I had a good think and re-read everything on this ticket and this[1] article and realised my understanding of the problem was flawed. Using just this patch in isolation it converted all of the OR operators to AND operators with mm=100%. Very confusing behaviour for our business area, but I realise now that it is correct. Perhaps the confusion stems from the way the q.op and mm parameters interact. If the behaviour was to instead separate them more clearly then we could change the config entirely. At the moment our mm is 100% because we effectively want q.op=AND, but if q.op was instead applied 1) always, 2) first and 3) independently from mm (ie. insert AND wherever an operator is missing) we could set mm=1 and achieve what we want by respecting the OR parameters provided by the user. I've added this on top of the patch already here and deployed again to our dev servers using 'q.op=AND mm=1' and now everything appears to function as it should. I'll upload the patch in a minute, and it includes several unit tests with different mm and q.op values. From my perspective I think the two parameters are interacting appropriately, but perhaps someone with more convoluted mm settings could give it a try? The change is simply in the constructor of the ExtendedSolrQueryParser class where it was hardcoded to force the default operator to OR (presumably so that mm would take care of things) I've made it look at the parameter provided with the query (copied the code from the Simple QParser and adjusted to fit). The unit test from the first patch that was marked TODO I have tweaked slightly. I think not finding a result in that case is entirely appropriate if the user can now tweak q.op. Opinions may vary of course. [1] http://searchhub.org/2011/12/28/why-not-and-or-and-not/ MM ignored in edismax queries with operators Key: SOLR-2649 URL: https://issues.apache.org/jira/browse/SOLR-2649 Project: Solr Issue Type: Bug Components: query parsers Reporter: Magnus Bergmark Priority: Minor Fix For: 4.9, 5.0 Attachments: SOLR-2649-with-Qop.patch, SOLR-2649.diff, SOLR-2649.patch Hypothetical scenario: 1. User searches for stocks oil gold with MM set to 50% 2. User adds -stockings to the query: stocks oil gold -stockings 3. User gets no hits since MM was ignored and all terms where AND-ed together The behavior seems to be intentional,
[jira] [Commented] (SOLR-5722) Add catenateShingles option to WordDelimiterFilter
[ https://issues.apache.org/jira/browse/SOLR-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903781#comment-13903781 ] Greg Pendlebury commented on SOLR-5722: --- The link to the doco is working for me today so I took a quick look. I think the other reason that the HyphenatedWordsFilter is not suitable is that it removes the hyphen from the material assuming that it can only have one meaning. The specific circumstances I am considering is when the hyphen is part of a legitimately hyphenated word that just happen to break across a line wrap. eg. 'up-{\n}to-date' The HyphenatedWordsFilter would turn this into 'upto-date', and cause user searches of 'up to date' to not match, since no filters later in the change can really pull 'upto' apart again. Whereas the 'catenateShingles' option is intended to preserve the word delimiter and provide all the permutations a user might type to find that term: up to date, upto date, up todate, uptodate Add catenateShingles option to WordDelimiterFilter -- Key: SOLR-5722 URL: https://issues.apache.org/jira/browse/SOLR-5722 Project: Solr Issue Type: Improvement Reporter: Greg Pendlebury Priority: Minor Labels: filter, newbie, patch Attachments: WDFconcatShingles.patch Apologies if I put this in the wrong spot. I'm attaching a patch (against current trunk) that adds support for a 'catenateShingles' option to the WordDelimiterFilter. We (National Library of Australia - NLA) are currently maintaining this as an internal modification to the Filter, but I believe it is generic enough to contribute upstream. Description: = {code} /** * NLA Modification to the standard word delimiter to support various * hyphenation use cases. Primarily driven by requirements for * newspapers where words are often broken across line endings. * * eg. hyphenated-surname is printed printed across a line ending and * turns out like hyphen-ated-surname or hyphenated-sur-name. * * In this scenario the stock filter, with 'catenateAll' turned on, will * generate individual tokens plus one combined token, but not * sub-tokens like hyphenated surname and hyphenatedsur name. * * So we add a new 'catenateShingles' to achieve this. */ {code} Includes unit tests, and as is noted in one of them CATENATE_WORDS and CATENATE_SHINGLES are logically considered mutually exclusive for sensible usage and can cause duplicate tokens (although they should have the same positions etc). I'm happy to work on it more if anyone finds problems with it. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5722) Add catenateShingles option to WordDelimiterFilter
[ https://issues.apache.org/jira/browse/SOLR-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903781#comment-13903781 ] Greg Pendlebury edited comment on SOLR-5722 at 2/18/14 4:55 AM: The link to the doco is working for me today so I took a quick look. I think the other reason that the HyphenatedWordsFilter is not suitable is that it removes the hyphen from the material assuming that it can only have one meaning. The specific circumstances I am considering is when the hyphen is part of a legitimately hyphenated word that just happen to break across a line wrap. eg. 'up-\{\n\}to-date' The HyphenatedWordsFilter would turn this into 'upto-date', and cause user searches of 'up to date' to not match, since no filters later in the change can really pull 'upto' apart again. Whereas the 'catenateShingles' option is intended to preserve the word delimiter and provide all the permutations a user might type to find that term: up to date, upto date, up todate, uptodate was (Author: gpendleb): The link to the doco is working for me today so I took a quick look. I think the other reason that the HyphenatedWordsFilter is not suitable is that it removes the hyphen from the material assuming that it can only have one meaning. The specific circumstances I am considering is when the hyphen is part of a legitimately hyphenated word that just happen to break across a line wrap. eg. 'up-{\n}to-date' The HyphenatedWordsFilter would turn this into 'upto-date', and cause user searches of 'up to date' to not match, since no filters later in the change can really pull 'upto' apart again. Whereas the 'catenateShingles' option is intended to preserve the word delimiter and provide all the permutations a user might type to find that term: up to date, upto date, up todate, uptodate Add catenateShingles option to WordDelimiterFilter -- Key: SOLR-5722 URL: https://issues.apache.org/jira/browse/SOLR-5722 Project: Solr Issue Type: Improvement Reporter: Greg Pendlebury Priority: Minor Labels: filter, newbie, patch Attachments: WDFconcatShingles.patch Apologies if I put this in the wrong spot. I'm attaching a patch (against current trunk) that adds support for a 'catenateShingles' option to the WordDelimiterFilter. We (National Library of Australia - NLA) are currently maintaining this as an internal modification to the Filter, but I believe it is generic enough to contribute upstream. Description: = {code} /** * NLA Modification to the standard word delimiter to support various * hyphenation use cases. Primarily driven by requirements for * newspapers where words are often broken across line endings. * * eg. hyphenated-surname is printed printed across a line ending and * turns out like hyphen-ated-surname or hyphenated-sur-name. * * In this scenario the stock filter, with 'catenateAll' turned on, will * generate individual tokens plus one combined token, but not * sub-tokens like hyphenated surname and hyphenatedsur name. * * So we add a new 'catenateShingles' to achieve this. */ {code} Includes unit tests, and as is noted in one of them CATENATE_WORDS and CATENATE_SHINGLES are logically considered mutually exclusive for sensible usage and can cause duplicate tokens (although they should have the same positions etc). I'm happy to work on it more if anyone finds problems with it. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5722) Add catenateShingles option to WordDelimiterFilter
[ https://issues.apache.org/jira/browse/SOLR-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902824#comment-13902824 ] Greg Pendlebury commented on SOLR-5722: --- I don't think it does. It has been a while since we looked into it, and that link is currently returning 503 for me, but my understanding was that the HyphenatedWordsFilter put two tokens back together when a hyphen was found on the end of the first token. The catenateShingles options we are using addresses the scenario where multiple hyphens are found internal to a single token. Add catenateShingles option to WordDelimiterFilter -- Key: SOLR-5722 URL: https://issues.apache.org/jira/browse/SOLR-5722 Project: Solr Issue Type: Improvement Reporter: Greg Pendlebury Priority: Minor Labels: filter, newbie, patch Attachments: WDFconcatShingles.patch Apologies if I put this in the wrong spot. I'm attaching a patch (against current trunk) that adds support for a 'catenateShingles' option to the WordDelimiterFilter. We (National Library of Australia - NLA) are currently maintaining this as an internal modification to the Filter, but I believe it is generic enough to contribute upstream. Description: = {code} /** * NLA Modification to the standard word delimiter to support various * hyphenation use cases. Primarily driven by requirements for * newspapers where words are often broken across line endings. * * eg. hyphenated-surname is printed printed across a line ending and * turns out like hyphen-ated-surname or hyphenated-sur-name. * * In this scenario the stock filter, with 'catenateAll' turned on, will * generate individual tokens plus one combined token, but not * sub-tokens like hyphenated surname and hyphenatedsur name. * * So we add a new 'catenateShingles' to achieve this. */ {code} Includes unit tests, and as is noted in one of them CATENATE_WORDS and CATENATE_SHINGLES are logically considered mutually exclusive for sensible usage and can cause duplicate tokens (although they should have the same positions etc). I'm happy to work on it more if anyone finds problems with it. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5722) Add catenateShingles option to WordDelimiterFilter
Greg Pendlebury created SOLR-5722: - Summary: Add catenateShingles option to WordDelimiterFilter Key: SOLR-5722 URL: https://issues.apache.org/jira/browse/SOLR-5722 Project: Solr Issue Type: Improvement Reporter: Greg Pendlebury Priority: Minor Apologies if I put this in the wrong spot. I'm attaching a patch (against current trunk) that adds support for a 'catenateShingles' option to the WordDelimiterFilter. We (National Library of Australia - NLA) are currently maintaining this as an internal modification to the Filter, but I believe it is generic enough to contribute upstream. Description: = {code} /** * NLA Modification to the standard word delimiter to support various * hyphenation use cases. Primarily driven by requirements for * newspapers where words are often broken across line endings. * * eg. hyphenated-surname is printed printed across a line ending and * turns out like hyphen-ated-surname or hyphenated-sur-name. * * In this scenario the stock filter, with 'catenateAll' turned on, will * generate individual tokens plus one combined token, but not * sub-tokens like hyphenated surname and hyphenatedsur name. * * So we add a new 'catenateShingles' to achieve this. */ {code} Includes unit tests, and as is noted in one of them CATENATE_WORDS and CATENATE_SHINGLES are logically considered mutually exclusive for sensible usage and can cause duplicate tokens (although they should have the same positions etc). I'm happy to work on it more if anyone finds problems with it. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5722) Add catenateShingles option to WordDelimiterFilter
[ https://issues.apache.org/jira/browse/SOLR-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Pendlebury updated SOLR-5722: -- Attachment: WDFconcatShingles.patch Patch against trunk : http://svn.apache.org/repos/asf/lucene/dev/trunk (r1567824) Add catenateShingles option to WordDelimiterFilter -- Key: SOLR-5722 URL: https://issues.apache.org/jira/browse/SOLR-5722 Project: Solr Issue Type: Improvement Reporter: Greg Pendlebury Priority: Minor Labels: filter, newbie, patch Attachments: WDFconcatShingles.patch Apologies if I put this in the wrong spot. I'm attaching a patch (against current trunk) that adds support for a 'catenateShingles' option to the WordDelimiterFilter. We (National Library of Australia - NLA) are currently maintaining this as an internal modification to the Filter, but I believe it is generic enough to contribute upstream. Description: = {code} /** * NLA Modification to the standard word delimiter to support various * hyphenation use cases. Primarily driven by requirements for * newspapers where words are often broken across line endings. * * eg. hyphenated-surname is printed printed across a line ending and * turns out like hyphen-ated-surname or hyphenated-sur-name. * * In this scenario the stock filter, with 'catenateAll' turned on, will * generate individual tokens plus one combined token, but not * sub-tokens like hyphenated surname and hyphenatedsur name. * * So we add a new 'catenateShingles' to achieve this. */ {code} Includes unit tests, and as is noted in one of them CATENATE_WORDS and CATENATE_SHINGLES are logically considered mutually exclusive for sensible usage and can cause duplicate tokens (although they should have the same positions etc). I'm happy to work on it more if anyone finds problems with it. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4956) make maxBufferedAddsPerServer configurable
[ https://issues.apache.org/jira/browse/SOLR-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739058#comment-13739058 ] Greg Pendlebury commented on SOLR-4956: --- So it seems we have three options here: 1 make it configurable with a warning that if you change it it may lead to Bad Stuff. I'd support this solely from the perspective of testing its impact. Rebuilding code to change a hardcoded integer is a tad annoying if you are just diagnosing what impact things could have. We batch ingest several thousand documents at a time into a 96 JVM cluster (32 shards * 3 replicas). I'd love to see if we could lower CPU load by altering this setting... even if it is only a diagnostic step that is at odds with long term goals related to batching at all. make maxBufferedAddsPerServer configurable -- Key: SOLR-4956 URL: https://issues.apache.org/jira/browse/SOLR-4956 Project: Solr Issue Type: Improvement Affects Versions: 4.3, 5.0 Reporter: Erick Erickson Anecdotal user's list evidence indicates that in high-throughput situations, the default of 10 docs/batch for inter-shard batching can generate significant CPU load. See the thread titled Sharding and Replication on June 19th, but the gist is below. I haven't poked around, but it's a little surprising on the surface that Asif is seeing this kind of difference. So I'm wondering if this change indicates some other underlying issue. Regardless, this seems like it would be good to investigate. Here's the gist of Asif's experience from the thread: Its a completely practical problem - we are exploring Solr to build a real time analytics/data solution for a system handling about 1000 qps. We have various metrics that are stored as different collections on the cloud, which means very high amount of writes. The cloud also needs to support about 300-400 qps. We initially tested with a single Solr node on a 16 core / 24 GB box for a single metric. We saw that writes were not a issue at all - Solr was handling it extremely well. We were also able to achieve about 200 qps from a single node. When we set up the cloud ( a ensemble on 6 boxes), we saw very high CPU usage on the replicas. Up to 10 cores were getting used for writes on the replicas. Hence my concern with respect to batch updates for the replicas. BTW, I altered the maxBufferedAddsPerServer to 1000 - and now CPU usage is very similar to single node installation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR
[ https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267045#comment-13267045 ] Greg Pendlebury commented on SOLR-2487: --- @Neil, that's way better then the way I do things now. Thanks. Maven continues to surprise me. Do not include slf4j-jdk14 jar in WAR - Key: SOLR-2487 URL: https://issues.apache.org/jira/browse/SOLR-2487 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 3.2, 4.0 Reporter: Jan Høydahl Assignee: Jan Høydahl Labels: logging, slf4j Fix For: 3.6, 4.0 Attachments: SOLR-2487.patch, SOLR-2487.patch, SOLR-2487.patch I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help newbies get up and running. But I find myself re-packaging the war for every customer when adapting to their choice of logger framework, which is counter-productive. It would be sufficient to have the jdk-logging binding in example/lib to let the example and tutorial still work OOTB but as soon as you deploy solr.war to production you're forced to explicitly decide what logging to use. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR
[ https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084928#comment-13084928 ] Greg Pendlebury commented on SOLR-2487: --- It would be great to have a skinny WAR available as a Maven artifact. At the moment there is no way in Maven to have it exclude the jdk14 JAR short of rebuilding and rehosting the WAR elsewhere. eg: http://www.jarvana.com/jarvana/browse/org/dspace/dependencies/solr/dspace-solr-webapp/1.4.1.0/ And to my knowledge at the moment, there is nothing like this available for v3.3.0 With a skinny WAR in Maven listing all the currently bundled dependencies the end result for most users would be identical, since Maven will go get them all for you anyway. Then people that don't want jdk14 can add this to their own project and they will get everything but that single dependency: dependency groupIdorg.slf4j/groupId artifactIdslf4j-jdk/artifactId version1.6.1/version scopeprovided/scope /dependency Do not include slf4j-jdk14 jar in WAR - Key: SOLR-2487 URL: https://issues.apache.org/jira/browse/SOLR-2487 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 3.2, 4.0 Reporter: Jan Høydahl Labels: logging, slf4j I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help newbies get up and running. But I find myself re-packaging the war for every customer when adapting to their choice of logger framework, which is counter-productive. It would be sufficient to have the jdk-logging binding in example/lib to let the example and tutorial still work OOTB but as soon as you deploy solr.war to production you're forced to explicitly decide what logging to use. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR
[ https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084967#comment-13084967 ] Greg Pendlebury commented on SOLR-2487: --- At the moment there is no way in Maven to have it exclude the jdk14 JAR... Hmm, I shouldn't have stated an absolute like that. I eventually got a script building today that dropped the WAR as a dependency, unpacked it to a '/solr' context folder, then nuked the jdk14 JAR only, leaving the rest in place. I'd still prefer a skinny WAR, since it would be a much cleaner build script, and allow me to eliminate duplicate JARs on the classpath with greater ease. It would also be more in line with the spirit of how Maven is intended to work... but I have a workaround, and don't expect to world to conform to my wishes :) Do not include slf4j-jdk14 jar in WAR - Key: SOLR-2487 URL: https://issues.apache.org/jira/browse/SOLR-2487 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 3.2, 4.0 Reporter: Jan Høydahl Labels: logging, slf4j I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help newbies get up and running. But I find myself re-packaging the war for every customer when adapting to their choice of logger framework, which is counter-productive. It would be sufficient to have the jdk-logging binding in example/lib to let the example and tutorial still work OOTB but as soon as you deploy solr.war to production you're forced to explicitly decide what logging to use. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR
[ https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084967#comment-13084967 ] Greg Pendlebury edited comment on SOLR-2487 at 8/15/11 5:22 AM: At the moment there is no way in Maven to have it exclude the jdk14 JAR... Hmm, I shouldn't have stated an absolute like that. I eventually got a script building today that dropped the WAR as a dependency, unpacked it to a '/solr' context folder, then nuked the jdk14 JAR only, leaving the rest in place. I'd still prefer a skinny WAR, since it would be a much cleaner build script, and allow me to eliminate duplicate/conflicting JARs on the classpath with greater ease. It would also be more in line with the spirit of how Maven is intended to work... but I have a workaround, and don't expect the world to conform to my wishes :) was (Author: greg.pendlebury): At the moment there is no way in Maven to have it exclude the jdk14 JAR... Hmm, I shouldn't have stated an absolute like that. I eventually got a script building today that dropped the WAR as a dependency, unpacked it to a '/solr' context folder, then nuked the jdk14 JAR only, leaving the rest in place. I'd still prefer a skinny WAR, since it would be a much cleaner build script, and allow me to eliminate duplicate JARs on the classpath with greater ease. It would also be more in line with the spirit of how Maven is intended to work... but I have a workaround, and don't expect to world to conform to my wishes :) Do not include slf4j-jdk14 jar in WAR - Key: SOLR-2487 URL: https://issues.apache.org/jira/browse/SOLR-2487 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 3.2, 4.0 Reporter: Jan Høydahl Labels: logging, slf4j I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help newbies get up and running. But I find myself re-packaging the war for every customer when adapting to their choice of logger framework, which is counter-productive. It would be sufficient to have the jdk-logging binding in example/lib to let the example and tutorial still work OOTB but as soon as you deploy solr.war to production you're forced to explicitly decide what logging to use. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org