Re: From a high level query call, tell Solr / Lucene to automatically apply a leaf operator?
Mark, it's there for ages http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/queryParser/core/package-summary.html You are welcome! On Mon, Mar 4, 2013 at 2:42 AM, Mark Bennett mbenn...@ideaeng.com wrote: Hi Mikhail, Thanks for the links, looks like interesting stuff. Sadly this project is stuck in 3.x for some very thorny reasons... Googling around, looks like this might be strictly 4.x... On Mon, Feb 25, 2013 at 12:21 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Mark, AFAIK http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/core/package-summary.htmlis a convenient framework for such juggling. Please also be aware of the good starting point http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/standard/package-summary.html On Sun, Feb 24, 2013 at 11:33 AM, Mark Bennett mbenn...@ideaeng.com wrote: Scenario: You're submitting a block of text as a query. You're content to let solr / lucene handing query parsing and tokenziation, etc. But you'd like to have ALL eventually produced leaf-nodes in the parse tree to have: * Boolean .MUST (effectively a + prefix) * Fuzzy match of ~1 or ~2 In a simple application, and if there were no punctuation, you could preprocess the query, effectively: * split on whitespace * for t in tokens: t = + + t + ~2 But this is ugly, and even then I think things like stop words would be messed up: * OK in Solr: the chair(it can properly remove the) * But if this:+the~2 +chair~2 (I'm not sure this would work) Sure, at the application level you could also remove the stop words in the for t in tokens loop, but then some other weird case would come up. Maybe one of the field's analyzers has some other token filter you forgot about, so you'd have to bring that logic forward as well. (Long story of why I'd want to do all this... and I know people think adding ~2 to all tokens will give bad results anyway, trying to fix inherited code that can't be scrapped, etc) -- Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513 -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: From a high level query call, tell Solr / Lucene to automatically apply a leaf operator?
Hi Mikhail, Thanks for the links, looks like interesting stuff. Sadly this project is stuck in 3.x for some very thorny reasons... Googling around, looks like this might be strictly 4.x... On Mon, Feb 25, 2013 at 12:21 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Mark, AFAIK http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/core/package-summary.htmlis a convenient framework for such juggling. Please also be aware of the good starting point http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/standard/package-summary.html On Sun, Feb 24, 2013 at 11:33 AM, Mark Bennett mbenn...@ideaeng.com wrote: Scenario: You're submitting a block of text as a query. You're content to let solr / lucene handing query parsing and tokenziation, etc. But you'd like to have ALL eventually produced leaf-nodes in the parse tree to have: * Boolean .MUST (effectively a + prefix) * Fuzzy match of ~1 or ~2 In a simple application, and if there were no punctuation, you could preprocess the query, effectively: * split on whitespace * for t in tokens: t = + + t + ~2 But this is ugly, and even then I think things like stop words would be messed up: * OK in Solr: the chair(it can properly remove the) * But if this:+the~2 +chair~2 (I'm not sure this would work) Sure, at the application level you could also remove the stop words in the for t in tokens loop, but then some other weird case would come up. Maybe one of the field's analyzers has some other token filter you forgot about, so you'd have to bring that logic forward as well. (Long story of why I'd want to do all this... and I know people think adding ~2 to all tokens will give bad results anyway, trying to fix inherited code that can't be scrapped, etc) -- Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513 -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
From a high level query call, tell Solr / Lucene to automatically apply a leaf operator?
Scenario: You're submitting a block of text as a query. You're content to let solr / lucene handing query parsing and tokenziation, etc. But you'd like to have ALL eventually produced leaf-nodes in the parse tree to have: * Boolean .MUST (effectively a + prefix) * Fuzzy match of ~1 or ~2 In a simple application, and if there were no punctuation, you could preprocess the query, effectively: * split on whitespace * for t in tokens: t = + + t + ~2 But this is ugly, and even then I think things like stop words would be messed up: * OK in Solr: the chair(it can properly remove the) * But if this:+the~2 +chair~2 (I'm not sure this would work) Sure, at the application level you could also remove the stop words in the for t in tokens loop, but then some other weird case would come up. Maybe one of the field's analyzers has some other token filter you forgot about, so you'd have to bring that logic forward as well. (Long story of why I'd want to do all this... and I know people think adding ~2 to all tokens will give bad results anyway, trying to fix inherited code that can't be scrapped, etc) -- Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513