Re: From a high level query call, tell Solr / Lucene to automatically apply a leaf operator?

2013-03-04 Thread Mikhail Khludnev
Mark,

it's there for ages
http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/queryParser/core/package-summary.html
You are welcome!


On Mon, Mar 4, 2013 at 2:42 AM, Mark Bennett mbenn...@ideaeng.com wrote:

 Hi Mikhail,

 Thanks for the links, looks like interesting stuff.

 Sadly this project is stuck in 3.x for some very thorny reasons...

 Googling around, looks like this might be strictly 4.x...

 On Mon, Feb 25, 2013 at 12:21 PM, Mikhail Khludnev 
 mkhlud...@griddynamics.com wrote:

  Mark,
 
  AFAIK
 
 
 http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/core/package-summary.htmlis
  a convenient framework for such juggling.
  Please also be aware of the good starting point
 
 
 http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/standard/package-summary.html
 
 
 
  On Sun, Feb 24, 2013 at 11:33 AM, Mark Bennett mbenn...@ideaeng.com
  wrote:
 
   Scenario:
  
   You're submitting a block of text as a query.
  
   You're content to let solr / lucene handing query parsing and
  tokenziation,
   etc.
  
   But you'd like to have ALL eventually produced leaf-nodes in the parse
  tree
   to have:
   * Boolean .MUST (effectively a + prefix)
   * Fuzzy match of ~1 or ~2
  
   In a simple application, and if there were no punctuation, you could
   preprocess the query, effectively:
   * split on whitespace
   * for t in tokens: t = + + t + ~2
  
   But this is ugly, and even then I think things like stop words would be
   messed up:
   * OK in Solr:   the chair(it can properly remove the)
   * But if this:+the~2  +chair~2   (I'm not sure this would work)
  
   Sure, at the application level you could also remove the stop words in
  the
   for t in tokens loop, but then some other weird case would come up.
   Maybe one of the field's analyzers has some other token filter you
 forgot
   about, so you'd have to bring that logic forward as well.
  
   (Long story of why I'd want to do all this... and I know people think
   adding ~2 to all tokens will give bad results anyway, trying to fix
   inherited code that can't be scrapped, etc)
  
   --
   Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
   Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
  
 
 
 
  --
  Sincerely yours
  Mikhail Khludnev
  Principal Engineer,
  Grid Dynamics
 
  http://www.griddynamics.com
   mkhlud...@griddynamics.com
 




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: From a high level query call, tell Solr / Lucene to automatically apply a leaf operator?

2013-03-03 Thread Mark Bennett
Hi Mikhail,

Thanks for the links, looks like interesting stuff.

Sadly this project is stuck in 3.x for some very thorny reasons...

Googling around, looks like this might be strictly 4.x...

On Mon, Feb 25, 2013 at 12:21 PM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 Mark,

 AFAIK

 http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/core/package-summary.htmlis
 a convenient framework for such juggling.
 Please also be aware of the good starting point

 http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/standard/package-summary.html



 On Sun, Feb 24, 2013 at 11:33 AM, Mark Bennett mbenn...@ideaeng.com
 wrote:

  Scenario:
 
  You're submitting a block of text as a query.
 
  You're content to let solr / lucene handing query parsing and
 tokenziation,
  etc.
 
  But you'd like to have ALL eventually produced leaf-nodes in the parse
 tree
  to have:
  * Boolean .MUST (effectively a + prefix)
  * Fuzzy match of ~1 or ~2
 
  In a simple application, and if there were no punctuation, you could
  preprocess the query, effectively:
  * split on whitespace
  * for t in tokens: t = + + t + ~2
 
  But this is ugly, and even then I think things like stop words would be
  messed up:
  * OK in Solr:   the chair(it can properly remove the)
  * But if this:+the~2  +chair~2   (I'm not sure this would work)
 
  Sure, at the application level you could also remove the stop words in
 the
  for t in tokens loop, but then some other weird case would come up.
  Maybe one of the field's analyzers has some other token filter you forgot
  about, so you'd have to bring that logic forward as well.
 
  (Long story of why I'd want to do all this... and I know people think
  adding ~2 to all tokens will give bad results anyway, trying to fix
  inherited code that can't be scrapped, etc)
 
  --
  Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
  Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com



From a high level query call, tell Solr / Lucene to automatically apply a leaf operator?

2013-02-23 Thread Mark Bennett
Scenario:

You're submitting a block of text as a query.

You're content to let solr / lucene handing query parsing and tokenziation,
etc.

But you'd like to have ALL eventually produced leaf-nodes in the parse tree
to have:
* Boolean .MUST (effectively a + prefix)
* Fuzzy match of ~1 or ~2

In a simple application, and if there were no punctuation, you could
preprocess the query, effectively:
* split on whitespace
* for t in tokens: t = + + t + ~2

But this is ugly, and even then I think things like stop words would be
messed up:
* OK in Solr:   the chair(it can properly remove the)
* But if this:+the~2  +chair~2   (I'm not sure this would work)

Sure, at the application level you could also remove the stop words in the
for t in tokens loop, but then some other weird case would come up.
Maybe one of the field's analyzers has some other token filter you forgot
about, so you'd have to bring that logic forward as well.

(Long story of why I'd want to do all this... and I know people think
adding ~2 to all tokens will give bad results anyway, trying to fix
inherited code that can't be scrapped, etc)

--
Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513