ClientUtils escape query

2008-08-05 Thread Grant Ingersoll
ClientUtils.escapeQueryChars seems a bit aggressive to me in terms of  
what it escapes.  It references http://lucene.apache.org/java/docs/queryparsersyntax.html#Escaping 
 Special Characters, but doesn't explicitly escape them, instead  
opting for the more general \W regex.  Thus, I'm noticing that chars  
that don't need to be escaped ( like / ) are being escaped.


Anyone recall why this is?  I suppose the problem comes in when one  
considers other query parsers, but maybe we should just mark this one  
as explicitly for use w/ the Lucene QP?


-Grant


Re: ClientUtils escape query

2008-08-05 Thread Ryan McKinley
That came after I spent a week increasing the list of things that need  
escaped one at a time (waiting for errors along the way...)


Erik suggested I look at how the ruby client handles it... and I  
haven't seen any problem since them.


Is there any problem with over escaping?  I know it makes some things  
look funny.  Perhaps there is a regex that will do any non-letter except


ryan


On Aug 5, 2008, at 8:28 AM, Grant Ingersoll wrote:

ClientUtils.escapeQueryChars seems a bit aggressive to me in terms  
of what it escapes.  It references http://lucene.apache.org/java/docs/queryparsersyntax.html#Escaping 
 Special Characters, but doesn't explicitly escape them, instead  
opting for the more general \W regex.  Thus, I'm noticing that chars  
that don't need to be escaped ( like / ) are being escaped.


Anyone recall why this is?  I suppose the problem comes in when one  
considers other query parsers, but maybe we should just mark this  
one as explicitly for use w/ the Lucene QP?


-Grant




Re: ClientUtils escape query

2008-08-05 Thread Donovan Jimenez

In the PHP client I used these:


/**
	 * Escape a value for special query characters such as ':', '(',  
')', '*', '?', etc.

 *
	 * NOTE: inside a phrase fewer characters need escaped, use [EMAIL PROTECTED]  
Apache_Solr_Service::escapePhrase()} instead

 *
 * @param string $value
 * @return string
 */
static public function escape($value)
{
		//list taken from http://lucene.apache.org/java/docs/ 
queryparsersyntax.html#Escaping%20Special%20Characters

$pattern = 
'/(\+|-|&&|\|\||!|\(|\)|\{|}|\[|]|\^|"|~|\*|\?|:|\\\)/';
$replace = '\\\$1';

return preg_replace($pattern, $replace, $value);
}

/**
	 * Escape a value meant to be contained in a phrase for special  
query characters

 *
 * @param string $value
 * @return string
 */
static public function escapePhrase($value)
{
$pattern = '/("|\\\)/';
$replace = '\\\$1';

return preg_replace($pattern, $replace, $value);
}


helpful?

On Aug 5, 2008, at 4:16 PM, Ryan McKinley wrote:

That came after I spent a week increasing the list of things that  
need escaped one at a time (waiting for errors along the way...)


Erik suggested I look at how the ruby client handles it... and I  
haven't seen any problem since them.


Is there any problem with over escaping?  I know it makes some  
things look funny.  Perhaps there is a regex that will do any non- 
letter except


ryan


On Aug 5, 2008, at 8:28 AM, Grant Ingersoll wrote:

ClientUtils.escapeQueryChars seems a bit aggressive to me in terms  
of what it escapes.  It references http://lucene.apache.org/java/ 
docs/queryparsersyntax.html#Escaping Special Characters, but  
doesn't explicitly escape them, instead opting for the more  
general \W regex.  Thus, I'm noticing that chars that don't need  
to be escaped ( like / ) are being escaped.


Anyone recall why this is?  I suppose the problem comes in when  
one considers other query parsers, but maybe we should just mark  
this one as explicitly for use w/ the Lucene QP?


-Grant






Re: ClientUtils escape query

2008-08-05 Thread Grant Ingersoll
It's mainly a problem when one wants to display the thing later, I  
guess.


-Grant

On Aug 5, 2008, at 4:16 PM, Ryan McKinley wrote:

That came after I spent a week increasing the list of things that  
need escaped one at a time (waiting for errors along the way...)


Erik suggested I look at how the ruby client handles it... and I  
haven't seen any problem since them.


Is there any problem with over escaping?  I know it makes some  
things look funny.  Perhaps there is a regex that will do any non- 
letter except


ryan


On Aug 5, 2008, at 8:28 AM, Grant Ingersoll wrote:

ClientUtils.escapeQueryChars seems a bit aggressive to me in terms  
of what it escapes.  It references http://lucene.apache.org/java/docs/queryparsersyntax.html#Escaping 
 Special Characters, but doesn't explicitly escape them, instead  
opting for the more general \W regex.  Thus, I'm noticing that  
chars that don't need to be escaped ( like / ) are being escaped.


Anyone recall why this is?  I suppose the problem comes in when one  
considers other query parsers, but maybe we should just mark this  
one as explicitly for use w/ the Lucene QP?


-Grant







Re: ClientUtils escape query

2008-08-05 Thread Mike Klaas

Wouldn't you want to reverse all escaping in that case anyway?

-Mike

On 5-Aug-08, at 1:45 PM, Grant Ingersoll wrote:

It's mainly a problem when one wants to display the thing later, I  
guess.


-Grant

On Aug 5, 2008, at 4:16 PM, Ryan McKinley wrote:

That came after I spent a week increasing the list of things that  
need escaped one at a time (waiting for errors along the way...)


Erik suggested I look at how the ruby client handles it... and I  
haven't seen any problem since them.


Is there any problem with over escaping?  I know it makes some  
things look funny.  Perhaps there is a regex that will do any non- 
letter except


ryan


On Aug 5, 2008, at 8:28 AM, Grant Ingersoll wrote:

ClientUtils.escapeQueryChars seems a bit aggressive to me in terms  
of what it escapes.  It references http://lucene.apache.org/java/docs/queryparsersyntax.html#Escaping 
 Special Characters, but doesn't explicitly escape them, instead  
opting for the more general \W regex.  Thus, I'm noticing that  
chars that don't need to be escaped ( like / ) are being escaped.


Anyone recall why this is?  I suppose the problem comes in when  
one considers other query parsers, but maybe we should just mark  
this one as explicitly for use w/ the Lucene QP?


-Grant