Re: Escaping SPARQL regex

2014-01-22 Thread Martynas Jusevičius
Thanks Andy. REUtil.quoteMeta() seems to do the trick.

On Tue, Jan 21, 2014 at 6:22 PM, Andy Seaborne  wrote:
> On 21/01/14 16:58, Martynas Jusevičius wrote:
>>
>> Andy,
>>
>> what if I'm sending the query to a remote endpoint that does not
>> support Java style regex syntax? Do I need to use FmtUtils then?
>
>
> FmtUtils does not have code to escape regex metacharacters.
> You'll need to escape all metacharacter by string manipulation.  It's a
> shame that the standard Java RT uses \Q\E
>
> Perl has quotemeta which implements it's \Q\E
>
>
>>
>> Looking at the example from SPARQL 1.1 STR() [1]:
>>
>> This query selects the set of people who use their work.example
>> address in their foaf profile:
>>
>> PREFIX foaf: 
>> SELECT ?name ?mbox
>>   WHERE { ?x foaf:name  ?name ;
>>  foaf:mbox  ?mbox .
>>   FILTER regex(str(?mbox), "@work\\.example$") }
>>
>> How does "work.example" become "work\\.example"? Is one backslash
>> escaping the regex, and the second one escaping the literal? How do I
>> achieve this with Jena?
>
>
> Yes. \\ puts a single real \ into the SPARQL string.
>
> That's how you do it in Jena.
> "." is a metacharacter so it needs escaping.
> ==> \.
> but it's in a SPARQL string so syntax needs to be \\
> ==> \\.
>
>
> The Xerces implement in REUtil.quoteMeta is:
>
> public static String quoteMeta(String literal) {
> int len = literal.length();
> StringBuffer buffer = null;
> for (int i = 0;  i < len;  i ++) {
> int ch = literal.charAt(i);
> if (".*+?{[()|\\^$".indexOf(ch) >= 0) {
> if (buffer == null) {
> buffer = new StringBuffer(i+(len-i)*2);
> if (i > 0)  buffer.append(literal.substring(0, i));
> }
> buffer.append((char)'\\');
> buffer.append((char)ch);
> } else if (buffer != null)
> buffer.append((char)ch);
> }
> return buffer != null ? buffer.toString() : literal;
> }
>
> and there are others via Google.
>
> Andy
>
>
>>
>> [1] http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#func-str
>>
>> Martynas
>>
>> On Tue, Jan 21, 2014 at 10:10 AM, Andy Seaborne  wrote:
>>>
>>> Works for me:
>>>
>>> SELECT * {
>>>VALUES ?o { "+35" "abc+35def" }
>>>FILTER regex(?o , "\\Q+35\\E", "i")
>>> }
>>>
>>> and in Java you need  due to both the levels of escaping (Java text,
>>> SPARQL).
>>>
>>> [[
>>> Regex: Pattern exception: java.util.regex.PatternSyntaxException:
>>> Dangling
>>> meta character '+' near index 0
>>> ]]
>>> which gives the game away it's using java regexs :-)
>>>
>>> The regex engine is Java's and the string is used untouched.
>>>
>>> Strictly, it should be XSD v1 which are slightly different:
>>>
>>> http://www.w3.org/TR/xmlschema-2/#regexs
>>>
>>> and there is a strict Xerces provided alternative if you want exact XSD
>>> regular expressions.
>>>
>>> XSD and Java differs in only very small ways (e.g. XSD has one extra
>>> modifier flag, "m", XSD has no \Q\E, and XSD has "Is" for unicode code
>>> blocks inside \p e.g. \p{IsMongolian})
>>>
>>> And now
>>> http://www.w3.org/TR/xmlschema11-2/#regexs
>>>
>>>  Andy
>>>
>>>
>>> On 21/01/14 02:32, Joshua TAYLOR wrote:


 My apologies.  I replied too quickly.  I just wrote this test with
 Jena's command line tools. To match the string "+35", I had to use the
 "\\+35" in the query:

 select ?label where {
 values ?label { "+35" "-35" }
 filter(regex(str(?label),"\\+35"))
 }

 -
 | label |
 =
 | "+35" |
 -

 That's _two_ slashes in the query string, which means that in Java
 you'd end up writing

 String query = ... + "filter(regex(...,\"+35\")" + ...;

 Sorry for the hasty and inaccurate reply.

 On Mon, Jan 20, 2014 at 9:27 PM, Joshua TAYLOR 
 wrote:
>
>
> On Mon, Jan 20, 2014 at 8:48 PM, Martynas Jusevičius
>  wrote:
>>
>>
>> OK maybe "+35" was a bad example. But isn't "+" a special char in
>> SPARQL regex? And there are more like "*", "?" etc.
>> http://www.w3.org/TR/xpath-functions/#regex-syntax
>
>
>
>
> Oh, good point.  But if it needs to be escape with a slash, then
> wouldn't
>
>  filter regex(str(?label), "\+35")
>
> be fine?  Note that if you're constructing this programmatically, you
> might end up writing code like
>
>   String queryString = ... + "filter regex(str(?label), \"\\+35\")"
> +
> ...;
>
> --
> Joshua Taylor, http://www.cs.rpi.edu/~tayloj/





>>>
>


Re: Escaping SPARQL regex

2014-01-21 Thread Andy Seaborne

On 21/01/14 16:58, Martynas Jusevičius wrote:

Andy,

what if I'm sending the query to a remote endpoint that does not
support Java style regex syntax? Do I need to use FmtUtils then?


FmtUtils does not have code to escape regex metacharacters.
You'll need to escape all metacharacter by string manipulation.  It's a 
shame that the standard Java RT uses \Q\E


Perl has quotemeta which implements it's \Q\E



Looking at the example from SPARQL 1.1 STR() [1]:

This query selects the set of people who use their work.example
address in their foaf profile:

PREFIX foaf: 
SELECT ?name ?mbox
  WHERE { ?x foaf:name  ?name ;
 foaf:mbox  ?mbox .
  FILTER regex(str(?mbox), "@work\\.example$") }

How does "work.example" become "work\\.example"? Is one backslash
escaping the regex, and the second one escaping the literal? How do I
achieve this with Jena?


Yes. \\ puts a single real \ into the SPARQL string.

That's how you do it in Jena.
"." is a metacharacter so it needs escaping.
==> \.
but it's in a SPARQL string so syntax needs to be \\
==> \\.


The Xerces implement in REUtil.quoteMeta is:

public static String quoteMeta(String literal) {
int len = literal.length();
StringBuffer buffer = null;
for (int i = 0;  i < len;  i ++) {
int ch = literal.charAt(i);
if (".*+?{[()|\\^$".indexOf(ch) >= 0) {
if (buffer == null) {
buffer = new StringBuffer(i+(len-i)*2);
if (i > 0)  buffer.append(literal.substring(0, i));
}
buffer.append((char)'\\');
buffer.append((char)ch);
} else if (buffer != null)
buffer.append((char)ch);
}
return buffer != null ? buffer.toString() : literal;
}

and there are others via Google.

Andy



[1] http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#func-str

Martynas

On Tue, Jan 21, 2014 at 10:10 AM, Andy Seaborne  wrote:

Works for me:

SELECT * {
   VALUES ?o { "+35" "abc+35def" }
   FILTER regex(?o , "\\Q+35\\E", "i")
}

and in Java you need  due to both the levels of escaping (Java text,
SPARQL).

[[
Regex: Pattern exception: java.util.regex.PatternSyntaxException: Dangling
meta character '+' near index 0
]]
which gives the game away it's using java regexs :-)

The regex engine is Java's and the string is used untouched.

Strictly, it should be XSD v1 which are slightly different:

http://www.w3.org/TR/xmlschema-2/#regexs

and there is a strict Xerces provided alternative if you want exact XSD
regular expressions.

XSD and Java differs in only very small ways (e.g. XSD has one extra
modifier flag, "m", XSD has no \Q\E, and XSD has "Is" for unicode code
blocks inside \p e.g. \p{IsMongolian})

And now
http://www.w3.org/TR/xmlschema11-2/#regexs

 Andy


On 21/01/14 02:32, Joshua TAYLOR wrote:


My apologies.  I replied too quickly.  I just wrote this test with
Jena's command line tools. To match the string "+35", I had to use the
"\\+35" in the query:

select ?label where {
values ?label { "+35" "-35" }
filter(regex(str(?label),"\\+35"))
}

-
| label |
=
| "+35" |
-

That's _two_ slashes in the query string, which means that in Java
you'd end up writing

String query = ... + "filter(regex(...,\"+35\")" + ...;

Sorry for the hasty and inaccurate reply.

On Mon, Jan 20, 2014 at 9:27 PM, Joshua TAYLOR 
wrote:


On Mon, Jan 20, 2014 at 8:48 PM, Martynas Jusevičius
 wrote:


OK maybe "+35" was a bad example. But isn't "+" a special char in
SPARQL regex? And there are more like "*", "?" etc.
http://www.w3.org/TR/xpath-functions/#regex-syntax




Oh, good point.  But if it needs to be escape with a slash, then wouldn't

 filter regex(str(?label), "\+35")

be fine?  Note that if you're constructing this programmatically, you
might end up writing code like

  String queryString = ... + "filter regex(str(?label), \"\\+35\")" +
...;

--
Joshua Taylor, http://www.cs.rpi.edu/~tayloj/











Re: Escaping SPARQL regex

2014-01-21 Thread Joshua TAYLOR
On Tue, Jan 21, 2014 at 11:58 AM, Martynas Jusevičius
 wrote:
> Andy,
>
> what if I'm sending the query to a remote endpoint that does not
> support Java style regex syntax? Do I need to use FmtUtils then?
>
> Looking at the example from SPARQL 1.1 STR() [1]:
>
> This query selects the set of people who use their work.example
> address in their foaf profile:
>
> PREFIX foaf: 
> SELECT ?name ?mbox
>  WHERE { ?x foaf:name  ?name ;
> foaf:mbox  ?mbox .
>  FILTER regex(str(?mbox), "@work\\.example$") }
>
> How does "work.example" become "work\\.example"? Is one backslash
> escaping the regex, and the second one escaping the literal? How do I
> achieve this with Jena?
>
> [1] http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#func-str

(I'm not Andy, but) as I understand it, you want the pattern

@work\.example$

where the pattern needs to include a backslash, because otherwise the
. would mean "any character".  In SPARQL strings, however, you
represent a backslash with '\\', as noted in 19.7 Escape sequences in
strings [1].  Using a parameterized sparql string (you said earlier
that you are), you can get the pattern into the query using
setLiteral:

import com.hp.hpl.jena.query.ParameterizedSparqlString;

public class SPARQLRegexEscapeExample {
  public static void main(String[] args) {
final ParameterizedSparqlString query = new ParameterizedSparqlString(
"PREFIX foaf: \n" +
"SELECT ?name ?mbox\n" +
"WHERE { ?x foaf:name  ?name ;\n" +
"   foaf:mbox  ?mbox .\n" +
"FILTER regex(str(?mbox), ?pattern ) }" );
// Note that pattern has only 15 characters, there's only
// one `\` in it.
final String pattern = "@work\\.example$";
query.setLiteral( "?pattern", pattern );
System.out.println(  query.toString() );
  }
}

which produces as output (which has the correct escaping):

PREFIX foaf: 
SELECT ?name ?mbox
WHERE { ?x foaf:name  ?name ;
   foaf:mbox  ?mbox .
FILTER regex(str(?mbox), "@work\\.example$" ) }



[1] http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#grammarEscapes





-- 
Joshua Taylor, http://www.cs.rpi.edu/~tayloj/


Re: Escaping SPARQL regex

2014-01-21 Thread Martynas Jusevičius
Andy,

what if I'm sending the query to a remote endpoint that does not
support Java style regex syntax? Do I need to use FmtUtils then?

Looking at the example from SPARQL 1.1 STR() [1]:

This query selects the set of people who use their work.example
address in their foaf profile:

PREFIX foaf: 
SELECT ?name ?mbox
 WHERE { ?x foaf:name  ?name ;
foaf:mbox  ?mbox .
 FILTER regex(str(?mbox), "@work\\.example$") }

How does "work.example" become "work\\.example"? Is one backslash
escaping the regex, and the second one escaping the literal? How do I
achieve this with Jena?

[1] http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#func-str

Martynas

On Tue, Jan 21, 2014 at 10:10 AM, Andy Seaborne  wrote:
> Works for me:
>
> SELECT * {
>   VALUES ?o { "+35" "abc+35def" }
>   FILTER regex(?o , "\\Q+35\\E", "i")
> }
>
> and in Java you need  due to both the levels of escaping (Java text,
> SPARQL).
>
> [[
> Regex: Pattern exception: java.util.regex.PatternSyntaxException: Dangling
> meta character '+' near index 0
> ]]
> which gives the game away it's using java regexs :-)
>
> The regex engine is Java's and the string is used untouched.
>
> Strictly, it should be XSD v1 which are slightly different:
>
> http://www.w3.org/TR/xmlschema-2/#regexs
>
> and there is a strict Xerces provided alternative if you want exact XSD
> regular expressions.
>
> XSD and Java differs in only very small ways (e.g. XSD has one extra
> modifier flag, "m", XSD has no \Q\E, and XSD has "Is" for unicode code
> blocks inside \p e.g. \p{IsMongolian})
>
> And now
> http://www.w3.org/TR/xmlschema11-2/#regexs
>
> Andy
>
>
> On 21/01/14 02:32, Joshua TAYLOR wrote:
>>
>> My apologies.  I replied too quickly.  I just wrote this test with
>> Jena's command line tools. To match the string "+35", I had to use the
>> "\\+35" in the query:
>>
>> select ?label where {
>>values ?label { "+35" "-35" }
>>filter(regex(str(?label),"\\+35"))
>> }
>>
>> -
>> | label |
>> =
>> | "+35" |
>> -
>>
>> That's _two_ slashes in the query string, which means that in Java
>> you'd end up writing
>>
>> String query = ... + "filter(regex(...,\"+35\")" + ...;
>>
>> Sorry for the hasty and inaccurate reply.
>>
>> On Mon, Jan 20, 2014 at 9:27 PM, Joshua TAYLOR 
>> wrote:
>>>
>>> On Mon, Jan 20, 2014 at 8:48 PM, Martynas Jusevičius
>>>  wrote:

 OK maybe "+35" was a bad example. But isn't "+" a special char in
 SPARQL regex? And there are more like "*", "?" etc.
 http://www.w3.org/TR/xpath-functions/#regex-syntax
>>>
>>>
>>>
>>> Oh, good point.  But if it needs to be escape with a slash, then wouldn't
>>>
>>> filter regex(str(?label), "\+35")
>>>
>>> be fine?  Note that if you're constructing this programmatically, you
>>> might end up writing code like
>>>
>>>  String queryString = ... + "filter regex(str(?label), \"\\+35\")" +
>>> ...;
>>>
>>> --
>>> Joshua Taylor, http://www.cs.rpi.edu/~tayloj/
>>
>>
>>
>>
>


Re: Escaping SPARQL regex

2014-01-21 Thread Joshua TAYLOR
On Tue, Jan 21, 2014 at 7:40 AM, Michael Brunnbauer  wrote:
>
> I was just refering to the documentation of ParameterizedSparqlString, which
> says that "injection is done by textual substitution":
>
>  
> http://jena.apache.org/documentation/javadoc/arq/com/hp/hpl/jena/query/ParameterizedSparqlString.html

If you're using a ParameterizedSparqlString, the escaping still seems
to be handled correctly.  The string that you'd want to inject is
`\+35`, which is written as `\\+35` in Java.  Here's and example and
its output:

import com.hp.hpl.jena.query.ParameterizedSparqlString;
import com.hp.hpl.jena.query.QueryExecution;
import com.hp.hpl.jena.query.QueryExecutionFactory;
import com.hp.hpl.jena.query.ResultSet;
import com.hp.hpl.jena.query.ResultSetFormatter;
import com.hp.hpl.jena.rdf.model.Model;
import com.hp.hpl.jena.rdf.model.ModelFactory;

public class ParameterizedSparqlRegexSubstitution {
  public static void main(String[] args) {
final Model empty = ModelFactory.createDefaultModel();
final ParameterizedSparqlString query = new ParameterizedSparqlString(
"select * where {\n" +
"  values ?label { \"+35\" \"-35\" }\n" +
"  filter( regex( str(?label), ?pattern ))\n" +
"}\n" );
query.setLiteral( "?pattern", "\\+35" );
System.out.println( query.toString() );
final QueryExecution exec = QueryExecutionFactory.create(
query.asQuery(), empty );
final ResultSet results = exec.execSelect();
ResultSetFormatter.out( results );
  }
}


select * where {
  values ?label { "+35" "-35" }
  filter( regex( str(?label), "\\+35" ))
}

-
| label |
=
| "+35" |
-





-- 
Joshua Taylor, http://www.cs.rpi.edu/~tayloj/


Re: Escaping SPARQL regex

2014-01-21 Thread Michael Brunnbauer

Hello Andy,

I was just refering to the documentation of ParameterizedSparqlString, which
says that "injection is done by textual substitution":

 
http://jena.apache.org/documentation/javadoc/arq/com/hp/hpl/jena/query/ParameterizedSparqlString.html

Thanks for pointing to FmtUtils but the functions do not seem to be documented:

 
https://jena.apache.org/documentation/javadoc/arq/com/hp/hpl/jena/sparql/util/FmtUtils.html#stringForLiteral%28com.hp.hpl.jena.graph.Node_Literal,%20com.hp.hpl.jena.sparql.serializer.SerializationContext%29

I guess anyone taking security serious should have a look at the sourcecode
anyway.

Regards,

Michael Brunnbauer

On Tue, Jan 21, 2014 at 12:22:33PM +, Andy Seaborne wrote:
> Michael,
> 
> Could you raise a JIRA for this with an example of where the escaping 
> isn't happening? Thanks.
> 
> There is code for formatting (FmtUtils) but if you have discovered a 
> case where it isn't being applied properly, it should be fixed.
> 
>   Andy
> 
> On 21/01/14 08:43, Michael Brunnbauer wrote:
> >
> >Hello Martynas,
> >
> >On Tue, Jan 21, 2014 at 01:30:42AM +0100, Martynas Jusevi?ius wrote:
> >>is there a way to build a SPARQL-specific regex string in Jena?
> >
> >I do not know. com.hp.hpl.jena.query.ParameterizedSparqlString does not 
> >seem
> >to do the necessary escaping. This is what we do to create literals in
> >a SPARQL query in Python:
> >
> >  def escape(s):
> >  map={
> >  '"': '\\"',
> >  '\r': '\\r',
> >  '\n': '\\n',
> >  '\t': '\\t',
> >  '\b': '\\b',
> >  '\f': '\\f'
> >  }
> >  s=s.replace('\\','\\u005C\\u005C')
> >  for key,value in map.items():
> >  s=s.replace(key,value)
> >  return '"' + s + '"'
> >
> >And this is what we do to check that URIs that are inserted into a SPARQL
> >query do not contain malicious stuff:
> >
> >  def checkuri(uri):
> >  for c in uri:
> >  n = ord(c)
> >  if n <= 32 or c in '<>\\':
> >  return False
> >  return True
> >
> >Regards,
> >
> >Michael Brunnbauer
> >

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail bru...@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel


pgpaVyBdPp2kw.pgp
Description: PGP signature


Re: Escaping SPARQL regex

2014-01-21 Thread Andy Seaborne

Michael,

Could you raise a JIRA for this with an example of where the escaping 
isn't happening? Thanks.


There is code for formatting (FmtUtils) but if you have discovered a 
case where it isn't being applied properly, it should be fixed.


Andy

On 21/01/14 08:43, Michael Brunnbauer wrote:


Hello Martynas,

On Tue, Jan 21, 2014 at 01:30:42AM +0100, Martynas Jusevi?ius wrote:

is there a way to build a SPARQL-specific regex string in Jena?


I do not know. com.hp.hpl.jena.query.ParameterizedSparqlString does not seem
to do the necessary escaping. This is what we do to create literals in
a SPARQL query in Python:

  def escape(s):
  map={
  '"': '\\"',
  '\r': '\\r',
  '\n': '\\n',
  '\t': '\\t',
  '\b': '\\b',
  '\f': '\\f'
  }
  s=s.replace('\\','\\u005C\\u005C')
  for key,value in map.items():
  s=s.replace(key,value)
  return '"' + s + '"'

And this is what we do to check that URIs that are inserted into a SPARQL
query do not contain malicious stuff:

  def checkuri(uri):
  for c in uri:
  n = ord(c)
  if n <= 32 or c in '<>\\':
  return False
  return True

Regards,

Michael Brunnbauer





Re: Escaping SPARQL regex

2014-01-21 Thread Michael Brunnbauer

Hello Martynas,

uh... wait. You want to escape with regard to regex and not with regard to
SPARQL. Then I answered the wrong question :-)

Regards,

Michael Brunnbauer

On Tue, Jan 21, 2014 at 09:43:52AM +0100, Michael Brunnbauer wrote:
> 
> Hello Martynas,
> 
> On Tue, Jan 21, 2014 at 01:30:42AM +0100, Martynas Jusevi?ius wrote:
> > is there a way to build a SPARQL-specific regex string in Jena?
> 
> I do not know. com.hp.hpl.jena.query.ParameterizedSparqlString does not seem
> to do the necessary escaping. This is what we do to create literals in
> a SPARQL query in Python:
> 
>  def escape(s):
>  map={
>  '"': '\\"',
>  '\r': '\\r',
>  '\n': '\\n',
>  '\t': '\\t',
>  '\b': '\\b',
>  '\f': '\\f'
>  }
>  s=s.replace('\\','\\u005C\\u005C')
>  for key,value in map.items():
>  s=s.replace(key,value)
>  return '"' + s + '"'
> 
> And this is what we do to check that URIs that are inserted into a SPARQL
> query do not contain malicious stuff:
> 
>  def checkuri(uri):
>  for c in uri:
>  n = ord(c)
>  if n <= 32 or c in '<>\\':
>  return False
>  return True
> 
> Regards,
> 
> Michael Brunnbauer
> 
> -- 
> ++  Michael Brunnbauer
> ++  netEstate GmbH
> ++  Geisenhausener Straße 11a
> ++  81379 München
> ++  Tel +49 89 32 19 77 80
> ++  Fax +49 89 32 19 77 89 
> ++  E-Mail bru...@netestate.de
> ++  http://www.netestate.de/
> ++
> ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
> ++  USt-IdNr. DE221033342
> ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
> ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel



-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail bru...@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel


pgpVA7zKlUEHB.pgp
Description: PGP signature


Re: Escaping SPARQL regex

2014-01-21 Thread Andy Seaborne

Works for me:

SELECT * {
  VALUES ?o { "+35" "abc+35def" }
  FILTER regex(?o , "\\Q+35\\E", "i")
}

and in Java you need  due to both the levels of escaping (Java text, 
SPARQL).


[[
Regex: Pattern exception: java.util.regex.PatternSyntaxException: 
Dangling meta character '+' near index 0

]]
which gives the game away it's using java regexs :-)

The regex engine is Java's and the string is used untouched.

Strictly, it should be XSD v1 which are slightly different:

http://www.w3.org/TR/xmlschema-2/#regexs

and there is a strict Xerces provided alternative if you want exact XSD 
regular expressions.


XSD and Java differs in only very small ways (e.g. XSD has one extra 
modifier flag, "m", XSD has no \Q\E, and XSD has "Is" for unicode code 
blocks inside \p e.g. \p{IsMongolian})


And now
http://www.w3.org/TR/xmlschema11-2/#regexs

Andy

On 21/01/14 02:32, Joshua TAYLOR wrote:

My apologies.  I replied too quickly.  I just wrote this test with
Jena's command line tools. To match the string "+35", I had to use the
"\\+35" in the query:

select ?label where {
   values ?label { "+35" "-35" }
   filter(regex(str(?label),"\\+35"))
}

-
| label |
=
| "+35" |
-

That's _two_ slashes in the query string, which means that in Java
you'd end up writing

String query = ... + "filter(regex(...,\"+35\")" + ...;

Sorry for the hasty and inaccurate reply.

On Mon, Jan 20, 2014 at 9:27 PM, Joshua TAYLOR  wrote:

On Mon, Jan 20, 2014 at 8:48 PM, Martynas Jusevičius
 wrote:

OK maybe "+35" was a bad example. But isn't "+" a special char in
SPARQL regex? And there are more like "*", "?" etc.
http://www.w3.org/TR/xpath-functions/#regex-syntax



Oh, good point.  But if it needs to be escape with a slash, then wouldn't

filter regex(str(?label), "\+35")

be fine?  Note that if you're constructing this programmatically, you
might end up writing code like

 String queryString = ... + "filter regex(str(?label), \"\\+35\")" + ...;

--
Joshua Taylor, http://www.cs.rpi.edu/~tayloj/








Re: Escaping SPARQL regex

2014-01-21 Thread Michael Brunnbauer

Hello Martynas,

On Tue, Jan 21, 2014 at 01:30:42AM +0100, Martynas Jusevi?ius wrote:
> is there a way to build a SPARQL-specific regex string in Jena?

I do not know. com.hp.hpl.jena.query.ParameterizedSparqlString does not seem
to do the necessary escaping. This is what we do to create literals in
a SPARQL query in Python:

 def escape(s):
 map={
 '"': '\\"',
 '\r': '\\r',
 '\n': '\\n',
 '\t': '\\t',
 '\b': '\\b',
 '\f': '\\f'
 }
 s=s.replace('\\','\\u005C\\u005C')
 for key,value in map.items():
 s=s.replace(key,value)
 return '"' + s + '"'

And this is what we do to check that URIs that are inserted into a SPARQL
query do not contain malicious stuff:

 def checkuri(uri):
 for c in uri:
 n = ord(c)
 if n <= 32 or c in '<>\\':
 return False
 return True

Regards,

Michael Brunnbauer

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail bru...@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel


pgpN6esK_wVFW.pgp
Description: PGP signature


Re: Escaping SPARQL regex

2014-01-20 Thread Joshua TAYLOR
My apologies.  I replied too quickly.  I just wrote this test with
Jena's command line tools. To match the string "+35", I had to use the
"\\+35" in the query:

select ?label where {
  values ?label { "+35" "-35" }
  filter(regex(str(?label),"\\+35"))
}

-
| label |
=
| "+35" |
-

That's _two_ slashes in the query string, which means that in Java
you'd end up writing

String query = ... + "filter(regex(...,\"+35\")" + ...;

Sorry for the hasty and inaccurate reply.

On Mon, Jan 20, 2014 at 9:27 PM, Joshua TAYLOR  wrote:
> On Mon, Jan 20, 2014 at 8:48 PM, Martynas Jusevičius
>  wrote:
>> OK maybe "+35" was a bad example. But isn't "+" a special char in
>> SPARQL regex? And there are more like "*", "?" etc.
>> http://www.w3.org/TR/xpath-functions/#regex-syntax
>
>
> Oh, good point.  But if it needs to be escape with a slash, then wouldn't
>
>filter regex(str(?label), "\+35")
>
> be fine?  Note that if you're constructing this programmatically, you
> might end up writing code like
>
> String queryString = ... + "filter regex(str(?label), \"\\+35\")" + ...;
>
> --
> Joshua Taylor, http://www.cs.rpi.edu/~tayloj/



-- 
Joshua Taylor, http://www.cs.rpi.edu/~tayloj/


Re: Escaping SPARQL regex

2014-01-20 Thread Joshua TAYLOR
On Mon, Jan 20, 2014 at 8:48 PM, Martynas Jusevičius
 wrote:
> OK maybe "+35" was a bad example. But isn't "+" a special char in
> SPARQL regex? And there are more like "*", "?" etc.
> http://www.w3.org/TR/xpath-functions/#regex-syntax


Oh, good point.  But if it needs to be escape with a slash, then wouldn't

   filter regex(str(?label), "\+35")

be fine?  Note that if you're constructing this programmatically, you
might end up writing code like

String queryString = ... + "filter regex(str(?label), \"\\+35\")" + ...;

-- 
Joshua Taylor, http://www.cs.rpi.edu/~tayloj/


Re: Escaping SPARQL regex

2014-01-20 Thread Martynas Jusevičius
OK maybe "+35" was a bad example. But isn't "+" a special char in
SPARQL regex? And there are more like "*", "?" etc.
http://www.w3.org/TR/xpath-functions/#regex-syntax

If my input comes from the user, how do I escape those? I guess this
might be related:
http://www.w3.org/TR/sparql11-query/#grammarEscapes

On Tue, Jan 21, 2014 at 1:59 AM, Joshua TAYLOR  wrote:
> On Mon, Jan 20, 2014 at 7:30 PM, Martynas Jusevičius
>  wrote:
>>
>> For example, Pattern.compile(Pattern.quote("+35")) produces query with
>>
>> FILTER regex(str(?label), "\\Q+35\\E", "i")
>>
>> which returns no results. So I guess I can't use the Java syntax
>> directly like that. How do I escape the string for SPARQL regex?
>
> It's not quite clear to me what you're asking.  What doesn't work
> about the following?
>
> filter regex(str(?label), "+35")
>
> //JT
>
> --
> Joshua Taylor, http://www.cs.rpi.edu/~tayloj/


Re: Escaping SPARQL regex

2014-01-20 Thread Joshua TAYLOR
On Mon, Jan 20, 2014 at 7:30 PM, Martynas Jusevičius
 wrote:
>
> For example, Pattern.compile(Pattern.quote("+35")) produces query with
>
> FILTER regex(str(?label), "\\Q+35\\E", "i")
>
> which returns no results. So I guess I can't use the Java syntax
> directly like that. How do I escape the string for SPARQL regex?

It's not quite clear to me what you're asking.  What doesn't work
about the following?

filter regex(str(?label), "+35")

//JT

-- 
Joshua Taylor, http://www.cs.rpi.edu/~tayloj/


Escaping SPARQL regex

2014-01-20 Thread Martynas Jusevičius
Hey,

is there a way to build a SPARQL-specific regex string in Jena?

Now I'm using the Pattern class to implement simple regex() based
search. If the search string contains special characters, escaping is
done but the syntax does not seem to work.

For example, Pattern.compile(Pattern.quote("+35")) produces query with

FILTER regex(str(?label), "\\Q+35\\E", "i")

which returns no results. So I guess I can't use the Java syntax
directly like that. How do I escape the string for SPARQL regex?

Martynas
graphityhq.com