[rdflib-dev] Deparsing a parsed SPARQL query

devonianfarm Mon, 09 Oct 2017 08:50:34 -0700

I brought this up about a month ago on another thread but I am ready to revisit 
it.


I am working on this library

https://github.com/paulhoule/gastrodon

and one of the goals for the library is that it should know more about SPARQL 
than most users.

Here are two bits of SPARQL intelligence that the library already has:

(1) It uses the SPARQL parser in rdflib to look for a GROUP BY statement in a 
query and if the group variables are used in the SELECT clause,  these are 
automatically made the indexes of a pandas data frame made from the SELECT 
output.

(2) The library also substitutes binding variables into SPARQL queries in order 
to use binding variables with queries sent to remote SPARQL endpoints.

I've been going at (2) with a rather stupid approach based on str.replace() 
which worked for a while,  but then I found an easy way to break it.  I can see 
a hack that will get me through the day,  but there is a way to break that,  
and even though I think I see a way to make one that is unbreakable,  it is 
enough work that I might want to write something that can turn SPARQL parse 
trees back into text because that,  in principle,  is the way to complete 
SPARQL intelligence.

So I am thinking about the right way to do it.  For a while I was puzzled by 
the large parse trees created by simple expression,  for instance,

parseQuery("SELECT (5 as ?o) {}")

parses to

[([], {}), SelectQuery_{'projection': [vars_{'expr': 
ConditionalOrExpression_{'expr': ConditionalAndExpression_{'expr': 
RelationalExpression_{'expr': AdditiveExpression_{'expr': 
MultiplicativeExpression_{'expr': rdflib.term.Literal('5', 
datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer'))}}}}}, 
'evar': rdflib.term.Variable('o')}], 'where': GroupGraphPatternSub_{}}], {})

But the more I think about it,  this is just how parse trees produced by 
something like pyparsing work -- the precedence structure is embedded in the 
sequence of nodes;  I can imagine this might make query understanding a little 
harder than it has to be,  but it shouldn't be a barrier to unparsing because 
if the conditionalAndExpression has only one leg,  the string form is just the 
string form of the single leg.

Another thing I am thinking about is that I'd like to have some way to 
substitute list variables from Python into the ExpressionList in an IN clause.  
It seems I could add an XVAR term which is ?? + VARNAME,  then add the 
appropriate production,  but I hate the idea of having to copy that whole parse 
tree when I only want to make a small change.

Any ideas?

-- 
http://github.com/RDFLib
--- 
You received this message because you are subscribed to the Google Groups 
"rdflib-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/rdflib-dev/f3b8b74f-3476-42e2-aafc-e224d6512fbb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[rdflib-dev] Deparsing a parsed SPARQL query

Reply via email to