Re: Standardizing property functions and/or full text search in SPARQL

Paolo Castagna Sat, 10 Mar 2012 00:35:22 -0800

Hi Andy,
many thanks for the interesting historical perspective.

Frank, thanks for asking this interesting question. Rob thanks for your 2 cents.
Here are my 2 cents, very little to add to what Andy and Rob have already said.


But, I want to share my perspective and a comment from the point of view of 
users and/or people running SPARQL endpoints.

As for any 'standardization'/'recommendation' activity, one of the goal should 
be interoperability between different implementations.
This did not happen with SPARQL in relation to 'free text' search and it is not 
because the lack of user needs (the evidence tells us the need is there and a 
lot of SPARQL engines provide that
functionality).
Free text is a custom extension and SPARQL queries using it are not portable 
across different implementations. We just need to be aware of this and live 
with that, for now.

This is not limited to free text, just to name another similar 'extension' I am 
interested in: geo/spatial capabilities/indexes.
Indeed, I'd love to read something alongside Andy's message below in relation 
to GeoSPARQL and understand if history is likely to be repeated in that context 
as well.

Another thing I often find myself thinking about is pros/cons, 
similarity/differences of "property functions" and filter functions.
Can free text search be done with filter functions? If so, is there any 
advantage in that approach? Same for spatial indexes.

My 2 cents,
Paolo

Andy Seaborne wrote:
> A bit of history:
> 
> The idea of property functions is copied from cwm/N3 (called built-in
> properties).  A good property function is one that describes meaning:
> 
>  (?a ?b) math:sum ?c .
> 
> That expresses a relationship.  Of course, in practice it can't run
> backwards but if there is a set of ?a ?b ?c then they have a
> relationship just like normal properties.  It does subtly assume
> something about the way execution happens in that parts of the BGP
> lexically before the property function have bound variables before the
> property function is called.  Normally, a BGP over triples can be
> executed in any order - you get the same answers; it just changes the
> number of negative cases to consider.
> 
> It also assumes lists are not structures of triples but first-class
> items in the data model.  Seem like that is less of abuse but they do
> squeeze something into strict SPARQL syntax.  For what ever reason,
> people are more comfortable with that than with syntax extensions.
> 
> So when SPARQL-WG did the features and requirements definition phase, we
> decided not to formalize property functions.  Indeed, relying on the
> "good property function" characterisation, you can argue there is
> nothing to define.  Just because a relationship is computed, and
> directly in the data is not important.
> 
> The fact it might affect some engines doing different evaluation made it
> politically sensitive.
> 
> 
> The one case what argued for them was text search.  It does not require
> property functions, they just squeeze them into SPARQL 1.0 syntax.  The
> WG could have decided to text search with special syntax and not
> property functions. As property functions, a text search does express a
> relationship, possibly indirect, between a thing (literal, document) and
> a text query string.
> 
> The text search has other issues: there isn't a standard syntax for text
> search and it looked like a monster work item.
> 
> For regexs, SPARQL uses XSD Function and Operators regex language [1].
> And that is so close to Java, Perl etc etc that it makes a difference
> only to picky implementers and no one else [3].
> 
> For free text, back then, Lucene syntax was common but not nearly as
> universal as Perl regex, which has displaced variations and is available
> in C, Perl, Java and all their friends.  So the group would have to at
> least survey existing candidates and define the language; XPath full
> text [2] wasn't finished then.
> 
> I doubt the WG could have done full free-text a la XPath full text. Even
> a subset would be significant to spec and test.
> 
> It would displace other things, given the WG has bounded resources.
> 
> The other issue was the amount of work it would take to implement. Regex
> implementations exist for (nearly?) every language.
> 
> Free text looked like it would require SPARQL implements to implement a
> large piece of work.  OK(maybe) if you can use Lucene or a clone, but
> that isn't the situation for everyone.
> 
> It felt at the time like free-text and not much else.  Aggregates were
> commonly implemented, a clear need and known to be practical (even so,
> there has been some resistance to the amount work they need).  Text
> search was too big a topic to undertake.
> 
>     Andy
> 
> [1] http://www.w3.org/TR/xpath-functions/#regex-syntax
> [2] http://www.w3.org/TR/xpath-full-text-10/
> [3] Look at the flags.
>     Even ARQ uses Java by default and Xerces on request
>     (Xerces has an exact XSD regex engine)
> 
> 
> On 09/03/12 19:56, Robert Vesse wrote:
>> Hi Frank
>>
>> I do not believe either of these are on the agenda for the current
>> round of SPARQL standardization but it may be worth you suggesting
>> these for inclusion as a Future Work item on the comments mailing
>> list - [email protected] - so that they can be included
>> on the list at http://www.w3.org/2009/sparql/wiki/Future_Work_Items
>> and feed into any future SPARQL working group
>>
>> FWIW there are already a number of interoperable implementations of
>> the LARQ style syntax already out in the wild - my own dotNetRDF
>> implements this as does Clark&  Parsia's Stardog and possibly others
>> I'm not aware of.  Also property functions in general are widely
>> implemented for a variety of purposes in a whole variety of triple
>> stores and SPARQL engines.
>>
>> The slightly subversive property function syntax is slightly awkward
>> and at odds with the pure SPARQL specification but it address the
>> general limitation of extension functions in SPARQL that they can
>> only return a single value and the 1.0 specific limitation that you
>> could not actually bind the result of an extension function to a
>> variable.  Even with BIND in SPARQL 1.1 you can only assign a single
>> value in a BIND so either you'd have to have multiple extension
>> functions to get the matches and the scores (and then how do you
>> relate them)
>>
>> Rob
>>
>> On Mar 9, 2012, at 11:44 AM, Frank Budinsky wrote:
>>
>>>
>>>
>>> Hi,
>>>
>>> I'm trying to get a handle on the strategic implications of using
>>> Jena property functions, and specifically the LARQ textMatch
>>> property function approach for supporting full text search.
>>>
>>> Does anybody know if there is anything in the works to try to
>>> include property functions in a future version of SPARQL? I noticed
>>> a small amount of discussion about this back in 2008, but haven't
>>> seen anything since. I see that they are not part of the standard
>>> SPARQL 1.1 specification and don't even appear to be an avenue of
>>> extension envisioned by the SPARQL 1.1 specification, which
>>> envisions extension value functions and entailment regimes.
>>>
>>> It seems that  the syntax of property functions borrows the syntax
>>> of legitimate SPARQL queries but gives it an
>>> implementation-specific meaning that runs counter to SPARQL
>>> semantics. Has there been any attempt to reconcile what property
>>> functions do with the semantics of SPARQL 1.1, as described in
>>> chapter 18: http://www.w3.org/TR/sparql11-query/#sparqlDefinition?
>>>
>>> Thanks, Frank
>>
>

Re: Standardizing property functions and/or full text search in SPARQL

Reply via email to