...
On 12/03/12 14:22, Frank Budinsky wrote:
Me too. Correct me if I'm wrong, but it seems that filter functions are
fundamentally not workable for performance reasons. A filter function
provides a yes/no answer for each triple that may be a match. If you use a
filter function for free-text search, it would need to be called for every
triple in the triplestore (i.e., does triple #1 match the pattern?, does
#2?, does #3?, etc., etc.). With the property function approach, the
function is called once, and it (via Lucene or some other lookup) returns
all the matches.
You're right about FILTER/property functions. For free text, the index
(the pattern) is generative. It's often, but not always, better to look
in the index, get some matches, then proceed with the rest of the RDF
part. Sometimes it's not, when the RDF only matches a few things, when
going to the index to check is better.
This is the reason I don't think that free-text can be accomplished using
an existing standard SPARQL mechanism, and therefore unless property
functions become an allowable extension, the current LARQ approach is
arguably non-compliant with the SPARQL spec. From Andy's reply, I gather
that he believes this is an arguable point. Did I get that right?
Yes.
Of course, as you (in your other reply) and Andy have pointed out, the
other important question is can the the text query syntax (currently based
on Lucene) be standardized so that the actual matching behavior would be
interoperable across implementations. I suppose an approach that required
all triplestore implementations to support custom property functions (or
similar) would, at least, allow users to plug in their own custom text
match property function into any triplestore of their choosing.
Yes, that would be good; tere are some syntax choices in dealing with
multiple arguments and results.
e.g. a named argument block grafted onto RDF syntax:
[ xyz:target ?literal ;
xyz:matchString "foobar" ;
xyz:limit 10 ]
or "use" of lists:
?literal xyz:match ("foobar" 10 ) .
so I think that, for a standard, not messing around with triple syntax
and putting in a proper procedure call (with arguments that can be
bound) is better, maybe with named arguments as well:
# Random idea ...
PROC(<textmatch>, ?literal, "foobar", 10)
Andy