Bob DuCharme wrote:
Lee,
That first one worked great out of the box (untested? I'm impressed)
with a few repeated rows, and DISTINCT fixed that.
Great!
When I first wrote my original query, I picked two directors who were
unlikely to have many actors in common for the novelty value of them
having something in common, but I wanted to come up with a form of the
query that would work for directors who were likely to have actors in
common, and when I replaced the directors' names with Woody Allen and
Robert Altman, it worked just great. The results even listed the three
Mia Farrow/Woody Allen movies along with her one Altman film and Michael
Murphy's two Altman movies with his one Allen film.
At first I didn't understand the role of your movie2 variable, but I
think I do now: we're only interested in binding movie1 (and its
movieName) if something can be bound to movie2 as well, and movie2's
name will come up in a different result row when that movie binds to
movie1. Is that correct?
That's correct. That's the reason it's sort of a tricky query: we're
really asking to find two movies, but only interested in hearing about
one at a time. Actually, though, that points out an alternative way to
do the query:
PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
PREFIX dc: <http://purl.org/dc/terms/>
SELECT ?actorName ?johnwatersMovie ?stevenspielbergMovie WHERE {
# bind the two directors
?jw movie:director_name "John Waters" .
?ss movie:director_name "Steven Spielberg" .
# we want to find two movies, one by each director
?m1 movie:director ?jw ;
dc:title ?johnwatersMovie ;
movie:actor ?actor .
?m2 movie:director ?ss ;
dc:title ?stevenspielbergMovie ;
movie:actor ?actor .
?actor movie:actor_name ?actorName .
}
This one is the most straightforward - just find pairs of movies by JW
and SS and return the pair. The drawback of this approach is that for
directors and actors with a large number of common movies, the result
set is going to have one row for every pair of movies that the
particular actor was in for the two directors. (Not sure if that makes
sense.)
(Instead of naming the variables after the directors in question like I
did here, you can project out ?movieName1 ?dirName1 ?movieName2
?dirName2 -- the only change to the query is to add in triple patterns
that find the directors' names explicitly as in our first query:
?jw movie:director_name ?dirName1 .
?ss movie:director_name ?dirName2 .
Because of what you wrote about a movie with multiple directors, I tried
Woody Allen and Francis Ford Coppola, because besides having actors in
common (Diane Keaton, Joe Mantegna, others) they both directed parts of
"New York Stories" along with Martin Scorcese. With this being a
multiple-director movie that Allen also acted in, he was therefore "in"
a Coppola movie. The query results ended up listing every movie Allen
acted in, with two entries for "What's Up Tiger Lily" because it too had
two directors, crediting Allen and the Japanese guy who directed the
original movie that Allen redubbed.
I didn't try your second query because it looked like it was asking for
so much before filtering that I worried that it asked too much of the
server (and because your first version worked so well!), although I
could be misunderstanding how typical SPARQL query processing works.
In practice, for many SPARQL engines I know about, you are right that
that would not be a particularly efficient way to write the query. In
theory, there is no reason that a query engine should not be smart
enough to optimize those filters into the query evaluation though, and
thus avoid having to postprocess a large number of results.
I'd be interested to hear from any implementors who are doing that -
what filters can you push into your underlying query / index search
strategy, and how does using a FILTER compare (performance-wise) to
using a straight-up triple pattern that matches against a literal value?
Lee
thanks,
Bob
Lee Feigenbaum wrote:
To summarize from the blog post, the goal is to find all actors that
appear in both a John Waters and a Steven Spielberg film. But now you
also want to find all the movies (by those directors) that the actor
was in. If my understanding is right, this is how I would go about it
(untested):
PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
PREFIX dc: <http://purl.org/dc/terms/>
SELECT ?actorName ?dirName ?movieName WHERE {
# bind the two directors
?jw movie:director_name "John Waters" .
?ss movie:director_name "Steven Spielberg" .
# we want to find two movies, one by each director
# but we use two variables so that we can only
# pull out the name of one of them
{
?movie1 movie:director ?jw .
?movie2 movie:director ?ss .
} UNION {
?movie2 movie:director ?jw .
?movie1 movie:director ?ss .
}
?movie1 dc:title ?movieName .
# the actor needs to be in both movies
?movie1 movie:actor ?actor .
?movie2 movie:actor ?actor .
?actor movie:actor_name ?actorName .
# we need to repeat the director information
# to be able to bind a variable to the director's
# name - this may give extra results if a movie has
# multiple directors
?movie1 movie:director [ movie:director_name ?dirName ] .
}
There may yet be a more elegant way to do this, and I'm not positive
I've got this right. I think that ARQ at least has a LET assignment
operator that would avoid the need for the extra director binding
there at the end.
Another way to approach this that is quite similar but might be
considered easier would be to use FILTERs:
PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
PREFIX dc: <http://purl.org/dc/terms/>
SELECT DISTINCT ?actorName ?dirName1 ?movieName WHERE {
# this is the info we want to pull out
?movie1 dc:title ?movieName .
?actor movie:actor_name ?actorName .
# two movies with two directors
?movie1 movie:director [ movie:director_name ?dirName1 ].
?movie2 movie:director [ movie:director_name ?dirName2 ]
# the same actor needs to be in both movies
?movie1 movie:actor ?actor .
?movie2 movie:actor ?actor .
# use the filter to check the director names
FILTER(
(?dirName1 = 'John Waters' && ?dirName2 = 'Steven Spielberg') ||
(?dirName2 = 'John Waters' && ?dirName1 = 'Steven Spielberg')
) .
}
This second one avoids some of the silliness since it relies on the
fact that each actor for which this works will match the pattern two
ways (one way with JW bound to ?dirName1 and one with SS bound to
it). We want to get *both* these results.
In practice, I'd usually do something like this with multiple queries
- first find the relevant actors, then find their movies by JW and by SS.
hope this is helpful,
Lee
Bob DuCharme wrote:
I'm trying to expand the query shown at
http://www.snee.com/bobdc.blog/2008/11/sparql-at-the-movies.html#id203668
to include the director and movie names in the result of the query
sent to http://data.linkedmdb.org/sparql. I guess my main problem is
trying to understand how I can set it up so that ?actor is bound to
the same value throughout the query but ?movie and ?movieName can be
bound to different values in the two patterns. I know that one actor
was in a single movie by each of the two directors named below, and
while the following doesn't give me an error when submitted it gives
an unrelated set of data. I may be going about it completely wrong.
Any suggestions?
thanks,
Bob
####################
SELECT ?actorName ?dirName ?movieName WHERE {
?dir1 <http://data.linkedmdb.org/resource/movie/director_name> "John
Waters".
?dir2 <http://data.linkedmdb.org/resource/movie/director_name>
"Steven Spielberg".
?actor <http://data.linkedmdb.org/resource/movie/actor_name>
?actorName.
{
?movie <http://data.linkedmdb.org/resource/movie/director> ?dir1;
<http://data.linkedmdb.org/resource/movie/actor> ?actor;
<http://purl.org/dc/terms/title> ?movieName.
?dir1 <http://data.linkedmdb.org/resource/movie/director_name>
?dirName.
}
UNION
{
?movie <http://data.linkedmdb.org/resource/movie/director> ?dir2;
<http://data.linkedmdb.org/resource/movie/actor> ?actor;
<http://purl.org/dc/terms/title> ?movieName.
?dir2 <http://data.linkedmdb.org/resource/movie/director_name>
?dirName.
}
}