Ping? 4.9.2 is out. And we definitely need to fix this in time for 4.10.
I can experiment with the extra column approach, though it seems pretty unclean to me. (Unclean = not rdf based) On Wed, Aug 22, 2012 at 11:16 AM, Vishesh Handa <[email protected]> wrote: > Hey everyone > > In 4.9, most the queries on large datasets are impossibly slow and often > cause virtuoso to completely lock up. So I've been going through the common > queries that are passed to Nepomuk from a user perspective and been trying > to optimize them. > > The most prevalent problem is that of the user visibility. > > Simple queries like listing all the tags seem to blow out of proportion > with the added "FILTER EXISTS { ?r a [ nao:userVisible "true"^^xsd:boolean > ] . }". If one looks the the SQL that is being generated one can see a > drastic different > > "select ?r where { ?r a nao:Tag . }" > > SELECT __id2i ( "s_1_0-t0"."S" ) AS "r" > FROM DB.DBA.RDF_QUAD AS "s_1_0-t0" > WHERE "s_1_0-t0"."P" = __i2idn ( __bft( ' > http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1)) > AND isiri_id ( "s_1_0-t0"."O") > AND "s_1_0-t0"."O" = __i2idn ( __bft( ' > http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1)) > OPTION (QUIETCAST) > > > "select ?r where { ?r a nao:Tag . FILTER EXISTS { ?r a [ nao:userVisible > "true"^^xsd:boolean ] . } }" > > SELECT __id2i ( "s_1_0-t0"."S" ) AS "r" > FROM DB.DBA.RDF_QUAD AS "s_1_0-t0" > WHERE "s_1_0-t0"."P" = __i2idn ( __bft( ' > http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1)) > AND isiri_id ( "s_1_0-t0"."O") > AND "s_1_0-t0"."O" = __i2idn ( __bft( ' > http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1)) > AND EXISTS ( ( > SELECT TOP 1 1 AS __ask_retval > FROM DB.DBA.RDF_QUAD AS "s_1_4-t1" > INNER JOIN DB.DBA.RDF_QUAD AS "s_1_4-t2" > ON ( "s_1_4-t1"."S" = "s_1_4-t2"."O" ) > WHERE "s_1_4-t1"."P" = __i2idn ( __bft( ' > http://www.semanticdesktop.org/ontologies/2007/08/15/nao#userVisible' , > 1)) > AND (1 - isiri_id ( "s_1_4-t1"."O")) > AND "s_1_4-t1"."O" = DB.DBA.RDF_OBJ_OF_SQLVAL ( 1) > AND "s_1_4-t2"."P" = __i2idn ( __bft( ' > http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1)) > AND isiri_id ( "s_1_4-t2"."O") > AND "s_1_4-t2"."S" = "s_1_0-t0"."S" > OPTION (QUIETCAST) > )) > OPTION (QUIETCAST) > > The second query results in an added query on every single result, and > that additional query also contains an added join. > > On my system with 13k tags (yeah, I know), the system is completely > unusable. Virtuoso pops up to 200% and takes about 5 minutes to respond. > While I don't expect anyone to have 13k tags, people do have those many > contacts or emails. > > Options on how to fix - > > 1. Use graphs with a filter - > > select ?r where { graph ?g { ?r a nao:Tag . } FILTER NOT EXISTS { ?g a > nrl:Ontology. } } > > _______________________________________________________________________________ > > SELECT __id2i ( "s_1_1-t0"."S" ) AS "r" > FROM DB.DBA.RDF_QUAD AS "s_1_1-t0" > WHERE "s_1_1-t0"."P" = __i2idn ( __bft( ' > http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1)) > AND isiri_id ( "s_1_1-t0"."O") > AND "s_1_1-t0"."O" = __i2idn ( __bft( ' > http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1)) > AND not ( EXISTS ( ( > SELECT TOP 1 1 AS __ask_retval > FROM DB.DBA.RDF_QUAD AS "s_1_4-t1" > WHERE "s_1_4-t1"."P" = __i2idn ( __bft( ' > http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1)) > AND isiri_id ( "s_1_4-t1"."O") > AND "s_1_4-t1"."O" = __i2idn ( __bft( ' > http://www.semanticdesktop.org/ontologies/2007/08/15/nrl#Ontology' , 1)) > AND "s_1_4-t1"."S" = "s_1_1-t0"."G" > OPTION (QUIETCAST) > ))) > OPTION (QUIETCAST) > > This also results in an additional SQL query per resource, but it's still > a LOT faster (no join in the exists query). > > 2.) Use graphs via nao:maintainedBy > > select ?r where { graph ?g { ?r a nao:Tag . } ?g nao:maintainedBy ?app . }' > > _______________________________________________________________________________ > > SELECT __id2i ( "s_1_1-t0"."S" ) AS "r" > FROM DB.DBA.RDF_QUAD AS "s_1_1-t0" > INNER JOIN DB.DBA.RDF_QUAD AS "s_1_0-t1" > ON ( "s_1_0-t1"."S" = "s_1_1-t0"."G" ) > WHERE "s_1_1-t0"."P" = __i2idn ( __bft( ' > http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1)) > AND isiri_id ( "s_1_1-t0"."O") > AND "s_1_1-t0"."O" = __i2idn ( __bft( ' > http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1)) > AND ( "s_1_0-t1"."S" < min_bnode_iri_id ()) > AND "s_1_0-t1"."P" = __i2idn ( __bft( ' > http://www.semanticdesktop.org/ontologies/2007/08/15/nao#maintainedBy' , > 1)) > OPTION (QUIETCAST) > > This would be the ideal solution, however it will kill backward > compatibility cause all the graph don't have the nao:maintainedBy clause. > > 3.) Go SQL and add another column to our RDF_QUAD table which is indexed. > That way we can always filter statements on the basis of visibility. Would > be considerably faster than the join. > > I suggest we go with option 1 for 4.9, and option 2 for 4.10 and get rid > of all the user visible stuff. > > Any suggestions? > > -- > Vishesh Handa > > -- Vishesh Handa
_______________________________________________ Nepomuk mailing list [email protected] https://mail.kde.org/mailman/listinfo/nepomuk
