[
https://issues.apache.org/jira/browse/MARMOTTA-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617317#comment-13617317
]
Sebastian Schaffert commented on MARMOTTA-175:
----------------------------------------------
Hi Raffaele,
thanks for the tests. I agree that left joins are usually very efficient. But
unfortunately not in combination with a union subselect, because then the
databases first have to create a temporary result table to perform the join on
(at least this is what I read when googling a bit on the MySQL issue). Maybe it
is possible to reformulate the delete statement in a way that the left join
does not involve a subselect?
Your tests have interesting results. I would have expected NOT IN to be the
slowest operation, because it would need to list all ids from the different
tables and for big results even do a complete scan. OTOH maybe the databases
can optimize this kind of operation. Would be interesting to see with a bigger
dataset, because in this example the indexes will easily fit in main memory.
I'll try with our GeoNames import on Postgres (140 million triples).
For now I solved the issue as described, but this is definately worth
investigating more. So I'll reopen the issue to keep it in mind.
> Garbage collection on triple tables
> -----------------------------------
>
> Key: MARMOTTA-175
> URL: https://issues.apache.org/jira/browse/MARMOTTA-175
> Project: Marmotta
> Issue Type: Bug
> Components: Triple Store
> Affects Versions: 3.1-incubating
> Environment: Centos 6.4 64b - JDK 1.6.0_38 - MySql 5.1.67 - Tomcat
> 7.0.37
> Reporter: Raffaele Palmieri
> Assignee: Sebastian Schaffert
> Priority: Minor
> Labels: garbage, mysql, triplestore
>
> During garbage collection of triple tables in log there is the following line:
> SQL error while executing garbage collection on triples table: You have an
> error in your SQL syntax; check the manual that corresponds to your MySQL
> server version for the right syntax to use near 'UNION (SELECT triple_id FROM
> reasoner_just_supp_triples WHERE triple_id = triple' at line 1
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira