carmen r wrote:
Hi,

        Has anyone here used Redland with a large number of triples (>10 
million)?
How does it scale?
>
once or twice i loaded some small dbpedia sets in that range

i cant really say, as rasqal has seen some commits since then.

        Was it too bad then? Did you use a mysql back end?

perhaps dajobe can comment on the dataloads Y! has put redland through?

        It would be wonderful...

proper indexing is essential at that size, and did get performance closer to 
Virtuoso
(which usually still won)
be aware Virtuoso is a gigantic monolithic beast with XML processing stuff, a 
SQL DB,
etc..

        I've seen the benchmarks. Virtuoso seems to support dbpedia well, but I
couldn't find details over what hardware they're using.

on that note, i think the best solution towards more scalable semweb stuff is 
more
modularity

eg, acces to internal rasqal set-intersection stuff so one can do offline 
(ahead of
time) aggregations hinted on use patterns

or overload functions using some class-inheritance/super() technique and provide
optimized SQL, etc

        I agree, but I'm worried that a MySQL backend may start to choke 
somewhere
before 100 million triples. Besides, to do that sort of tweaking it'd be best to
create a more complex SQL schema, partitioning data by type and hashes, etc. I 
think
it could work very well, but it's a project more complex than I can handle now.

currently i switched to a FS based store to support flexible 
optimization/aggregation
strategy and remove the 'black box beast' components from the system. if god 
took away
my FS, id defintiely look at redland again, before anything else

hope that helps

        Thanks, that helps a lot. I was considering moving most of the data to
FS, and it's good to know that it works. May I ask you how you're doing that?
Are you using Redland file storage?

        I was considering partitioning the data by subject, which will solve 
most
of my needs, but I'll lose the ability to perform SPARQL over the set. I have
also considered caching the aggregations and query results, but in order to do
that I still need to know how well it will scale: if queries take hours to run,
there's no way I can build and keep the cache up to date.

        Thanks a lot for your reply!

--
Bruno Barberi Gnecco <brunobg_at_users.sourceforge.net>
My only love sprung from my only hate!
Too early seen unknown, and known too late!
                -- William Shakespeare, "Romeo and Juliet"
_______________________________________________
redland-dev mailing list
[email protected]
http://lists.librdf.org/mailman/listinfo/redland-dev

Reply via email to