Hi Iker, As I wrote the original reply. I can talk about the beta.sparql.uniprot.org endpoint. UniProt release 2013_04 has 7 billion statements in it in 15 named graphs see http://beta.sparql.uniprot.org/ and receives about 50,000 queries a day.
When I was selecting a triple store to use I tested it thoroughly including for growth. The UniProt raw data doubles about every 15 months (flatfile) e.g. we got 300 million more triples this month to deal with next month will be 320 million etc... The SIB Swiss Institute of Bioinformatics is an academic institution and the SPARQL endpoint is a secondary API to the UniProt data. Budget for the SPARQL hardware does consists out of what can I steal from our NextGenSequencing groups and hardware sponsors. In this case a 64 core AMD 256 GB ram machine with two 800gb 10,000 rpm disks. We have a frozen release cycle. Which means data changes once a month. As we have one machine it means we allow access to the current release 6.6 billion triples and build the next one 3 weeks into release cycle. When building the new release we host more than 13 billion triples on one 256gb ram machine. However, I don't know when I get new hardware so I also ran a test with 6billion live + 12billion loading (in the 5.2 series should be better now) at that time the second load ran in just under 6 days, which is the time I have for loading data if I want to make the release cycle. As the machine is shared between live users and production now that the service is more popular we might have trouble down the line. So per current 7 billion statements we use 100GB ram of HEAP 60 gb of which is cache for OWLIM-SE. The main problem is the use of distinct queries that build up large in memory data structures (will be fixed with https://openrdf.atlassian.net/browse/SES-1769) which can play havoc with GC and take the machine down. We use OWLIM-SE, as I have only one machine. If it dies it dies, I can replace it with a old 64gm ram machine but query times will suffer dramatically. We back up the entire OWLIM-SE database files to tape/HSM, so can restore a database easily on any other machine in the cluster I can get hands on. Even if I had more machines I would for now use multiple instances of OWLIM-SE as copying the data and putting a http load balancer in front plus DNS round robbin is a really nice solution for my use cases. One of the things is that if other partners in the UniProt consortium want to run a sparql endpoint mirror then OWLIM-SE being java is really easy to deploy, with virtuoso or C based options this is much more difficult (more dependent on lower level OS). Regards, Jerven On Mar 15, 2013, at 12:26 AM, Iker Huerga wrote: > Hi, > > I would like to bring back the Thread below in which maximum number of > triples stored is discussed. I'm actually curious about the set up of those > machines in which 12billion triples are stored. > > - Were you talking about a cluster configuration (OWLIM-Enterprise or > Replication Cluster) or single server? > - Also how much RAM are you using? (mainly curious about how much memory > Uniprot machines have...) > > We (using Virtuoso Commercial Cluster Edition) are currently working with 8 > billion triples sitting into a cluster made of two nodes. For us, 32GB of > RAM/per billion triples are needed, i.e. 256GB RAM for 8 billion triples in > the whole cluster (128 per node). This RAM is mainly required to store the > index not the triples per se. > > Thanks > Iker > > > > > > > >>> Jerven Bolleman <jerven.bolle...@isb-sib.ch > > ( > >>> > mailto:jerven.bolle...@isb-sib.ch > ) 25.02.2013 19:22 >>> > Some comments inline, > On Feb 25, 2013, at 7:05 PM, deepak Naik wrote: > > > Hi everyone, > > > > I am interested to figure it out about the maximum number of triple storage > > done reported using the following triple store: > > 1, sesame ( > http://www.openrdf.org/ > ) > > 2, bigowlim ( bigowlim ) > now OWLIM-SE > 8 billion + linkedlifedata > 6.6 billion > beta.sparql.uniprot.org > (live in testing more than 12 billion) > > > 3, swiftowlim ( swiftowlim ) > now OWLIM-lite > currently 2 billion max (hard limit) > > > > Thanks, > > Deepak > > > > ------------------------------------------------------------------------------ > > Everyone hates slow websites. So do we. > > Make your web apps faster with AppDynamics > > Download AppDynamics Lite for free today: > > > http://p.sf.net/sfu/appdyn_d2d_feb_______________________________________________ > > > Sesame-general mailing list > > > sesame-gene...@lists.sourceforge.net > > > > https://lists.sourceforge.net/lists/listinfo/sesame-general > > > ------------------------------------------------------------------- > Jerven Bolleman > jerven.bolle...@isb-sib.ch > > SIB Swiss Institute of Bioinformatics Tel: +41 (0)22 379 58 85 > CMU, rue Michel Servet 1 Fax: +41 (0)22 379 58 58 > 1211 Geneve 4, > Switzerland > www.isb-sib.ch - www.uniprot.org > > Follow us at > https://twitter.com/#!/uniprot ( https://twitter.com/#%21/uniprot > ) > ------------------------------------------------------------------- > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > > http://p.sf.net/sfu/appdyn_d2d_feb > > _______________________________________________ > Sesame-general mailing list > > sesame-gene...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/sesame-general > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_feb > > > _______________________________________________ > Sesame-general mailing > > listSesame-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/sesame-general > _______________________________________________ > Owlim-discussion mailing list > > Owlim-discussion@ontotext.com > http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion > > > > -- > Iker Huerga > http://www.ikerhuerga.com/ > _______________________________________________ > Owlim-discussion mailing list > Owlim-discussion@ontotext.com > http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion ------------------------------------------------------------------- Jerven Bolleman jerven.bolle...@isb-sib.ch SIB Swiss Institute of Bioinformatics Tel: +41 (0)22 379 58 85 CMU, rue Michel Servet 1 Fax: +41 (0)22 379 58 58 1211 Geneve 4, Switzerland www.isb-sib.ch - www.uniprot.org Follow us at https://twitter.com/#!/uniprot ------------------------------------------------------------------- _______________________________________________ Owlim-discussion mailing list Owlim-discussion@ontotext.com http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion