Hi Iker,

As I wrote the original reply. I can talk about the beta.sparql.uniprot.org 
endpoint.
UniProt release 2013_04 has 7 billion statements in it in 15 named graphs see 
http://beta.sparql.uniprot.org/ and receives about 50,000 queries a day.

When I was selecting a triple store to use I tested it thoroughly including for 
growth. The UniProt raw data doubles about every 15 months (flatfile)
e.g. we got 300 million more triples this month to deal with next month will be 
320 million etc...

The SIB Swiss Institute of Bioinformatics is an academic institution and the 
SPARQL endpoint is a secondary API to the UniProt data. Budget for the SPARQL 
hardware does consists out of what can I steal from our NextGenSequencing 
groups and hardware sponsors. In this case a 64 core AMD 256 GB ram machine 
with two 800gb 10,000 rpm disks.

We have a frozen release cycle. Which means data changes once a month. As we 
have one machine it means we allow access to the current release 6.6 billion 
triples and build the next one 3 weeks into release cycle. When building the 
new release we host more than 13 billion triples on one 256gb ram machine. 
However, I don't know when I get new hardware so I also ran a test with 
6billion live + 12billion loading (in the 5.2 series should be better now) at 
that time the second load ran in just under 6 days, which is the time I have 
for loading data if I want to make the release cycle. As the machine is shared 
between live users and production now that the service is more popular we might 
have trouble down the line. So per current 7 billion statements we use 100GB 
ram of HEAP 60 gb of which is cache for OWLIM-SE. The main problem is the use 
of distinct queries that build up large in memory data structures (will be 
fixed with https://openrdf.atlassian.net/browse/SES-1769) which can 
 play havoc with GC and take the machine down.

We use OWLIM-SE, as I have only one machine. If it dies it dies, I can replace 
it with a old 64gm ram machine but query times will suffer dramatically. We 
back up the entire OWLIM-SE database files to tape/HSM, so can restore a 
database easily on any other machine in the cluster I can get hands on. Even if 
I had more machines I would for now use multiple instances of OWLIM-SE as 
copying the data and putting a http load balancer in front plus DNS round 
robbin is a really nice solution for my use cases.

One of the things is that if other partners in the UniProt consortium want to 
run a sparql endpoint mirror then OWLIM-SE being java is really easy to deploy, 
with virtuoso or C based options this is much more difficult (more dependent on 
lower level OS).

Regards,
Jerven


On Mar 15, 2013, at 12:26 AM, Iker Huerga wrote:

> Hi, 
> 
> I would like to bring back the Thread below in which maximum number of 
> triples stored is discussed. I'm actually curious about the set up of those 
> machines in which 12billion triples are stored. 
> 
>  - Were you talking about a cluster configuration (OWLIM-Enterprise or 
> Replication Cluster) or single server?
>  - Also how much RAM are you using? (mainly curious about how much memory 
> Uniprot machines have...)
> 
> We (using Virtuoso Commercial Cluster Edition) are currently working with 8 
> billion triples sitting into a cluster made of two nodes. For us, 32GB of 
> RAM/per billion triples are needed, i.e. 256GB RAM for 8 billion triples in 
> the whole cluster (128 per node). This RAM is mainly required to store the 
> index not the triples per se. 
> 
> Thanks
> Iker
> 
> 
> 
> 
> 
> 
> >>> Jerven Bolleman <jerven.bolle...@isb-sib.ch
> > ( 
> >>> 
> mailto:jerven.bolle...@isb-sib.ch
>  ) 25.02.2013 19:22 >>>
> Some comments inline,
> On Feb 25, 2013, at 7:05 PM, deepak Naik wrote:
> 
> > Hi everyone,
> > 
> > I am interested to figure it out about the maximum number of triple storage 
> > done reported using the following triple store:
> > 1, sesame ( 
> http://www.openrdf.org/
>  )
> > 2, bigowlim ( bigowlim  )
> now OWLIM-SE
> 8 billion + linkedlifedata
> 6.6 billion 
> beta.sparql.uniprot.org
>  (live in testing more than 12 billion)
> 
> > 3, swiftowlim ( swiftowlim )
> now OWLIM-lite
> currently 2 billion max (hard limit)
> > 
> > Thanks,
> > Deepak
> > 
> > ------------------------------------------------------------------------------
> > Everyone hates slow websites. So do we.
> > Make your web apps faster with AppDynamics
> > Download AppDynamics Lite for free today:
> > 
> http://p.sf.net/sfu/appdyn_d2d_feb_______________________________________________
> 
> > Sesame-general mailing list
> > 
> sesame-gene...@lists.sourceforge.net
> 
> > 
> https://lists.sourceforge.net/lists/listinfo/sesame-general
> 
> 
> -------------------------------------------------------------------
> Jerven Bolleman                        
> jerven.bolle...@isb-sib.ch
> 
> SIB Swiss Institute of Bioinformatics      Tel: +41 (0)22 379 58 85
> CMU, rue Michel Servet 1               Fax: +41 (0)22 379 58 58
> 1211 Geneve 4,
> Switzerland     
> www.isb-sib.ch - www.uniprot.org
> 
> Follow us at 
> https://twitter.com/#!/uniprot ( https://twitter.com/#%21/uniprot
>  )
> -------------------------------------------------------------------
> 
> 
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> 
> http://p.sf.net/sfu/appdyn_d2d_feb
> 
> _______________________________________________
> Sesame-general mailing list
> 
> sesame-gene...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/sesame-general
> 
> 
> 
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_feb
> 
> 
> _______________________________________________
> Sesame-general mailing 
> 
> listSesame-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/sesame-general
> _______________________________________________
> Owlim-discussion mailing list
> 
> Owlim-discussion@ontotext.com
> http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion
> 
> 
> 
> -- 
> Iker Huerga
> http://www.ikerhuerga.com/
> _______________________________________________
> Owlim-discussion mailing list
> Owlim-discussion@ontotext.com
> http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion

-------------------------------------------------------------------
Jerven Bolleman                        jerven.bolle...@isb-sib.ch
SIB Swiss Institute of Bioinformatics      Tel: +41 (0)22 379 58 85
CMU, rue Michel Servet 1               Fax: +41 (0)22 379 58 58
1211 Geneve 4,
Switzerland     www.isb-sib.ch - www.uniprot.org
Follow us at https://twitter.com/#!/uniprot
-------------------------------------------------------------------

_______________________________________________
Owlim-discussion mailing list
Owlim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion

Reply via email to