Hi Marco, It's the Sun jvm on both;
Ubuntu: java version "1.6.0_24" Java(TM) SE Runtime Environment (build 1.6.0_24-b07) Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode) Centos: java version "1.6.0_16" Java(TM) SE Runtime Environment (build 1.6.0_16-b01) Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode) On Wed, Jun 15, 2011 at 4:32 PM, Marco Neumann <[email protected]>wrote: > what jvm do you use on the machines? > > > On Wed, Jun 15, 2011 at 11:23 AM, Richard Francis <[email protected]> wrote: > > Hi, > > > > I'm using two identical machines in ec2 running tdbloader on centos > (CentOS > > release 5 (Final)) and Ubuntu 11.04 (natty) > > > > I've observed an issue where Centos will run happily at a consistent > speed > > and complete a load of 650million triples in around 12 hours, whereas the > > load on Ubuntu, after just 15million triples tails off and runs at an > ever > > increasing slower interval. > > > > On initial observation of the Ubuntu machine I noticed that the flush-202 > > process was running quite high, also running iostat showed that io was > the > > real bottle neck - with the ubuntu machine showing a constant use of the > > disk for both reads and writes (the centos machine had periods of no > usage > > followed by periods of writes). This led me to investigate how memory was > > being used by the Ubuntu machine - and a few blog posts / tutorials later > I > > found a couple of settings to tweak - the first I tried > > was dirty_writeback_centisecs - setting this to 0 had an immediate > positive > > effect on the load that I was performing - but after some more testing I > > found that the problem was just put back to around 80million triples > before > > I saw a drop off on performance. > > > > This led me investigate whether there was the same issue with tdbloader2 > - > > From my observations I got the same problem - but this time around 150m > > triples. > > > > Again - I focused on "dirty" settings - and this time tweaking > dirty_bytes > > = 30000000000 and dirty_background_bytes = 15000000000 saw a massive > > performance increase and for the vast part of add phase of the tdbloader > it > > kept up with the centos machine. > > > > Finally, last night I stopped all loads, and raced the centos machine and > > the ubuntu machine - both have completed - but the Centos machine (around > 12 > > hours) was still far quicker than the Ubuntu machine (20 hours). > > > > So my questions are, has anyone else observed this? - can anyone suggest > any > > further improvements - or things to try? - what is the best OS to perform > a > > tdbload on? > > > > Rich > > > > > > Tests were performed on three different machines 1x Centos and 2 x Ubuntu > - > > to rule out EC2 being a bottle neck - all were (from > > http://aws.amazon.com/ec2/instance-types/) > > > > High-Memory Double Extra Large Instance > > > > 34.2 GB of memory > > 13 EC2 Compute Units (4 virtual cores with 3.25 EC2 Compute Units each) > > 850 GB of instance storage > > 64-bit platform > > I/O Performance: High > > API name: m2.2xlarge > > All machines are configured with no swap > > > > Here's the summary from the only completed load on Ubuntu; > > > > ** Index SPO->OSP: 685,552,449 slots indexed in 18,337.75 seconds [Rate: > > 37,384.76 per second] > > -- Finish triples index phase > > ** 685,552,449 triples indexed in 37,063.51 seconds [Rate: 18,496.69 per > > second] > > -- Finish triples load > > ** Completed: 685,552,449 triples loaded in 78,626.27 seconds [Rate: > > 8,719.13 per second] > > -- Finish quads load > > > > Some resources I used; > > http://www.westnet.com/~gsmith/content/linux-pdflush.htm > > http://arighi.blogspot.com/2008/10/fine-grained-dirtyratio-and.html > > > > > > -- > Marco Neumann > KONA >
