Re: tdbloader's info on batch count

Andy Seaborne Wed, 29 Feb 2012 05:10:32 -0800

On 29/02/12 11:34, Sarven Capadisli wrote:

On 12-02-29 06:26 AM, Sarven Capadisli wrote:

On 12-02-29 05:09 AM, Damian Steer wrote:

At a guess, other stuff happening on the same host? A batch might
include a sync to disk too. I wouldn't have thought GC would be an
issue.


Not to my knowledge. I get the feeling that the disk falls asleep.
Hence, I'm investing with what I have right now.


On that note, actually what I find absurd is that, if I want to get
tdbloader back to action (to work faster), I do some large disk writing
on another screen window. This was an accidental find, and I don't have
a technical explanation for it. Somehow that causes the Batch numbers go
up to 20000+, where they may have been stuck below 1000s.

-Sarven


Interesting but I'm not completely shocked.

The batch speed (yes, triples per second for the last time interval)tends to shoot up at the start (JIT presumably), hit some peak, thenvery slowly decline. With exceptions. Sometimes it declines for abit, then starts going faster even on a machine that is doing nothingelse, which is a bit odd.

I think the occasional one-off drop in batch is a major, non-incrementalGC happening.

The "doing work elsewhere" makes it go faster might be because the OS isknocked into a more efficient policy for the disk cache but I'm guessinghere.

Add: 4,150,000 triples (Batch: 2,380 / Avg: 4,684)
Add: 4,200,000 triples (Batch: 29,620 / Avg: 4,732)


That's pretty slow.

Usual questions:
  How much data overall?
  Many long literals?  Other unusual data features?
  What's the machine?

An incremental version is quite possible. It could load to a dataset,ensuring the id are right, then do index-merging.


        Andy

Re: tdbloader's info on batch count

Reply via email to