On 29/02/12 11:34, Sarven Capadisli wrote:
On 12-02-29 06:26 AM, Sarven Capadisli wrote:
On 12-02-29 05:09 AM, Damian Steer wrote:
At a guess, other stuff happening on the same host? A batch might
include a sync to disk too. I wouldn't have thought GC would be an
issue.

Not to my knowledge. I get the feeling that the disk falls asleep.
Hence, I'm investing with what I have right now.

On that note, actually what I find absurd is that, if I want to get
tdbloader back to action (to work faster), I do some large disk writing
on another screen window. This was an accidental find, and I don't have
a technical explanation for it. Somehow that causes the Batch numbers go
up to 20000+, where they may have been stuck below 1000s.

-Sarven

Interesting but I'm not completely shocked.

The batch speed (yes, triples per second for the last time interval) tends to shoot up at the start (JIT presumably), hit some peak, then very slowly decline. With exceptions. Sometimes it declines for a bit, then starts going faster even on a machine that is doing nothing else, which is a bit odd.

I think the occasional one-off drop in batch is a major, non-incremental GC happening.

The "doing work elsewhere" makes it go faster might be because the OS is knocked into a more efficient policy for the disk cache but I'm guessing here.

Add: 4,150,000 triples (Batch: 2,380 / Avg: 4,684)
Add: 4,200,000 triples (Batch: 29,620 / Avg: 4,732)

That's pretty slow.

Usual questions:
  How much data overall?
  Many long literals?  Other unusual data features?
  What's the machine?

An incremental version is quite possible. It could load to a dataset, ensuring the id are right, then do index-merging.

        Andy

Reply via email to