Re: [s3ql] Unclear on zero deduplication and cache for local backend

Nikolaus Rath Tue, 30 Sep 2014 19:06:07 -0700

p6su5t8...@snkmail.com writes:
> Hi,
>
> I have recently tried out s3ql on Debian testing, and I have a few
> questions.
>
> I'm using s3ql with local storage, without encryption or compression.
> I set threads to 1 as a baseline
[...] .
> I find when I specify cachesize manually to be small or zero that my
> write throughput goes down by several orders of magnitude.  Is using
> no cache unsupported?


Yes, this is not supported. You are right that if the backend storage is
a local disk, this could be made to work. However, S3QL was designed for
network storage, and the "local" storage backend was added for use
with a network file system (like sshfs) and testing, and not as an
efficient method to utilize your local disk.

In theory, there are several optimizations one could implement with the
local backend (not requiring a cache being one of them). However, I
don't think this is worth it. I don't think that even with additional
optimizations, there'd be little reason not to use e.g. dm-crypt with
btrfs to get very similar features with orders of magnitude better
performance.

> I don't mind a small performance loss but when I use a zero cache size
> I get throughput of around 50 kilobytes per second, which suggests
> that I'm running up against an unexpected code path.  Read performance
> is okay even in that case.

I think with zero cache, S3QL probably downloads, updates, uploads and
removes a cache entry for every single read() call.

> The next thing I'm wondering a lot about is the deduplication.  In my
> test, I'm writing all zeroes.  I write a megabyte using one block of a
> 1MB block size using dd, and then I write 1024 blocks of a kilobyte
> each.  I then also write 2MB or 4MB at a time.  I'd expect that
> deduplication would catch these very trivial cases and that I'd only
> see one entry of at most 2^n bytes, where 2^n represents the
> approximate block size of the deduplication.

Yes, this is what should happen.

> I'd also expect 2^n to be smaller than a megabyte (maybe like a single
> 64k block).

That's probably not the case. S3QL de-duplicates on the level of storage
objects. You specify the maximum storage object size at mkfs.s3ql time
with the --blocksize option, and the default is 10 MB.

To see de-duplication in action, you either need to write more data, or
you need to write smaller, but identical files:

$ echo hello, world > foo
$ echo hello, world > bar

..in this case s3ql will store only one storage object (containing
"hello, world") in the backend.


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to s3ql+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [s3ql] Unclear on zero deduplication and cache for local backend

Reply via email to