Hi, I have recently tried out s3ql on Debian testing, and I have a few questions.
I'm using s3ql with local storage, without encryption or compression. I set threads to 1 as a baseline. I'm pretty confused about what I'm experiencing. Firstly I don't understand the need for (or use of) a local cache when the backend is already a local filesystem. I'd like to not have to store the data locally more than once, even if it's temporary. I see that files are populated into ~/.s3ql/local*cache/ and that they are on the same order as the files I'm writing into the actual mounted filesystem. For a local filesystem backing, I'd like to have a mode where I'm not using a cache, even if it means I'm required to write data synchronously. Apart from this request I'd also like to understand what's going on under the hood and if there are any parameters I can tweak when I'm using a local backend. I find when I specify cachesize manually to be small or zero that my write throughput goes down by several orders of magnitude. Is using no cache unsupported? Does this introduce a deadlock or starvation of some type? I understand if it's not supported or intended use for now, in which case I'll leave the cache at its default but it might be a useful test to see how the system works when the cache is small or nonexistent. If I choose a smaller, but still a nonzero sized cache, I would want to ensure that I don't come across an arbitrary limitation. I don't mind a small performance loss but when I use a zero cache size I get throughput of around 50 kilobytes per second, which suggests that I'm running up against an unexpected code path. Read performance is okay even in that case. The next thing I'm wondering a lot about is the deduplication. In my test, I'm writing all zeroes. I write a megabyte using one block of a 1MB block size using dd, and then I write 1024 blocks of a kilobyte each. I then also write 2MB or 4MB at a time. I'd expect that deduplication would catch these very trivial cases and that I'd only see one entry of at most 2^n bytes, where 2^n represents the approximate block size of the deduplication. I'd also expect 2^n to be smaller than a megabyte (maybe like a single 64k block). Is there a reason that deduplication is considering this data as different or unique? I'd like to understand what's going on internally. I'd like to be able to write any arbitrary configuration of zeroes of size 2^m and I'd still like to see at most one entry of size 2^n in the database, consisting of one block's worth of zeroes, even if m >> n. As long as this is not being done, I'm a little confused about how the deduplication works, especially when the input is inherently duplicate. I understand compression would bring down the size considerably but that would mask the issue I'm experiencing here. Mike -- You received this message because you are subscribed to the Google Groups "s3ql" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
