Re: [s3ql] Re: Partial block caching implementation

Ivan Shapovalov Sat, 11 Jul 2020 09:23:07 -0700

On 2020-07-11 at 12:13 +0100, Nikolaus Rath wrote:
> On Jul 11 2020, Ivan Shapovalov <inte...@intelfx.name> wrote:
> > On 2020-07-10 at 19:54 +0100, Nikolaus Rath wrote:
> > > On Jul 10 2020, Daniel Jagszent <dan...@jagszent.de> wrote:
> > > > > Ah yes, compression and probably encryption will indeed
> > > > > preclude
> > > > > any sort of partial block caching. An implementation will
> > > > > have to
> > > > > be limited to plain uncompressed blocks, which is okay for my
> > > > > use- case though (borg provides its own encryption and
> > > > > compression anyway).  [...]
> > > > Compression and encryption are integral parts of S3QL and I
> > > > would
> > > > argue that disabling them is only an edge case.
> > > 
> > > If I were to write S3QL from scratch, I would probably not
> > > support
> > > this at all, right. However, since the feature is present, I
> > > think we
> > > ought to consider it fully supported ("edge case" makes it sound
> > > as
> > > if this isn't the case).
> > > 
> > > 
> > > > I might be wrong but I think Nikolaus (maintainer of S3QL) will
> > > > not
> > > > accept such a huge change into S3QL that is only beneficial for
> > > > an
> > > > edge
> > > > case.
> > > 
> > > Never say never, but the bar is certainly high here. I think
> > > there
> > > are
> > > more promising avenues to explore - eg. storing the
> > > compressed/uncompressed offset mapping to make partial retrieval
> > > work
> > > for all cases.
> > 
> > Hmm, I'm not sure how's that supposed to work.
> > 
> > AFAICS, s3ql uses "solid compression", meaning that the entire
> > block is
> > compressed at once. It is generally impossible to extract a
> > specific
> > range of uncompressed data without decompressing the whole
> > stream.[1]
> 
> At least bzip2 always works in blocks, IIRC blocks are at most 900 kB
> (for highest compression settings). I wouldn't be surprised if the
> same
> holds for LZMA.


True, I forgot that bzip2 is inherently block-based. Not sure about
LZMA or gzip, but there is still a significant obstacle: how would you
extract this information from the compression libraries?

> 
> We could track the size of each compressed block, and store it as
> part
> of the metadata of the object (so it doesn't blow-up the SQLite
> table).
> 
> > Encryption does not pose this kind of existential problem — AES is
> > used
> > in CTR mode, which theoretically permits random-access decryption —
> > but
> > the crypto library in use, python-cryptography, doesn't seem to
> > permit
> > this sort of trickery.
> 
> Worst case you can feed X bytes of garbage into the decrypter and
> then
> start with the partial block - with CTR you should get the right
> output.

Yes, that could probably work. Still feels like a grand hack.

-- 
Ivan Shapovalov / intelfx /

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to s3ql+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/s3ql/85251a7f8c1176770ca65acc679266d89c3a0211.camel%40intelfx.name.

signature.asc
Description: This is a digitally signed message part

Re: [s3ql] Re: Partial block caching implementation

Reply via email to