[s3ql] Re: Partial block caching implementation

2020-07-10 Thread Nikolaus Rath
On Jul 10 2020, Ivan Shapovalov  wrote:
> (I will probably need a fork anyway, as Nikolaus has apparently
> rejected a specific optimization in the B2 backend, absence of which
> makes my s3ql hit a certain API rate limit very often.)

Hu? S3QL does not have a B2 backend at all, so I don't think I could
have rejected optimizations for it.

Best,
Nikolaus

-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to s3ql+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/s3ql/87y2nrnqz0.fsf%40thinkpad.rath.org.


[s3ql] Re: Partial block caching implementation

2020-07-10 Thread Nikolaus Rath
On Jul 10 2020, Daniel Jagszent  wrote:
>> Ah yes, compression and probably encryption will indeed preclude any
>> sort of partial block caching. An implementation will have to be
>> limited to plain uncompressed blocks, which is okay for my use-case
>> though (borg provides its own encryption and compression anyway).
>> [...]
> Compression and encryption are integral parts of S3QL and I would argue
> that disabling them is only an edge case.

If I were to write S3QL from scratch, I would probably not support this
at all, right. However, since the feature is present, I think we ought
to consider it fully supported ("edge case" makes it sound as if this
isn't the case).


> I might be wrong but I think Nikolaus (maintainer of S3QL) will not
> accept such a huge change into S3QL that is only beneficial for an edge
> case.

Never say never, but the bar is certainly high here. I think there are
more promising avenues to explore - eg. storing the
compressed/uncompressed offset mapping to make partial retrieval work
for all cases.

Best,
-Nikolaus

-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to s3ql+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/s3ql/87v9ivnqth.fsf%40thinkpad.rath.org.


Re: [s3ql] Re: Partial block caching implementation

2020-07-10 Thread Ivan “intelfx” Shapovalov


> 10 июля 2020 г., в 21:51, Nikolaus Rath  написал(а):
> 
> On Jul 10 2020, Ivan Shapovalov  wrote:
>> (I will probably need a fork anyway, as Nikolaus has apparently
>> rejected a specific optimization in the B2 backend, absence of which
>> makes my s3ql hit a certain API rate limit very often.)
> 
> Hu? S3QL does not have a B2 backend at all, so I don't think I could
> have rejected optimizations for it.

Then how am I using it? :)
https://github.com/s3ql/s3ql/pull/116

--
Ivan Shapovalov / intelfx /

(Sent from a phone. Havoc may be wreaked on the formatting.)

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to s3ql+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/s3ql/7E2F7B61-5EF1-4D86-B17D-A18923D75342%40intelfx.name.


Re: [s3ql] Re: Partial block caching implementation

2020-07-10 Thread Nikolaus Rath
On Jul 10 2020, Ivan “intelfx” Shapovalov  wrote:
>> 10 июля 2020 г., в 21:51, Nikolaus Rath  написал(а):
>> 
>> On Jul 10 2020, Ivan Shapovalov  wrote:
>>> (I will probably need a fork anyway, as Nikolaus has apparently
>>> rejected a specific optimization in the B2 backend, absence of which
>>> makes my s3ql hit a certain API rate limit very often.)
>> 
>> Hu? S3QL does not have a B2 backend at all, so I don't think I could
>> have rejected optimizations for it.
>
> Then how am I using it? :)
> https://github.com/s3ql/s3ql/pull/116

I stand corrected.. I guess I didn't do a release since that so the
documentation isn't updated yet. Apologies.

That said, my point about the optimization stands, I do not remember
rejecting anything here. Do you have a link for that too? :-)

Best,
-Nikolaus

-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to s3ql+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/s3ql/87sgdznq5f.fsf%40thinkpad.rath.org.


Re: [s3ql] Re: Partial block caching implementation

2020-07-10 Thread Ivan “intelfx” Shapovalov

> 10 июля 2020 г., в 22:09, Nikolaus Rath  написал(а):
> 
> On Jul 10 2020, Ivan “intelfx” Shapovalov  wrote:
 10 июля 2020 г., в 21:51, Nikolaus Rath  написал(а):
>>> 
>>> On Jul 10 2020, Ivan Shapovalov  wrote:
 (I will probably need a fork anyway, as Nikolaus has apparently
 rejected a specific optimization in the B2 backend, absence of which
 makes my s3ql hit a certain API rate limit very often.)
>>> 
>>> Hu? S3QL does not have a B2 backend at all, so I don't think I could
>>> have rejected optimizations for it.
>> 
>> Then how am I using it? :)
>> https://github.com/s3ql/s3ql/pull/116
> 
> I stand corrected.. I guess I didn't do a release since that so the
> documentation isn't updated yet. Apologies.
> 
> That said, my point about the optimization stands, I do not remember
> rejecting anything here. Do you have a link for that too? :-)

I wasn’t entirely correct, too: you didn’t strictly reject the feature, but 
rather scared the contributor into not doing it :)

https://github.com/s3ql/s3ql/pull/116#issuecomment-517634046

It would appear that’s not a theoretical issue — I’m hitting 
b2_get_file_versions cap from time to time.

--
Ivan Shapovalov / intelfx /

(Sent from a phone. Havoc may be wreaked on the formatting.)

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to s3ql+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/s3ql/167806B1-1C3B-4200-A4D1-520A2114813E%40intelfx.name.


Re: [s3ql] Re: Partial block caching implementation

2020-07-10 Thread Ivan Shapovalov
On 2020-07-10 at 19:54 +0100, Nikolaus Rath wrote:
> On Jul 10 2020, Daniel Jagszent  wrote:
> > > Ah yes, compression and probably encryption will indeed preclude
> > > any
> > > sort of partial block caching. An implementation will have to be
> > > limited to plain uncompressed blocks, which is okay for my use-
> > > case
> > > though (borg provides its own encryption and compression anyway).
> > > [...]
> > Compression and encryption are integral parts of S3QL and I would
> > argue
> > that disabling them is only an edge case.
> 
> If I were to write S3QL from scratch, I would probably not support
> this
> at all, right. However, since the feature is present, I think we
> ought
> to consider it fully supported ("edge case" makes it sound as if this
> isn't the case).
> 
> 
> > I might be wrong but I think Nikolaus (maintainer of S3QL) will not
> > accept such a huge change into S3QL that is only beneficial for an
> > edge
> > case.
> 
> Never say never, but the bar is certainly high here. I think there
> are
> more promising avenues to explore - eg. storing the
> compressed/uncompressed offset mapping to make partial retrieval work
> for all cases.

Hmm, I'm not sure how's that supposed to work.

AFAICS, s3ql uses "solid compression", meaning that the entire block is
compressed at once. It is generally impossible to extract a specific
range of uncompressed data without decompressing the whole stream.[1]

Encryption does not pose this kind of existential problem — AES is used
in CTR mode, which theoretically permits random-access decryption — but
the crypto library in use, python-cryptography, doesn't seem to permit
this sort of trickery.

[1]: This can be solved by converting compression layer into a block-
based one, but this will naturally break compatibility (i. e. we will
have to introduce a new set of compression algorithms, that is, another
corner case) and will require to either compromise on the block size,
or introduce complex indirection (such as storing compressed-
uncompressed offset maps along with the object itself), or completely
blow metadata out of proportion (recording an offset mapping for each
128K of the data). Regardless of the implementation plan this will also
compromise the compression efficiency. Completely not worth it, IMO.

-- 
Ivan Shapovalov / intelfx /

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to s3ql+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/s3ql/994adf80ffb0a27abc8e66e88d00f95d054a7b2c.camel%40intelfx.name.


signature.asc
Description: This is a digitally signed message part


Re: [s3ql] Re: Partial block caching implementation

2020-07-11 Thread Nikolaus Rath
On Jul 11 2020, Ivan Shapovalov  wrote:
> On 2020-07-10 at 19:54 +0100, Nikolaus Rath wrote:
>> On Jul 10 2020, Daniel Jagszent  wrote:
>> > > Ah yes, compression and probably encryption will indeed preclude
>> > > any sort of partial block caching. An implementation will have to
>> > > be limited to plain uncompressed blocks, which is okay for my
>> > > use- case though (borg provides its own encryption and
>> > > compression anyway).  [...]
>> > Compression and encryption are integral parts of S3QL and I would
>> > argue that disabling them is only an edge case.
>> 
>> If I were to write S3QL from scratch, I would probably not support
>> this at all, right. However, since the feature is present, I think we
>> ought to consider it fully supported ("edge case" makes it sound as
>> if this isn't the case).
>> 
>> 
>> > I might be wrong but I think Nikolaus (maintainer of S3QL) will not
>> > accept such a huge change into S3QL that is only beneficial for an
>> > edge
>> > case.
>> 
>> Never say never, but the bar is certainly high here. I think there
>> are
>> more promising avenues to explore - eg. storing the
>> compressed/uncompressed offset mapping to make partial retrieval work
>> for all cases.
>
> Hmm, I'm not sure how's that supposed to work.
>
> AFAICS, s3ql uses "solid compression", meaning that the entire block is
> compressed at once. It is generally impossible to extract a specific
> range of uncompressed data without decompressing the whole stream.[1]

At least bzip2 always works in blocks, IIRC blocks are at most 900 kB
(for highest compression settings). I wouldn't be surprised if the same
holds for LZMA.

We could track the size of each compressed block, and store it as part
of the metadata of the object (so it doesn't blow-up the SQLite table).

> Encryption does not pose this kind of existential problem — AES is used
> in CTR mode, which theoretically permits random-access decryption — but
> the crypto library in use, python-cryptography, doesn't seem to permit
> this sort of trickery.

Worst case you can feed X bytes of garbage into the decrypter and then
start with the partial block - with CTR you should get the right
output.

Best,
Nikolaus

-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to s3ql+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/s3ql/87pn92nw27.fsf%40thinkpad.rath.org.


Re: [s3ql] Re: Partial block caching implementation

2020-07-11 Thread Ivan Shapovalov
On 2020-07-11 at 12:13 +0100, Nikolaus Rath wrote:
> On Jul 11 2020, Ivan Shapovalov  wrote:
> > On 2020-07-10 at 19:54 +0100, Nikolaus Rath wrote:
> > > On Jul 10 2020, Daniel Jagszent  wrote:
> > > > > Ah yes, compression and probably encryption will indeed
> > > > > preclude
> > > > > any sort of partial block caching. An implementation will
> > > > > have to
> > > > > be limited to plain uncompressed blocks, which is okay for my
> > > > > use- case though (borg provides its own encryption and
> > > > > compression anyway).  [...]
> > > > Compression and encryption are integral parts of S3QL and I
> > > > would
> > > > argue that disabling them is only an edge case.
> > > 
> > > If I were to write S3QL from scratch, I would probably not
> > > support
> > > this at all, right. However, since the feature is present, I
> > > think we
> > > ought to consider it fully supported ("edge case" makes it sound
> > > as
> > > if this isn't the case).
> > > 
> > > 
> > > > I might be wrong but I think Nikolaus (maintainer of S3QL) will
> > > > not
> > > > accept such a huge change into S3QL that is only beneficial for
> > > > an
> > > > edge
> > > > case.
> > > 
> > > Never say never, but the bar is certainly high here. I think
> > > there
> > > are
> > > more promising avenues to explore - eg. storing the
> > > compressed/uncompressed offset mapping to make partial retrieval
> > > work
> > > for all cases.
> > 
> > Hmm, I'm not sure how's that supposed to work.
> > 
> > AFAICS, s3ql uses "solid compression", meaning that the entire
> > block is
> > compressed at once. It is generally impossible to extract a
> > specific
> > range of uncompressed data without decompressing the whole
> > stream.[1]
> 
> At least bzip2 always works in blocks, IIRC blocks are at most 900 kB
> (for highest compression settings). I wouldn't be surprised if the
> same
> holds for LZMA.

True, I forgot that bzip2 is inherently block-based. Not sure about
LZMA or gzip, but there is still a significant obstacle: how would you
extract this information from the compression libraries?

> 
> We could track the size of each compressed block, and store it as
> part
> of the metadata of the object (so it doesn't blow-up the SQLite
> table).
> 
> > Encryption does not pose this kind of existential problem — AES is
> > used
> > in CTR mode, which theoretically permits random-access decryption —
> > but
> > the crypto library in use, python-cryptography, doesn't seem to
> > permit
> > this sort of trickery.
> 
> Worst case you can feed X bytes of garbage into the decrypter and
> then
> start with the partial block - with CTR you should get the right
> output.

Yes, that could probably work. Still feels like a grand hack.

-- 
Ivan Shapovalov / intelfx /

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to s3ql+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/s3ql/85251a7f8c1176770ca65acc679266d89c3a0211.camel%40intelfx.name.


signature.asc
Description: This is a digitally signed message part


Re: [s3ql] Re: Partial block caching implementation

2020-07-15 Thread Nikolaus Rath
On Jul 11 2020, Ivan Shapovalov  wrote:
> On 2020-07-11 at 12:13 +0100, Nikolaus Rath
> wrote:
>> On Jul 11 2020, Ivan Shapovalov  wrote:
>> > On 2020-07-10 at 19:54 +0100, Nikolaus Rath wrote:
>> > > On Jul 10 2020, Daniel Jagszent  wrote:
>> > > > > Ah yes, compression and probably encryption will indeed preclude any 
>> > > > > sort of
>> > > > > partial block caching. An implementation will have to be limited to 
>> > > > > plain
>> > > > > uncompressed blocks, which is okay for my use- case though (borg 
>> > > > > provides its
>> > > > > own encryption and compression anyway).  [...]
>> > > > Compression and encryption are integral parts of S3QL and I would 
>> > > > argue that
>> > > > disabling them is only an edge case.
>> > >  If I were to write S3QL from scratch, I would probably not support this 
>> > > at all,
>> > > right. However, since the feature is present, I think we ought to 
>> > > consider it fully
>> > > supported ("edge case" makes it sound as if this isn't the case).
>> > > 
>> > > 
>> > > > I might be wrong but I think Nikolaus (maintainer of S3QL) will not 
>> > > > accept such a
>> > > > huge change into S3QL that is only beneficial for an edge case.
>> > >  Never say never, but the bar is certainly high here. I think there are 
>> > > more
>> > > promising avenues to explore - eg. storing the compressed/uncompressed 
>> > > offset
>> > > mapping to make partial retrieval work for all cases.
>> >  Hmm, I'm not sure how's that supposed to work.
>> > 
>> > AFAICS, s3ql uses "solid compression", meaning that the entire block is 
>> > compressed at
>> > once. It is generally impossible to extract a specific range of 
>> > uncompressed data
>> > without decompressing the whole stream.[1]
>>
>>  At least bzip2 always works in blocks, IIRC blocks are at most 900 kB (for 
>> highest
>> compression settings). I wouldn't be surprised if the same holds for LZMA.
>
> True, I forgot that bzip2 is inherently block-based. Not sure about LZMA or 
> gzip, but
> there is still a significant obstacle: how would you extract this information 
> from the
> compression libraries?

No need to extract it, S3QL hands data to the compression library in
smaller chunks (IIRC 128 kB), so we just have to keep track of what goes
into and comes out of the compression library.


Best,
-Nikolaus

-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to s3ql+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/s3ql/874kq9f5o7.fsf%40vostro.rath.org.