Re: Please review & discuss - draft proposal for how to do base64, foldedLines, etc.

Mike Beckerle Tue, 06 Mar 2018 09:03:54 -0800

Received some excellent feedback on the proposal as presented on the Wiki from 
Steve Hanson of IBM.



The feedback was mostly very supportive of the proposal. He suggested this 
change:


He suggested that we avoid the term "streaming" and stick with "layering" in 
all the terminology as the behavior known as "streaming" already has strong 
connotations.


Throughout the long history of DFDL, the term layering was always used for 
these concepts where a transformation must be done to data before parsing 
(after unparsing).


We do use the term "data stream" or just "stream" as a direction-independent 
way of referring to the data being parsed (input stream) or unparsed (output 
stream), but "streaming" connotes processing in a manner consistent with an 
unbounded stream, using a small/finite memory footprint.


While layering can be done in a streaming manner or not, the point of layering 
is different, as it is about the algorithmic transformations, not the memory 
footprint nor length-boundedness of the data stream.


So I've updated the proposal to use the term layer, layered, and layering and 
use the term stream minimally.


The updated proposal, which now lives on the Apache Daffodil Wiki here:


https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Data+Layer+Annotations+for+Base64+and+other+Layered+Transformations


-Mike Beckerle

Tresys


________________________________
From: Mike Beckerle
Sent: Friday, January 5, 2018 6:28:23 PM
To: [email protected]
Subject: Re: Please review & discuss - draft proposal for how to do base64, 
foldedLines, etc.


Updated proposal attached.

________________________________
From: Steve Lawrence <[email protected]>
Sent: Thursday, January 4, 2018 9:28:51 AM
To: Mike Beckerle
Subject: Re: Please review & discuss - draft proposal for how to do base64, 
foldedLines, etc.

<-- snip -->

> 2) What about options for a transform? For example, you might want to
> specify a gzip stream to do something like --best or --fast to favor
> compression size vs speed. Or what variation of base64 should be used.
> Might also used to describe how errors should be handled specific to a
> transform. For example, base64 can ignore garbage characters when
> decoding, but that might want to be a processing error in some cases.
>
> I guess this could be a single option with space separated key/value
> pairs, e.g.
>
>    daf:streamTransformOptions="base64_ignore_garbage=yes
> base64_variant=rfc1421"
>
> That's very extensible, but might not be consistent with the rest of
> DFDL. Maybe we need specific options for each stream transform, e.g.
>
>    daf:streamTransformBase64IgnoreGarbage="yes"
>    daf:streamTransformBase64Variant="rfc1421"
>    ..
>
> MikeB: My suggestion would be to make these parameters part of the algorithm
> name for now. E.g.,
> daf:streamTransform="base64Best" or 
> daf:streamTransform="base64_ignore_garbage".
>
> We're going to need a way to specify many of these stream transforms. 
> Specifying
> gzip with options
> and naming it something new better not be very hard. So perhaps that is good
> enough for now.
>

My only (minor) concern with this is that if something had multiple
options, the combinations of names could expand quickly. But probably
not worth worrying about until that actually happens--it may not be an
issue in practice.

Everything else above sounds good.

Re: Please review & discuss - draft proposal for how to do base64, foldedLines, etc.

Reply via email to