Re: coder evolutions?

Romain Manni-Bucau Sun, 04 Feb 2018 12:15:05 -0800

Works for me. So a jira with target version = 3.

Can someone with the karma check we have a 3.0.0 in jira system please?


Le 4 févr. 2018 20:46, "Reuven Lax" <[email protected]> a écrit :

> Seems fine to me. At some point we might want to do an audit of existing
> Jira issues, because I suspect there are issues that should be targeted to
> 3.0 but are not yet tagged.
>
> On Sun, Feb 4, 2018 at 11:41 AM, Jean-Baptiste Onofré <[email protected]>
> wrote:
>
>> I would prefer to use Jira, with "wish"/"ideas", and adding Beam 3.0.0
>> version.
>>
>> WDYT ?
>>
>> Regards
>> JB
>>
>> On 02/04/2018 07:55 PM, Reuven Lax wrote:
>> > Do we have a good place to track the items for Beam 3.0, or is Jira the
>> best
>> > place? Romain has a good point - if this gets forgotten when we do Beam
>> 3.0,
>> > then we're stuck waiting around till Beam 4.0.
>> >
>> > Reuven
>> >
>> > On Sun, Feb 4, 2018 at 9:27 AM, Jean-Baptiste Onofré <[email protected]
>> > <mailto:[email protected]>> wrote:
>> >
>> >     That's a good point. In the roadmap for Beam 3, I think it makes
>> sense to add a
>> >     point about this.
>> >
>> >     Regards
>> >     JB
>> >
>> >     On 02/04/2018 06:18 PM, Eugene Kirpichov wrote:
>> >     > I think doing a change that would break pipeline update for every
>> single user of
>> >     > Flink and Dataflow needs to be postponed until a next major
>> version. Pipeline
>> >     > update is a very frequently used feature, especially by the
>> largest users. We've
>> >     > had those users get significantly upset even when we accidentally
>> broke update
>> >     > compatibility for some special cases of individual transforms;
>> breaking it
>> >     > intentionally and project-wide is too extreme to be justified by
>> the benefits of
>> >     > the current change.
>> >     >
>> >     > That said, I think concerns about coder APIs are reasonable, and
>> it is
>> >     > unfortunate that we effectively can't make changes to them right
>> now. It would
>> >     > be great if in the next major version we were better prepared for
>> evolution of
>> >     > coders, e.g. by having coders support a version marker or
>> something like that,
>> >     > with an API for detecting the version of data on wire and reading
>> or writing
>> >     > data of an old version. Such a change (introducing versioning)
>> would also, of
>> >     > course, be incompatible and would need to be postponed until a
>> major version -
>> >     > but, at least, subsequent changes wouldn't.
>> >     >
>> >     > ...And as I was typing this email, seems that this is what the
>> thread already
>> >     > came to!
>> >     >
>> >     > On Sun, Feb 4, 2018 at 9:16 AM Romain Manni-Bucau <
>> [email protected] <mailto:[email protected]>
>> >     > <mailto:[email protected] <mailto:[email protected]>>>
>> wrote:
>> >     >
>> >     >     I like this idea of migration support at coder level. It
>> would require to
>> >     >     add a metadata in all outputs which would represent the
>> version then coders
>> >     >     can handle the logic properly depending the version - we can
>> assume a coder
>> >     >     dev upgrade the version when he breaks the representation I
>> hope ;).
>> >     >     With this: no runner impact at all :).
>> >     >
>> >     >
>> >     >     Romain Manni-Bucau
>> >     >     @rmannibucau <https://twitter.com/rmannibucau
>> >     <https://twitter.com/rmannibucau>> |  Blog
>> >     >     <https://rmannibucau.metawerx.net/
>> >     <https://rmannibucau.metawerx.net/>> | Old Blog
>> >     >     <http://rmannibucau.wordpress.com <
>> http://rmannibucau.wordpress.com>>
>> >     | Github
>> >     >     <https://github.com/rmannibucau <
>> https://github.com/rmannibucau>> |
>> >     LinkedIn
>> >     >     <https://www.linkedin.com/in/rmannibucau
>> >     <https://www.linkedin.com/in/rmannibucau>> | Book
>> >     >
>> >      <https://www.packtpub.com/application-development/java-ee-
>> 8-high-performance <https://www.packtpub.com/appl
>> ication-development/java-ee-8-high-performance>>
>> >     >
>> >     >     2018-02-04 18:09 GMT+01:00 Reuven Lax <[email protected]
>> <mailto:[email protected]>
>> >     >     <mailto:[email protected] <mailto:[email protected]>>>:
>> >     >
>> >     >         It would already break quite a number of users at this
>> point.
>> >     >
>> >     >         I think what we should be doing is moving forward on the
>> snapshot/update
>> >     >         proposal. That proposal actually provides a way forward
>> when coders
>> >     >         change (it proposes a way to map an old snapshot to one
>> using the new
>> >     >         coder, so changes to coders in the future will be much
>> easier to make.
>> >     >         However much of the implementation for this will likely
>> be at the runner
>> >     >         level, not the SDK level.
>> >     >
>> >     >         Reuven
>> >     >
>> >     >         On Sun, Feb 4, 2018 at 9:04 AM, Romain Manni-Bucau
>> >     >         <[email protected] <mailto:[email protected]>
>> >     <mailto:[email protected] <mailto:[email protected]>>>
>> wrote:
>> >     >
>> >     >             I fully understand that, and this is one of the
>> reason managing to
>> >     >             solve these issues is very important and ASAP. My
>> conclusion is that
>> >     >             we must break it now to avoid to do it later when
>> usage will be way
>> >     >             more developped - I would be very happy to be wrong
>> on that point -
>> >     >             so I started this PR and this thread. We can postpone
>> it but it
>> >     >             would break later so for probably more users.
>> >     >
>> >     >
>> >     >             Romain Manni-Bucau
>> >     >             @rmannibucau <https://twitter.com/rmannibucau
>> >     <https://twitter.com/rmannibucau>> |  Blog
>> >     >             <https://rmannibucau.metawerx.net/
>> >     <https://rmannibucau.metawerx.net/>> | Old Blog
>> >     >             <http://rmannibucau.wordpress.com
>> >     <http://rmannibucau.wordpress.com>> | Github
>> >     >             <https://github.com/rmannibucau
>> >     <https://github.com/rmannibucau>> | LinkedIn
>> >     >             <https://www.linkedin.com/in/rmannibucau
>> >     <https://www.linkedin.com/in/rmannibucau>> | Book
>> >     >
>> >      <https://www.packtpub.com/application-development/java-ee-
>> 8-high-performance <https://www.packtpub.com/appl
>> ication-development/java-ee-8-high-performance>>
>> >     >
>> >     >             2018-02-04 17:49 GMT+01:00 Reuven Lax <
>> [email protected] <mailto:[email protected]>
>> >     >             <mailto:[email protected] <mailto:[email protected]>>>:
>> >     >
>> >     >                 Unfortunately several runners (at least Flink and
>> Dataflow)
>> >     >                 support in-place update of streaming pipelines as
>> a key feature,
>> >     >                 and changing coder format breaks this. This is a
>> very important
>> >     >                 feature of both runners, and we should endeavor
>> not to break them.
>> >     >
>> >     >                 In-place snapshot and update is also a top-level
>> Beam proposal
>> >     >                 that was received positively, though neither of
>> those runners
>> >     >                 yet implement the proposed interface.
>> >     >
>> >     >                 Reuven
>> >     >
>> >     >                 On Sun, Feb 4, 2018 at 8:44 AM, Romain Manni-Bucau
>> >     >                 <[email protected] <mailto:
>> [email protected]>
>> >     <mailto:[email protected] <mailto:[email protected]>>>
>> wrote:
>> >     >
>> >     >                     Sadly yes, and why the PR is actually WIP. As
>> mentionned it
>> >     >                     modifies it and requires some updates in
>> other languages and
>> >     >                     the standard_coders.yml file (I didn't find
>> how this file
>> >     >                     was generated).
>> >     >                     Since coders must be about volatile data I
>> don't think it is
>> >     >                     a big deal to change it though.
>> >     >
>> >     >
>> >     >                     Romain Manni-Bucau
>> >     >                     @rmannibucau <https://twitter.com/rmannibucau
>> >     <https://twitter.com/rmannibucau>> |  Blog
>> >     >                     <https://rmannibucau.metawerx.net/
>> >     <https://rmannibucau.metawerx.net/>> | Old Blog
>> >     >                     <http://rmannibucau.wordpress.com
>> >     <http://rmannibucau.wordpress.com>> | Github
>> >     >                     <https://github.com/rmannibucau
>> >     <https://github.com/rmannibucau>> | LinkedIn
>> >     >                     <https://www.linkedin.com/in/rmannibucau
>> >     <https://www.linkedin.com/in/rmannibucau>> | Book
>> >     >
>> >      <https://www.packtpub.com/application-development/java-ee-
>> 8-high-performance <https://www.packtpub.com/appl
>> ication-development/java-ee-8-high-performance>>
>> >     >
>> >     >                     2018-02-04 17:34 GMT+01:00 Reuven Lax <
>> [email protected] <mailto:[email protected]>
>> >     >                     <mailto:[email protected] <mailto:
>> [email protected]>>>:
>> >     >
>> >     >                         One question - does this change the
>> actual byte encoding
>> >     >                         of elements? We've tried hard not to do
>> that so far for
>> >     >                         reasons of compatibility.
>> >     >
>> >     >                         Reuven
>> >     >
>> >     >                         On Sun, Feb 4, 2018 at 6:44 AM, Romain
>> Manni-Bucau
>> >     >                         <[email protected]
>> >     <mailto:[email protected]> <mailto:[email protected]
>> >     <mailto:[email protected]>>>
>> >     >                         wrote:
>> >     >
>> >     >                             Hi guys,
>> >     >
>> >     >                             I submitted a PR on coders to enhance
>> 1. the user
>> >     >                             experience 2. the determinism and
>> handling of
>> >     coders.
>> >     >
>> >     >                             1. the user experience is linked to
>> what i
>> >     sent some
>> >     >                             days ago: close handling of the
>> streams from a
>> >     coder
>> >     >                             code. Long story short I add a
>> SkipCloseCoder
>> >     which
>> >     >                             can decorate a coder and just wraps
>> the stream
>> >     >                             (input or output) in flavors skipping
>> close()
>> >     calls.
>> >     >                             This avoids to do it by default
>> (which had my
>> >     >                             preference if you read the related
>> thread but not
>> >     >                             the one of everybody) but also makes
>> the usage
>> >     of a
>> >     >                             coder with this issue easy since the
>> of() of the
>> >     >                             coder just wraps itself in this
>> delagating coder.
>> >     >
>> >     >                             2. this one is more nasty and mainly
>> concerns
>> >     >                             IterableLikeCoders. These ones use
>> this kind of
>> >     >                             algorithm (keep in mind they work on
>> a list):
>> >     >
>> >     >                             writeSize()
>> >     >                             for all element e {
>> >     >                                 elementCoder.write(e)
>> >     >                             }
>> >     >                             writeMagicNumber() // this one
>> depends the size
>> >     >
>> >     >                             The decoding is symmetric so I bypass
>> it here.
>> >     >
>> >     >                             Indeed all these writes (reads) are
>> done on
>> >     the same
>> >     >                             stream. Therefore it assumes you read
>> as much
>> >     bytes
>> >     >                             than you write...which is a huge
>> assumption for a
>> >     >                             coder which should by contract assume
>> it can read
>> >     >                             the stream...as a stream (until -1).
>> >     >
>> >     >                             The idea of the fix is to change this
>> encoding to
>> >     >                             this kind of algorithm:
>> >     >
>> >     >                             writeSize()
>> >     >                             for all element e {
>> >     >                                 writeElementByteCount(e)
>> >     >                                 elementCoder.write(e)
>> >     >                             }
>> >     >                             writeMagicNumber() // still optionally
>> >     >
>> >     >                             This way on the decode size you can
>> wrap the
>> >     stream
>> >     >                             by element to enforce the limitation
>> of the
>> >     byte count.
>> >     >
>> >     >                             Side note: this indeed enforce a
>> limitation due to
>> >     >                             java byte limitation but if you check
>> coder
>> >     code it
>> >     >                             is already here at the higher level
>> so it is not a
>> >     >                             big deal for now.
>> >     >
>> >     >                             In terms of implementation it uses a
>> >     >                             LengthAwareCoder which delegates to
>> another coder
>> >     >                             the encoding and just adds the byte
>> count
>> >     before the
>> >     >                             actual serialization. Not perfect but
>> should
>> >     be more
>> >     >                             than enough in terms of support and
>> perf for
>> >     beam if
>> >     >                             you think real pipelines (we try to
>> avoid
>> >     >                             serializations or it is done on some
>> well known
>> >     >                             points where this algo should be
>> >     enough...worse case
>> >     >                             it is not a huge overhead, mainly
>> just some memory
>> >     >                             overhead).
>> >     >
>> >     >
>> >     >                             The PR is available
>> >     >                             at https://github.com/apache/
>> beam/pull/4594
>> >     <https://github.com/apache/beam/pull/4594>. If you
>> >     >                             check you will see I put it "WIP".
>> The main reason
>> >     >                             is that it changes the encoding
>> format for
>> >     >                             containers (lists, iterable, ...) and
>> therefore
>> >     >                             breaks python/go/... tests and the
>> >     >                             standard_coders.yml definition. Some
>> help on that
>> >     >                             would be very welcomed.
>> >     >
>> >     >                             Technical side note if you
>> >     >                             wonder: UnownedInputStream doesn't
>> even allow to
>> >     >                             mark the stream so there is no real
>> fast way
>> >     to read
>> >     >                             the stream as fast as possible with
>> standard
>> >     >                             buffering strategies and to support
>> this automatic
>> >     >                             IterableCoder wrapping which is
>> implicit. In other
>> >     >                             words, if beam wants to support any
>> coder,
>> >     including
>> >     >                             the ones not requiring to write the
>> size of the
>> >     >                             output - most of the codecs - then we
>> need to
>> >     change
>> >     >                             the way it works to something like
>> that which does
>> >     >                             it for the user which doesn't know
>> its coder got
>> >     >                             wrapped.
>> >     >
>> >     >                             Hope it makes sense, if not, don't
>> hesitate to ask
>> >     >                             questions.
>> >     >
>> >     >                             Happy end of week-end.
>> >     >
>> >     >                             Romain Manni-Bucau
>> >     >                             @rmannibucau <
>> https://twitter.com/rmannibucau
>> >     <https://twitter.com/rmannibucau>> |
>> >     >                              Blog <https://rmannibucau.metawerx.
>> net/
>> >     <https://rmannibucau.metawerx.net/>> | Old Blog
>> >     >                             <http://rmannibucau.wordpress.com
>> >     <http://rmannibucau.wordpress.com>> | Github
>> >     >                             <https://github.com/rmannibucau
>> >     <https://github.com/rmannibucau>> | LinkedIn
>> >     >                             <https://www.linkedin.com/in/
>> rmannibucau
>> >     <https://www.linkedin.com/in/rmannibucau>> | Book
>> >     >
>> >      <https://www.packtpub.com/application-development/java-ee-
>> 8-high-performance <https://www.packtpub.com/appl
>> ication-development/java-ee-8-high-performance>>
>> >     >
>> >     >
>> >     >
>> >     >
>> >     >
>> >     >
>> >     >
>> >
>> >     --
>> >     Jean-Baptiste Onofré
>> >     [email protected] <mailto:[email protected]>
>> >     http://blog.nanthrax.net
>> >     Talend - http://www.talend.com
>> >
>> >
>>
>> --
>> Jean-Baptiste Onofré
>> [email protected]
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>
>

Re: coder evolutions?

Reply via email to