Well guess it was a wording issue more than anything else.

That said it is not true for all runners so can still need some more love
later but i dont have a solution yet for it. Just wondered if a better way
to solve it was here already.

Le 30 janv. 2018 22:36, "Reuven Lax" <re...@google.com> a écrit :

> The point isn't runner constraints, the point is that runners might
> decisions to break fusion at unexpected points (e.g. the decision might be
> made because the runner has profile data about previous runs of the
> pipeline, and knows it should break it at that point). The SDK has no good
> way of knowing what those decisions will be, so needs to conservatively
> assume it could happen anywhere.
>
> On Tue, Jan 30, 2018 at 1:31 PM, Romain Manni-Bucau <rmannibu...@gmail.com
> > wrote:
>
>> Hmm starts to smell like the old question "how to enforce runner
>> constraints without enforcing too much" :(.
>>
>> Anyway, that is enough for me for this topic.
>> Thanks for the clarification and reminders guys.
>>
>> Le 30 janv. 2018 22:29, "Reuven Lax" <re...@google.com> a écrit :
>>
>>> Where the split points are depends on the runner. Runners are free to
>>> split at any point (and often do to prevent cycles from appearing in the
>>> graph).
>>>
>>> On Tue, Jan 30, 2018 at 1:27 PM, Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
>>>> I kind of agree on all of that and brings me to the interesting point
>>>> of that topic: why coders are that enforced if not used most of the time -
>>>> flat processor chain to caricature it?
>>>>
>>>> Shouldnt it be relaxed a bit and just enforced at split or shuffle
>>>> points?
>>>>
>>>>
>>>> Le 30 janv. 2018 22:09, "Ben Chambers" <bchamb...@apache.org> a écrit :
>>>>
>>>>> It sounds like in your specific case you're saying that the same
>>>>> encoding can be viewed by the Java type system two different ways. For
>>>>> instance, if you have an object Person that is convertible to JSON using
>>>>> Jackson, than that JSON encoding can be viewed as either a Person or a
>>>>> Map<String, Object> looking at the JSON fields. In that case, there needs
>>>>> to be some kind of "view change" change transform to change the type of 
>>>>> the
>>>>> PCollection.
>>>>>
>>>>> I'm not sure an untyped API would be better here. Requiring the "view
>>>>> change" be explicit means we can ensure the types are compatible, and also
>>>>> makes it very clear when this kind of change is desired.
>>>>>
>>>>> Some background on Coders that may be relevant:
>>>>>
>>>>> It might help to to think about Coders as the specification of how
>>>>> elements in a PCollection are encoded if/when the runner needs to. If you
>>>>> are trying to read JSON or XML records from a source, that is part of the
>>>>> source transform (reading JSON or XML records) and not part of the
>>>>> collection produced by the transform.
>>>>>
>>>>> Consider further -- even if you read XML records from a source, you
>>>>> likely *wouldn't* want to use an XML Coder for those records within the
>>>>> pipeline, as every time the pipeline needed to serialize them you would
>>>>> produce much larger amounts of data (XML is not an efficient/compact
>>>>> encoding). Instead, you likely want to read XML records from the source 
>>>>> and
>>>>> then encode those within the pipeline using something more efficient. Then
>>>>> convert them to something more readable but possibly less-efficient before
>>>>> they exit the pipeline at a sink.
>>>>>
>>>>> On Tue, Jan 30, 2018 at 12:23 PM Kenneth Knowles <k...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Ah, this is a point that Robert brings up quite often: one reason we
>>>>>> put coders on PCollections instead of doing that work in PTransforms is
>>>>>> that the runner (plus SDK harness) can automatically only serialize when
>>>>>> necessary. So the default in Beam is that the thing you want to happen is
>>>>>> already done. There are some corner cases when you get to the portability
>>>>>> framework but I am pretty sure it already works this way. If you show 
>>>>>> what
>>>>>> is a PTransform and PCollection in your example it might show where we 
>>>>>> can
>>>>>> fix things.
>>>>>>
>>>>>> On Tue, Jan 30, 2018 at 12:17 PM, Romain Manni-Bucau <
>>>>>> rmannibu...@gmail.com> wrote:
>>>>>>
>>>>>>> Indeed,
>>>>>>>
>>>>>>> I'll take a stupid example to make it shorter.
>>>>>>> I have a source emitting Person objects ({name:...,id:...})
>>>>>>> serialized with jackson as JSON.
>>>>>>> Then my pipeline processes them with a DoFn taking a Map<String,
>>>>>>> String>. Here I set the coder to read json as a map.
>>>>>>>
>>>>>>> However a Map<String, String> is not a Person so my pipeline needs
>>>>>>> an intermediate step to convert one into the other and has in the 
>>>>>>> design an
>>>>>>> useless serialization round trip.
>>>>>>>
>>>>>>> If you check the chain you have: Person -> JSON -> Map<String,
>>>>>>> String> -> JSON -> Map<String, String> whereas Person -> JSON ->
>>>>>>> Map<String, String> is fully enough cause there is equivalence of JSON 
>>>>>>> in
>>>>>>> this example.
>>>>>>>
>>>>>>> In other words if an coder output is readable from another coder
>>>>>>> input, the java strong typing doesn't know about it and can enforce some
>>>>>>> fake steps.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Romain Manni-Bucau
>>>>>>> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>>>>>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>>>>>> <http://rmannibucau.wordpress.com> | Github
>>>>>>> <https://github.com/rmannibucau> | LinkedIn
>>>>>>> <https://www.linkedin.com/in/rmannibucau>
>>>>>>>
>>>>>>> 2018-01-30 21:07 GMT+01:00 Kenneth Knowles <k...@google.com>:
>>>>>>>
>>>>>>>> I'm not sure I understand your question. Can you explain more?
>>>>>>>>
>>>>>>>> On Tue, Jan 30, 2018 at 11:50 AM, Romain Manni-Bucau <
>>>>>>>> rmannibu...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi guys,
>>>>>>>>>
>>>>>>>>> just encountered an issue with the pipeline API and wondered if
>>>>>>>>> you thought about it.
>>>>>>>>>
>>>>>>>>> It can happen the Coders are compatible between them. Simple
>>>>>>>>> example is a text coder like JSON or XML will be able to read text. 
>>>>>>>>> However
>>>>>>>>> with the pipeline API you can't support this directly and
>>>>>>>>> enforce the user to use an intermediate state to be typed.
>>>>>>>>>
>>>>>>>>> Is there already a way to avoid these useless round trips?
>>>>>>>>>
>>>>>>>>> Said otherwise: how to handle coders transitivity?
>>>>>>>>>
>>>>>>>>> Romain Manni-Bucau
>>>>>>>>> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>>>>>>>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>>>>>>>> <http://rmannibucau.wordpress.com> | Github
>>>>>>>>> <https://github.com/rmannibucau> | LinkedIn
>>>>>>>>> <https://www.linkedin.com/in/rmannibucau>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>
>

Reply via email to