I don't have enough info to comment on whether Tuples are the right answer - but the user problem here is real.
There's a fundamental question I had as a new Beam user which was "how do I get my data from one ParDo to the next?" This is *really key* - without it, doing basic pipelines is not possible, so there should hopefully be something very simple for users. This is also an area where advanced users with lots of knowledge (aka, people reading this list) have a lot of knowledge they can use to decide the exact correct solution to their problem, but for beginning users learning beam, they just want to know how to do this seemingly simple task - if the answer is "here, read lots of documentations about coders", we're giving users an intimidating first user experience that will likely block their first pipeline creation experience. Having *something* that's a simple answer would be helpful. What I've seen from the docs don't seem to make it clear. The Beam docs don't talk about it at all (yet!), and looking at the old the dataflow docs, from what I can see, it forces the user to go through several jumps of understanding/read docs in different areas. For AutoValue - do we have clear guidance/code labs/examples showing users how to use AutoValue and what coder to use with AutoValue? There's a real trade-off there since it involves users learning several concepts vs Tuples, which it sounds like most folks trying to do data processing would be familiar with from other tools. Like I said - I'm not speaking up for or against Tuples, but Beam should have an answer. If we did have a built-in Tuple, I would think it would be good for it to have a robust coder already in the coder registry. Robert - can you speak to what exactly the Tuple tradeoffs are, and why it wouldn't be appropriate for beam to at least push users towards one? I'd like to hear more about that. S On Tue, Dec 13, 2016 at 10:03 AM Robert Bradshaw <rober...@google.com.invalid> wrote: > On Tue, Dec 13, 2016 at 9:02 AM, Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > > Hi Robert, > > > > Agree, however which one the user would use ? Create his own one ? > > Whichever suits their needs best, which could include his or her own. > > > Today, I think Beam is heavily flexible in term of data format (which is > > great), but the trade off is that the end-users have to write lot of > > boilerplate code (just to convert from one type to another). > > > > So, basically, the purpose of a Beam Tuple is to have something provided > out > > of box: if the user wants to use another tuple, that's fine. > > Generally speaking, the discussion about data format extension is about > to > > simplify the way for users to manipulate popular data formats. > > If I understand correctly, the proposal is to pick (or write) a Tuple > API and bless it by shipping it with the SDK along with beam-specific > helper code. I'd be helpful to see concretely how large of a savings > this would be to a user, and whether that's worth the cost. > > > On 12/13/2016 05:56 PM, Robert Bradshaw wrote: > >> > >> The Java language isn't very amenable to Tuple APIs as there are several > >> (mutually exclusive?) tradeoffs that must be made, each with their pros > >> and > >> cons. What advantage is there of Beam providing its own tuple API vs. > >> letting users pick whatever tuple library they want and using that with > >> Beam? > >> > >> (I suppose we're already using and encouraging AutoValue which covers a > >> lot > >> of tuple cases.) > >> > >> On Tue, Dec 13, 2016 at 8:20 AM, Aparup Banerjee (apbanerj) < > >> apban...@cisco.com> wrote: > >> > >>> We have created one. An untagged Tuple. Will be happy to contribute it > to > >>> the community > >>> > >>> Aparup > >>> > >>>> On Dec 13, 2016, at 5:11 AM, Amit <amitsel...@gmail.com> wrote: > >>>> > >>>> I'll add that I know of Beam's PTuple, but my question is about much > >>>> simpler Tuples, untagged. > >>>> > >>>> On Tue, Dec 13, 2016 at 1:56 PM Jean-Baptiste Onofré <j...@nanthrax.net > > > >>>> wrote: > >>>> > >>>>> Hi Amit, > >>>>> > >>>>> as discussed together, I think a Tuple abstraction would be good in > the > >>>>> SDK (more than in the data format extension). > >>>>> > >>>>> Regards > >>>>> JB > >>>>> > >>>>>> On 12/13/2016 11:06 AM, Amit Sela wrote: > >>>>>> Hi all, > >>>>>> > >>>>>> I was wondering why Beam doesn't have tuples as part of the SDK ? > >>>>>> To the best of my knowledge all currently supported (OSS) runners: > >>> > >>> Spark, > >>>>>> > >>>>>> Flink, Apex provide a Tuple abstraction and I was wondering if Beam > >>>>> > >>>>> should > >>>>>> > >>>>>> too ? > >>>>>> > >>>>>> Consider KV for example; it is a special ("*keyed*" by the first > >>>>>> field) > >>>>>> implementation Tuple2. > >>>>>> While KV's importance is far more than being a Tuple2, I'm wondering > >>>>>> if > >>>>> > >>>>> the > >>>>>> > >>>>>> SDK would benefit from a proper TupleX support ? > >>>>>> > >>>>>> Thanks, > >>>>>> Amit > >>>>>> > >>>>> > >>>>> -- > >>>>> Jean-Baptiste Onofré > >>>>> jbono...@apache.org > >>>>> http://blog.nanthrax.net > >>>>> Talend - http://www.talend.com > >>>>> > >>> > >> > > > > -- > > Jean-Baptiste Onofré > > jbono...@apache.org > > http://blog.nanthrax.net > > Talend - http://www.talend.com >