Nice. Commented a bit on the doc a bit. +1 to working up the Python, Go, portability implications.
Kenn On Thu, Nov 30, 2017 at 1:06 PM, Reuven Lax <[email protected]> wrote: > Thanks! > > > On Thu, Nov 30, 2017 at 11:25 AM, Holden Karau <[email protected]> > wrote: > >> Rocking, I'll start leaving some comments on this. I'm excited to see >> work being done in this area as well :) >> >> On Thu, Nov 30, 2017 at 9:20 AM, Tyler Akidau <[email protected]> wrote: >> >>> On Wed, Nov 29, 2017 at 6:38 PM Reuven Lax <[email protected]> wrote: >>> >>>> There has been a lot of conversation about schemas on PCollections >>>> recently. There are a number of reasons for this. Schemas as first-class >>>> objects in Beam provide a nice base for building BeamSQL. Spark has >>>> provided schema-support via Dataframes for over two years, and it has >>>> proved to be very popular among Spark users; it turns out that FlumeJava - >>>> the original inspiration for the Beam API - has had schema support for even >>>> longer, though this feature was not included in the Beam (at that time >>>> Dataflow) API. It turns out that most records have structure, and allowing >>>> the system to understand record structure can both simplify usage of the >>>> system and allow for new performance optimizations. >>>> >>>> After discussion with JB, Eugene, Kenn, Robert, and a number of others >>>> on the list, I've started a proposal document here >>>> <https://docs.google.com/document/d/1tnG2DPHZYbsomvihIpXruUmQ12pHGK0QIvXS1FOTgRc/edit?usp=sharing> >>>> describing how schemas can be added to Beam in a manner that integrates >>>> with the existing Beam API. The goal is not blindly copy existing systems >>>> that have schemas, but rather to ensure that we get the best fit for Beam. >>>> Please comment on this proposal - as much feedback as possible is valuable. >>>> >>>> In addition, you may notice this document is incomplete. While it does >>>> sketch out how schemas can fit into Beam semantically, many portions of >>>> this design remain to be fleshed out. In particular, the API signatures are >>>> only sketched at at a high level, exactly what all these APIs will look >>>> like has not yet been defined. I would welcome help from interested members >>>> of the community to define these APIs, and to make sure we're covering all >>>> relevant use cases. >>>> >>> >>> Thanks for sharing this Reuven, I'm excited to see this being discussed. >>> One global comment: all of the existing examples are in Java. It would be >>> great if we could design this with Python in mind (and how it could >>> interact cleanly with Pandas) at the same time. +Robert Bradshaw >>> <[email protected]> , +Holden Karau <[email protected]> , and +Ahmet >>> Altay <[email protected]> , all whom I've spoken with regarding this and >>> other Python things recently, just to be sure they see it. But of course >>> it'd be great if anyone working on Python could jump in. >>> >>> -Tyler >>> >>> >>> >>>> >>>> Thanks all, >>>> >>>> Reuven >>>> >>>> >>>> >> >> >> -- >> Twitter: https://twitter.com/holdenkarau >> > >
