Thanks!

On Thu, Nov 30, 2017 at 11:25 AM, Holden Karau <[email protected]> wrote:

> Rocking, I'll start leaving some comments on this. I'm excited to see work
> being done in this area as well :)
>
> On Thu, Nov 30, 2017 at 9:20 AM, Tyler Akidau <[email protected]> wrote:
>
>> On Wed, Nov 29, 2017 at 6:38 PM Reuven Lax <[email protected]> wrote:
>>
>>> There has been a lot of conversation about schemas on PCollections
>>> recently. There are a number of reasons for this. Schemas as first-class
>>> objects in Beam provide a nice base for building BeamSQL. Spark has
>>> provided schema-support via Dataframes for over two years, and it has
>>> proved to be very popular among Spark users; it turns out that FlumeJava -
>>> the original inspiration for the Beam API - has had schema support for even
>>> longer, though this feature was not included in the Beam (at that time
>>> Dataflow) API. It turns out that most records have structure, and allowing
>>> the system to understand record structure can both simplify usage of the
>>> system and allow for new performance optimizations.
>>>
>>> After discussion with JB, Eugene, Kenn, Robert, and a number of others
>>> on the list, I've started a proposal document here
>>> <https://docs.google.com/document/d/1tnG2DPHZYbsomvihIpXruUmQ12pHGK0QIvXS1FOTgRc/edit?usp=sharing>
>>> describing how schemas can be added to Beam in a manner that integrates
>>> with the existing Beam API. The goal is not blindly copy existing systems
>>> that have schemas, but rather to ensure that we get the best fit for Beam.
>>> Please comment on this proposal - as much feedback as possible is valuable.
>>>
>>> In addition, you may notice this document is incomplete. While it does
>>> sketch out how schemas can fit into Beam semantically, many portions of
>>> this design remain to be fleshed out. In particular, the API signatures are
>>> only sketched at at a high level, exactly what all these APIs will look
>>> like has not yet been defined. I would welcome help from interested members
>>> of the community to define these APIs, and to make sure we're covering all
>>> relevant use cases.
>>>
>>
>> Thanks for sharing this Reuven, I'm excited to see this being discussed.
>> One global comment: all of the existing examples are in Java. It would be
>> great if we could design this with Python in mind (and how it could
>> interact cleanly with Pandas) at the same time. +Robert Bradshaw
>> <[email protected]> , +Holden Karau <[email protected]> , and +Ahmet
>> Altay <[email protected]> , all whom I've spoken with regarding this and
>> other Python things recently, just to be sure they see it. But of course
>> it'd be great if anyone working on Python could jump in.
>>
>> -Tyler
>>
>>
>>
>>>
>>> Thanks all,
>>>
>>> Reuven
>>>
>>>
>>>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
>

Reply via email to