Re: Beam SQL Improvements

2018-05-22 Thread Romain Manni-Bucau
Why not extending ProcessContext to add the new remapped output? But looks
good (the part i dont like is that creating a new context each time a new
feature is added is hurting users. What when beam will add some reactive
support? ReactiveOutputReceiver?)

Pipeline sounds the wrong storage since once distributed you serialized the
instances so kind of broke the lifecycle of the original instance and have
no real release/close hook on them anymore right? Not sure we can do better
than dofn/source embedded instances today.




Le mer. 23 mai 2018 08:02, Romain Manni-Bucau  a
écrit :

>
>
> Le mer. 23 mai 2018 07:55, Jean-Baptiste Onofré  a
> écrit :
>
>> Hi,
>>
>> IMHO, it would be better to have a explicit transform/IO as converter.
>>
>> It would be easier for users.
>>
>> Another option would be to use a "TypeConverter/SchemaConverter" map as
>> we do in Camel: Beam could check the source/destination "type" and check
>> in the map if there's a converter available. This map can be store as
>> part of the pipeline (as we do for filesystem registration).
>>
>
>
> It works in camel because it is not strongly typed, isnt it? So can
> require a beam new pipeline api.
>
> +1 for the explicit transform, if added to the pipeline api as coder it
> wouldnt break the fluent api:
>
> p.apply(io).setOutputType(Foo.class)
>
> Coders can be a workaround since they owns the type but since the
> pcollection is the real owner it is surely saner this way, no?
>
> Also it needs to ensure all converters are present before running the
> pipeline probably, no implicit environment converter support is probably
> good to start to avoid late surprises.
>
>
>
>> My $0.01
>>
>> Regards
>> JB
>>
>> On 23/05/2018 07:51, Romain Manni-Bucau wrote:
>> > How does it work on the pipeline side?
>> > Do you generate these "virtual" IO at build time to enable the fluent
>> > API to work not erasing generics?
>> >
>> > ex: SQL(row)->BigQuery(native) will not compile so we need a
>> > SQL(row)->BigQuery(row)
>> >
>> > Side note unrelated to Row: if you add another registry maybe a pretask
>> > is to ensure beam has a kind of singleton/context to avoid to duplicate
>> > it or not track it properly. These kind of converters will need a global
>> > close and not only per record in general:
>> > converter.init();converter.convert(row);converter.destroy();,
>> > otherwise it easily leaks. This is why it can require some way to not
>> > recreate it. A quick fix, if you are in bytebuddy already, can be to add
>> > it to setup/teardown pby, being more global would be nicer but is more
>> > challenging.
>> >
>> > Romain Manni-Bucau
>> > @rmannibucau  |  Blog
>> >  | Old Blog
>> >  | Github
>> >  | LinkedIn
>> >  | Book
>> > <
>> https://www.packtpub.com/application-development/java-ee-8-high-performance
>> >
>> >
>> >
>> > Le mer. 23 mai 2018 à 07:22, Reuven Lax > > > a écrit :
>> >
>> > No - the only modules we need to add to core are the ones we choose
>> > to add. For example, I will probably add a registration for
>> > TableRow/TableSchema (GCP BigQuery) so these can work seamlessly
>> > with schemas. However I will add that to the GCP module, so only
>> > someone depending on that module need to pull in that dependency.
>> > The Java ServiceLoader framework can be used by these modules to
>> > register schemas for their types (we already do something similar
>> > for FileSystem and for coders as well).
>> >
>> > BTW, right now the conversion back and forth between Row objects I'm
>> > doing in the ByteBuddy generated bytecode that we generate in order
>> > to invoke DoFns.
>> >
>> > Reuven
>> >
>> > On Tue, May 22, 2018 at 10:04 PM Romain Manni-Bucau
>> > mailto:rmannibu...@gmail.com>> wrote:
>> >
>> > Hmm, the pluggability part is close to what I wanted to do with
>> > JsonObject as a main API (to avoid to redo a "row" API and
>> > schema API)
>> > Row.as(Class) sounds good but then, does it mean we'll get
>> > beam-sdk-java-row-jsonobject like modules (I'm not against, just
>> > trying to understand here)?
>> > If so, how an IO can use as() with the type it expects? Doesnt
>> > it lead to have a tons of  these modules at the end?
>> >
>> > Romain Manni-Bucau
>> > @rmannibucau  |  Blog
>> >  | Old Blog
>> >  | Github
>> >  | LinkedIn
>> >  | Book
>> > <
>> https://www.packtpub.com/application-development/java-ee-8-high-performance
>> >
>> >
>> >
>> > Le mer. 23 mai 2018 à 04:57, Reuven Lax > >   

Re: Beam SQL Improvements

2018-05-22 Thread Reuven Lax
Yeah, all schemas are verified when the pipeline is construct (before
anything starts running). BTW - under the covers schemas are implemented as
a special type of coder, and coders are always set on a PCollection.

I'm happy to add explicit conversion transforms as well for Beam users,
though as I mentioned generic transforms and frameworks like SQL will
probably not find it convenient to use them.


On Tue, May 22, 2018 at 11:02 PM Romain Manni-Bucau 
wrote:

>
>
> Le mer. 23 mai 2018 07:55, Jean-Baptiste Onofré  a
> écrit :
>
>> Hi,
>>
>> IMHO, it would be better to have a explicit transform/IO as converter.
>>
>> It would be easier for users.
>>
>> Another option would be to use a "TypeConverter/SchemaConverter" map as
>> we do in Camel: Beam could check the source/destination "type" and check
>> in the map if there's a converter available. This map can be store as
>> part of the pipeline (as we do for filesystem registration).
>>
>
>
> It works in camel because it is not strongly typed, isnt it? So can
> require a beam new pipeline api.
>
> +1 for the explicit transform, if added to the pipeline api as coder it
> wouldnt break the fluent api:
>
> p.apply(io).setOutputType(Foo.class)
>
> Coders can be a workaround since they owns the type but since the
> pcollection is the real owner it is surely saner this way, no?
>
> Also it needs to ensure all converters are present before running the
> pipeline probably, no implicit environment converter support is probably
> good to start to avoid late surprises.
>
>
>
>> My $0.01
>>
>> Regards
>> JB
>>
>> On 23/05/2018 07:51, Romain Manni-Bucau wrote:
>> > How does it work on the pipeline side?
>> > Do you generate these "virtual" IO at build time to enable the fluent
>> > API to work not erasing generics?
>> >
>> > ex: SQL(row)->BigQuery(native) will not compile so we need a
>> > SQL(row)->BigQuery(row)
>> >
>> > Side note unrelated to Row: if you add another registry maybe a pretask
>> > is to ensure beam has a kind of singleton/context to avoid to duplicate
>> > it or not track it properly. These kind of converters will need a global
>> > close and not only per record in general:
>> > converter.init();converter.convert(row);converter.destroy();,
>> > otherwise it easily leaks. This is why it can require some way to not
>> > recreate it. A quick fix, if you are in bytebuddy already, can be to add
>> > it to setup/teardown pby, being more global would be nicer but is more
>> > challenging.
>> >
>> > Romain Manni-Bucau
>> > @rmannibucau  |  Blog
>> >  | Old Blog
>> >  | Github
>> >  | LinkedIn
>> >  | Book
>> > <
>> https://www.packtpub.com/application-development/java-ee-8-high-performance
>> >
>> >
>> >
>> > Le mer. 23 mai 2018 à 07:22, Reuven Lax > > > a écrit :
>> >
>> > No - the only modules we need to add to core are the ones we choose
>> > to add. For example, I will probably add a registration for
>> > TableRow/TableSchema (GCP BigQuery) so these can work seamlessly
>> > with schemas. However I will add that to the GCP module, so only
>> > someone depending on that module need to pull in that dependency.
>> > The Java ServiceLoader framework can be used by these modules to
>> > register schemas for their types (we already do something similar
>> > for FileSystem and for coders as well).
>> >
>> > BTW, right now the conversion back and forth between Row objects I'm
>> > doing in the ByteBuddy generated bytecode that we generate in order
>> > to invoke DoFns.
>> >
>> > Reuven
>> >
>> > On Tue, May 22, 2018 at 10:04 PM Romain Manni-Bucau
>> > mailto:rmannibu...@gmail.com>> wrote:
>> >
>> > Hmm, the pluggability part is close to what I wanted to do with
>> > JsonObject as a main API (to avoid to redo a "row" API and
>> > schema API)
>> > Row.as(Class) sounds good but then, does it mean we'll get
>> > beam-sdk-java-row-jsonobject like modules (I'm not against, just
>> > trying to understand here)?
>> > If so, how an IO can use as() with the type it expects? Doesnt
>> > it lead to have a tons of  these modules at the end?
>> >
>> > Romain Manni-Bucau
>> > @rmannibucau  |  Blog
>> >  | Old Blog
>> >  | Github
>> >  | LinkedIn
>> >  | Book
>> > <
>> https://www.packtpub.com/application-development/java-ee-8-high-performance
>> >
>> >
>> >
>> > Le mer. 23 mai 2018 à 04:57, Reuven Lax > > > a écrit :
>> >
>> > By the way Romain, if you have specific scenarios in mind 

Re: Beam SQL Improvements

2018-05-22 Thread Romain Manni-Bucau
Le mer. 23 mai 2018 07:55, Jean-Baptiste Onofré  a écrit :

> Hi,
>
> IMHO, it would be better to have a explicit transform/IO as converter.
>
> It would be easier for users.
>
> Another option would be to use a "TypeConverter/SchemaConverter" map as
> we do in Camel: Beam could check the source/destination "type" and check
> in the map if there's a converter available. This map can be store as
> part of the pipeline (as we do for filesystem registration).
>


It works in camel because it is not strongly typed, isnt it? So can require
a beam new pipeline api.

+1 for the explicit transform, if added to the pipeline api as coder it
wouldnt break the fluent api:

p.apply(io).setOutputType(Foo.class)

Coders can be a workaround since they owns the type but since the
pcollection is the real owner it is surely saner this way, no?

Also it needs to ensure all converters are present before running the
pipeline probably, no implicit environment converter support is probably
good to start to avoid late surprises.



> My $0.01
>
> Regards
> JB
>
> On 23/05/2018 07:51, Romain Manni-Bucau wrote:
> > How does it work on the pipeline side?
> > Do you generate these "virtual" IO at build time to enable the fluent
> > API to work not erasing generics?
> >
> > ex: SQL(row)->BigQuery(native) will not compile so we need a
> > SQL(row)->BigQuery(row)
> >
> > Side note unrelated to Row: if you add another registry maybe a pretask
> > is to ensure beam has a kind of singleton/context to avoid to duplicate
> > it or not track it properly. These kind of converters will need a global
> > close and not only per record in general:
> > converter.init();converter.convert(row);converter.destroy();,
> > otherwise it easily leaks. This is why it can require some way to not
> > recreate it. A quick fix, if you are in bytebuddy already, can be to add
> > it to setup/teardown pby, being more global would be nicer but is more
> > challenging.
> >
> > Romain Manni-Bucau
> > @rmannibucau  |  Blog
> >  | Old Blog
> >  | Github
> >  | LinkedIn
> >  | Book
> > <
> https://www.packtpub.com/application-development/java-ee-8-high-performance
> >
> >
> >
> > Le mer. 23 mai 2018 à 07:22, Reuven Lax  > > a écrit :
> >
> > No - the only modules we need to add to core are the ones we choose
> > to add. For example, I will probably add a registration for
> > TableRow/TableSchema (GCP BigQuery) so these can work seamlessly
> > with schemas. However I will add that to the GCP module, so only
> > someone depending on that module need to pull in that dependency.
> > The Java ServiceLoader framework can be used by these modules to
> > register schemas for their types (we already do something similar
> > for FileSystem and for coders as well).
> >
> > BTW, right now the conversion back and forth between Row objects I'm
> > doing in the ByteBuddy generated bytecode that we generate in order
> > to invoke DoFns.
> >
> > Reuven
> >
> > On Tue, May 22, 2018 at 10:04 PM Romain Manni-Bucau
> > mailto:rmannibu...@gmail.com>> wrote:
> >
> > Hmm, the pluggability part is close to what I wanted to do with
> > JsonObject as a main API (to avoid to redo a "row" API and
> > schema API)
> > Row.as(Class) sounds good but then, does it mean we'll get
> > beam-sdk-java-row-jsonobject like modules (I'm not against, just
> > trying to understand here)?
> > If so, how an IO can use as() with the type it expects? Doesnt
> > it lead to have a tons of  these modules at the end?
> >
> > Romain Manni-Bucau
> > @rmannibucau  |  Blog
> >  | Old Blog
> >  | Github
> >  | LinkedIn
> >  | Book
> > <
> https://www.packtpub.com/application-development/java-ee-8-high-performance
> >
> >
> >
> > Le mer. 23 mai 2018 à 04:57, Reuven Lax  > > a écrit :
> >
> > By the way Romain, if you have specific scenarios in mind I
> > would love to hear them. I can try and guess what exactly
> > you would like to get out of schemas, but it would work
> > better if you gave me concrete scenarios that you would like
> > to work.
> >
> > Reuven
> >
> > On Tue, May 22, 2018 at 7:45 PM Reuven Lax  > > wrote:
> >
> > Yeah, what I'm working on will help with IO. Basically
> > if you register a function with SchemaRegistry that
> > converts back and forth between a 

Re: Beam SQL Improvements

2018-05-22 Thread Reuven Lax
Sure - we can definitely add explicit conversion transforms. The automatic
transform is useful for generic transforms and frameworks (such as SQL)
that want to be able to take in a PCollection and operate on it. However if
users using Schema directly find it easier to have explicit transforms to
do conversion, there's no reason not to add them.

On Tue, May 22, 2018 at 10:55 PM Jean-Baptiste Onofré 
wrote:

> Hi,
>
> IMHO, it would be better to have a explicit transform/IO as converter.
>
> It would be easier for users.
>
> Another option would be to use a "TypeConverter/SchemaConverter" map as
> we do in Camel: Beam could check the source/destination "type" and check
> in the map if there's a converter available. This map can be store as
> part of the pipeline (as we do for filesystem registration).
>
> My $0.01
>
> Regards
> JB
>
> On 23/05/2018 07:51, Romain Manni-Bucau wrote:
> > How does it work on the pipeline side?
> > Do you generate these "virtual" IO at build time to enable the fluent
> > API to work not erasing generics?
> >
> > ex: SQL(row)->BigQuery(native) will not compile so we need a
> > SQL(row)->BigQuery(row)
> >
> > Side note unrelated to Row: if you add another registry maybe a pretask
> > is to ensure beam has a kind of singleton/context to avoid to duplicate
> > it or not track it properly. These kind of converters will need a global
> > close and not only per record in general:
> > converter.init();converter.convert(row);converter.destroy();,
> > otherwise it easily leaks. This is why it can require some way to not
> > recreate it. A quick fix, if you are in bytebuddy already, can be to add
> > it to setup/teardown pby, being more global would be nicer but is more
> > challenging.
> >
> > Romain Manni-Bucau
> > @rmannibucau  |  Blog
> >  | Old Blog
> >  | Github
> >  | LinkedIn
> >  | Book
> > <
> https://www.packtpub.com/application-development/java-ee-8-high-performance
> >
> >
> >
> > Le mer. 23 mai 2018 à 07:22, Reuven Lax  > > a écrit :
> >
> > No - the only modules we need to add to core are the ones we choose
> > to add. For example, I will probably add a registration for
> > TableRow/TableSchema (GCP BigQuery) so these can work seamlessly
> > with schemas. However I will add that to the GCP module, so only
> > someone depending on that module need to pull in that dependency.
> > The Java ServiceLoader framework can be used by these modules to
> > register schemas for their types (we already do something similar
> > for FileSystem and for coders as well).
> >
> > BTW, right now the conversion back and forth between Row objects I'm
> > doing in the ByteBuddy generated bytecode that we generate in order
> > to invoke DoFns.
> >
> > Reuven
> >
> > On Tue, May 22, 2018 at 10:04 PM Romain Manni-Bucau
> > mailto:rmannibu...@gmail.com>> wrote:
> >
> > Hmm, the pluggability part is close to what I wanted to do with
> > JsonObject as a main API (to avoid to redo a "row" API and
> > schema API)
> > Row.as(Class) sounds good but then, does it mean we'll get
> > beam-sdk-java-row-jsonobject like modules (I'm not against, just
> > trying to understand here)?
> > If so, how an IO can use as() with the type it expects? Doesnt
> > it lead to have a tons of  these modules at the end?
> >
> > Romain Manni-Bucau
> > @rmannibucau  |  Blog
> >  | Old Blog
> >  | Github
> >  | LinkedIn
> >  | Book
> > <
> https://www.packtpub.com/application-development/java-ee-8-high-performance
> >
> >
> >
> > Le mer. 23 mai 2018 à 04:57, Reuven Lax  > > a écrit :
> >
> > By the way Romain, if you have specific scenarios in mind I
> > would love to hear them. I can try and guess what exactly
> > you would like to get out of schemas, but it would work
> > better if you gave me concrete scenarios that you would like
> > to work.
> >
> > Reuven
> >
> > On Tue, May 22, 2018 at 7:45 PM Reuven Lax  > > wrote:
> >
> > Yeah, what I'm working on will help with IO. Basically
> > if you register a function with SchemaRegistry that
> > converts back and forth between a type (say JsonObject)
> > and a Beam Row, then it is applied by the framework
> > behind the scenes as part of DoFn invocation. Concrete
> > example: let's say I

Re: Beam SQL Improvements

2018-05-22 Thread Reuven Lax
On Tue, May 22, 2018 at 10:51 PM Romain Manni-Bucau 
wrote:

> How does it work on the pipeline side?
> Do you generate these "virtual" IO at build time to enable the fluent API
> to work not erasing generics?
>

Yeah - so I've already added support for injected element parameters (I'm
going to send an email to dev and users to make sure everyone is aware of
it), and that will be in the next Beam release. Basically you can now write:

DoFn() {
  @ProcessElement public void process(InputT element,
OutputReceiver output) {
  }
}

So there's almost no need for ProcessContext anymore (I would like to
eventually support side inputs as well, at which point the only reason to
keep ProcessContext around is backwards compatibility). Since process() is
not a virtual method, the "type checking" is done at pipeline construction
time instead of compile time.


> ex: SQL(row)->BigQuery(native) will not compile so we need a
> SQL(row)->BigQuery(row)
>
> Side note unrelated to Row: if you add another registry maybe a pretask is
> to ensure beam has a kind of singleton/context to avoid to duplicate it or
> not track it properly. These kind of converters will need a global close
> and not only per record in general:
> converter.init();converter.convert(row);converter.destroy();, otherwise
> it easily leaks. This is why it can require some way to not recreate it. A
> quick fix, if you are in bytebuddy already, can be to add it to
> setup/teardown pby, being more global would be nicer but is more
> challenging.
>

Right now I'm using Pipeline as the container, so the lifetime is the life
of the Pipeline. Do you think this is the wrong lifetime?


>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>
>
> Le mer. 23 mai 2018 à 07:22, Reuven Lax  a écrit :
>
>> No - the only modules we need to add to core are the ones we choose to
>> add. For example, I will probably add a registration for
>> TableRow/TableSchema (GCP BigQuery) so these can work seamlessly with
>> schemas. However I will add that to the GCP module, so only someone
>> depending on that module need to pull in that dependency. The Java
>> ServiceLoader framework can be used by these modules to register schemas
>> for their types (we already do something similar for FileSystem and for
>> coders as well).
>>
>> BTW, right now the conversion back and forth between Row objects I'm
>> doing in the ByteBuddy generated bytecode that we generate in order to
>> invoke DoFns.
>>
>> Reuven
>>
>> On Tue, May 22, 2018 at 10:04 PM Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> Hmm, the pluggability part is close to what I wanted to do with
>>> JsonObject as a main API (to avoid to redo a "row" API and schema API)
>>> Row.as(Class) sounds good but then, does it mean we'll get
>>> beam-sdk-java-row-jsonobject like modules (I'm not against, just trying to
>>> understand here)?
>>> If so, how an IO can use as() with the type it expects? Doesnt it lead
>>> to have a tons of  these modules at the end?
>>>
>>> Romain Manni-Bucau
>>> @rmannibucau  |  Blog
>>>  | Old Blog
>>>  | Github
>>>  | LinkedIn
>>>  | Book
>>> 
>>>
>>>
>>> Le mer. 23 mai 2018 à 04:57, Reuven Lax  a écrit :
>>>
 By the way Romain, if you have specific scenarios in mind I would love
 to hear them. I can try and guess what exactly you would like to get out of
 schemas, but it would work better if you gave me concrete scenarios that
 you would like to work.

 Reuven

 On Tue, May 22, 2018 at 7:45 PM Reuven Lax  wrote:

> Yeah, what I'm working on will help with IO. Basically if you register
> a function with SchemaRegistry that converts back and forth between a type
> (say JsonObject) and a Beam Row, then it is applied by the framework 
> behind
> the scenes as part of DoFn invocation. Concrete example: let's say I have
> an IO that reads json objects
>   class MyJsonIORead extends PTransform {...}
>
> If you register a schema for this type (or you can also just set the
> schema directly on the output PCollection), then Beam knows how to convert
> back and forth between JsonObject and Row. So the next ParDo can look like
>
> p.apply(new MyJsonIORead())
> .apply(ParDo.of(new DoFn
> @ProcessElement void process(@Element Row row) {
>})
>
> And Beam will automatically convert JsonObject to a Row for processing
> (you aren't forced to do this

Re: Beam SQL Improvements

2018-05-22 Thread Jean-Baptiste Onofré
Hi,

IMHO, it would be better to have a explicit transform/IO as converter.

It would be easier for users.

Another option would be to use a "TypeConverter/SchemaConverter" map as
we do in Camel: Beam could check the source/destination "type" and check
in the map if there's a converter available. This map can be store as
part of the pipeline (as we do for filesystem registration).

My $0.01

Regards
JB

On 23/05/2018 07:51, Romain Manni-Bucau wrote:
> How does it work on the pipeline side?
> Do you generate these "virtual" IO at build time to enable the fluent
> API to work not erasing generics?
> 
> ex: SQL(row)->BigQuery(native) will not compile so we need a
> SQL(row)->BigQuery(row)
> 
> Side note unrelated to Row: if you add another registry maybe a pretask
> is to ensure beam has a kind of singleton/context to avoid to duplicate
> it or not track it properly. These kind of converters will need a global
> close and not only per record in general:
> converter.init();converter.convert(row);converter.destroy();,
> otherwise it easily leaks. This is why it can require some way to not
> recreate it. A quick fix, if you are in bytebuddy already, can be to add
> it to setup/teardown pby, being more global would be nicer but is more
> challenging.
> 
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
> 
> 
> Le mer. 23 mai 2018 à 07:22, Reuven Lax  > a écrit :
> 
> No - the only modules we need to add to core are the ones we choose
> to add. For example, I will probably add a registration for
> TableRow/TableSchema (GCP BigQuery) so these can work seamlessly
> with schemas. However I will add that to the GCP module, so only
> someone depending on that module need to pull in that dependency.
> The Java ServiceLoader framework can be used by these modules to
> register schemas for their types (we already do something similar
> for FileSystem and for coders as well).
> 
> BTW, right now the conversion back and forth between Row objects I'm
> doing in the ByteBuddy generated bytecode that we generate in order
> to invoke DoFns.
> 
> Reuven
> 
> On Tue, May 22, 2018 at 10:04 PM Romain Manni-Bucau
> mailto:rmannibu...@gmail.com>> wrote:
> 
> Hmm, the pluggability part is close to what I wanted to do with
> JsonObject as a main API (to avoid to redo a "row" API and
> schema API)
> Row.as(Class) sounds good but then, does it mean we'll get
> beam-sdk-java-row-jsonobject like modules (I'm not against, just
> trying to understand here)?
> If so, how an IO can use as() with the type it expects? Doesnt
> it lead to have a tons of  these modules at the end?
> 
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
> 
> 
> 
> Le mer. 23 mai 2018 à 04:57, Reuven Lax  > a écrit :
> 
> By the way Romain, if you have specific scenarios in mind I
> would love to hear them. I can try and guess what exactly
> you would like to get out of schemas, but it would work
> better if you gave me concrete scenarios that you would like
> to work.
> 
> Reuven
> 
> On Tue, May 22, 2018 at 7:45 PM Reuven Lax  > wrote:
> 
> Yeah, what I'm working on will help with IO. Basically
> if you register a function with SchemaRegistry that
> converts back and forth between a type (say JsonObject)
> and a Beam Row, then it is applied by the framework
> behind the scenes as part of DoFn invocation. Concrete
> example: let's say I have an IO that reads json objects
>   class MyJsonIORead extends PTransform JsonObject> {...}
> 
> If you register a schema for this type (or you can also
> just set the schema directly on the output PCollection),
> then Beam knows how to convert back and forth between
> JsonObject and Row. So the next ParDo can look like
> 
> p.apply(new MyJsonIORead())
> .apply(ParDo.of(new DoFn
>     @ProcessElement void process(@Element Row row) {
>   

Re: Beam SQL Improvements

2018-05-22 Thread Romain Manni-Bucau
How does it work on the pipeline side?
Do you generate these "virtual" IO at build time to enable the fluent API
to work not erasing generics?

ex: SQL(row)->BigQuery(native) will not compile so we need a
SQL(row)->BigQuery(row)

Side note unrelated to Row: if you add another registry maybe a pretask is
to ensure beam has a kind of singleton/context to avoid to duplicate it or
not track it properly. These kind of converters will need a global close
and not only per record in general:
converter.init();converter.convert(row);converter.destroy();, otherwise
it easily leaks. This is why it can require some way to not recreate it. A
quick fix, if you are in bytebuddy already, can be to add it to
setup/teardown pby, being more global would be nicer but is more
challenging.

Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book



Le mer. 23 mai 2018 à 07:22, Reuven Lax  a écrit :

> No - the only modules we need to add to core are the ones we choose to
> add. For example, I will probably add a registration for
> TableRow/TableSchema (GCP BigQuery) so these can work seamlessly with
> schemas. However I will add that to the GCP module, so only someone
> depending on that module need to pull in that dependency. The Java
> ServiceLoader framework can be used by these modules to register schemas
> for their types (we already do something similar for FileSystem and for
> coders as well).
>
> BTW, right now the conversion back and forth between Row objects I'm doing
> in the ByteBuddy generated bytecode that we generate in order to invoke
> DoFns.
>
> Reuven
>
> On Tue, May 22, 2018 at 10:04 PM Romain Manni-Bucau 
> wrote:
>
>> Hmm, the pluggability part is close to what I wanted to do with
>> JsonObject as a main API (to avoid to redo a "row" API and schema API)
>> Row.as(Class) sounds good but then, does it mean we'll get
>> beam-sdk-java-row-jsonobject like modules (I'm not against, just trying to
>> understand here)?
>> If so, how an IO can use as() with the type it expects? Doesnt it lead to
>> have a tons of  these modules at the end?
>>
>> Romain Manni-Bucau
>> @rmannibucau  |  Blog
>>  | Old Blog
>>  | Github
>>  | LinkedIn
>>  | Book
>> 
>>
>>
>> Le mer. 23 mai 2018 à 04:57, Reuven Lax  a écrit :
>>
>>> By the way Romain, if you have specific scenarios in mind I would love
>>> to hear them. I can try and guess what exactly you would like to get out of
>>> schemas, but it would work better if you gave me concrete scenarios that
>>> you would like to work.
>>>
>>> Reuven
>>>
>>> On Tue, May 22, 2018 at 7:45 PM Reuven Lax  wrote:
>>>
 Yeah, what I'm working on will help with IO. Basically if you register
 a function with SchemaRegistry that converts back and forth between a type
 (say JsonObject) and a Beam Row, then it is applied by the framework behind
 the scenes as part of DoFn invocation. Concrete example: let's say I have
 an IO that reads json objects
   class MyJsonIORead extends PTransform {...}

 If you register a schema for this type (or you can also just set the
 schema directly on the output PCollection), then Beam knows how to convert
 back and forth between JsonObject and Row. So the next ParDo can look like

 p.apply(new MyJsonIORead())
 .apply(ParDo.of(new DoFn
 @ProcessElement void process(@Element Row row) {
})

 And Beam will automatically convert JsonObject to a Row for processing
 (you aren't forced to do this of course - you can always ask for it as a
 JsonObject).

 The same is true for output. If you have a sink that takes in
 JsonObject but the transform before it produces Row objects (for instance -
 because the transform before it is Beam SQL), Beam can automatically
 convert Row back to JsonObject for you.

 All of this was detailed in the Schema doc I shared a few months ago.
 There was a lot of discussion on that document from various parties, and
 some of this API is a result of that discussion. This is also working in
 the branch JB and I were working on, though not yet integrated back to
 master.

 I would like to actually go further and make Row an interface and
 provide a way to automatically put a Row interface on top of any other
 object (e.g. JsonObject, Pojo, etc.) This won't change the way the user
 writes code, but instead of Beam having to copy and convert at each stage
 (e.g. from JsonObjec

Re: [VOTE] Go SDK

2018-05-22 Thread Andrew Psaltis
+1 (non-binding) Fantastic to see another language being used in this space
and the learnings that will come from bringing another language to the SDK.

On Wed, May 23, 2018 at 12:25 PM, Willy Lulciuc 
wrote:

> +1 (non-binding)
>
> Great work!
>
> On Tue, May 22, 2018 at 3:17 PM, Kenneth Knowles  wrote:
>
>> The process has to be done by an officer or member. Can you help us with
>> this, Davor?
>>
>> On Tue, May 22, 2018 at 3:14 PM Robert Bradshaw 
>> wrote:
>>
>>> On Tue, May 22, 2018 at 2:42 PM Davor Bonaci  wrote:
>>>
>>> >>* Robert mentioned that "SGA should have probably already been
>>> filed"
>>> in the previous thread. I got the impression that nothing further was
>>> needed. I'll follow up.
>>>
>>> > Please just follow: http://incubator.apache.org/ip-clearance/. Simple.
>>> Quick.
>>>
>>> +1, let's put this question behind us.
>>>
>>> > Perhaps relevant: I saw some golang license determinations as Category
>>> A
>>> fly by earlier in the week. Reuse/quote anything already available.
>>>
>>> >>* The standard Go tooling basically always pulls directly from
>>> github,
>>> so there is no real urgency here.
>>>
>>> > No urgency. That said, we'll probably want a copy of whatever GitHub is
>>> serving, to be served also by dist.apache.org (and considered as the
>>> source
>>> of truth).
>>>
>>> Yes, we should continue mirroring $(wget
>>> https://github.com/apache/beam/archive/release-${VERSION}.zip) there.
>>>
>>
>


Re: Beam SQL Improvements

2018-05-22 Thread Reuven Lax
No - the only modules we need to add to core are the ones we choose to add.
For example, I will probably add a registration for TableRow/TableSchema
(GCP BigQuery) so these can work seamlessly with schemas. However I will
add that to the GCP module, so only someone depending on that module need
to pull in that dependency. The Java ServiceLoader framework can be used by
these modules to register schemas for their types (we already do something
similar for FileSystem and for coders as well).

BTW, right now the conversion back and forth between Row objects I'm doing
in the ByteBuddy generated bytecode that we generate in order to invoke
DoFns.

Reuven

On Tue, May 22, 2018 at 10:04 PM Romain Manni-Bucau 
wrote:

> Hmm, the pluggability part is close to what I wanted to do with JsonObject
> as a main API (to avoid to redo a "row" API and schema API)
> Row.as(Class) sounds good but then, does it mean we'll get
> beam-sdk-java-row-jsonobject like modules (I'm not against, just trying to
> understand here)?
> If so, how an IO can use as() with the type it expects? Doesnt it lead to
> have a tons of  these modules at the end?
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>
>
> Le mer. 23 mai 2018 à 04:57, Reuven Lax  a écrit :
>
>> By the way Romain, if you have specific scenarios in mind I would love to
>> hear them. I can try and guess what exactly you would like to get out of
>> schemas, but it would work better if you gave me concrete scenarios that
>> you would like to work.
>>
>> Reuven
>>
>> On Tue, May 22, 2018 at 7:45 PM Reuven Lax  wrote:
>>
>>> Yeah, what I'm working on will help with IO. Basically if you register a
>>> function with SchemaRegistry that converts back and forth between a type
>>> (say JsonObject) and a Beam Row, then it is applied by the framework behind
>>> the scenes as part of DoFn invocation. Concrete example: let's say I have
>>> an IO that reads json objects
>>>   class MyJsonIORead extends PTransform {...}
>>>
>>> If you register a schema for this type (or you can also just set the
>>> schema directly on the output PCollection), then Beam knows how to convert
>>> back and forth between JsonObject and Row. So the next ParDo can look like
>>>
>>> p.apply(new MyJsonIORead())
>>> .apply(ParDo.of(new DoFn
>>> @ProcessElement void process(@Element Row row) {
>>>})
>>>
>>> And Beam will automatically convert JsonObject to a Row for processing
>>> (you aren't forced to do this of course - you can always ask for it as a
>>> JsonObject).
>>>
>>> The same is true for output. If you have a sink that takes in JsonObject
>>> but the transform before it produces Row objects (for instance - because
>>> the transform before it is Beam SQL), Beam can automatically convert Row
>>> back to JsonObject for you.
>>>
>>> All of this was detailed in the Schema doc I shared a few months ago.
>>> There was a lot of discussion on that document from various parties, and
>>> some of this API is a result of that discussion. This is also working in
>>> the branch JB and I were working on, though not yet integrated back to
>>> master.
>>>
>>> I would like to actually go further and make Row an interface and
>>> provide a way to automatically put a Row interface on top of any other
>>> object (e.g. JsonObject, Pojo, etc.) This won't change the way the user
>>> writes code, but instead of Beam having to copy and convert at each stage
>>> (e.g. from JsonObject to Row) it simply will create a Row object that uses
>>> the the JsonObject as its underlying storage.
>>>
>>> Reuven
>>>
>>> On Tue, May 22, 2018 at 11:37 AM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
 Well, beam can implement a new mapper but it doesnt help for io. Most
 of modern backends will take json directly, even javax one and it must stay
 generic.

 Then since json to pojo mapping is already done a dozen of times, not
 sure it is worth it for now.

 Le mar. 22 mai 2018 20:27, Reuven Lax  a écrit :

> We can do even better btw. Building a SchemaRegistry where automatic
> conversions can be registered between schema and Java data types. With 
> this
> the user won't even need a DoFn to do the conversion.
>
> On Tue, May 22, 2018, 10:13 AM Romain Manni-Bucau <
> rmannibu...@gmail.com> wrote:
>
>> Hi guys,
>>
>> Checked out what has been done on schema model and think it is
>> acceptable - regarding the json debate -  if
>> https://issues.apache.org/jira/browse/BEAM-4381 can be fixed.
>>
>> High level, it is about providing a mainstream and not too impacting
>> model OOTB and JSON seems the most valid option for now, at least f

Re: Beam SQL Improvements

2018-05-22 Thread Romain Manni-Bucau
Hmm, the pluggability part is close to what I wanted to do with JsonObject
as a main API (to avoid to redo a "row" API and schema API)
Row.as(Class) sounds good but then, does it mean we'll get
beam-sdk-java-row-jsonobject like modules (I'm not against, just trying to
understand here)?
If so, how an IO can use as() with the type it expects? Doesnt it lead to
have a tons of  these modules at the end?

Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book



Le mer. 23 mai 2018 à 04:57, Reuven Lax  a écrit :

> By the way Romain, if you have specific scenarios in mind I would love to
> hear them. I can try and guess what exactly you would like to get out of
> schemas, but it would work better if you gave me concrete scenarios that
> you would like to work.
>
> Reuven
>
> On Tue, May 22, 2018 at 7:45 PM Reuven Lax  wrote:
>
>> Yeah, what I'm working on will help with IO. Basically if you register a
>> function with SchemaRegistry that converts back and forth between a type
>> (say JsonObject) and a Beam Row, then it is applied by the framework behind
>> the scenes as part of DoFn invocation. Concrete example: let's say I have
>> an IO that reads json objects
>>   class MyJsonIORead extends PTransform {...}
>>
>> If you register a schema for this type (or you can also just set the
>> schema directly on the output PCollection), then Beam knows how to convert
>> back and forth between JsonObject and Row. So the next ParDo can look like
>>
>> p.apply(new MyJsonIORead())
>> .apply(ParDo.of(new DoFn
>> @ProcessElement void process(@Element Row row) {
>>})
>>
>> And Beam will automatically convert JsonObject to a Row for processing
>> (you aren't forced to do this of course - you can always ask for it as a
>> JsonObject).
>>
>> The same is true for output. If you have a sink that takes in JsonObject
>> but the transform before it produces Row objects (for instance - because
>> the transform before it is Beam SQL), Beam can automatically convert Row
>> back to JsonObject for you.
>>
>> All of this was detailed in the Schema doc I shared a few months ago.
>> There was a lot of discussion on that document from various parties, and
>> some of this API is a result of that discussion. This is also working in
>> the branch JB and I were working on, though not yet integrated back to
>> master.
>>
>> I would like to actually go further and make Row an interface and provide
>> a way to automatically put a Row interface on top of any other object (e.g.
>> JsonObject, Pojo, etc.) This won't change the way the user writes code, but
>> instead of Beam having to copy and convert at each stage (e.g. from
>> JsonObject to Row) it simply will create a Row object that uses the the
>> JsonObject as its underlying storage.
>>
>> Reuven
>>
>> On Tue, May 22, 2018 at 11:37 AM Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> Well, beam can implement a new mapper but it doesnt help for io. Most of
>>> modern backends will take json directly, even javax one and it must stay
>>> generic.
>>>
>>> Then since json to pojo mapping is already done a dozen of times, not
>>> sure it is worth it for now.
>>>
>>> Le mar. 22 mai 2018 20:27, Reuven Lax  a écrit :
>>>
 We can do even better btw. Building a SchemaRegistry where automatic
 conversions can be registered between schema and Java data types. With this
 the user won't even need a DoFn to do the conversion.

 On Tue, May 22, 2018, 10:13 AM Romain Manni-Bucau <
 rmannibu...@gmail.com> wrote:

> Hi guys,
>
> Checked out what has been done on schema model and think it is
> acceptable - regarding the json debate -  if
> https://issues.apache.org/jira/browse/BEAM-4381 can be fixed.
>
> High level, it is about providing a mainstream and not too impacting
> model OOTB and JSON seems the most valid option for now, at least for IO
> and some user transforms.
>
> Wdyt?
>
> Le ven. 27 avr. 2018 18:36, Romain Manni-Bucau 
> a écrit :
>
>>  Can give it a try end of may, sure. (holidays and work constraints
>> will make it hard before).
>>
>> Le 27 avr. 2018 18:26, "Anton Kedin"  a écrit :
>>
>>> Romain,
>>>
>>> I don't believe that JSON approach was investigated very thoroughIy.
>>> I mentioned few reasons which will make it not the best choice my 
>>> opinion,
>>> but I may be wrong. Can you put together a design doc or a prototype?
>>>
>>> Thank you,
>>> Anton
>>>
>>>
>>> On Thu, Apr 26, 2018 at 10:17 PM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>


 Le 26 avr. 2018 23:13, "Anton Kedin"  a écrit :

>

Re: Beam SQL Improvements

2018-05-22 Thread Reuven Lax
By the way Romain, if you have specific scenarios in mind I would love to
hear them. I can try and guess what exactly you would like to get out of
schemas, but it would work better if you gave me concrete scenarios that
you would like to work.

Reuven

On Tue, May 22, 2018 at 7:45 PM Reuven Lax  wrote:

> Yeah, what I'm working on will help with IO. Basically if you register a
> function with SchemaRegistry that converts back and forth between a type
> (say JsonObject) and a Beam Row, then it is applied by the framework behind
> the scenes as part of DoFn invocation. Concrete example: let's say I have
> an IO that reads json objects
>   class MyJsonIORead extends PTransform {...}
>
> If you register a schema for this type (or you can also just set the
> schema directly on the output PCollection), then Beam knows how to convert
> back and forth between JsonObject and Row. So the next ParDo can look like
>
> p.apply(new MyJsonIORead())
> .apply(ParDo.of(new DoFn
> @ProcessElement void process(@Element Row row) {
>})
>
> And Beam will automatically convert JsonObject to a Row for processing
> (you aren't forced to do this of course - you can always ask for it as a
> JsonObject).
>
> The same is true for output. If you have a sink that takes in JsonObject
> but the transform before it produces Row objects (for instance - because
> the transform before it is Beam SQL), Beam can automatically convert Row
> back to JsonObject for you.
>
> All of this was detailed in the Schema doc I shared a few months ago.
> There was a lot of discussion on that document from various parties, and
> some of this API is a result of that discussion. This is also working in
> the branch JB and I were working on, though not yet integrated back to
> master.
>
> I would like to actually go further and make Row an interface and provide
> a way to automatically put a Row interface on top of any other object (e.g.
> JsonObject, Pojo, etc.) This won't change the way the user writes code, but
> instead of Beam having to copy and convert at each stage (e.g. from
> JsonObject to Row) it simply will create a Row object that uses the the
> JsonObject as its underlying storage.
>
> Reuven
>
> On Tue, May 22, 2018 at 11:37 AM Romain Manni-Bucau 
> wrote:
>
>> Well, beam can implement a new mapper but it doesnt help for io. Most of
>> modern backends will take json directly, even javax one and it must stay
>> generic.
>>
>> Then since json to pojo mapping is already done a dozen of times, not
>> sure it is worth it for now.
>>
>> Le mar. 22 mai 2018 20:27, Reuven Lax  a écrit :
>>
>>> We can do even better btw. Building a SchemaRegistry where automatic
>>> conversions can be registered between schema and Java data types. With this
>>> the user won't even need a DoFn to do the conversion.
>>>
>>> On Tue, May 22, 2018, 10:13 AM Romain Manni-Bucau 
>>> wrote:
>>>
 Hi guys,

 Checked out what has been done on schema model and think it is
 acceptable - regarding the json debate -  if
 https://issues.apache.org/jira/browse/BEAM-4381 can be fixed.

 High level, it is about providing a mainstream and not too impacting
 model OOTB and JSON seems the most valid option for now, at least for IO
 and some user transforms.

 Wdyt?

 Le ven. 27 avr. 2018 18:36, Romain Manni-Bucau 
 a écrit :

>  Can give it a try end of may, sure. (holidays and work constraints
> will make it hard before).
>
> Le 27 avr. 2018 18:26, "Anton Kedin"  a écrit :
>
>> Romain,
>>
>> I don't believe that JSON approach was investigated very thoroughIy.
>> I mentioned few reasons which will make it not the best choice my 
>> opinion,
>> but I may be wrong. Can you put together a design doc or a prototype?
>>
>> Thank you,
>> Anton
>>
>>
>> On Thu, Apr 26, 2018 at 10:17 PM Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>>
>>>
>>> Le 26 avr. 2018 23:13, "Anton Kedin"  a écrit :
>>>
>>> BeamRecord (Row) has very little in common with JsonObject (I assume
>>> you're talking about javax.json), except maybe some similarities of the
>>> API. Few reasons why JsonObject doesn't work:
>>>
>>>- it is a Java EE API:
>>>   - Beam SDK is not limited to Java. There are probably similar
>>>   APIs for other languages but they might not necessarily carry the 
>>> same
>>>   semantics / APIs;
>>>
>>>
>>> Not a big deal I think. At least not a technical blocker.
>>>
>>>
>>>- It can change between Java versions;
>>>
>>> No, this is javaee ;).
>>>
>>>
>>>
>>>- Current Beam java implementation is an experimental feature to
>>>   identify what's needed from such API, in the end we might end up 
>>> with
>>>   something similar to JsonObject API, but likely not
>>>
>>>
>>> I dont get that point as a blo

Re: Beam SQL Improvements

2018-05-22 Thread Reuven Lax
Yeah, what I'm working on will help with IO. Basically if you register a
function with SchemaRegistry that converts back and forth between a type
(say JsonObject) and a Beam Row, then it is applied by the framework behind
the scenes as part of DoFn invocation. Concrete example: let's say I have
an IO that reads json objects
  class MyJsonIORead extends PTransform {...}

If you register a schema for this type (or you can also just set the schema
directly on the output PCollection), then Beam knows how to convert back
and forth between JsonObject and Row. So the next ParDo can look like

p.apply(new MyJsonIORead())
.apply(ParDo.of(new DoFn
@ProcessElement void process(@Element Row row) {
   })

And Beam will automatically convert JsonObject to a Row for processing (you
aren't forced to do this of course - you can always ask for it as a
JsonObject).

The same is true for output. If you have a sink that takes in JsonObject
but the transform before it produces Row objects (for instance - because
the transform before it is Beam SQL), Beam can automatically convert Row
back to JsonObject for you.

All of this was detailed in the Schema doc I shared a few months ago. There
was a lot of discussion on that document from various parties, and some of
this API is a result of that discussion. This is also working in the branch
JB and I were working on, though not yet integrated back to master.

I would like to actually go further and make Row an interface and provide a
way to automatically put a Row interface on top of any other object (e.g.
JsonObject, Pojo, etc.) This won't change the way the user writes code, but
instead of Beam having to copy and convert at each stage (e.g. from
JsonObject to Row) it simply will create a Row object that uses the the
JsonObject as its underlying storage.

Reuven

On Tue, May 22, 2018 at 11:37 AM Romain Manni-Bucau 
wrote:

> Well, beam can implement a new mapper but it doesnt help for io. Most of
> modern backends will take json directly, even javax one and it must stay
> generic.
>
> Then since json to pojo mapping is already done a dozen of times, not sure
> it is worth it for now.
>
> Le mar. 22 mai 2018 20:27, Reuven Lax  a écrit :
>
>> We can do even better btw. Building a SchemaRegistry where automatic
>> conversions can be registered between schema and Java data types. With this
>> the user won't even need a DoFn to do the conversion.
>>
>> On Tue, May 22, 2018, 10:13 AM Romain Manni-Bucau 
>> wrote:
>>
>>> Hi guys,
>>>
>>> Checked out what has been done on schema model and think it is
>>> acceptable - regarding the json debate -  if
>>> https://issues.apache.org/jira/browse/BEAM-4381 can be fixed.
>>>
>>> High level, it is about providing a mainstream and not too impacting
>>> model OOTB and JSON seems the most valid option for now, at least for IO
>>> and some user transforms.
>>>
>>> Wdyt?
>>>
>>> Le ven. 27 avr. 2018 18:36, Romain Manni-Bucau 
>>> a écrit :
>>>
  Can give it a try end of may, sure. (holidays and work constraints
 will make it hard before).

 Le 27 avr. 2018 18:26, "Anton Kedin"  a écrit :

> Romain,
>
> I don't believe that JSON approach was investigated very thoroughIy. I
> mentioned few reasons which will make it not the best choice my opinion,
> but I may be wrong. Can you put together a design doc or a prototype?
>
> Thank you,
> Anton
>
>
> On Thu, Apr 26, 2018 at 10:17 PM Romain Manni-Bucau <
> rmannibu...@gmail.com> wrote:
>
>>
>>
>> Le 26 avr. 2018 23:13, "Anton Kedin"  a écrit :
>>
>> BeamRecord (Row) has very little in common with JsonObject (I assume
>> you're talking about javax.json), except maybe some similarities of the
>> API. Few reasons why JsonObject doesn't work:
>>
>>- it is a Java EE API:
>>   - Beam SDK is not limited to Java. There are probably similar
>>   APIs for other languages but they might not necessarily carry the 
>> same
>>   semantics / APIs;
>>
>>
>> Not a big deal I think. At least not a technical blocker.
>>
>>
>>- It can change between Java versions;
>>
>> No, this is javaee ;).
>>
>>
>>
>>- Current Beam java implementation is an experimental feature to
>>   identify what's needed from such API, in the end we might end up 
>> with
>>   something similar to JsonObject API, but likely not
>>
>>
>> I dont get that point as a blocker
>>
>>
>>- ;
>>   - represents JSON, which is not an API but an object notation:
>>   - it is defined as unicode string in a certain format. If you
>>   choose to adhere to ECMA-404, then it doesn't sound like 
>> JsonObject can
>>   represent an Avro object, if I'm reading it right;
>>
>>
>> It is in the generator impl, you can impl an avrogenerator.
>>
>>
>>- doesn't define a type 

Re: Missing copyright notices for shaded packages

2018-05-22 Thread Kenneth Knowles
Did you look through all our jars or is that just a sample?

Kenn

On Tue, May 22, 2018 at 7:22 PM Davor Bonaci  wrote:

> This analysis looks correct. Great find!
>
> The recommended fix would be different. I'd suggest appending this
> sentence to the end of the LICENSE file: "A part of several convenience
> binary distributions of this software is licensed as follows", followed by
> the full license text (including its copyright, clauses and disclaimer) --
> for each such case separately. Don't edit the NOTICE file.
>
> I'd suggest keeping things simple: no per-artifact license/notice, etc.
> Just two project-wide files, but I'd suggest including it/attaching it
> "everywhere". Opinions on this part may vary, but, for me, "everywhere"
> includes every jar file.
>
> Standard disclaimers apply.
>
> Any volunteers? Thanks so much!
>
> On Tue, May 22, 2018 at 4:02 PM, Andrew Pilloud 
> wrote:
>
>> Here is what I think might be missing:
>>
>> (1) what artifacts are impacted and where are they distributed
>>
>>
>> http://central.maven.org/maven2/org/apache/beam/beam-sdks-java-core/2.4.0/beam-sdks-java-core-2.4.0.jar
>>
>> http://central.maven.org/maven2/org/apache/beam/beam-runners-direct-java/2.4.0/beam-runners-direct-java-2.4.0.jar
>>
>> http://central.maven.org/maven2/org/apache/beam/beam-sdks-java-harness/2.4.0/beam-sdks-java-harness-2.4.0.jar
>>
>> http://central.maven.org/maven2/org/apache/beam/beam-sdks-java-extensions-sql/2.4.0/beam-sdks-java-extensions-sql-2.4.0.jar
>>
>> (2) the external dependency being distributed
>>
>> beam-sdks-java-core: protobuf
>> beam-runners-direct-java: protobuf
>> beam-runners-direct-java: jsr-305
>> beam-sdks-java-extensions-sql: janino-compiler
>>
>> (3) license and/or term not adhered to
>>
>> BSD 3 Clause: Redistributions in binary form must reproduce the above
>> copyright notice, this list of conditions and the following disclaimer in
>> the documentation and/or other materials provided with the distribution.
>>
>> (4) any proposed fix
>>
>> NOTICE file in the jar.
>>
>> I am not a lawyer, this is not legal advice.
>>
>> On Tue, May 22, 2018 at 2:55 PM Davor Bonaci  wrote:
>>
>>> Thanks for the report!
>>>
>>> Could you please comment more as to: (1) what artifacts are impacted and
>>> where are they distributed, (2) the external dependency being distributed,
>>> (3) license and/or term not adhered to, and (4) any proposed fix?
>>>
>>> Any such information would be helpful in triaging the problem -- thanks
>>> so much!
>>>
>>> (If confirmed, this would be release blocking.)
>>>
>>> On Tue, May 22, 2018 at 2:37 PM, Lukasz Cwik  wrote:
>>>
 Does it have to be part of the jar or is it good enough to be part of
 the sources jar (as 2.4.0 had it part of the
 beam-parent-2.4.0-source.zip
 
 )?

 On Tue, May 22, 2018 at 11:16 AM Andrew Pilloud 
 wrote:

> I was digging around in the SQL jar trying to debug some packaging
> issues and noticed that we aren't including the copyright notices from the
> packages we are shading. I also looked at our previously released jars and
> they are the same (so this isn't a regression). Should we be including the
> copyright notice from packages we are redistributing?
>
> Andrew
>

>>>
>


Re: [VOTE] Go SDK

2018-05-22 Thread Willy Lulciuc
+1 (non-binding)

Great work!

On Tue, May 22, 2018 at 3:17 PM, Kenneth Knowles  wrote:

> The process has to be done by an officer or member. Can you help us with
> this, Davor?
>
> On Tue, May 22, 2018 at 3:14 PM Robert Bradshaw 
> wrote:
>
>> On Tue, May 22, 2018 at 2:42 PM Davor Bonaci  wrote:
>>
>> >>* Robert mentioned that "SGA should have probably already been
>> filed"
>> in the previous thread. I got the impression that nothing further was
>> needed. I'll follow up.
>>
>> > Please just follow: http://incubator.apache.org/ip-clearance/. Simple.
>> Quick.
>>
>> +1, let's put this question behind us.
>>
>> > Perhaps relevant: I saw some golang license determinations as Category A
>> fly by earlier in the week. Reuse/quote anything already available.
>>
>> >>* The standard Go tooling basically always pulls directly from
>> github,
>> so there is no real urgency here.
>>
>> > No urgency. That said, we'll probably want a copy of whatever GitHub is
>> serving, to be served also by dist.apache.org (and considered as the
>> source
>> of truth).
>>
>> Yes, we should continue mirroring $(wget
>> https://github.com/apache/beam/archive/release-${VERSION}.zip) there.
>>
>


Re: [VOTE] Go SDK

2018-05-22 Thread Davor Bonaci
Always happy to help. I'm sure JB is as well, others too!

Please draft/collect any relevant data -- thanks!

On Tue, May 22, 2018 at 3:17 PM, Kenneth Knowles  wrote:

> The process has to be done by an officer or member. Can you help us with
> this, Davor?
>
> On Tue, May 22, 2018 at 3:14 PM Robert Bradshaw 
> wrote:
>
>> On Tue, May 22, 2018 at 2:42 PM Davor Bonaci  wrote:
>>
>> >>* Robert mentioned that "SGA should have probably already been
>> filed"
>> in the previous thread. I got the impression that nothing further was
>> needed. I'll follow up.
>>
>> > Please just follow: http://incubator.apache.org/ip-clearance/. Simple.
>> Quick.
>>
>> +1, let's put this question behind us.
>>
>> > Perhaps relevant: I saw some golang license determinations as Category A
>> fly by earlier in the week. Reuse/quote anything already available.
>>
>> >>* The standard Go tooling basically always pulls directly from
>> github,
>> so there is no real urgency here.
>>
>> > No urgency. That said, we'll probably want a copy of whatever GitHub is
>> serving, to be served also by dist.apache.org (and considered as the
>> source
>> of truth).
>>
>> Yes, we should continue mirroring $(wget
>> https://github.com/apache/beam/archive/release-${VERSION}.zip) there.
>>
>


Re: Missing copyright notices for shaded packages

2018-05-22 Thread Davor Bonaci
This analysis looks correct. Great find!

The recommended fix would be different. I'd suggest appending this sentence
to the end of the LICENSE file: "A part of several convenience binary
distributions of this software is licensed as follows", followed by the
full license text (including its copyright, clauses and disclaimer) -- for
each such case separately. Don't edit the NOTICE file.

I'd suggest keeping things simple: no per-artifact license/notice, etc.
Just two project-wide files, but I'd suggest including it/attaching it
"everywhere". Opinions on this part may vary, but, for me, "everywhere"
includes every jar file.

Standard disclaimers apply.

Any volunteers? Thanks so much!

On Tue, May 22, 2018 at 4:02 PM, Andrew Pilloud  wrote:

> Here is what I think might be missing:
>
> (1) what artifacts are impacted and where are they distributed
>
> http://central.maven.org/maven2/org/apache/beam/beam-
> sdks-java-core/2.4.0/beam-sdks-java-core-2.4.0.jar
> http://central.maven.org/maven2/org/apache/beam/beam-
> runners-direct-java/2.4.0/beam-runners-direct-java-2.4.0.jar
> http://central.maven.org/maven2/org/apache/beam/beam-
> sdks-java-harness/2.4.0/beam-sdks-java-harness-2.4.0.jar
> http://central.maven.org/maven2/org/apache/beam/beam-
> sdks-java-extensions-sql/2.4.0/beam-sdks-java-extensions-sql-2.4.0.jar
>
> (2) the external dependency being distributed
>
> beam-sdks-java-core: protobuf
> beam-runners-direct-java: protobuf
> beam-runners-direct-java: jsr-305
> beam-sdks-java-extensions-sql: janino-compiler
>
> (3) license and/or term not adhered to
>
> BSD 3 Clause: Redistributions in binary form must reproduce the above
> copyright notice, this list of conditions and the following disclaimer in
> the documentation and/or other materials provided with the distribution.
>
> (4) any proposed fix
>
> NOTICE file in the jar.
>
> I am not a lawyer, this is not legal advice.
>
> On Tue, May 22, 2018 at 2:55 PM Davor Bonaci  wrote:
>
>> Thanks for the report!
>>
>> Could you please comment more as to: (1) what artifacts are impacted and
>> where are they distributed, (2) the external dependency being distributed,
>> (3) license and/or term not adhered to, and (4) any proposed fix?
>>
>> Any such information would be helpful in triaging the problem -- thanks
>> so much!
>>
>> (If confirmed, this would be release blocking.)
>>
>> On Tue, May 22, 2018 at 2:37 PM, Lukasz Cwik  wrote:
>>
>>> Does it have to be part of the jar or is it good enough to be part of
>>> the sources jar (as 2.4.0 had it part of the
>>> beam-parent-2.4.0-source.zip
>>> 
>>> )?
>>>
>>> On Tue, May 22, 2018 at 11:16 AM Andrew Pilloud 
>>> wrote:
>>>
 I was digging around in the SQL jar trying to debug some packaging
 issues and noticed that we aren't including the copyright notices from the
 packages we are shading. I also looked at our previously released jars and
 they are the same (so this isn't a regression). Should we be including the
 copyright notice from packages we are redistributing?

 Andrew

>>>
>>


Jenkins build is back to normal : beam_SeedJob #1765

2018-05-22 Thread Apache Jenkins Server
See 



Build failed in Jenkins: beam_SeedJob #1764

2018-05-22 Thread Apache Jenkins Server
See 

--
GitHub pull request #5406 of commit b92995635829066196f4ee34783a860d6d20bda7, 
no merge conflicts.
Setting status of b92995635829066196f4ee34783a860d6d20bda7 to PENDING with url 
https://builds.apache.org/job/beam_SeedJob/1764/ and message: 'Build started 
sha1 is merged.'
Using context: Jenkins: Seed Job
[EnvInject] - Loading node environment variables.
Building remotely on beam14 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/5406/*:refs/remotes/origin/pr/5406/*
 > git rev-parse refs/remotes/origin/pr/5406/merge^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/pr/5406/merge^{commit} # timeout=10
Checking out Revision 246fcc32a318ad78ad9583d40f3f39ccfcda6977 
(refs/remotes/origin/pr/5406/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 246fcc32a318ad78ad9583d40f3f39ccfcda6977
Commit message: "Merge b92995635829066196f4ee34783a860d6d20bda7 into 
15600d612e9f7dcc060fd2ad88e7f0ee2167eeb1"
First time build. Skipping changelog.
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Processing DSL script job_00_seed.groovy
Processing DSL script job_Dependency_Check.groovy
java.lang.NullPointerException: Cannot invoke method forEach() on null object
at 
org.codehaus.groovy.runtime.NullObject.invokeMethod(NullObject.java:91)
at 
org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:48)
at 
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
at 
org.codehaus.groovy.runtime.callsite.NullCallSite.call(NullCallSite.java:35)
at 
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
at 
common_job_properties.generateDependencyReport(common_job_properties.groovy:420)
at common_job_properties$generateDependencyReport$3.call(Unknown Source)
at 
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:117)
at 
job_Dependency_Check$_run_closure1.doCall(job_Dependency_Check.groovy:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at 
org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
at groovy.lang.Closure.call(Closure.java:414)
at groovy.lang.Closure.call(Closure.java:430)
at 
org.codehaus.groovy.runtime.DefaultGroovyMethods.with(DefaultGroovyMethods.java:242)
at org.codehaus.groovy.runtime.dgm$757.invoke(Unknown Source)
at 
org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoMetaMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:251)
at 
org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.call(PogoMetaMethodSite.java:71)
at 
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
at 
org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.call(PogoMetaMethodSite.java:76)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
at javaposse.jobdsl.dsl.JobParent.processItem(JobParent.groovy:104)
at sun.reflect.GeneratedMethodAccessor9123.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:210)
at 
org.codehaus.groovy

Re: Missing copyright notices for shaded packages

2018-05-22 Thread Andrew Pilloud
Here is what I think might be missing:

(1) what artifacts are impacted and where are they distributed

http://central.maven.org/maven2/org/apache/beam/beam-sdks-java-core/2.4.0/beam-sdks-java-core-2.4.0.jar
http://central.maven.org/maven2/org/apache/beam/beam-runners-direct-java/2.4.0/beam-runners-direct-java-2.4.0.jar
http://central.maven.org/maven2/org/apache/beam/beam-sdks-java-harness/2.4.0/beam-sdks-java-harness-2.4.0.jar
http://central.maven.org/maven2/org/apache/beam/beam-sdks-java-extensions-sql/2.4.0/beam-sdks-java-extensions-sql-2.4.0.jar

(2) the external dependency being distributed

beam-sdks-java-core: protobuf
beam-runners-direct-java: protobuf
beam-runners-direct-java: jsr-305
beam-sdks-java-extensions-sql: janino-compiler

(3) license and/or term not adhered to

BSD 3 Clause: Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the distribution.

(4) any proposed fix

NOTICE file in the jar.

I am not a lawyer, this is not legal advice.

On Tue, May 22, 2018 at 2:55 PM Davor Bonaci  wrote:

> Thanks for the report!
>
> Could you please comment more as to: (1) what artifacts are impacted and
> where are they distributed, (2) the external dependency being distributed,
> (3) license and/or term not adhered to, and (4) any proposed fix?
>
> Any such information would be helpful in triaging the problem -- thanks so
> much!
>
> (If confirmed, this would be release blocking.)
>
> On Tue, May 22, 2018 at 2:37 PM, Lukasz Cwik  wrote:
>
>> Does it have to be part of the jar or is it good enough to be part of the
>> sources jar (as 2.4.0 had it part of the beam-parent-2.4.0-source.zip
>> 
>> )?
>>
>> On Tue, May 22, 2018 at 11:16 AM Andrew Pilloud 
>> wrote:
>>
>>> I was digging around in the SQL jar trying to debug some packaging
>>> issues and noticed that we aren't including the copyright notices from the
>>> packages we are shading. I also looked at our previously released jars and
>>> they are the same (so this isn't a regression). Should we be including the
>>> copyright notice from packages we are redistributing?
>>>
>>> Andrew
>>>
>>
>


Build failed in Jenkins: beam_SeedJob #1763

2018-05-22 Thread Apache Jenkins Server
See 

--
GitHub pull request #5406 of commit dfa421d2d52bbaac720d5d16f69954d542e58f67, 
no merge conflicts.
Setting status of dfa421d2d52bbaac720d5d16f69954d542e58f67 to PENDING with url 
https://builds.apache.org/job/beam_SeedJob/1763/ and message: 'Build started 
sha1 is merged.'
Using context: Jenkins: Seed Job
[EnvInject] - Loading node environment variables.
Building remotely on beam14 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/5406/*:refs/remotes/origin/pr/5406/*
 > git rev-parse refs/remotes/origin/pr/5406/merge^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/pr/5406/merge^{commit} # timeout=10
Checking out Revision 173d3bf3658aefd082c523aaffe37ebf2d8207ff 
(refs/remotes/origin/pr/5406/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 173d3bf3658aefd082c523aaffe37ebf2d8207ff
Commit message: "Merge dfa421d2d52bbaac720d5d16f69954d542e58f67 into 
15600d612e9f7dcc060fd2ad88e7f0ee2167eeb1"
First time build. Skipping changelog.
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Processing DSL script job_00_seed.groovy
Processing DSL script job_Dependency_Check.groovy
ERROR: startup failed:
workspace:/.test-infra/jenkins/common_job_properties.groovy: 23: Invalid 
duplicate class definition of class common_job_properties : The source 
workspace:/.test-infra/jenkins/common_job_properties.groovy contains at least 
two definitions of the class common_job_properties.
One of the classes is an explicit generated class using the class statement, 
the other is a class generated from the script body based on the file name. 
Solutions are to change the file name or to change the class name.
 @ line 23, column 1.
   class common_job_properties {
   ^

1 error




Build failed in Jenkins: beam_SeedJob #1762

2018-05-22 Thread Apache Jenkins Server
See 

--
GitHub pull request #5406 of commit 902d5946d1445a6f9a84248bacd19bec04ba3a56, 
no merge conflicts.
Setting status of 902d5946d1445a6f9a84248bacd19bec04ba3a56 to PENDING with url 
https://builds.apache.org/job/beam_SeedJob/1762/ and message: 'Build started 
sha1 is merged.'
Using context: Jenkins: Seed Job
[EnvInject] - Loading node environment variables.
Building remotely on beam14 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/5406/*:refs/remotes/origin/pr/5406/*
 > git rev-parse refs/remotes/origin/pr/5406/merge^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/pr/5406/merge^{commit} # timeout=10
Checking out Revision 143ff8d0c54be1aa07740d114a683434ffc0f984 
(refs/remotes/origin/pr/5406/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 143ff8d0c54be1aa07740d114a683434ffc0f984
Commit message: "Merge 902d5946d1445a6f9a84248bacd19bec04ba3a56 into 
15600d612e9f7dcc060fd2ad88e7f0ee2167eeb1"
First time build. Skipping changelog.
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Processing DSL script job_00_seed.groovy
Processing DSL script job_Dependency_Check.groovy
java.lang.NullPointerException: Cannot invoke method size() on null object
at 
org.codehaus.groovy.runtime.NullObject.invokeMethod(NullObject.java:91)
at 
org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:48)
at 
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
at 
org.codehaus.groovy.runtime.callsite.NullCallSite.call(NullCallSite.java:35)
at 
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:117)
at 
common_job_properties.generateDependencyReport(common_job_properties.groovy:366)
at common_job_properties$generateDependencyReport$3.call(Unknown Source)
at 
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:117)
at 
job_Dependency_Check$_run_closure1.doCall(job_Dependency_Check.groovy:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at 
org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
at groovy.lang.Closure.call(Closure.java:414)
at groovy.lang.Closure.call(Closure.java:430)
at 
org.codehaus.groovy.runtime.DefaultGroovyMethods.with(DefaultGroovyMethods.java:242)
at org.codehaus.groovy.runtime.dgm$757.invoke(Unknown Source)
at 
org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoMetaMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:251)
at 
org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.call(PogoMetaMethodSite.java:71)
at 
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
at 
org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.call(PogoMetaMethodSite.java:76)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
at javaposse.jobdsl.dsl.JobParent.processItem(JobParent.groovy:104)
at sun.reflect.GeneratedMethodAccessor9123.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:210)
at 
org.codehaus.groovy.ru

Re: [VOTE] Go SDK

2018-05-22 Thread Kenneth Knowles
The process has to be done by an officer or member. Can you help us with
this, Davor?

On Tue, May 22, 2018 at 3:14 PM Robert Bradshaw  wrote:

> On Tue, May 22, 2018 at 2:42 PM Davor Bonaci  wrote:
>
> >>* Robert mentioned that "SGA should have probably already been filed"
> in the previous thread. I got the impression that nothing further was
> needed. I'll follow up.
>
> > Please just follow: http://incubator.apache.org/ip-clearance/. Simple.
> Quick.
>
> +1, let's put this question behind us.
>
> > Perhaps relevant: I saw some golang license determinations as Category A
> fly by earlier in the week. Reuse/quote anything already available.
>
> >>* The standard Go tooling basically always pulls directly from
> github,
> so there is no real urgency here.
>
> > No urgency. That said, we'll probably want a copy of whatever GitHub is
> serving, to be served also by dist.apache.org (and considered as the
> source
> of truth).
>
> Yes, we should continue mirroring $(wget
> https://github.com/apache/beam/archive/release-${VERSION}.zip) there.
>


Re: [VOTE] Go SDK

2018-05-22 Thread Robert Bradshaw
On Tue, May 22, 2018 at 2:42 PM Davor Bonaci  wrote:

>>* Robert mentioned that "SGA should have probably already been filed"
in the previous thread. I got the impression that nothing further was
needed. I'll follow up.

> Please just follow: http://incubator.apache.org/ip-clearance/. Simple.
Quick.

+1, let's put this question behind us.

> Perhaps relevant: I saw some golang license determinations as Category A
fly by earlier in the week. Reuse/quote anything already available.

>>* The standard Go tooling basically always pulls directly from github,
so there is no real urgency here.

> No urgency. That said, we'll probably want a copy of whatever GitHub is
serving, to be served also by dist.apache.org (and considered as the source
of truth).

Yes, we should continue mirroring $(wget
https://github.com/apache/beam/archive/release-${VERSION}.zip) there.


Re: Missing copyright notices for shaded packages

2018-05-22 Thread Davor Bonaci
Thanks for the report!

Could you please comment more as to: (1) what artifacts are impacted and
where are they distributed, (2) the external dependency being distributed,
(3) license and/or term not adhered to, and (4) any proposed fix?

Any such information would be helpful in triaging the problem -- thanks so
much!

(If confirmed, this would be release blocking.)

On Tue, May 22, 2018 at 2:37 PM, Lukasz Cwik  wrote:

> Does it have to be part of the jar or is it good enough to be part of the
> sources jar (as 2.4.0 had it part of the beam-parent-2.4.0-source.zip
> 
> )?
>
> On Tue, May 22, 2018 at 11:16 AM Andrew Pilloud 
> wrote:
>
>> I was digging around in the SQL jar trying to debug some packaging issues
>> and noticed that we aren't including the copyright notices from the
>> packages we are shading. I also looked at our previously released jars and
>> they are the same (so this isn't a regression). Should we be including the
>> copyright notice from packages we are redistributing?
>>
>> Andrew
>>
>


Re: [VOTE] Go SDK

2018-05-22 Thread Davor Bonaci
>
>   * Robert mentioned that "SGA should have probably already been filed"
> in the previous thread. I got the impression that nothing further was
> needed. I'll follow up.
>

Please just follow: http://incubator.apache.org/ip-clearance/. Simple.
Quick.

Perhaps relevant: I saw some golang license determinations as Category A
fly by earlier in the week. Reuse/quote anything already available.

  * The standard Go tooling basically always pulls directly from github, so
> there is no real urgency here.
>

No urgency. That said, we'll probably want a copy of whatever GitHub is
serving, to be served also by dist.apache.org (and considered as the source
of truth).

(Great work, again!)


Re: Missing copyright notices for shaded packages

2018-05-22 Thread Lukasz Cwik
Does it have to be part of the jar or is it good enough to be part of the
sources jar (as 2.4.0 had it part of the beam-parent-2.4.0-source.zip

)?

On Tue, May 22, 2018 at 11:16 AM Andrew Pilloud  wrote:

> I was digging around in the SQL jar trying to debug some packaging issues
> and noticed that we aren't including the copyright notices from the
> packages we are shading. I also looked at our previously released jars and
> they are the same (so this isn't a regression). Should we be including the
> copyright notice from packages we are redistributing?
>
> Andrew
>


Re: [VOTE] Go SDK

2018-05-22 Thread Robert Bradshaw
+1 (enthusiastic and binding)

Really excited to see another data point in the model with a third
language, and thank you for fleshing this out to a full SDK. Good to go
from my perspective.
On Tue, May 22, 2018 at 10:19 AM Ahmet Altay  wrote:

> +1 (binding)

> Congratulations to the team!

> On Tue, May 22, 2018 at 10:13 AM, Alan Myrvold 
wrote:

>> +1 (non-binding)
>> Nice work!

>> On Tue, May 22, 2018 at 9:18 AM Pablo Estrada  wrote:

>>> +1 (binding)
>>> Very excited to see this!

>>> On Tue, May 22, 2018 at 9:09 AM Thomas Weise  wrote:

 +1 and congrats!


 On Tue, May 22, 2018 at 8:48 AM, Rafael Fernandez 
wrote:

> +1 !

> On Tue, May 22, 2018 at 7:54 AM Lukasz Cwik  wrote:

>> +1 (binding)

>> On Tue, May 22, 2018 at 6:16 AM Robert Burke 
wrote:

>>> +1 (non-binding)

>>> I'm looking forward to helping gophers solve their big data
problems in their language of choice, and runner of choice!

>>> Next stop, a non-java portability runner?

>>> On Tue, May 22, 2018, 6:08 AM Kenneth Knowles 
wrote:

 +1 (binding)

 This is great. Feels like a phase change in the life of Apache
Beam, having three languages, with multiple portable runners on the horizon.

 Kenn

 On Tue, May 22, 2018 at 2:50 AM Ismaël Mejía 
wrote:

> +1 (binding)

> Go SDK brings new language support for a community not well
supported in
> the Big Data world the Go developers, so this is a great. Also
the fact
> that this is the first SDK integrated with the portability work
makes it an
> interesting project to learn lessons from for future languages.

> Now it is the time to start building a community around the Go
SDK this is
> the most important task now, and the only way to do it is to have
the SDK
> as an official part of Beam so +1.

> Congrats to Henning and all the other contributors for this
important
> milestone.
> On Tue, May 22, 2018 at 10:21 AM Holden Karau <
hol...@pigscanfly.ca> wrote:

> > +1 (non-binding), I've had a chance to work with the SDK and
it's pretty
> neat to see Beam add support for a language before the most of
the big data
> ecosystem.

> > On Mon, May 21, 2018 at 10:29 PM, Jean-Baptiste Onofré <
j...@nanthrax.net>
> wrote:

> >> Hi Henning,

> >> SGA has been filed for the entire project during the
incubation period.

> >> Here, we have to check if SGA/IP donation is clean for the Go
SDK.

> >> We don't have a lot to do, just checked that we are clean on
this front.

> >> Regards
> >> JB

> >> On 22/05/2018 06:42, Henning Rohde wrote:

> >>> Thanks everyone!

> >>> Davor -- regarding your two comments:
> >>> * Robert mentioned that "SGA should have probably already
been
> filed" in the previous thread. I got the impression that nothing
further
> was needed. I'll follow up.
> >>> * The standard Go tooling basically always pulls directly
from
> github, so there is no real urgency here.

> >>> Thanks,
> >>>Henning


> >>> On Mon, May 21, 2018 at 9:30 PM Jean-Baptiste Onofré <
j...@nanthrax.net
> > wrote:

> >>>  +1 (binding)

> >>>  I just want to check about SGA/IP/Headers.

> >>>  Thanks !
> >>>  Regards
> >>>  JB

> >>>  On 22/05/2018 03:02, Henning Rohde wrote:
> >>>   > Hi everyone,
> >>>   >
> >>>   > Now that the remaining issues have been resolved as
discussed,
> >>>  I'd like
> >>>   > to propose a formal vote on accepting the Go SDK into
master. The
> >>>  main
> >>>   > practical difference is that the Go SDK would be part
of the
> >>>  Apache Beam
> >>>   > release going forward.
> >>>   >
> >>>   > Highlights of the Go SDK:
> >>>   >   * Go user experience with natively-typed DoFns with
(simulated)
> >>>   > generic types
> >>>   >   * Covers most of the Beam model: ParDo, GBK, CoGBK,
Flatten,
> >>>  Combine,
> >>>   > Windowing, ..
> >>>   >   * Includes several IO connectors: Datastore,
BigQuery, PubSub,
> >>>   > extensible textio.
> >>>   >   * Supports the portability framework for both batch
and
> streaming,
> >>>   > notably the upcoming portable Flink runner
> >>>   >   * Supports a direct runner for small batch
workloads and
> testing.
> >>>   >   * Includes pre-commit tests and post-commit
integration tests.
> >>>   >
> >>>   > And last but not least
> >>>   >   *  includes contributions from several independent
users and
>>>

Re: Launching a Portable Pipeline

2018-05-22 Thread Ankur Goenka
Thank you guys for the input.

Here

 is the summary.








*Responsibility of Beam on Job ManagementBeam provide a common interface
for basic job management operations called JobService. The supported
operations can vary between runners.What is JobService?JobService is a
runner specific component which implements Beams JobService interface
defined here
.What
is the life cycle of a JobService?There are 3 scenarios 1. With ULR,
JobService is short lived and runs as long as the ULR runs. ( JobService
Lifespan ~= Job Lifespan )2. With Production runners ( Flink, Dataflow
etc), JobService can either be short lived or long lived. The choice is up
to the runner.3. With Production runners ( Flink, Dataflow etc) without
long running JobService, SDK will spin up a local JobService.JobService
state managementThe choice of state management is up to JobService
implementation. The basic requirement is that JobService should be able to
perform all the operations with the returned job handle. At the very least
it can be the job handle for the underlying runner job and JobService will
simply proxy actions to the runner using the provided job handle.A
persistent JobService is free to provide a simple string as a JobHandle. In
this case, job handle can only be used with the same job service.A
stateless not persistent JobService can provide a opaque blob containing
all the relevant information about the job. In this case the job handle can
be used with any instance of JobService with the same code.JobService code
distribution and invocation when JobService is short livedWe will give an
easy to run solution using docker. Docker will help in both executable
distribution and providing platform independent binary.We will also give an
easy setup script with a supporting document for users who do not want to
use docker on local machine.Should Flink JobService start a local cluster
for testing?Flink JobService will be capable of submitting to a remote
Flink cluster if an master url is provided else it will execute the
pipeline in an inprocess Flink invocation on the same JVM.*


On Tue, May 22, 2018 at 12:37 PM Eugene Kirpichov 
wrote:

> Thanks Ankur, I think there's consensus, so it's probably ready to share :)
>
> On Fri, May 18, 2018 at 3:00 PM Ankur Goenka  wrote:
>
>> Thanks for all the input.
>> I have summarized the discussions at the bottom of the document ( here
>> 
>> ).
>> Please feel free to provide comments.
>> Once we agree, I will publish the conclusion on the mailing list.
>>
>> On Mon, May 14, 2018 at 1:51 PM Eugene Kirpichov 
>> wrote:
>>
>>> Thanks Ankur, this document clarifies a few points and raises some very
>>> important questions. I encourage everybody with a stake in Portability to
>>> take a look and chime in.
>>>
>>> +Aljoscha Krettek  +Thomas Weise
>>>  +Henning Rohde 
>>>
>>> On Mon, May 14, 2018 at 12:34 PM Ankur Goenka  wrote:
>>>
 Updated link
 
  to
 the document as the previous link was not working for some people.


 On Fri, May 11, 2018 at 7:56 PM Ankur Goenka  wrote:

> Hi,
>
> Recent effort on portability has introduced JobService and
> ArtifactService to the beam stack along with SDK. This has open up a few
> questions around how we start a pipeline in a portable setup (with
> JobService).
> I am trying to document our approach to launching a portable pipeline
> and take binding decisions based on the discussion.
> Please review the document and provide your feedback.
>
> Thanks,
> Ankur
>



Re: Launching a Portable Pipeline

2018-05-22 Thread Eugene Kirpichov
Thanks Ankur, I think there's consensus, so it's probably ready to share :)

On Fri, May 18, 2018 at 3:00 PM Ankur Goenka  wrote:

> Thanks for all the input.
> I have summarized the discussions at the bottom of the document ( here
> 
> ).
> Please feel free to provide comments.
> Once we agree, I will publish the conclusion on the mailing list.
>
> On Mon, May 14, 2018 at 1:51 PM Eugene Kirpichov 
> wrote:
>
>> Thanks Ankur, this document clarifies a few points and raises some very
>> important questions. I encourage everybody with a stake in Portability to
>> take a look and chime in.
>>
>> +Aljoscha Krettek  +Thomas Weise
>>  +Henning Rohde 
>>
>> On Mon, May 14, 2018 at 12:34 PM Ankur Goenka  wrote:
>>
>>> Updated link
>>> 
>>>  to
>>> the document as the previous link was not working for some people.
>>>
>>>
>>> On Fri, May 11, 2018 at 7:56 PM Ankur Goenka  wrote:
>>>
 Hi,

 Recent effort on portability has introduced JobService and
 ArtifactService to the beam stack along with SDK. This has open up a few
 questions around how we start a pipeline in a portable setup (with
 JobService).
 I am trying to document our approach to launching a portable pipeline
 and take binding decisions based on the discussion.
 Please review the document and provide your feedback.

 Thanks,
 Ankur

>>>


Re: The full list of proposals / prototype documents

2018-05-22 Thread Eugene Kirpichov
Making it easier to manage indeed would be good. Could someone from PMC
please add the following documents of mine to it?

SDF related documents:
http://s.apache.org/splittable-do-fn
http://s.apache.org/sdf-via-source
http://s.apache.org/textio-sdf 
http://s.apache.org/beam-watch-transform
http://s.apache.org/beam-breaking-fusion


Non SDF related:
http://s.apache.org/context-fn
http://s.apache.org/fileio-write

A suggestion: maybe we can establish a convention to send design document
proposals to dev+desi...@beam.apache.org? Does the Apache mailing list
management software support this kind of stuff? Then they'd be quite easy
to find and filter.

On Tue, May 22, 2018 at 10:57 AM Kenneth Knowles  wrote:

> It is owned by the Beam PMC collectively. Any PMC member can add things to
> it. Ideas for making it easy to manage are welcome.
>
> Probably easier to have a markdown file somewhere with a list of docs so
> we can issue and review PRs. Not sure the web site is the right place for
> it - we have a history of porting docs to markdown but really that is high
> overhead and users/community probably don't gain from it so much. Some have
> suggested a wiki.
>
> Kenn
>
> On Tue, May 22, 2018 at 10:22 AM Scott Wegner  wrote:
>
>> Thanks for the links. Any details on that Google drive folder? Who
>> maintains it? Is it possible for any contributor to add their design doc?
>>
>> On Mon, May 21, 2018 at 8:15 AM Joseph PENG 
>> wrote:
>>
>>> Alexey,
>>>
>>> I do not know where you can find all design docs, but I know a blog that
>>> has collected some of the major design docs. Hope it helps.
>>>
>>> https://wtanaka.com/beam/design-doc
>>>
>>> https://drive.google.com/drive/folders/0B-IhJZh9Ab52OFBVZHpsNjc4eXc
>>>
>>> On Mon, May 21, 2018 at 9:28 AM Alexey Romanenko <
>>> aromanenko@gmail.com> wrote:
>>>
 Hi all,

 Is it possible to obtain somewhere a list of all proposals / prototype
 documents that have been published as a technical / design documents for
 new features? I have links to only some of them (found in mail list
 discussions by chance) but I’m not aware of others.

 If yes, could someone share it or point me out where it is located in
 case if I missed this?

 If not, don’t you think it would make sense to have such index of these
 documents? I believe it can be useful for Beam contributors since these
 proposals contain information which is absent or not so detailed on Beam
 web site documentation.

 WBR,
 Alexey
>>>
>>>


Re: Current progress on Portable runners

2018-05-22 Thread Eugene Kirpichov
Thanks all! Yeah, I'll update the Portability page with the status of this
project and other pointers this week or next (mostly out of office this
week).

On Fri, May 18, 2018 at 5:01 PM Thomas Weise  wrote:

> - Flink JobService: in review 
>
> That's TODO (above PR was merged, but it doesn't contain the Flink job
> service).
>
> Discussion about it is here:
> https://docs.google.com/document/d/1xOaEEJrMmiSHprd-WiYABegfT129qqF-idUBINjxz8s/edit?ts=5afa1238
>
> Thanks,
> Thomas
>
>
>
> On Fri, May 18, 2018 at 7:01 AM, Thomas Weise  wrote:
>
>> Most of it should probably go to
>> https://beam.apache.org/contribute/portability/
>>
>> Also for reference, here is the prototype doc: https://s.apache.org/beam-
>> portability-team-doc
>>
>> Thomas
>>
>> On Fri, May 18, 2018 at 5:35 AM, Kenneth Knowles  wrote:
>>
>>> This is awesome. Would you be up for adding a brief description at
>>> https://beam.apache.org/contribute/#works-in-progress and maybe a
>>> pointer to a gdoc with something like the contents of this email? (my
>>> reasoning is (a) keep the contribution guide concise but (b) all this
>>> detail is helpful yet (c) the detail may be ever-changing so making a
>>> separate web page is not the best format)
>>>
>>> Kenn
>>>
>>> On Thu, May 17, 2018 at 3:13 PM Eugene Kirpichov 
>>> wrote:
>>>
 Hi all,

 A little over a month ago, a large group of Beam community members has
 been working a prototype of a portable Flink runner - that is, a runner
 that can execute Beam pipelines on Flink via the Portability API
 . The prototype was developed in
 a separate branch
  and was
 successfully demonstrated at Flink Forward, where it ran Python and Go
 pipelines in a limited setting.

 Since then, a smaller group of people (Ankur Goenka, Axel Magnuson, Ben
 Sidhom and myself) have been working on productionizing the prototype to
 address its limitations and do things "the right way", preparing to reuse
 this work for developing other portable runners (e.g. Spark). This involves
 a surprising amount of work, since many important design and implementation
 concerns could be ignored for the purposes of a prototype. I wanted to give
 an update on where we stand now.

 Our immediate milestone in sight is *Run Java and Python batch
 WordCount examples against a distributed remote Flink cluster*. That
 involves a few moving parts, roughly in order of appearance:

 *Job submission:*
 - The SDK is configured to use a "portable runner", whose
 responsibility is to run the pipeline against a given JobService endpoint.
 - The portable runner converts the pipeline to a portable Pipeline proto
 - The runner finds out which artifacts it needs to stage, and staging
 them against an ArtifactStagingService
 - A Flink-specific JobService receives the Pipeline proto, performs
 some optimizations (e.g. fusion) and translates it to Flink datasets and
 functions

 *Job execution:*
 - A Flink function executes a fused chain of Beam transforms (an
 "executable stage") by converting the input and the stage to bundles and
 executing them against an SDK harness
 - The function starts the proper SDK harness, auxiliary services (e.g.
 artifact retrieval, side input handling) and wires them together
 - The function feeds the data to the harness and receives data back.

 *And here is our status of implementation for these parts:* basically,
 almost everything is either done or in review.

 *Job submission:*
 - General-purpose portable runner in the Python SDK: done
 ; Java SDK: also done
 
 - Artifact staging from the Python SDK: in review (PR
 , PR
 ); in java, it's done also
 - Flink JobService: in review
 
 - Translation from a Pipeline proto to Flink datasets and functions:
 done 
 - ArtifactStagingService implementation that stages artifacts to a
 location on a distributed filesystem: in development (design is clear)

 *Job execution:*
 - Flink function for executing via an SDK harness: done
 
 - APIs for managing lifecycle of an SDK harness: done
 
 - Specific implementation of those APIs using Docker: part done
 , part in review
 
 - ArtifactRetrievalService that retrieves artifacts from the location
 where ArtifactStagingService 

Re: [VOTE] Go SDK

2018-05-22 Thread Huygaa Batsaikhan
+1 (non-binding). Great news!

On Tue, May 22, 2018 at 11:49 AM Chamikara Jayalath 
wrote:

> +1 (non-binding). Great to know that our third SDK will be
> released/supported officially.
>
> On Tue, May 22, 2018 at 11:38 AM Eugene Kirpichov 
> wrote:
>
>> +1!
>>
>> It is particularly exciting to me that the Go support is
>> "portability-first" and does everything in the proper "portability way"
>> from the start, free of legacy non-portable runner support code.
>>
>> On Tue, May 22, 2018 at 11:32 AM Scott Wegner  wrote:
>>
>>> +1 (non-binding)
>>>
>>> Having a third language will really force us to design Beam constructs
>>> in a language-agnostic way, and achieve the goals of portability. Thanks to
>>> all that have helped reach this milestone.
>>>
>>> On Tue, May 22, 2018 at 10:19 AM Ahmet Altay  wrote:
>>>
 +1 (binding)

 Congratulations to the team!

 On Tue, May 22, 2018 at 10:13 AM, Alan Myrvold 
 wrote:

> +1 (non-binding)
> Nice work!
>
> On Tue, May 22, 2018 at 9:18 AM Pablo Estrada 
> wrote:
>
>> +1 (binding)
>> Very excited to see this!
>>
>> On Tue, May 22, 2018 at 9:09 AM Thomas Weise  wrote:
>>
>>> +1 and congrats!
>>>
>>>
>>> On Tue, May 22, 2018 at 8:48 AM, Rafael Fernandez <
>>> rfern...@google.com> wrote:
>>>
 +1 !

 On Tue, May 22, 2018 at 7:54 AM Lukasz Cwik 
 wrote:

> +1 (binding)
>
> On Tue, May 22, 2018 at 6:16 AM Robert Burke 
> wrote:
>
>> +1 (non-binding)
>>
>> I'm looking forward to helping gophers solve their big data
>> problems in their language of choice, and runner of choice!
>>
>> Next stop, a non-java portability runner?
>>
>> On Tue, May 22, 2018, 6:08 AM Kenneth Knowles 
>> wrote:
>>
>>> +1 (binding)
>>>
>>> This is great. Feels like a phase change in the life of Apache
>>> Beam, having three languages, with multiple portable runners on the 
>>> horizon.
>>>
>>> Kenn
>>>
>>> On Tue, May 22, 2018 at 2:50 AM Ismaël Mejía 
>>> wrote:
>>>
 +1 (binding)

 Go SDK brings new language support for a community not well
 supported in
 the Big Data world the Go developers, so this is a great. Also
 the fact
 that this is the first SDK integrated with the portability work
 makes it an
 interesting project to learn lessons from for future languages.

 Now it is the time to start building a community around the Go
 SDK this is
 the most important task now, and the only way to do it is to
 have the SDK
 as an official part of Beam so +1.

 Congrats to Henning and all the other contributors for this
 important
 milestone.
 On Tue, May 22, 2018 at 10:21 AM Holden Karau <
 hol...@pigscanfly.ca> wrote:

 > +1 (non-binding), I've had a chance to work with the SDK and
 it's pretty
 neat to see Beam add support for a language before the most of
 the big data
 ecosystem.

 > On Mon, May 21, 2018 at 10:29 PM, Jean-Baptiste Onofré <
 j...@nanthrax.net>
 wrote:

 >> Hi Henning,

 >> SGA has been filed for the entire project during the
 incubation period.

 >> Here, we have to check if SGA/IP donation is clean for the
 Go SDK.

 >> We don't have a lot to do, just checked that we are clean on
 this front.

 >> Regards
 >> JB

 >> On 22/05/2018 06:42, Henning Rohde wrote:

 >>> Thanks everyone!

 >>> Davor -- regarding your two comments:
 >>> * Robert mentioned that "SGA should have probably
 already been
 filed" in the previous thread. I got the impression that
 nothing further
 was needed. I'll follow up.
 >>> * The standard Go tooling basically always pulls
 directly from
 github, so there is no real urgency here.

 >>> Thanks,
 >>>Henning


 >>> On Mon, May 21, 2018 at 9:30 PM Jean-Baptiste Onofré <
 j...@nanthrax.net
 > wrote:

 >>>  +1 (binding)

 >>>  I just want to check about SGA/IP/Headers.

 >>>  Thanks !
 >>>  Regards
>

Re: [VOTE] Go SDK

2018-05-22 Thread Chamikara Jayalath
+1 (non-binding). Great to know that our third SDK will be
released/supported officially.

On Tue, May 22, 2018 at 11:38 AM Eugene Kirpichov 
wrote:

> +1!
>
> It is particularly exciting to me that the Go support is
> "portability-first" and does everything in the proper "portability way"
> from the start, free of legacy non-portable runner support code.
>
> On Tue, May 22, 2018 at 11:32 AM Scott Wegner  wrote:
>
>> +1 (non-binding)
>>
>> Having a third language will really force us to design Beam constructs in
>> a language-agnostic way, and achieve the goals of portability. Thanks to
>> all that have helped reach this milestone.
>>
>> On Tue, May 22, 2018 at 10:19 AM Ahmet Altay  wrote:
>>
>>> +1 (binding)
>>>
>>> Congratulations to the team!
>>>
>>> On Tue, May 22, 2018 at 10:13 AM, Alan Myrvold 
>>> wrote:
>>>
 +1 (non-binding)
 Nice work!

 On Tue, May 22, 2018 at 9:18 AM Pablo Estrada 
 wrote:

> +1 (binding)
> Very excited to see this!
>
> On Tue, May 22, 2018 at 9:09 AM Thomas Weise  wrote:
>
>> +1 and congrats!
>>
>>
>> On Tue, May 22, 2018 at 8:48 AM, Rafael Fernandez <
>> rfern...@google.com> wrote:
>>
>>> +1 !
>>>
>>> On Tue, May 22, 2018 at 7:54 AM Lukasz Cwik 
>>> wrote:
>>>
 +1 (binding)

 On Tue, May 22, 2018 at 6:16 AM Robert Burke 
 wrote:

> +1 (non-binding)
>
> I'm looking forward to helping gophers solve their big data
> problems in their language of choice, and runner of choice!
>
> Next stop, a non-java portability runner?
>
> On Tue, May 22, 2018, 6:08 AM Kenneth Knowles 
> wrote:
>
>> +1 (binding)
>>
>> This is great. Feels like a phase change in the life of Apache
>> Beam, having three languages, with multiple portable runners on the 
>> horizon.
>>
>> Kenn
>>
>> On Tue, May 22, 2018 at 2:50 AM Ismaël Mejía 
>> wrote:
>>
>>> +1 (binding)
>>>
>>> Go SDK brings new language support for a community not well
>>> supported in
>>> the Big Data world the Go developers, so this is a great. Also
>>> the fact
>>> that this is the first SDK integrated with the portability work
>>> makes it an
>>> interesting project to learn lessons from for future languages.
>>>
>>> Now it is the time to start building a community around the Go
>>> SDK this is
>>> the most important task now, and the only way to do it is to
>>> have the SDK
>>> as an official part of Beam so +1.
>>>
>>> Congrats to Henning and all the other contributors for this
>>> important
>>> milestone.
>>> On Tue, May 22, 2018 at 10:21 AM Holden Karau <
>>> hol...@pigscanfly.ca> wrote:
>>>
>>> > +1 (non-binding), I've had a chance to work with the SDK and
>>> it's pretty
>>> neat to see Beam add support for a language before the most of
>>> the big data
>>> ecosystem.
>>>
>>> > On Mon, May 21, 2018 at 10:29 PM, Jean-Baptiste Onofré <
>>> j...@nanthrax.net>
>>> wrote:
>>>
>>> >> Hi Henning,
>>>
>>> >> SGA has been filed for the entire project during the
>>> incubation period.
>>>
>>> >> Here, we have to check if SGA/IP donation is clean for the Go
>>> SDK.
>>>
>>> >> We don't have a lot to do, just checked that we are clean on
>>> this front.
>>>
>>> >> Regards
>>> >> JB
>>>
>>> >> On 22/05/2018 06:42, Henning Rohde wrote:
>>>
>>> >>> Thanks everyone!
>>>
>>> >>> Davor -- regarding your two comments:
>>> >>> * Robert mentioned that "SGA should have probably
>>> already been
>>> filed" in the previous thread. I got the impression that nothing
>>> further
>>> was needed. I'll follow up.
>>> >>> * The standard Go tooling basically always pulls
>>> directly from
>>> github, so there is no real urgency here.
>>>
>>> >>> Thanks,
>>> >>>Henning
>>>
>>>
>>> >>> On Mon, May 21, 2018 at 9:30 PM Jean-Baptiste Onofré <
>>> j...@nanthrax.net
>>> > wrote:
>>>
>>> >>>  +1 (binding)
>>>
>>> >>>  I just want to check about SGA/IP/Headers.
>>>
>>> >>>  Thanks !
>>> >>>  Regards
>>> >>>  JB
>>>
>>> >>>  On 22/05/2018 03:02, Henning Rohde wrote:
>>> >>>   > Hi everyone,
>>> >>>   >
>>> >>>   > Now that the remaining issues have been resolved as
>>> discusse

Re: [VOTE] Go SDK

2018-05-22 Thread Eugene Kirpichov
+1!

It is particularly exciting to me that the Go support is
"portability-first" and does everything in the proper "portability way"
from the start, free of legacy non-portable runner support code.

On Tue, May 22, 2018 at 11:32 AM Scott Wegner  wrote:

> +1 (non-binding)
>
> Having a third language will really force us to design Beam constructs in
> a language-agnostic way, and achieve the goals of portability. Thanks to
> all that have helped reach this milestone.
>
> On Tue, May 22, 2018 at 10:19 AM Ahmet Altay  wrote:
>
>> +1 (binding)
>>
>> Congratulations to the team!
>>
>> On Tue, May 22, 2018 at 10:13 AM, Alan Myrvold 
>> wrote:
>>
>>> +1 (non-binding)
>>> Nice work!
>>>
>>> On Tue, May 22, 2018 at 9:18 AM Pablo Estrada 
>>> wrote:
>>>
 +1 (binding)
 Very excited to see this!

 On Tue, May 22, 2018 at 9:09 AM Thomas Weise  wrote:

> +1 and congrats!
>
>
> On Tue, May 22, 2018 at 8:48 AM, Rafael Fernandez  > wrote:
>
>> +1 !
>>
>> On Tue, May 22, 2018 at 7:54 AM Lukasz Cwik  wrote:
>>
>>> +1 (binding)
>>>
>>> On Tue, May 22, 2018 at 6:16 AM Robert Burke 
>>> wrote:
>>>
 +1 (non-binding)

 I'm looking forward to helping gophers solve their big data
 problems in their language of choice, and runner of choice!

 Next stop, a non-java portability runner?

 On Tue, May 22, 2018, 6:08 AM Kenneth Knowles 
 wrote:

> +1 (binding)
>
> This is great. Feels like a phase change in the life of Apache
> Beam, having three languages, with multiple portable runners on the 
> horizon.
>
> Kenn
>
> On Tue, May 22, 2018 at 2:50 AM Ismaël Mejía 
> wrote:
>
>> +1 (binding)
>>
>> Go SDK brings new language support for a community not well
>> supported in
>> the Big Data world the Go developers, so this is a great. Also
>> the fact
>> that this is the first SDK integrated with the portability work
>> makes it an
>> interesting project to learn lessons from for future languages.
>>
>> Now it is the time to start building a community around the Go
>> SDK this is
>> the most important task now, and the only way to do it is to have
>> the SDK
>> as an official part of Beam so +1.
>>
>> Congrats to Henning and all the other contributors for this
>> important
>> milestone.
>> On Tue, May 22, 2018 at 10:21 AM Holden Karau <
>> hol...@pigscanfly.ca> wrote:
>>
>> > +1 (non-binding), I've had a chance to work with the SDK and
>> it's pretty
>> neat to see Beam add support for a language before the most of
>> the big data
>> ecosystem.
>>
>> > On Mon, May 21, 2018 at 10:29 PM, Jean-Baptiste Onofré <
>> j...@nanthrax.net>
>> wrote:
>>
>> >> Hi Henning,
>>
>> >> SGA has been filed for the entire project during the
>> incubation period.
>>
>> >> Here, we have to check if SGA/IP donation is clean for the Go
>> SDK.
>>
>> >> We don't have a lot to do, just checked that we are clean on
>> this front.
>>
>> >> Regards
>> >> JB
>>
>> >> On 22/05/2018 06:42, Henning Rohde wrote:
>>
>> >>> Thanks everyone!
>>
>> >>> Davor -- regarding your two comments:
>> >>> * Robert mentioned that "SGA should have probably already
>> been
>> filed" in the previous thread. I got the impression that nothing
>> further
>> was needed. I'll follow up.
>> >>> * The standard Go tooling basically always pulls directly
>> from
>> github, so there is no real urgency here.
>>
>> >>> Thanks,
>> >>>Henning
>>
>>
>> >>> On Mon, May 21, 2018 at 9:30 PM Jean-Baptiste Onofré <
>> j...@nanthrax.net
>> > wrote:
>>
>> >>>  +1 (binding)
>>
>> >>>  I just want to check about SGA/IP/Headers.
>>
>> >>>  Thanks !
>> >>>  Regards
>> >>>  JB
>>
>> >>>  On 22/05/2018 03:02, Henning Rohde wrote:
>> >>>   > Hi everyone,
>> >>>   >
>> >>>   > Now that the remaining issues have been resolved as
>> discussed,
>> >>>  I'd like
>> >>>   > to propose a formal vote on accepting the Go SDK into
>> master. The
>> >>>  main
>> >>>   > practical difference is that the Go SDK would be part
>> of the
>> >>>  Apache Beam
>> >>>   > release going forward.
>>

Re: Beam SQL Improvements

2018-05-22 Thread Romain Manni-Bucau
Well, beam can implement a new mapper but it doesnt help for io. Most of
modern backends will take json directly, even javax one and it must stay
generic.

Then since json to pojo mapping is already done a dozen of times, not sure
it is worth it for now.

Le mar. 22 mai 2018 20:27, Reuven Lax  a écrit :

> We can do even better btw. Building a SchemaRegistry where automatic
> conversions can be registered between schema and Java data types. With this
> the user won't even need a DoFn to do the conversion.
>
> On Tue, May 22, 2018, 10:13 AM Romain Manni-Bucau 
> wrote:
>
>> Hi guys,
>>
>> Checked out what has been done on schema model and think it is acceptable
>> - regarding the json debate -  if
>> https://issues.apache.org/jira/browse/BEAM-4381 can be fixed.
>>
>> High level, it is about providing a mainstream and not too impacting
>> model OOTB and JSON seems the most valid option for now, at least for IO
>> and some user transforms.
>>
>> Wdyt?
>>
>> Le ven. 27 avr. 2018 18:36, Romain Manni-Bucau  a
>> écrit :
>>
>>>  Can give it a try end of may, sure. (holidays and work constraints will
>>> make it hard before).
>>>
>>> Le 27 avr. 2018 18:26, "Anton Kedin"  a écrit :
>>>
 Romain,

 I don't believe that JSON approach was investigated very thoroughIy. I
 mentioned few reasons which will make it not the best choice my opinion,
 but I may be wrong. Can you put together a design doc or a prototype?

 Thank you,
 Anton


 On Thu, Apr 26, 2018 at 10:17 PM Romain Manni-Bucau <
 rmannibu...@gmail.com> wrote:

>
>
> Le 26 avr. 2018 23:13, "Anton Kedin"  a écrit :
>
> BeamRecord (Row) has very little in common with JsonObject (I assume
> you're talking about javax.json), except maybe some similarities of the
> API. Few reasons why JsonObject doesn't work:
>
>- it is a Java EE API:
>   - Beam SDK is not limited to Java. There are probably similar
>   APIs for other languages but they might not necessarily carry the 
> same
>   semantics / APIs;
>
>
> Not a big deal I think. At least not a technical blocker.
>
>
>- It can change between Java versions;
>
> No, this is javaee ;).
>
>
>
>- Current Beam java implementation is an experimental feature to
>   identify what's needed from such API, in the end we might end up 
> with
>   something similar to JsonObject API, but likely not
>
>
> I dont get that point as a blocker
>
>
>- ;
>   - represents JSON, which is not an API but an object notation:
>   - it is defined as unicode string in a certain format. If you
>   choose to adhere to ECMA-404, then it doesn't sound like JsonObject 
> can
>   represent an Avro object, if I'm reading it right;
>
>
> It is in the generator impl, you can impl an avrogenerator.
>
>
>- doesn't define a type system (JSON does, but it's lacking):
>   - for example, JSON doesn't define semantics for numbers;
>   - doesn't define date/time types;
>   - doesn't allow extending JSON type system at all;
>
>
> That is why you need a metada object, or simpler, a schema with that
> data. Json or beam record doesnt help here and you end up on the same
> outcome if you think about it.
>
>
>- lacks schemas;
>
> Jsonschema are standard, widely spread and tooled compared to
> alternative.
>
> You can definitely try loosen the requirements and define everything
> in JSON in userland, but the point of Row/Schema is to avoid it and define
> everything in Beam model, which can be extended, mapped to JSON, Avro,
> BigQuery Schemas, custom binary format etc., with same semantics across
> beam SDKs.
>
>
> This is what jsonp would allow with the benefit of a natural pojo
> support through jsonb.
>
>
>
> On Thu, Apr 26, 2018 at 12:28 PM Romain Manni-Bucau <
> rmannibu...@gmail.com> wrote:
>
>> Just to let it be clear and let me understand: how is BeamRecord
>> different from a JsonObject which is an API without implementation (not
>> event a json one OOTB)? Advantage of json *api* are indeed natural 
>> mapping
>> (jsonb is based on jsonp so no new binding to reinvent) and simple
>> serialization (json+gzip for ex, or avro if you want to be geeky).
>>
>> I fail to see the point to rebuild an ecosystem ATM.
>>
>> Le 26 avr. 2018 19:12, "Reuven Lax"  a écrit :
>>
>>> Exactly what JB said. We will write a generic conversion from Avro
>>> (or json) to Beam schemas, which will make them work transparently with
>>> SQL. The plan is also to migrate Anton's work so that POJOs works
>>> generically for any schema.
>>>
>>> Reuven
>>>
>>> On Thu, Apr 26, 2018 at 1:17 AM Jean-Baptis

Re: [VOTE] Go SDK

2018-05-22 Thread Scott Wegner
+1 (non-binding)

Having a third language will really force us to design Beam constructs in a
language-agnostic way, and achieve the goals of portability. Thanks to all
that have helped reach this milestone.

On Tue, May 22, 2018 at 10:19 AM Ahmet Altay  wrote:

> +1 (binding)
>
> Congratulations to the team!
>
> On Tue, May 22, 2018 at 10:13 AM, Alan Myrvold 
> wrote:
>
>> +1 (non-binding)
>> Nice work!
>>
>> On Tue, May 22, 2018 at 9:18 AM Pablo Estrada  wrote:
>>
>>> +1 (binding)
>>> Very excited to see this!
>>>
>>> On Tue, May 22, 2018 at 9:09 AM Thomas Weise  wrote:
>>>
 +1 and congrats!


 On Tue, May 22, 2018 at 8:48 AM, Rafael Fernandez 
 wrote:

> +1 !
>
> On Tue, May 22, 2018 at 7:54 AM Lukasz Cwik  wrote:
>
>> +1 (binding)
>>
>> On Tue, May 22, 2018 at 6:16 AM Robert Burke 
>> wrote:
>>
>>> +1 (non-binding)
>>>
>>> I'm looking forward to helping gophers solve their big data problems
>>> in their language of choice, and runner of choice!
>>>
>>> Next stop, a non-java portability runner?
>>>
>>> On Tue, May 22, 2018, 6:08 AM Kenneth Knowles 
>>> wrote:
>>>
 +1 (binding)

 This is great. Feels like a phase change in the life of Apache
 Beam, having three languages, with multiple portable runners on the 
 horizon.

 Kenn

 On Tue, May 22, 2018 at 2:50 AM Ismaël Mejía 
 wrote:

> +1 (binding)
>
> Go SDK brings new language support for a community not well
> supported in
> the Big Data world the Go developers, so this is a great. Also the
> fact
> that this is the first SDK integrated with the portability work
> makes it an
> interesting project to learn lessons from for future languages.
>
> Now it is the time to start building a community around the Go SDK
> this is
> the most important task now, and the only way to do it is to have
> the SDK
> as an official part of Beam so +1.
>
> Congrats to Henning and all the other contributors for this
> important
> milestone.
> On Tue, May 22, 2018 at 10:21 AM Holden Karau <
> hol...@pigscanfly.ca> wrote:
>
> > +1 (non-binding), I've had a chance to work with the SDK and
> it's pretty
> neat to see Beam add support for a language before the most of the
> big data
> ecosystem.
>
> > On Mon, May 21, 2018 at 10:29 PM, Jean-Baptiste Onofré <
> j...@nanthrax.net>
> wrote:
>
> >> Hi Henning,
>
> >> SGA has been filed for the entire project during the incubation
> period.
>
> >> Here, we have to check if SGA/IP donation is clean for the Go
> SDK.
>
> >> We don't have a lot to do, just checked that we are clean on
> this front.
>
> >> Regards
> >> JB
>
> >> On 22/05/2018 06:42, Henning Rohde wrote:
>
> >>> Thanks everyone!
>
> >>> Davor -- regarding your two comments:
> >>> * Robert mentioned that "SGA should have probably already
> been
> filed" in the previous thread. I got the impression that nothing
> further
> was needed. I'll follow up.
> >>> * The standard Go tooling basically always pulls directly
> from
> github, so there is no real urgency here.
>
> >>> Thanks,
> >>>Henning
>
>
> >>> On Mon, May 21, 2018 at 9:30 PM Jean-Baptiste Onofré <
> j...@nanthrax.net
> > wrote:
>
> >>>  +1 (binding)
>
> >>>  I just want to check about SGA/IP/Headers.
>
> >>>  Thanks !
> >>>  Regards
> >>>  JB
>
> >>>  On 22/05/2018 03:02, Henning Rohde wrote:
> >>>   > Hi everyone,
> >>>   >
> >>>   > Now that the remaining issues have been resolved as
> discussed,
> >>>  I'd like
> >>>   > to propose a formal vote on accepting the Go SDK into
> master. The
> >>>  main
> >>>   > practical difference is that the Go SDK would be part
> of the
> >>>  Apache Beam
> >>>   > release going forward.
> >>>   >
> >>>   > Highlights of the Go SDK:
> >>>   >   * Go user experience with natively-typed DoFns with
> (simulated)
> >>>   > generic types
> >>>   >   * Covers most of the Beam model: ParDo, GBK, CoGBK,
> Flatten,
> >>>  Combine,
> >>>   > Windowing, ..
> >>>   >   * Includes several IO connectors:

Re: Beam SQL Improvements

2018-05-22 Thread Reuven Lax
We can do even better btw. Building a SchemaRegistry where automatic
conversions can be registered between schema and Java data types. With this
the user won't even need a DoFn to do the conversion.

On Tue, May 22, 2018, 10:13 AM Romain Manni-Bucau 
wrote:

> Hi guys,
>
> Checked out what has been done on schema model and think it is acceptable
> - regarding the json debate -  if
> https://issues.apache.org/jira/browse/BEAM-4381 can be fixed.
>
> High level, it is about providing a mainstream and not too impacting model
> OOTB and JSON seems the most valid option for now, at least for IO and some
> user transforms.
>
> Wdyt?
>
> Le ven. 27 avr. 2018 18:36, Romain Manni-Bucau  a
> écrit :
>
>>  Can give it a try end of may, sure. (holidays and work constraints will
>> make it hard before).
>>
>> Le 27 avr. 2018 18:26, "Anton Kedin"  a écrit :
>>
>>> Romain,
>>>
>>> I don't believe that JSON approach was investigated very thoroughIy. I
>>> mentioned few reasons which will make it not the best choice my opinion,
>>> but I may be wrong. Can you put together a design doc or a prototype?
>>>
>>> Thank you,
>>> Anton
>>>
>>>
>>> On Thu, Apr 26, 2018 at 10:17 PM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>


 Le 26 avr. 2018 23:13, "Anton Kedin"  a écrit :

 BeamRecord (Row) has very little in common with JsonObject (I assume
 you're talking about javax.json), except maybe some similarities of the
 API. Few reasons why JsonObject doesn't work:

- it is a Java EE API:
   - Beam SDK is not limited to Java. There are probably similar
   APIs for other languages but they might not necessarily carry the 
 same
   semantics / APIs;


 Not a big deal I think. At least not a technical blocker.


- It can change between Java versions;

 No, this is javaee ;).



- Current Beam java implementation is an experimental feature to
   identify what's needed from such API, in the end we might end up with
   something similar to JsonObject API, but likely not


 I dont get that point as a blocker


- ;
   - represents JSON, which is not an API but an object notation:
   - it is defined as unicode string in a certain format. If you
   choose to adhere to ECMA-404, then it doesn't sound like JsonObject 
 can
   represent an Avro object, if I'm reading it right;


 It is in the generator impl, you can impl an avrogenerator.


- doesn't define a type system (JSON does, but it's lacking):
   - for example, JSON doesn't define semantics for numbers;
   - doesn't define date/time types;
   - doesn't allow extending JSON type system at all;


 That is why you need a metada object, or simpler, a schema with that
 data. Json or beam record doesnt help here and you end up on the same
 outcome if you think about it.


- lacks schemas;

 Jsonschema are standard, widely spread and tooled compared to
 alternative.

 You can definitely try loosen the requirements and define everything in
 JSON in userland, but the point of Row/Schema is to avoid it and define
 everything in Beam model, which can be extended, mapped to JSON, Avro,
 BigQuery Schemas, custom binary format etc., with same semantics across
 beam SDKs.


 This is what jsonp would allow with the benefit of a natural pojo
 support through jsonb.



 On Thu, Apr 26, 2018 at 12:28 PM Romain Manni-Bucau <
 rmannibu...@gmail.com> wrote:

> Just to let it be clear and let me understand: how is BeamRecord
> different from a JsonObject which is an API without implementation (not
> event a json one OOTB)? Advantage of json *api* are indeed natural mapping
> (jsonb is based on jsonp so no new binding to reinvent) and simple
> serialization (json+gzip for ex, or avro if you want to be geeky).
>
> I fail to see the point to rebuild an ecosystem ATM.
>
> Le 26 avr. 2018 19:12, "Reuven Lax"  a écrit :
>
>> Exactly what JB said. We will write a generic conversion from Avro
>> (or json) to Beam schemas, which will make them work transparently with
>> SQL. The plan is also to migrate Anton's work so that POJOs works
>> generically for any schema.
>>
>> Reuven
>>
>> On Thu, Apr 26, 2018 at 1:17 AM Jean-Baptiste Onofré 
>> wrote:
>>
>>> For now we have a generic schema interface. Json-b can be an impl,
>>> avro could be another one.
>>>
>>> Regards
>>> JB
>>> Le 26 avr. 2018, à 12:08, Romain Manni-Bucau 
>>> a écrit:

 Hmm,

 avro has still the pitfalls to have an uncontrolled stack which
 brings way too much dependencies to be part of any API,
 this is why I proposed a JSON-P

Missing copyright notices for shaded packages

2018-05-22 Thread Andrew Pilloud
I was digging around in the SQL jar trying to debug some packaging issues
and noticed that we aren't including the copyright notices from the
packages we are shading. I also looked at our previously released jars and
they are the same (so this isn't a regression). Should we be including the
copyright notice from packages we are redistributing?

Andrew


Re: Java PreCommit seems broken

2018-05-22 Thread Scott Wegner
I've logged BEAM-4382 [1] to decouple maven archetype generation from the
rest of the Maven build. Luke, would you mind adding any context you have
about generating archetypes from Gradle? From a quick search I couldn't
find a native Gradle plugin, but perhaps the logic is simple enough to roll
our own logic.

[1] https://issues.apache.org/jira/browse/BEAM-4382

On Mon, May 21, 2018 at 11:16 AM Lukasz Cwik  wrote:

> The archetype projects are coupled to their parents but don't have any
> meaningful dependencies so a significantly simpler archetype could be used.
> The dependency that exists right now in the archetype is to provide
> specific build ordering which can easily be moved to Gradle directly.
>
> An alternative would be to migrate the Maven archetype build to use
> Gradle. Assembling the Maven archetype jar is easy as no compilation is
> required, the issue was about running/validating that the archetype can be
> built.
>
> On Fri, May 18, 2018 at 12:20 PM Scott Wegner  wrote:
>
>> +1 to the Lukasz's proposed solution. Depending on artifacts published
>> from a previous build it's fragile and will add flakiness to our test runs.
>> We should make pre-commits as hermetic as possible.
>>
>> Depending on the transitive set of publishToMavenLocal tasks seems
>> cumbersome, but also necessary.
>>
>> On a related note: The archetype projects are shelling out to mvn for the
>> build, which uses the existing pom.xml files. This places a build
>> dependency on the pom.xml files down to the project root due to parent
>> relationships. Has there been any investigation on whether we can decouple
>> archetype generation from our Maven pom.xml files?
>>
>> On Fri, May 18, 2018 at 10:47 AM Lukasz Cwik  wrote:
>>
>>> We would need the archetype task to depend on all the dependencies
>>> publishToMavenLocal tasks transitively and then be configured to use
>>> whatever that maven local is on Jenkins / dev machine. It would be best if
>>> it was an ephemeral folder because it would be annoying to have stuff
>>> installed underneath a devs .m2/ directory that would need cleaning up.
>>>
>>> On Fri, May 18, 2018 at 10:41 AM Kenneth Knowles  wrote:
>>>
 Is this just a build tweak, or are there costly steps that we'd have to
 add that would slow down presubmit? (with mvn I know that `test` and
 `install` did very different amounts of work - because mvn test didn't test
 the right artifacts, but maybe with Gradle not so much?)

 On Fri, May 18, 2018 at 9:14 AM Lukasz Cwik  wrote:

> The problem with the way that the archetypes tests are run (now with
> Gradle and in the past with Maven) is that they run against the nightly
> snapshot and not against artifacts from the current build. To get them to
> work, we would need to publish the dependent Maven modules to a temporary
> repo and instruct the archetype project to use it for building/testing
> purposes.
>
> On Fri, May 18, 2018 at 5:38 AM Kenneth Knowles 
> wrote:
>
>> Maybe something has changed, but the snapshots used to pull from the
>> public snapshot repo. We got failures for a while every time we cut a
>> release branch, but once there was a nightly snapshot they cleared up.
>>
>> Kenn
>>
>> On Thu, May 17, 2018 at 9:50 PM Scott Wegner 
>> wrote:
>>
>>> I noticed that tests tests simply run "mvn clean install" on the
>>> archetype project. But I don't see any dependent task which installs 
>>> built
>>> artifacts into the local Maven repo. Is that an oversight?
>>>
>>> If that's the case, perhaps the tests are failing sporadically when
>>> there are no previously installed snapshot artifacts cached on the 
>>> machine.
>>>
>>> On Thu, May 17, 2018, 2:45 PM Pablo Estrada 
>>> wrote:
>>>
 I'm seeing failures on Maven Archetype-related tests.

 Build Scan of a sample run:
 https://scans.gradle.com/s/kr23q43mh6fmk

 And the failure is here specifically:
 https://scans.gradle.com/s/kr23q43mh6fmk/console-log?task=:beam-sdks-java-maven-archetypes-examples:generateAndBuildArchetypeTest#L116


 Does anyone know why this might be happening?
 Best
 -P.
 --
 Got feedback? go/pabloem-feedback
 

>>>


Re: Beam SQL Improvements

2018-05-22 Thread Kenneth Knowles
Yea, I'm sure if you took on BEAM-4381 some folks would find it useful.

Kenn

On Tue, May 22, 2018 at 10:13 AM Romain Manni-Bucau 
wrote:

> Hi guys,
>
> Checked out what has been done on schema model and think it is acceptable
> - regarding the json debate -  if
> https://issues.apache.org/jira/browse/BEAM-4381 can be fixed.
>
> High level, it is about providing a mainstream and not too impacting model
> OOTB and JSON seems the most valid option for now, at least for IO and some
> user transforms.
>
> Wdyt?
>
> Le ven. 27 avr. 2018 18:36, Romain Manni-Bucau  a
> écrit :
>
>>  Can give it a try end of may, sure. (holidays and work constraints will
>> make it hard before).
>>
>> Le 27 avr. 2018 18:26, "Anton Kedin"  a écrit :
>>
>>> Romain,
>>>
>>> I don't believe that JSON approach was investigated very thoroughIy. I
>>> mentioned few reasons which will make it not the best choice my opinion,
>>> but I may be wrong. Can you put together a design doc or a prototype?
>>>
>>> Thank you,
>>> Anton
>>>
>>>
>>> On Thu, Apr 26, 2018 at 10:17 PM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>


 Le 26 avr. 2018 23:13, "Anton Kedin"  a écrit :

 BeamRecord (Row) has very little in common with JsonObject (I assume
 you're talking about javax.json), except maybe some similarities of the
 API. Few reasons why JsonObject doesn't work:

- it is a Java EE API:
   - Beam SDK is not limited to Java. There are probably similar
   APIs for other languages but they might not necessarily carry the 
 same
   semantics / APIs;


 Not a big deal I think. At least not a technical blocker.


- It can change between Java versions;

 No, this is javaee ;).



- Current Beam java implementation is an experimental feature to
   identify what's needed from such API, in the end we might end up with
   something similar to JsonObject API, but likely not


 I dont get that point as a blocker


- ;
   - represents JSON, which is not an API but an object notation:
   - it is defined as unicode string in a certain format. If you
   choose to adhere to ECMA-404, then it doesn't sound like JsonObject 
 can
   represent an Avro object, if I'm reading it right;


 It is in the generator impl, you can impl an avrogenerator.


- doesn't define a type system (JSON does, but it's lacking):
   - for example, JSON doesn't define semantics for numbers;
   - doesn't define date/time types;
   - doesn't allow extending JSON type system at all;


 That is why you need a metada object, or simpler, a schema with that
 data. Json or beam record doesnt help here and you end up on the same
 outcome if you think about it.


- lacks schemas;

 Jsonschema are standard, widely spread and tooled compared to
 alternative.

 You can definitely try loosen the requirements and define everything in
 JSON in userland, but the point of Row/Schema is to avoid it and define
 everything in Beam model, which can be extended, mapped to JSON, Avro,
 BigQuery Schemas, custom binary format etc., with same semantics across
 beam SDKs.


 This is what jsonp would allow with the benefit of a natural pojo
 support through jsonb.



 On Thu, Apr 26, 2018 at 12:28 PM Romain Manni-Bucau <
 rmannibu...@gmail.com> wrote:

> Just to let it be clear and let me understand: how is BeamRecord
> different from a JsonObject which is an API without implementation (not
> event a json one OOTB)? Advantage of json *api* are indeed natural mapping
> (jsonb is based on jsonp so no new binding to reinvent) and simple
> serialization (json+gzip for ex, or avro if you want to be geeky).
>
> I fail to see the point to rebuild an ecosystem ATM.
>
> Le 26 avr. 2018 19:12, "Reuven Lax"  a écrit :
>
>> Exactly what JB said. We will write a generic conversion from Avro
>> (or json) to Beam schemas, which will make them work transparently with
>> SQL. The plan is also to migrate Anton's work so that POJOs works
>> generically for any schema.
>>
>> Reuven
>>
>> On Thu, Apr 26, 2018 at 1:17 AM Jean-Baptiste Onofré 
>> wrote:
>>
>>> For now we have a generic schema interface. Json-b can be an impl,
>>> avro could be another one.
>>>
>>> Regards
>>> JB
>>> Le 26 avr. 2018, à 12:08, Romain Manni-Bucau 
>>> a écrit:

 Hmm,

 avro has still the pitfalls to have an uncontrolled stack which
 brings way too much dependencies to be part of any API,
 this is why I proposed a JSON-P based API (JsonObject) with a
 custom beam entry for some metadata (headers "à la Camel").


>

Re: The full list of proposals / prototype documents

2018-05-22 Thread Kenneth Knowles
It is owned by the Beam PMC collectively. Any PMC member can add things to
it. Ideas for making it easy to manage are welcome.

Probably easier to have a markdown file somewhere with a list of docs so we
can issue and review PRs. Not sure the web site is the right place for it -
we have a history of porting docs to markdown but really that is high
overhead and users/community probably don't gain from it so much. Some have
suggested a wiki.

Kenn

On Tue, May 22, 2018 at 10:22 AM Scott Wegner  wrote:

> Thanks for the links. Any details on that Google drive folder? Who
> maintains it? Is it possible for any contributor to add their design doc?
>
> On Mon, May 21, 2018 at 8:15 AM Joseph PENG 
> wrote:
>
>> Alexey,
>>
>> I do not know where you can find all design docs, but I know a blog that
>> has collected some of the major design docs. Hope it helps.
>>
>> https://wtanaka.com/beam/design-doc
>>
>> https://drive.google.com/drive/folders/0B-IhJZh9Ab52OFBVZHpsNjc4eXc
>>
>> On Mon, May 21, 2018 at 9:28 AM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> Is it possible to obtain somewhere a list of all proposals / prototype
>>> documents that have been published as a technical / design documents for
>>> new features? I have links to only some of them (found in mail list
>>> discussions by chance) but I’m not aware of others.
>>>
>>> If yes, could someone share it or point me out where it is located in
>>> case if I missed this?
>>>
>>> If not, don’t you think it would make sense to have such index of these
>>> documents? I believe it can be useful for Beam contributors since these
>>> proposals contain information which is absent or not so detailed on Beam
>>> web site documentation.
>>>
>>> WBR,
>>> Alexey
>>
>>


Re: The full list of proposals / prototype documents

2018-05-22 Thread Scott Wegner
Thanks for the links. Any details on that Google drive folder? Who
maintains it? Is it possible for any contributor to add their design doc?

On Mon, May 21, 2018 at 8:15 AM Joseph PENG 
wrote:

> Alexey,
>
> I do not know where you can find all design docs, but I know a blog that
> has collected some of the major design docs. Hope it helps.
>
> https://wtanaka.com/beam/design-doc
>
> https://drive.google.com/drive/folders/0B-IhJZh9Ab52OFBVZHpsNjc4eXc
>
> On Mon, May 21, 2018 at 9:28 AM Alexey Romanenko 
> wrote:
>
>> Hi all,
>>
>> Is it possible to obtain somewhere a list of all proposals / prototype
>> documents that have been published as a technical / design documents for
>> new features? I have links to only some of them (found in mail list
>> discussions by chance) but I’m not aware of others.
>>
>> If yes, could someone share it or point me out where it is located in
>> case if I missed this?
>>
>> If not, don’t you think it would make sense to have such index of these
>> documents? I believe it can be useful for Beam contributors since these
>> proposals contain information which is absent or not so detailed on Beam
>> web site documentation.
>>
>> WBR,
>> Alexey
>
>


Re: [VOTE] Go SDK

2018-05-22 Thread Ahmet Altay
+1 (binding)

Congratulations to the team!

On Tue, May 22, 2018 at 10:13 AM, Alan Myrvold  wrote:

> +1 (non-binding)
> Nice work!
>
> On Tue, May 22, 2018 at 9:18 AM Pablo Estrada  wrote:
>
>> +1 (binding)
>> Very excited to see this!
>>
>> On Tue, May 22, 2018 at 9:09 AM Thomas Weise  wrote:
>>
>>> +1 and congrats!
>>>
>>>
>>> On Tue, May 22, 2018 at 8:48 AM, Rafael Fernandez 
>>> wrote:
>>>
 +1 !

 On Tue, May 22, 2018 at 7:54 AM Lukasz Cwik  wrote:

> +1 (binding)
>
> On Tue, May 22, 2018 at 6:16 AM Robert Burke 
> wrote:
>
>> +1 (non-binding)
>>
>> I'm looking forward to helping gophers solve their big data problems
>> in their language of choice, and runner of choice!
>>
>> Next stop, a non-java portability runner?
>>
>> On Tue, May 22, 2018, 6:08 AM Kenneth Knowles  wrote:
>>
>>> +1 (binding)
>>>
>>> This is great. Feels like a phase change in the life of Apache Beam,
>>> having three languages, with multiple portable runners on the horizon.
>>>
>>> Kenn
>>>
>>> On Tue, May 22, 2018 at 2:50 AM Ismaël Mejía 
>>> wrote:
>>>
 +1 (binding)

 Go SDK brings new language support for a community not well
 supported in
 the Big Data world the Go developers, so this is a great. Also the
 fact
 that this is the first SDK integrated with the portability work
 makes it an
 interesting project to learn lessons from for future languages.

 Now it is the time to start building a community around the Go SDK
 this is
 the most important task now, and the only way to do it is to have
 the SDK
 as an official part of Beam so +1.

 Congrats to Henning and all the other contributors for this
 important
 milestone.
 On Tue, May 22, 2018 at 10:21 AM Holden Karau 
 wrote:

 > +1 (non-binding), I've had a chance to work with the SDK and it's
 pretty
 neat to see Beam add support for a language before the most of the
 big data
 ecosystem.

 > On Mon, May 21, 2018 at 10:29 PM, Jean-Baptiste Onofré <
 j...@nanthrax.net>
 wrote:

 >> Hi Henning,

 >> SGA has been filed for the entire project during the incubation
 period.

 >> Here, we have to check if SGA/IP donation is clean for the Go
 SDK.

 >> We don't have a lot to do, just checked that we are clean on
 this front.

 >> Regards
 >> JB

 >> On 22/05/2018 06:42, Henning Rohde wrote:

 >>> Thanks everyone!

 >>> Davor -- regarding your two comments:
 >>> * Robert mentioned that "SGA should have probably already
 been
 filed" in the previous thread. I got the impression that nothing
 further
 was needed. I'll follow up.
 >>> * The standard Go tooling basically always pulls directly
 from
 github, so there is no real urgency here.

 >>> Thanks,
 >>>Henning


 >>> On Mon, May 21, 2018 at 9:30 PM Jean-Baptiste Onofré <
 j...@nanthrax.net
 > wrote:

 >>>  +1 (binding)

 >>>  I just want to check about SGA/IP/Headers.

 >>>  Thanks !
 >>>  Regards
 >>>  JB

 >>>  On 22/05/2018 03:02, Henning Rohde wrote:
 >>>   > Hi everyone,
 >>>   >
 >>>   > Now that the remaining issues have been resolved as
 discussed,
 >>>  I'd like
 >>>   > to propose a formal vote on accepting the Go SDK into
 master. The
 >>>  main
 >>>   > practical difference is that the Go SDK would be part
 of the
 >>>  Apache Beam
 >>>   > release going forward.
 >>>   >
 >>>   > Highlights of the Go SDK:
 >>>   >   * Go user experience with natively-typed DoFns with
 (simulated)
 >>>   > generic types
 >>>   >   * Covers most of the Beam model: ParDo, GBK, CoGBK,
 Flatten,
 >>>  Combine,
 >>>   > Windowing, ..
 >>>   >   * Includes several IO connectors: Datastore,
 BigQuery, PubSub,
 >>>   > extensible textio.
 >>>   >   * Supports the portability framework for both batch
 and
 streaming,
 >>>   > notably the upcoming portable Flink runner
 >>>   >   * Supports a direct runner for small batch workloads
 and
 testing.
 >>>   >   * Includes pre-commit tests and post-commit
 integration tests.
>

Re: Beam SQL Improvements

2018-05-22 Thread Romain Manni-Bucau
Hi guys,

Checked out what has been done on schema model and think it is acceptable -
regarding the json debate -  if
https://issues.apache.org/jira/browse/BEAM-4381 can be fixed.

High level, it is about providing a mainstream and not too impacting model
OOTB and JSON seems the most valid option for now, at least for IO and some
user transforms.

Wdyt?

Le ven. 27 avr. 2018 18:36, Romain Manni-Bucau  a
écrit :

>  Can give it a try end of may, sure. (holidays and work constraints will
> make it hard before).
>
> Le 27 avr. 2018 18:26, "Anton Kedin"  a écrit :
>
>> Romain,
>>
>> I don't believe that JSON approach was investigated very thoroughIy. I
>> mentioned few reasons which will make it not the best choice my opinion,
>> but I may be wrong. Can you put together a design doc or a prototype?
>>
>> Thank you,
>> Anton
>>
>>
>> On Thu, Apr 26, 2018 at 10:17 PM Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>>
>>>
>>> Le 26 avr. 2018 23:13, "Anton Kedin"  a écrit :
>>>
>>> BeamRecord (Row) has very little in common with JsonObject (I assume
>>> you're talking about javax.json), except maybe some similarities of the
>>> API. Few reasons why JsonObject doesn't work:
>>>
>>>- it is a Java EE API:
>>>   - Beam SDK is not limited to Java. There are probably similar
>>>   APIs for other languages but they might not necessarily carry the same
>>>   semantics / APIs;
>>>
>>>
>>> Not a big deal I think. At least not a technical blocker.
>>>
>>>
>>>- It can change between Java versions;
>>>
>>> No, this is javaee ;).
>>>
>>>
>>>
>>>- Current Beam java implementation is an experimental feature to
>>>   identify what's needed from such API, in the end we might end up with
>>>   something similar to JsonObject API, but likely not
>>>
>>>
>>> I dont get that point as a blocker
>>>
>>>
>>>- ;
>>>   - represents JSON, which is not an API but an object notation:
>>>   - it is defined as unicode string in a certain format. If you
>>>   choose to adhere to ECMA-404, then it doesn't sound like JsonObject 
>>> can
>>>   represent an Avro object, if I'm reading it right;
>>>
>>>
>>> It is in the generator impl, you can impl an avrogenerator.
>>>
>>>
>>>- doesn't define a type system (JSON does, but it's lacking):
>>>   - for example, JSON doesn't define semantics for numbers;
>>>   - doesn't define date/time types;
>>>   - doesn't allow extending JSON type system at all;
>>>
>>>
>>> That is why you need a metada object, or simpler, a schema with that
>>> data. Json or beam record doesnt help here and you end up on the same
>>> outcome if you think about it.
>>>
>>>
>>>- lacks schemas;
>>>
>>> Jsonschema are standard, widely spread and tooled compared to
>>> alternative.
>>>
>>> You can definitely try loosen the requirements and define everything in
>>> JSON in userland, but the point of Row/Schema is to avoid it and define
>>> everything in Beam model, which can be extended, mapped to JSON, Avro,
>>> BigQuery Schemas, custom binary format etc., with same semantics across
>>> beam SDKs.
>>>
>>>
>>> This is what jsonp would allow with the benefit of a natural pojo
>>> support through jsonb.
>>>
>>>
>>>
>>> On Thu, Apr 26, 2018 at 12:28 PM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
 Just to let it be clear and let me understand: how is BeamRecord
 different from a JsonObject which is an API without implementation (not
 event a json one OOTB)? Advantage of json *api* are indeed natural mapping
 (jsonb is based on jsonp so no new binding to reinvent) and simple
 serialization (json+gzip for ex, or avro if you want to be geeky).

 I fail to see the point to rebuild an ecosystem ATM.

 Le 26 avr. 2018 19:12, "Reuven Lax"  a écrit :

> Exactly what JB said. We will write a generic conversion from Avro (or
> json) to Beam schemas, which will make them work transparently with SQL.
> The plan is also to migrate Anton's work so that POJOs works generically
> for any schema.
>
> Reuven
>
> On Thu, Apr 26, 2018 at 1:17 AM Jean-Baptiste Onofré 
> wrote:
>
>> For now we have a generic schema interface. Json-b can be an impl,
>> avro could be another one.
>>
>> Regards
>> JB
>> Le 26 avr. 2018, à 12:08, Romain Manni-Bucau 
>> a écrit:
>>>
>>> Hmm,
>>>
>>> avro has still the pitfalls to have an uncontrolled stack which
>>> brings way too much dependencies to be part of any API,
>>> this is why I proposed a JSON-P based API (JsonObject) with a custom
>>> beam entry for some metadata (headers "à la Camel").
>>>
>>>
>>> Romain Manni-Bucau
>>> @rmannibucau  |   Blog
>>>  | Old Blog
>>>  |  Github
>>>  | LinkedIn
>>> 

Re: [VOTE] Go SDK

2018-05-22 Thread Alan Myrvold
+1 (non-binding)
Nice work!

On Tue, May 22, 2018 at 9:18 AM Pablo Estrada  wrote:

> +1 (binding)
> Very excited to see this!
>
> On Tue, May 22, 2018 at 9:09 AM Thomas Weise  wrote:
>
>> +1 and congrats!
>>
>>
>> On Tue, May 22, 2018 at 8:48 AM, Rafael Fernandez 
>> wrote:
>>
>>> +1 !
>>>
>>> On Tue, May 22, 2018 at 7:54 AM Lukasz Cwik  wrote:
>>>
 +1 (binding)

 On Tue, May 22, 2018 at 6:16 AM Robert Burke 
 wrote:

> +1 (non-binding)
>
> I'm looking forward to helping gophers solve their big data problems
> in their language of choice, and runner of choice!
>
> Next stop, a non-java portability runner?
>
> On Tue, May 22, 2018, 6:08 AM Kenneth Knowles  wrote:
>
>> +1 (binding)
>>
>> This is great. Feels like a phase change in the life of Apache Beam,
>> having three languages, with multiple portable runners on the horizon.
>>
>> Kenn
>>
>> On Tue, May 22, 2018 at 2:50 AM Ismaël Mejía 
>> wrote:
>>
>>> +1 (binding)
>>>
>>> Go SDK brings new language support for a community not well
>>> supported in
>>> the Big Data world the Go developers, so this is a great. Also the
>>> fact
>>> that this is the first SDK integrated with the portability work
>>> makes it an
>>> interesting project to learn lessons from for future languages.
>>>
>>> Now it is the time to start building a community around the Go SDK
>>> this is
>>> the most important task now, and the only way to do it is to have
>>> the SDK
>>> as an official part of Beam so +1.
>>>
>>> Congrats to Henning and all the other contributors for this important
>>> milestone.
>>> On Tue, May 22, 2018 at 10:21 AM Holden Karau 
>>> wrote:
>>>
>>> > +1 (non-binding), I've had a chance to work with the SDK and it's
>>> pretty
>>> neat to see Beam add support for a language before the most of the
>>> big data
>>> ecosystem.
>>>
>>> > On Mon, May 21, 2018 at 10:29 PM, Jean-Baptiste Onofré <
>>> j...@nanthrax.net>
>>> wrote:
>>>
>>> >> Hi Henning,
>>>
>>> >> SGA has been filed for the entire project during the incubation
>>> period.
>>>
>>> >> Here, we have to check if SGA/IP donation is clean for the Go SDK.
>>>
>>> >> We don't have a lot to do, just checked that we are clean on this
>>> front.
>>>
>>> >> Regards
>>> >> JB
>>>
>>> >> On 22/05/2018 06:42, Henning Rohde wrote:
>>>
>>> >>> Thanks everyone!
>>>
>>> >>> Davor -- regarding your two comments:
>>> >>> * Robert mentioned that "SGA should have probably already
>>> been
>>> filed" in the previous thread. I got the impression that nothing
>>> further
>>> was needed. I'll follow up.
>>> >>> * The standard Go tooling basically always pulls directly
>>> from
>>> github, so there is no real urgency here.
>>>
>>> >>> Thanks,
>>> >>>Henning
>>>
>>>
>>> >>> On Mon, May 21, 2018 at 9:30 PM Jean-Baptiste Onofré <
>>> j...@nanthrax.net
>>> > wrote:
>>>
>>> >>>  +1 (binding)
>>>
>>> >>>  I just want to check about SGA/IP/Headers.
>>>
>>> >>>  Thanks !
>>> >>>  Regards
>>> >>>  JB
>>>
>>> >>>  On 22/05/2018 03:02, Henning Rohde wrote:
>>> >>>   > Hi everyone,
>>> >>>   >
>>> >>>   > Now that the remaining issues have been resolved as
>>> discussed,
>>> >>>  I'd like
>>> >>>   > to propose a formal vote on accepting the Go SDK into
>>> master. The
>>> >>>  main
>>> >>>   > practical difference is that the Go SDK would be part of
>>> the
>>> >>>  Apache Beam
>>> >>>   > release going forward.
>>> >>>   >
>>> >>>   > Highlights of the Go SDK:
>>> >>>   >   * Go user experience with natively-typed DoFns with
>>> (simulated)
>>> >>>   > generic types
>>> >>>   >   * Covers most of the Beam model: ParDo, GBK, CoGBK,
>>> Flatten,
>>> >>>  Combine,
>>> >>>   > Windowing, ..
>>> >>>   >   * Includes several IO connectors: Datastore, BigQuery,
>>> PubSub,
>>> >>>   > extensible textio.
>>> >>>   >   * Supports the portability framework for both batch and
>>> streaming,
>>> >>>   > notably the upcoming portable Flink runner
>>> >>>   >   * Supports a direct runner for small batch workloads
>>> and
>>> testing.
>>> >>>   >   * Includes pre-commit tests and post-commit
>>> integration tests.
>>> >>>   >
>>> >>>   > And last but not least
>>> >>>   >   *  includes contributions from several independent
>>> users and
>>> >>>   > developers, notably an IO connector for Datastore!
>>> >>>   >
>>> >>>   > Website: https://beam

Re: Proposal: keeping post-commit tests green

2018-05-22 Thread Scott Wegner
Thanks for the thoughtful proposal Mikhail. I've left some comments in the
doc.

I encourage others to take a look: the proposal adds some strong policies
about dealing with post-commit failures (rollback policy, locking master).
Currently our post-commits are frequently red, and we're missing out on a
valuable quality signal. I'm in favor of such policies to help get the test
signals back to a healthy state.

On Mon, May 21, 2018 at 2:48 PM Mikhail Gryzykhin  wrote:

> Hi Everyone,
>
> I've updated design doc according to comments.
>
> https://docs.google.com/document/d/1sczGwnCvdHiboVajGVdnZL0rfnr7ViXXAebBAf_uQME
>
> In general, ideas proposed seem to be appreciated. Still, some of sections
> require more discussion.
>
> Changes highlight:
> * Added roll-back first policy to best practices. This includes process on
> how to handle roll-back.
> * Marked topics that I'd like to have more input on. [cyan color]
>
> --Mikhail
>
> Have feedback ?
>
>
> On Fri, May 18, 2018 at 10:56 AM Andrew Pilloud 
> wrote:
>
>> Blocking commits to master on test flaps seems critical here. The test
>> flaps won't get the attention they deserve as long as people are just
>> spamming their PRs with 'Run Java Precommit' until they turn green. I'm
>> guilty of this behavior and I know it masks new flaky tests.
>>
>> I added a comment to your doc about detecting flaky tests. This can
>> easily be done by rerunning the postcommits during times when Jenkins would
>> otherwise be idle. You'll easily get a few dozen runs every weekend, you
>> just need a process to triage all the flakes and ensure there are bugs. I
>> worked on a project that did this along with blocking master on any post
>> commit failure. It was painful for the first few weeks, but things got
>> significantly better once most of the bugs were fixed.
>>
>> Andrew
>>
>> On Fri, May 18, 2018 at 10:39 AM Kenneth Knowles  wrote:
>>
>>> Love it. I would pull out from the doc also the key point: make the
>>> postcommit status constantly visible to everyone.
>>>
>>> Kenn
>>>
>>> On Fri, May 18, 2018 at 10:17 AM Mikhail Gryzykhin 
>>> wrote:
>>>
 Hi everyone,

 I'm Mikhail and started working on Google Dataflow several months ago.
 I'm really excited to work with Beam opensource community.

 I have a proposal to improve contributor experience by keeping
 post-commit tests green.

 I'm looking to get community consensus and approval about the process
 for keeping post-commit tests green and addressing post-commit test
 failures.

 Find full list of ideas brought in for discussion in this document:

 https://docs.google.com/document/d/1sczGwnCvdHiboVajGVdnZL0rfnr7ViXXAebBAf_uQME

 Key points are:
 1. Add explicit tracking of failures via JIRA
 2. No-Commit policy when post-commit tests are red

 --Mikhail




Re: Proposal for Beam Python User State and Timer APIs

2018-05-22 Thread Kenneth Knowles
Nice. I know that Java users have found it helpful to have this lower-level
way of writing pipelines when the high-level primitives don't quite have
the tight control they are looking for. I hope it will be a big draw for
Python, too.

(commenting on the doc)

Kenn

On Mon, May 21, 2018 at 5:15 PM Charles Chen  wrote:

> I want to share a proposal for adding user state and timer support to the
> Beam Python SDK and get the community's thoughts on how such an API should
> look: https://s.apache.org/beam-python-user-state-and-timers
>
> Let me know what you think and please add any comments and suggestions you
> may have.
>
> Best,
> Charles
>


Re: [VOTE] Go SDK

2018-05-22 Thread Pablo Estrada
+1 (binding)
Very excited to see this!

On Tue, May 22, 2018 at 9:09 AM Thomas Weise  wrote:

> +1 and congrats!
>
>
> On Tue, May 22, 2018 at 8:48 AM, Rafael Fernandez 
> wrote:
>
>> +1 !
>>
>> On Tue, May 22, 2018 at 7:54 AM Lukasz Cwik  wrote:
>>
>>> +1 (binding)
>>>
>>> On Tue, May 22, 2018 at 6:16 AM Robert Burke  wrote:
>>>
 +1 (non-binding)

 I'm looking forward to helping gophers solve their big data problems in
 their language of choice, and runner of choice!

 Next stop, a non-java portability runner?

 On Tue, May 22, 2018, 6:08 AM Kenneth Knowles  wrote:

> +1 (binding)
>
> This is great. Feels like a phase change in the life of Apache Beam,
> having three languages, with multiple portable runners on the horizon.
>
> Kenn
>
> On Tue, May 22, 2018 at 2:50 AM Ismaël Mejía 
> wrote:
>
>> +1 (binding)
>>
>> Go SDK brings new language support for a community not well supported
>> in
>> the Big Data world the Go developers, so this is a great. Also the
>> fact
>> that this is the first SDK integrated with the portability work makes
>> it an
>> interesting project to learn lessons from for future languages.
>>
>> Now it is the time to start building a community around the Go SDK
>> this is
>> the most important task now, and the only way to do it is to have the
>> SDK
>> as an official part of Beam so +1.
>>
>> Congrats to Henning and all the other contributors for this important
>> milestone.
>> On Tue, May 22, 2018 at 10:21 AM Holden Karau 
>> wrote:
>>
>> > +1 (non-binding), I've had a chance to work with the SDK and it's
>> pretty
>> neat to see Beam add support for a language before the most of the
>> big data
>> ecosystem.
>>
>> > On Mon, May 21, 2018 at 10:29 PM, Jean-Baptiste Onofré <
>> j...@nanthrax.net>
>> wrote:
>>
>> >> Hi Henning,
>>
>> >> SGA has been filed for the entire project during the incubation
>> period.
>>
>> >> Here, we have to check if SGA/IP donation is clean for the Go SDK.
>>
>> >> We don't have a lot to do, just checked that we are clean on this
>> front.
>>
>> >> Regards
>> >> JB
>>
>> >> On 22/05/2018 06:42, Henning Rohde wrote:
>>
>> >>> Thanks everyone!
>>
>> >>> Davor -- regarding your two comments:
>> >>> * Robert mentioned that "SGA should have probably already been
>> filed" in the previous thread. I got the impression that nothing
>> further
>> was needed. I'll follow up.
>> >>> * The standard Go tooling basically always pulls directly from
>> github, so there is no real urgency here.
>>
>> >>> Thanks,
>> >>>Henning
>>
>>
>> >>> On Mon, May 21, 2018 at 9:30 PM Jean-Baptiste Onofré <
>> j...@nanthrax.net
>> > wrote:
>>
>> >>>  +1 (binding)
>>
>> >>>  I just want to check about SGA/IP/Headers.
>>
>> >>>  Thanks !
>> >>>  Regards
>> >>>  JB
>>
>> >>>  On 22/05/2018 03:02, Henning Rohde wrote:
>> >>>   > Hi everyone,
>> >>>   >
>> >>>   > Now that the remaining issues have been resolved as
>> discussed,
>> >>>  I'd like
>> >>>   > to propose a formal vote on accepting the Go SDK into
>> master. The
>> >>>  main
>> >>>   > practical difference is that the Go SDK would be part of
>> the
>> >>>  Apache Beam
>> >>>   > release going forward.
>> >>>   >
>> >>>   > Highlights of the Go SDK:
>> >>>   >   * Go user experience with natively-typed DoFns with
>> (simulated)
>> >>>   > generic types
>> >>>   >   * Covers most of the Beam model: ParDo, GBK, CoGBK,
>> Flatten,
>> >>>  Combine,
>> >>>   > Windowing, ..
>> >>>   >   * Includes several IO connectors: Datastore, BigQuery,
>> PubSub,
>> >>>   > extensible textio.
>> >>>   >   * Supports the portability framework for both batch and
>> streaming,
>> >>>   > notably the upcoming portable Flink runner
>> >>>   >   * Supports a direct runner for small batch workloads and
>> testing.
>> >>>   >   * Includes pre-commit tests and post-commit integration
>> tests.
>> >>>   >
>> >>>   > And last but not least
>> >>>   >   *  includes contributions from several independent
>> users and
>> >>>   > developers, notably an IO connector for Datastore!
>> >>>   >
>> >>>   > Website: https://beam.apache.org/documentation/sdks/go/
>> >>>   > Code: https://github.com/apache/beam/tree/master/sdks/go
>> >>>   > Design: https://s.apache.org/beam-go-sdk-design-rfc
>> >>>   >
>> >>>   > Please vote:
>> >>>   > [ ] +1, Appro

Re: [VOTE] Go SDK

2018-05-22 Thread Thomas Weise
+1 and congrats!


On Tue, May 22, 2018 at 8:48 AM, Rafael Fernandez 
wrote:

> +1 !
>
> On Tue, May 22, 2018 at 7:54 AM Lukasz Cwik  wrote:
>
>> +1 (binding)
>>
>> On Tue, May 22, 2018 at 6:16 AM Robert Burke  wrote:
>>
>>> +1 (non-binding)
>>>
>>> I'm looking forward to helping gophers solve their big data problems in
>>> their language of choice, and runner of choice!
>>>
>>> Next stop, a non-java portability runner?
>>>
>>> On Tue, May 22, 2018, 6:08 AM Kenneth Knowles  wrote:
>>>
 +1 (binding)

 This is great. Feels like a phase change in the life of Apache Beam,
 having three languages, with multiple portable runners on the horizon.

 Kenn

 On Tue, May 22, 2018 at 2:50 AM Ismaël Mejía  wrote:

> +1 (binding)
>
> Go SDK brings new language support for a community not well supported
> in
> the Big Data world the Go developers, so this is a great. Also the fact
> that this is the first SDK integrated with the portability work makes
> it an
> interesting project to learn lessons from for future languages.
>
> Now it is the time to start building a community around the Go SDK
> this is
> the most important task now, and the only way to do it is to have the
> SDK
> as an official part of Beam so +1.
>
> Congrats to Henning and all the other contributors for this important
> milestone.
> On Tue, May 22, 2018 at 10:21 AM Holden Karau 
> wrote:
>
> > +1 (non-binding), I've had a chance to work with the SDK and it's
> pretty
> neat to see Beam add support for a language before the most of the big
> data
> ecosystem.
>
> > On Mon, May 21, 2018 at 10:29 PM, Jean-Baptiste Onofré <
> j...@nanthrax.net>
> wrote:
>
> >> Hi Henning,
>
> >> SGA has been filed for the entire project during the incubation
> period.
>
> >> Here, we have to check if SGA/IP donation is clean for the Go SDK.
>
> >> We don't have a lot to do, just checked that we are clean on this
> front.
>
> >> Regards
> >> JB
>
> >> On 22/05/2018 06:42, Henning Rohde wrote:
>
> >>> Thanks everyone!
>
> >>> Davor -- regarding your two comments:
> >>> * Robert mentioned that "SGA should have probably already been
> filed" in the previous thread. I got the impression that nothing
> further
> was needed. I'll follow up.
> >>> * The standard Go tooling basically always pulls directly from
> github, so there is no real urgency here.
>
> >>> Thanks,
> >>>Henning
>
>
> >>> On Mon, May 21, 2018 at 9:30 PM Jean-Baptiste Onofré <
> j...@nanthrax.net
> > wrote:
>
> >>>  +1 (binding)
>
> >>>  I just want to check about SGA/IP/Headers.
>
> >>>  Thanks !
> >>>  Regards
> >>>  JB
>
> >>>  On 22/05/2018 03:02, Henning Rohde wrote:
> >>>   > Hi everyone,
> >>>   >
> >>>   > Now that the remaining issues have been resolved as
> discussed,
> >>>  I'd like
> >>>   > to propose a formal vote on accepting the Go SDK into
> master. The
> >>>  main
> >>>   > practical difference is that the Go SDK would be part of
> the
> >>>  Apache Beam
> >>>   > release going forward.
> >>>   >
> >>>   > Highlights of the Go SDK:
> >>>   >   * Go user experience with natively-typed DoFns with
> (simulated)
> >>>   > generic types
> >>>   >   * Covers most of the Beam model: ParDo, GBK, CoGBK,
> Flatten,
> >>>  Combine,
> >>>   > Windowing, ..
> >>>   >   * Includes several IO connectors: Datastore, BigQuery,
> PubSub,
> >>>   > extensible textio.
> >>>   >   * Supports the portability framework for both batch and
> streaming,
> >>>   > notably the upcoming portable Flink runner
> >>>   >   * Supports a direct runner for small batch workloads and
> testing.
> >>>   >   * Includes pre-commit tests and post-commit integration
> tests.
> >>>   >
> >>>   > And last but not least
> >>>   >   *  includes contributions from several independent users
> and
> >>>   > developers, notably an IO connector for Datastore!
> >>>   >
> >>>   > Website: https://beam.apache.org/documentation/sdks/go/
> >>>   > Code: https://github.com/apache/beam/tree/master/sdks/go
> >>>   > Design: https://s.apache.org/beam-go-sdk-design-rfc
> >>>   >
> >>>   > Please vote:
> >>>   > [ ] +1, Approve that the Go SDK becomes an official part
> of Beam
> >>>   > [ ] -1, Do not approve (please provide specific comments)
> >>>   >
> >>>   > Thanks,
> >>>   >   The Gophers of Apache Beam
> >>>   >
> >>> 

Re: [VOTE] Go SDK

2018-05-22 Thread Rafael Fernandez
+1 !

On Tue, May 22, 2018 at 7:54 AM Lukasz Cwik  wrote:

> +1 (binding)
>
> On Tue, May 22, 2018 at 6:16 AM Robert Burke  wrote:
>
>> +1 (non-binding)
>>
>> I'm looking forward to helping gophers solve their big data problems in
>> their language of choice, and runner of choice!
>>
>> Next stop, a non-java portability runner?
>>
>> On Tue, May 22, 2018, 6:08 AM Kenneth Knowles  wrote:
>>
>>> +1 (binding)
>>>
>>> This is great. Feels like a phase change in the life of Apache Beam,
>>> having three languages, with multiple portable runners on the horizon.
>>>
>>> Kenn
>>>
>>> On Tue, May 22, 2018 at 2:50 AM Ismaël Mejía  wrote:
>>>
 +1 (binding)

 Go SDK brings new language support for a community not well supported in
 the Big Data world the Go developers, so this is a great. Also the fact
 that this is the first SDK integrated with the portability work makes
 it an
 interesting project to learn lessons from for future languages.

 Now it is the time to start building a community around the Go SDK this
 is
 the most important task now, and the only way to do it is to have the
 SDK
 as an official part of Beam so +1.

 Congrats to Henning and all the other contributors for this important
 milestone.
 On Tue, May 22, 2018 at 10:21 AM Holden Karau 
 wrote:

 > +1 (non-binding), I've had a chance to work with the SDK and it's
 pretty
 neat to see Beam add support for a language before the most of the big
 data
 ecosystem.

 > On Mon, May 21, 2018 at 10:29 PM, Jean-Baptiste Onofré <
 j...@nanthrax.net>
 wrote:

 >> Hi Henning,

 >> SGA has been filed for the entire project during the incubation
 period.

 >> Here, we have to check if SGA/IP donation is clean for the Go SDK.

 >> We don't have a lot to do, just checked that we are clean on this
 front.

 >> Regards
 >> JB

 >> On 22/05/2018 06:42, Henning Rohde wrote:

 >>> Thanks everyone!

 >>> Davor -- regarding your two comments:
 >>> * Robert mentioned that "SGA should have probably already been
 filed" in the previous thread. I got the impression that nothing further
 was needed. I'll follow up.
 >>> * The standard Go tooling basically always pulls directly from
 github, so there is no real urgency here.

 >>> Thanks,
 >>>Henning


 >>> On Mon, May 21, 2018 at 9:30 PM Jean-Baptiste Onofré <
 j...@nanthrax.net
 > wrote:

 >>>  +1 (binding)

 >>>  I just want to check about SGA/IP/Headers.

 >>>  Thanks !
 >>>  Regards
 >>>  JB

 >>>  On 22/05/2018 03:02, Henning Rohde wrote:
 >>>   > Hi everyone,
 >>>   >
 >>>   > Now that the remaining issues have been resolved as
 discussed,
 >>>  I'd like
 >>>   > to propose a formal vote on accepting the Go SDK into
 master. The
 >>>  main
 >>>   > practical difference is that the Go SDK would be part of the
 >>>  Apache Beam
 >>>   > release going forward.
 >>>   >
 >>>   > Highlights of the Go SDK:
 >>>   >   * Go user experience with natively-typed DoFns with
 (simulated)
 >>>   > generic types
 >>>   >   * Covers most of the Beam model: ParDo, GBK, CoGBK,
 Flatten,
 >>>  Combine,
 >>>   > Windowing, ..
 >>>   >   * Includes several IO connectors: Datastore, BigQuery,
 PubSub,
 >>>   > extensible textio.
 >>>   >   * Supports the portability framework for both batch and
 streaming,
 >>>   > notably the upcoming portable Flink runner
 >>>   >   * Supports a direct runner for small batch workloads and
 testing.
 >>>   >   * Includes pre-commit tests and post-commit integration
 tests.
 >>>   >
 >>>   > And last but not least
 >>>   >   *  includes contributions from several independent users
 and
 >>>   > developers, notably an IO connector for Datastore!
 >>>   >
 >>>   > Website: https://beam.apache.org/documentation/sdks/go/
 >>>   > Code: https://github.com/apache/beam/tree/master/sdks/go
 >>>   > Design: https://s.apache.org/beam-go-sdk-design-rfc
 >>>   >
 >>>   > Please vote:
 >>>   > [ ] +1, Approve that the Go SDK becomes an official part of
 Beam
 >>>   > [ ] -1, Do not approve (please provide specific comments)
 >>>   >
 >>>   > Thanks,
 >>>   >   The Gophers of Apache Beam
 >>>   >
 >>>   >




 > --
 > Twitter: https://twitter.com/holdenkarau

>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [VOTE] Go SDK

2018-05-22 Thread Lukasz Cwik
+1 (binding)

On Tue, May 22, 2018 at 6:16 AM Robert Burke  wrote:

> +1 (non-binding)
>
> I'm looking forward to helping gophers solve their big data problems in
> their language of choice, and runner of choice!
>
> Next stop, a non-java portability runner?
>
> On Tue, May 22, 2018, 6:08 AM Kenneth Knowles  wrote:
>
>> +1 (binding)
>>
>> This is great. Feels like a phase change in the life of Apache Beam,
>> having three languages, with multiple portable runners on the horizon.
>>
>> Kenn
>>
>> On Tue, May 22, 2018 at 2:50 AM Ismaël Mejía  wrote:
>>
>>> +1 (binding)
>>>
>>> Go SDK brings new language support for a community not well supported in
>>> the Big Data world the Go developers, so this is a great. Also the fact
>>> that this is the first SDK integrated with the portability work makes it
>>> an
>>> interesting project to learn lessons from for future languages.
>>>
>>> Now it is the time to start building a community around the Go SDK this
>>> is
>>> the most important task now, and the only way to do it is to have the SDK
>>> as an official part of Beam so +1.
>>>
>>> Congrats to Henning and all the other contributors for this important
>>> milestone.
>>> On Tue, May 22, 2018 at 10:21 AM Holden Karau 
>>> wrote:
>>>
>>> > +1 (non-binding), I've had a chance to work with the SDK and it's
>>> pretty
>>> neat to see Beam add support for a language before the most of the big
>>> data
>>> ecosystem.
>>>
>>> > On Mon, May 21, 2018 at 10:29 PM, Jean-Baptiste Onofré <
>>> j...@nanthrax.net>
>>> wrote:
>>>
>>> >> Hi Henning,
>>>
>>> >> SGA has been filed for the entire project during the incubation
>>> period.
>>>
>>> >> Here, we have to check if SGA/IP donation is clean for the Go SDK.
>>>
>>> >> We don't have a lot to do, just checked that we are clean on this
>>> front.
>>>
>>> >> Regards
>>> >> JB
>>>
>>> >> On 22/05/2018 06:42, Henning Rohde wrote:
>>>
>>> >>> Thanks everyone!
>>>
>>> >>> Davor -- regarding your two comments:
>>> >>> * Robert mentioned that "SGA should have probably already been
>>> filed" in the previous thread. I got the impression that nothing further
>>> was needed. I'll follow up.
>>> >>> * The standard Go tooling basically always pulls directly from
>>> github, so there is no real urgency here.
>>>
>>> >>> Thanks,
>>> >>>Henning
>>>
>>>
>>> >>> On Mon, May 21, 2018 at 9:30 PM Jean-Baptiste Onofré <
>>> j...@nanthrax.net
>>> > wrote:
>>>
>>> >>>  +1 (binding)
>>>
>>> >>>  I just want to check about SGA/IP/Headers.
>>>
>>> >>>  Thanks !
>>> >>>  Regards
>>> >>>  JB
>>>
>>> >>>  On 22/05/2018 03:02, Henning Rohde wrote:
>>> >>>   > Hi everyone,
>>> >>>   >
>>> >>>   > Now that the remaining issues have been resolved as
>>> discussed,
>>> >>>  I'd like
>>> >>>   > to propose a formal vote on accepting the Go SDK into
>>> master. The
>>> >>>  main
>>> >>>   > practical difference is that the Go SDK would be part of the
>>> >>>  Apache Beam
>>> >>>   > release going forward.
>>> >>>   >
>>> >>>   > Highlights of the Go SDK:
>>> >>>   >   * Go user experience with natively-typed DoFns with
>>> (simulated)
>>> >>>   > generic types
>>> >>>   >   * Covers most of the Beam model: ParDo, GBK, CoGBK,
>>> Flatten,
>>> >>>  Combine,
>>> >>>   > Windowing, ..
>>> >>>   >   * Includes several IO connectors: Datastore, BigQuery,
>>> PubSub,
>>> >>>   > extensible textio.
>>> >>>   >   * Supports the portability framework for both batch and
>>> streaming,
>>> >>>   > notably the upcoming portable Flink runner
>>> >>>   >   * Supports a direct runner for small batch workloads and
>>> testing.
>>> >>>   >   * Includes pre-commit tests and post-commit integration
>>> tests.
>>> >>>   >
>>> >>>   > And last but not least
>>> >>>   >   *  includes contributions from several independent users
>>> and
>>> >>>   > developers, notably an IO connector for Datastore!
>>> >>>   >
>>> >>>   > Website: https://beam.apache.org/documentation/sdks/go/
>>> >>>   > Code: https://github.com/apache/beam/tree/master/sdks/go
>>> >>>   > Design: https://s.apache.org/beam-go-sdk-design-rfc
>>> >>>   >
>>> >>>   > Please vote:
>>> >>>   > [ ] +1, Approve that the Go SDK becomes an official part of
>>> Beam
>>> >>>   > [ ] -1, Do not approve (please provide specific comments)
>>> >>>   >
>>> >>>   > Thanks,
>>> >>>   >   The Gophers of Apache Beam
>>> >>>   >
>>> >>>   >
>>>
>>>
>>>
>>>
>>> > --
>>> > Twitter: https://twitter.com/holdenkarau
>>>
>>


Re: [VOTE] Go SDK

2018-05-22 Thread Robert Burke
+1 (non-binding)

I'm looking forward to helping gophers solve their big data problems in
their language of choice, and runner of choice!

Next stop, a non-java portability runner?

On Tue, May 22, 2018, 6:08 AM Kenneth Knowles  wrote:

> +1 (binding)
>
> This is great. Feels like a phase change in the life of Apache Beam,
> having three languages, with multiple portable runners on the horizon.
>
> Kenn
>
> On Tue, May 22, 2018 at 2:50 AM Ismaël Mejía  wrote:
>
>> +1 (binding)
>>
>> Go SDK brings new language support for a community not well supported in
>> the Big Data world the Go developers, so this is a great. Also the fact
>> that this is the first SDK integrated with the portability work makes it
>> an
>> interesting project to learn lessons from for future languages.
>>
>> Now it is the time to start building a community around the Go SDK this is
>> the most important task now, and the only way to do it is to have the SDK
>> as an official part of Beam so +1.
>>
>> Congrats to Henning and all the other contributors for this important
>> milestone.
>> On Tue, May 22, 2018 at 10:21 AM Holden Karau 
>> wrote:
>>
>> > +1 (non-binding), I've had a chance to work with the SDK and it's pretty
>> neat to see Beam add support for a language before the most of the big
>> data
>> ecosystem.
>>
>> > On Mon, May 21, 2018 at 10:29 PM, Jean-Baptiste Onofré > >
>> wrote:
>>
>> >> Hi Henning,
>>
>> >> SGA has been filed for the entire project during the incubation period.
>>
>> >> Here, we have to check if SGA/IP donation is clean for the Go SDK.
>>
>> >> We don't have a lot to do, just checked that we are clean on this
>> front.
>>
>> >> Regards
>> >> JB
>>
>> >> On 22/05/2018 06:42, Henning Rohde wrote:
>>
>> >>> Thanks everyone!
>>
>> >>> Davor -- regarding your two comments:
>> >>> * Robert mentioned that "SGA should have probably already been
>> filed" in the previous thread. I got the impression that nothing further
>> was needed. I'll follow up.
>> >>> * The standard Go tooling basically always pulls directly from
>> github, so there is no real urgency here.
>>
>> >>> Thanks,
>> >>>Henning
>>
>>
>> >>> On Mon, May 21, 2018 at 9:30 PM Jean-Baptiste Onofré > > wrote:
>>
>> >>>  +1 (binding)
>>
>> >>>  I just want to check about SGA/IP/Headers.
>>
>> >>>  Thanks !
>> >>>  Regards
>> >>>  JB
>>
>> >>>  On 22/05/2018 03:02, Henning Rohde wrote:
>> >>>   > Hi everyone,
>> >>>   >
>> >>>   > Now that the remaining issues have been resolved as discussed,
>> >>>  I'd like
>> >>>   > to propose a formal vote on accepting the Go SDK into master.
>> The
>> >>>  main
>> >>>   > practical difference is that the Go SDK would be part of the
>> >>>  Apache Beam
>> >>>   > release going forward.
>> >>>   >
>> >>>   > Highlights of the Go SDK:
>> >>>   >   * Go user experience with natively-typed DoFns with
>> (simulated)
>> >>>   > generic types
>> >>>   >   * Covers most of the Beam model: ParDo, GBK, CoGBK, Flatten,
>> >>>  Combine,
>> >>>   > Windowing, ..
>> >>>   >   * Includes several IO connectors: Datastore, BigQuery,
>> PubSub,
>> >>>   > extensible textio.
>> >>>   >   * Supports the portability framework for both batch and
>> streaming,
>> >>>   > notably the upcoming portable Flink runner
>> >>>   >   * Supports a direct runner for small batch workloads and
>> testing.
>> >>>   >   * Includes pre-commit tests and post-commit integration
>> tests.
>> >>>   >
>> >>>   > And last but not least
>> >>>   >   *  includes contributions from several independent users and
>> >>>   > developers, notably an IO connector for Datastore!
>> >>>   >
>> >>>   > Website: https://beam.apache.org/documentation/sdks/go/
>> >>>   > Code: https://github.com/apache/beam/tree/master/sdks/go
>> >>>   > Design: https://s.apache.org/beam-go-sdk-design-rfc
>> >>>   >
>> >>>   > Please vote:
>> >>>   > [ ] +1, Approve that the Go SDK becomes an official part of
>> Beam
>> >>>   > [ ] -1, Do not approve (please provide specific comments)
>> >>>   >
>> >>>   > Thanks,
>> >>>   >   The Gophers of Apache Beam
>> >>>   >
>> >>>   >
>>
>>
>>
>>
>> > --
>> > Twitter: https://twitter.com/holdenkarau
>>
>


Re: [VOTE] Go SDK

2018-05-22 Thread Kenneth Knowles
+1 (binding)

This is great. Feels like a phase change in the life of Apache Beam, having
three languages, with multiple portable runners on the horizon.

Kenn

On Tue, May 22, 2018 at 2:50 AM Ismaël Mejía  wrote:

> +1 (binding)
>
> Go SDK brings new language support for a community not well supported in
> the Big Data world the Go developers, so this is a great. Also the fact
> that this is the first SDK integrated with the portability work makes it an
> interesting project to learn lessons from for future languages.
>
> Now it is the time to start building a community around the Go SDK this is
> the most important task now, and the only way to do it is to have the SDK
> as an official part of Beam so +1.
>
> Congrats to Henning and all the other contributors for this important
> milestone.
> On Tue, May 22, 2018 at 10:21 AM Holden Karau 
> wrote:
>
> > +1 (non-binding), I've had a chance to work with the SDK and it's pretty
> neat to see Beam add support for a language before the most of the big data
> ecosystem.
>
> > On Mon, May 21, 2018 at 10:29 PM, Jean-Baptiste Onofré 
> wrote:
>
> >> Hi Henning,
>
> >> SGA has been filed for the entire project during the incubation period.
>
> >> Here, we have to check if SGA/IP donation is clean for the Go SDK.
>
> >> We don't have a lot to do, just checked that we are clean on this front.
>
> >> Regards
> >> JB
>
> >> On 22/05/2018 06:42, Henning Rohde wrote:
>
> >>> Thanks everyone!
>
> >>> Davor -- regarding your two comments:
> >>> * Robert mentioned that "SGA should have probably already been
> filed" in the previous thread. I got the impression that nothing further
> was needed. I'll follow up.
> >>> * The standard Go tooling basically always pulls directly from
> github, so there is no real urgency here.
>
> >>> Thanks,
> >>>Henning
>
>
> >>> On Mon, May 21, 2018 at 9:30 PM Jean-Baptiste Onofré  > wrote:
>
> >>>  +1 (binding)
>
> >>>  I just want to check about SGA/IP/Headers.
>
> >>>  Thanks !
> >>>  Regards
> >>>  JB
>
> >>>  On 22/05/2018 03:02, Henning Rohde wrote:
> >>>   > Hi everyone,
> >>>   >
> >>>   > Now that the remaining issues have been resolved as discussed,
> >>>  I'd like
> >>>   > to propose a formal vote on accepting the Go SDK into master.
> The
> >>>  main
> >>>   > practical difference is that the Go SDK would be part of the
> >>>  Apache Beam
> >>>   > release going forward.
> >>>   >
> >>>   > Highlights of the Go SDK:
> >>>   >   * Go user experience with natively-typed DoFns with
> (simulated)
> >>>   > generic types
> >>>   >   * Covers most of the Beam model: ParDo, GBK, CoGBK, Flatten,
> >>>  Combine,
> >>>   > Windowing, ..
> >>>   >   * Includes several IO connectors: Datastore, BigQuery,
> PubSub,
> >>>   > extensible textio.
> >>>   >   * Supports the portability framework for both batch and
> streaming,
> >>>   > notably the upcoming portable Flink runner
> >>>   >   * Supports a direct runner for small batch workloads and
> testing.
> >>>   >   * Includes pre-commit tests and post-commit integration
> tests.
> >>>   >
> >>>   > And last but not least
> >>>   >   *  includes contributions from several independent users and
> >>>   > developers, notably an IO connector for Datastore!
> >>>   >
> >>>   > Website: https://beam.apache.org/documentation/sdks/go/
> >>>   > Code: https://github.com/apache/beam/tree/master/sdks/go
> >>>   > Design: https://s.apache.org/beam-go-sdk-design-rfc
> >>>   >
> >>>   > Please vote:
> >>>   > [ ] +1, Approve that the Go SDK becomes an official part of
> Beam
> >>>   > [ ] -1, Do not approve (please provide specific comments)
> >>>   >
> >>>   > Thanks,
> >>>   >   The Gophers of Apache Beam
> >>>   >
> >>>   >
>
>
>
>
> > --
> > Twitter: https://twitter.com/holdenkarau
>


Re: I'm back and ready to help grow our community!

2018-05-22 Thread Matthias Baetens
Same here - shame on me. Congratulations on the graduation Gris, very happy
to have you back!

On Tue, 22 May 2018 at 09:19 Ismaël Mejía  wrote:

> I missed somehow this email thread.
> Congratulations Gris and welcome back!
>
> On Fri, May 18, 2018 at 5:34 AM Jesse Anderson 
> wrote:
>
> > Congrats!
>
> > On Thu, May 17, 2018, 6:44 PM Robert Burke  wrote:
>
> >> Congrats & welcome back!
>
> >> On Thu, May 17, 2018, 5:44 PM Huygaa Batsaikhan 
> wrote:
>
> >>> Welcome back, Gris! Congratulations!
>
> >>> On Thu, May 17, 2018 at 4:24 PM Robert Bradshaw 
> wrote:
>
>  Congratulations, Gris! And welcome back!
>  On Thu, May 17, 2018 at 3:30 PM Robin Qiu  wrote:
>
>  > Congratulations! Welcome back!
>
>  > On Thu, May 17, 2018 at 3:23 PM Reuven Lax 
> wrote:
>
>  >> Congratulations! Good to see you back!
>
>  >> Reuven
>
>  >> On Thu, May 17, 2018 at 2:24 PM Griselda Cuevas 
> wrote:
>
>  >>> Hi Everyone,
>
>
>  >>> I was absent from the mailing list, slack channel and our Beam
>  community for the past six weeks, the reason was that I took a leave
> to
>  focus on finishing my Masters Degree, which I finally did on May 15th.
>
>
>  >>> I graduated as a Masters of Engineering in Operations Research
> with a
>  concentration in Data Science from UC Berkeley. I'm glad to be part of
> this
>  community and I'd like to share this accomplishment with you so I'm
> adding
>  two pictures of that day :)
>
>
>  >>> Given that I've seen so many new folks around, I'd like to use
> this
>  opportunity to re-introduce myself. I'm Gris Cuevas and I work at
> Google.
>  Now that I'm back, I'll continue to work on supporting our community
> in two
>  main streams: Contribution Experience & Events, Meetups, and
> Conferences.
>
>
>  >>> It's good to be back and I look forward to collaborating with you.
>
>
>  >>> Cheers,
>
>  >>> Gris
>
--


Re: [VOTE] Go SDK

2018-05-22 Thread Ismaël Mejía
+1 (binding)

Go SDK brings new language support for a community not well supported in
the Big Data world the Go developers, so this is a great. Also the fact
that this is the first SDK integrated with the portability work makes it an
interesting project to learn lessons from for future languages.

Now it is the time to start building a community around the Go SDK this is
the most important task now, and the only way to do it is to have the SDK
as an official part of Beam so +1.

Congrats to Henning and all the other contributors for this important
milestone.
On Tue, May 22, 2018 at 10:21 AM Holden Karau  wrote:

> +1 (non-binding), I've had a chance to work with the SDK and it's pretty
neat to see Beam add support for a language before the most of the big data
ecosystem.

> On Mon, May 21, 2018 at 10:29 PM, Jean-Baptiste Onofré 
wrote:

>> Hi Henning,

>> SGA has been filed for the entire project during the incubation period.

>> Here, we have to check if SGA/IP donation is clean for the Go SDK.

>> We don't have a lot to do, just checked that we are clean on this front.

>> Regards
>> JB

>> On 22/05/2018 06:42, Henning Rohde wrote:

>>> Thanks everyone!

>>> Davor -- regarding your two comments:
>>> * Robert mentioned that "SGA should have probably already been
filed" in the previous thread. I got the impression that nothing further
was needed. I'll follow up.
>>> * The standard Go tooling basically always pulls directly from
github, so there is no real urgency here.

>>> Thanks,
>>>Henning


>>> On Mon, May 21, 2018 at 9:30 PM Jean-Baptiste Onofré mailto:j...@nanthrax.net>> wrote:

>>>  +1 (binding)

>>>  I just want to check about SGA/IP/Headers.

>>>  Thanks !
>>>  Regards
>>>  JB

>>>  On 22/05/2018 03:02, Henning Rohde wrote:
>>>   > Hi everyone,
>>>   >
>>>   > Now that the remaining issues have been resolved as discussed,
>>>  I'd like
>>>   > to propose a formal vote on accepting the Go SDK into master. The
>>>  main
>>>   > practical difference is that the Go SDK would be part of the
>>>  Apache Beam
>>>   > release going forward.
>>>   >
>>>   > Highlights of the Go SDK:
>>>   >   * Go user experience with natively-typed DoFns with (simulated)
>>>   > generic types
>>>   >   * Covers most of the Beam model: ParDo, GBK, CoGBK, Flatten,
>>>  Combine,
>>>   > Windowing, ..
>>>   >   * Includes several IO connectors: Datastore, BigQuery, PubSub,
>>>   > extensible textio.
>>>   >   * Supports the portability framework for both batch and
streaming,
>>>   > notably the upcoming portable Flink runner
>>>   >   * Supports a direct runner for small batch workloads and
testing.
>>>   >   * Includes pre-commit tests and post-commit integration tests.
>>>   >
>>>   > And last but not least
>>>   >   *  includes contributions from several independent users and
>>>   > developers, notably an IO connector for Datastore!
>>>   >
>>>   > Website: https://beam.apache.org/documentation/sdks/go/
>>>   > Code: https://github.com/apache/beam/tree/master/sdks/go
>>>   > Design: https://s.apache.org/beam-go-sdk-design-rfc
>>>   >
>>>   > Please vote:
>>>   > [ ] +1, Approve that the Go SDK becomes an official part of Beam
>>>   > [ ] -1, Do not approve (please provide specific comments)
>>>   >
>>>   > Thanks,
>>>   >   The Gophers of Apache Beam
>>>   >
>>>   >




> --
> Twitter: https://twitter.com/holdenkarau


Re: [VOTE] Go SDK

2018-05-22 Thread Holden Karau
+1 (non-binding), I've had a chance to work with the SDK and it's pretty
neat to see Beam add support for a language before the most of the big data
ecosystem.

On Mon, May 21, 2018 at 10:29 PM, Jean-Baptiste Onofré 
wrote:

> Hi Henning,
>
> SGA has been filed for the entire project during the incubation period.
>
> Here, we have to check if SGA/IP donation is clean for the Go SDK.
>
> We don't have a lot to do, just checked that we are clean on this front.
>
> Regards
> JB
>
> On 22/05/2018 06:42, Henning Rohde wrote:
>
>> Thanks everyone!
>>
>> Davor -- regarding your two comments:
>>* Robert mentioned that "SGA should have probably already been filed"
>> in the previous thread. I got the impression that nothing further was
>> needed. I'll follow up.
>>* The standard Go tooling basically always pulls directly from github,
>> so there is no real urgency here.
>>
>> Thanks,
>>   Henning
>>
>>
>> On Mon, May 21, 2018 at 9:30 PM Jean-Baptiste Onofré > > wrote:
>>
>> +1 (binding)
>>
>> I just want to check about SGA/IP/Headers.
>>
>> Thanks !
>> Regards
>> JB
>>
>> On 22/05/2018 03:02, Henning Rohde wrote:
>>  > Hi everyone,
>>  >
>>  > Now that the remaining issues have been resolved as discussed,
>> I'd like
>>  > to propose a formal vote on accepting the Go SDK into master. The
>> main
>>  > practical difference is that the Go SDK would be part of the
>> Apache Beam
>>  > release going forward.
>>  >
>>  > Highlights of the Go SDK:
>>  >   * Go user experience with natively-typed DoFns with (simulated)
>>  > generic types
>>  >   * Covers most of the Beam model: ParDo, GBK, CoGBK, Flatten,
>> Combine,
>>  > Windowing, ..
>>  >   * Includes several IO connectors: Datastore, BigQuery, PubSub,
>>  > extensible textio.
>>  >   * Supports the portability framework for both batch and
>> streaming,
>>  > notably the upcoming portable Flink runner
>>  >   * Supports a direct runner for small batch workloads and testing.
>>  >   * Includes pre-commit tests and post-commit integration tests.
>>  >
>>  > And last but not least
>>  >   *  includes contributions from several independent users and
>>  > developers, notably an IO connector for Datastore!
>>  >
>>  > Website: https://beam.apache.org/documentation/sdks/go/
>>  > Code: https://github.com/apache/beam/tree/master/sdks/go
>>  > Design: https://s.apache.org/beam-go-sdk-design-rfc
>>  >
>>  > Please vote:
>>  > [ ] +1, Approve that the Go SDK becomes an official part of Beam
>>  > [ ] -1, Do not approve (please provide specific comments)
>>  >
>>  > Thanks,
>>  >   The Gophers of Apache Beam
>>  >
>>  >
>>
>>


-- 
Twitter: https://twitter.com/holdenkarau


Re: I'm back and ready to help grow our community!

2018-05-22 Thread Ismaël Mejía
I missed somehow this email thread.
Congratulations Gris and welcome back!

On Fri, May 18, 2018 at 5:34 AM Jesse Anderson 
wrote:

> Congrats!

> On Thu, May 17, 2018, 6:44 PM Robert Burke  wrote:

>> Congrats & welcome back!

>> On Thu, May 17, 2018, 5:44 PM Huygaa Batsaikhan 
wrote:

>>> Welcome back, Gris! Congratulations!

>>> On Thu, May 17, 2018 at 4:24 PM Robert Bradshaw 
wrote:

 Congratulations, Gris! And welcome back!
 On Thu, May 17, 2018 at 3:30 PM Robin Qiu  wrote:

 > Congratulations! Welcome back!

 > On Thu, May 17, 2018 at 3:23 PM Reuven Lax  wrote:

 >> Congratulations! Good to see you back!

 >> Reuven

 >> On Thu, May 17, 2018 at 2:24 PM Griselda Cuevas 
wrote:

 >>> Hi Everyone,


 >>> I was absent from the mailing list, slack channel and our Beam
 community for the past six weeks, the reason was that I took a leave to
 focus on finishing my Masters Degree, which I finally did on May 15th.


 >>> I graduated as a Masters of Engineering in Operations Research
with a
 concentration in Data Science from UC Berkeley. I'm glad to be part of
this
 community and I'd like to share this accomplishment with you so I'm
adding
 two pictures of that day :)


 >>> Given that I've seen so many new folks around, I'd like to use this
 opportunity to re-introduce myself. I'm Gris Cuevas and I work at
Google.
 Now that I'm back, I'll continue to work on supporting our community
in two
 main streams: Contribution Experience & Events, Meetups, and
Conferences.


 >>> It's good to be back and I look forward to collaborating with you.


 >>> Cheers,

 >>> Gris