Re: Doc on running validatesRunner test

2018-12-14 Thread Thomas Weise
Thanks for bringing it here. It would be a good candidate for
https://cwiki.apache.org/confluence/display/BEAM/Gradle+Tips

Thomas

On Fri, Dec 14, 2018 at 8:00 PM Manu Zhang  wrote:

> Hi all,
>
> I was looking for the command to run validatesRunner test on Gearpump but
> couldn’t find it in the contributing guide.
>
> After some trials, it seems the following command won’t work
>
>
> *./gradlew :beam-runners-gearpump:validatesRunner —tests
> org.apache.beam.sdk.PipelineTest*
>
> but this does
>
>
> *./gradlew :beam-runners-gearpump:validatesRunnerStreaming —tests
> org.apache.beam.sdk.PipelineTest*
>
> Has this been documented anywhere ?
>
> Thanks,
> Manu Zhang
>


Doc on running validatesRunner test

2018-12-14 Thread Manu Zhang
Hi all,

I was looking for the command to run validatesRunner test on Gearpump but 
couldn’t find it in the contributing guide.

After some trials, it seems the following command won’t work

./gradlew :beam-runners-gearpump:validatesRunner —tests 
org.apache.beam.sdk.PipelineTest

but this does

./gradlew :beam-runners-gearpump:validatesRunnerStreaming —tests 
org.apache.beam.sdk.PipelineTest

Has this been documented anywhere ?

Thanks,
Manu Zhang


Re: [ANNOUNCE] Apache Beam 2.9.0 released!

2018-12-14 Thread Rui Wang
Thanks to make it happen!

-Rui

On Fri, Dec 14, 2018 at 11:00 AM Scott Wegner  wrote:

> Congrats everyone! Thanks Cham for doing the release.
>
> On Thu, Dec 13, 2018 at 7:42 PM Chamikara Jayalath 
> wrote:
>
>> The Apache Beam team is pleased to announce the release of version 2.9.0!
>>
>> Apache Beam is an open source unified programming model to define and
>> execute data processing pipelines, including ETL, batch and stream
>> (continuous) processing. See https://beam.apache.org
>>
>> You can download the release here:
>>
>> https://beam.apache.org/get-started/downloads/
>>
>> This release includes the following major new features & improvements.
>> Please see the blog post for more details:
>> https://beam.apache.org/blog/2018/12/13/beam-2.9.0.html
>>
>> Thanks to everyone who contributed to this release, and we hope you enjoy
>> using Beam 2.9.0.
>> -- Chamikara Jayalath, on behalf of The Apache Beam team
>>
>
>
> --
>
>
>
>
> Got feedback? tinyurl.com/swegner-feedback
>


Re: [ANNOUNCE] Apache Beam 2.9.0 released!

2018-12-14 Thread Scott Wegner
Congrats everyone! Thanks Cham for doing the release.

On Thu, Dec 13, 2018 at 7:42 PM Chamikara Jayalath 
wrote:

> The Apache Beam team is pleased to announce the release of version 2.9.0!
>
> Apache Beam is an open source unified programming model to define and
> execute data processing pipelines, including ETL, batch and stream
> (continuous) processing. See https://beam.apache.org
>
> You can download the release here:
>
> https://beam.apache.org/get-started/downloads/
>
> This release includes the following major new features & improvements.
> Please see the blog post for more details:
> https://beam.apache.org/blog/2018/12/13/beam-2.9.0.html
>
> Thanks to everyone who contributed to this release, and we hope you enjoy
> using Beam 2.9.0.
> -- Chamikara Jayalath, on behalf of The Apache Beam team
>


-- 




Got feedback? tinyurl.com/swegner-feedback


Re: 2019 Beam Events

2018-12-14 Thread Austin Bennett
Since we are starting to think more about in person meetings, I put
together an 'in-person' page within the community section of the website.

If someone wants to merge:  https://github.com/apache/beam/pull/7279






On Thu, Dec 13, 2018 at 2:09 AM Etienne Chauchot 
wrote:

> Great work ! Thanks for sharing Gris !
>
> Etienne
>
> Le mercredi 05 décembre 2018 à 07:47 +, Matthias Baetens a écrit :
>
> Great stuff, Gris! Looking forward to what 2019 will bring!
>
> The Beam meetup in London will have a new get together early next year as
> well :-)
> https://www.meetup.com/London-Apache-Beam-Meetup/
>
>
> On Tue, 4 Dec 2018 at 23:50 Austin Bennett 
> wrote:
>
> Already got that process kicked off with the NY and LA meet ups, now that
> SF is about to be inagurated goal will be to get these moving as well.
>
> For anyone that is in (or goes to) those areas:
> https://www.meetup.com/New-York-Apache-Beam/
> https://www.meetup.com/Los-Angeles-Apache-Beam/
>
> Please reach out to get involved!
>
>
>
> On Tue, Dec 4, 2018 at 3:13 PM Griselda Cuevas  wrote:
>
> +1 to Pablo's suggestion, if there's interest in "Founding a Meetup group
> in a particular city, let's create the Meetup page and start getting sign
> ups. Joana will be reaching out with a comprenhexive list of how to get
> started and we're hoping to compile a high level calendar of
> launches/announcements to feed into your meetup.
>
> G
>
> On Tue, 4 Dec 2018 at 12:04, Daniel Salerno  wrote:
>
> =)
>
> What good news!
>
> Okay, I'll set up the group and try to get interested.
>
> Thank you
>
>
> Em ter, 4 de dez de 2018 às 17:19, Pablo Estrada 
> escreveu:
>
> FWIW, for some of these places that have interest (e.g. Brazil, Israel),
> it's possible to create a group in meetup.com, and start gauging
> interest, and looking for organizers.
> Once a group of people with interest exists, it's easier to get interest /
> sponsorship to bring speakers.
> So if you are willing to create the group in meetup, Daniel, we can
> monitor it and try to plan something as it grows : )
> Best
> -P.
>
> On Tue, Dec 4, 2018 at 10:55 AM Daniel Salerno 
> wrote:
>
>
> It's a shame that there are no events in Brazil ...
>
> =(
>
> Em ter, 4 de dez de 2018 às 13:12, OrielResearch Eila Arich-Landkof <
> e...@orielresearch.org> escreveu:
>
> agree 
>
> On Tue, Dec 4, 2018 at 5:41 AM Chaim Turkel  wrote:
>
> Israel would be nice to have one
> chaim
> On Tue, Dec 4, 2018 at 12:33 AM Griselda Cuevas  wrote:
> >
> > Hi Beam Community,
> >
> > I started curating industry conferences, meetups and events that are
> relevant for Beam, this initial list I came up with. I'd love your help
> adding others that I might have overlooked. Once we're satisfied with the
> list, let's re-share so we can coordinate proposal submissions, attendance
> and community meetups there.
> >
> >
> > Cheers,
> >
> > G
> >
> >
> >
>
> --
>
>
> Loans are funded by
> FinWise Bank, a Utah-chartered bank located in Sandy,
> Utah, member FDIC, Equal
> Opportunity Lender. Merchant Cash Advances are
> made by Behalf. For more
> information on ECOA, click here
> . For important information about
> opening a new
> account, review Patriot Act procedures here
> .
> Visit Legal
>  to
> review our comprehensive program terms,
> conditions, and disclosures.
>
>
>
>


Re: GSOC - Summer of Code, on Beam?

2018-12-14 Thread Kenneth Knowles
I put together a (currently empty) saved search for Beam's GSoC issues:
https://issues.apache.org/jira/issues/?filter=12345337

Kenn

On Wed, Dec 12, 2018 at 1:10 AM Ismaël Mejía  wrote:

> Oh I had not seen that the announce was official, so time to get ready.
>
> https://opensource.googleblog.com/2018/11/google-summer-of-code-15-years-strong.html
>
> Mentors should have proposals ready around January 15, 2019. Remember
> timeline matters.
> https://developers.google.com/open-source/gsoc/timeline
>
> On Tue, Dec 11, 2018 at 6:14 PM Ismaël Mejía  wrote:
> >
> > You have to register the concrete proposal, no need to register the
> > project since the organization (ASF) is already part of GSoC.
> > On Tue, Dec 11, 2018 at 12:42 PM Maximilian Michels 
> wrote:
> > >
> > > I think that's a great idea if we can find good candidates. Do we have
> to
> > > register the project to be able to receive applications from students?
> > >
> > > On 07.12.18 16:44, Ismaël Mejía wrote:
> > > > Last year we had two proposals. Kenneth proposal around SQL was
> > > > accepted and if I remember correctly a success.
> > > >
> > > > For students interested you should do the complete process via the
> > > > GSoC website and look for the ‘gsoc’ label.
> > > >
> > > > For committers who want to mentor students you have to subscribe into
> > > > the GSoC website, but also to the mentors@ mailing list, where you
> > > > should fill the proposal through a document shared there (and yes
> this
> > > > is not documented at all, need to fix that). This document is
> prepared
> > > > by the ASF because is the full organization (not the project) who
> > > > requests the participation. Sadly I was not aware of this process and
> > > > I missed the Apache deadline (some days before the GSoC deadline)
> last
> > > > year and for that reason my proposal did not get accepted. So pay
> > > > attention to that Pablo (or the others that want to mentor some
> > > > students).
> > > >
> > > > On Thu, Dec 6, 2018 at 8:21 PM jhzgg2...@gmail.com <
> jhzgg2...@gmail.com> wrote:
> > > >>
> > > >>
> > > >>
> > > >> On 2018/12/05 00:29:48, Pablo Estrada  wrote:
> > > >>> Hi Austin!
> > > >>> Thanks a lot for surfacing this. I participated in GSOC as a
> student a
> > > >>> couple times, and loved it. This being my first time around as a
> committer,
> > > >>> I'm excited to try and help.
> > > >>>
> > > >>> I think, for starters, it may be good to find issues in JIRA to
> label with
> > > >>> "gsoc", so please everyone who knows of good candidate project
> issues,
> > > >>> label them with "gsoc".
> > > >>>
> > > >>> And then we can find mentors for these issues, and start helping
> students
> > > >>> in the application process.
> > > >>>
> > > >>> Best
> > > >>> -P.
> > > >>>
> > > >>> On Tue, Dec 4, 2018 at 3:46 PM Austin Bennett <
> whatwouldausti...@gmail.com>
> > > >>> wrote:
> > > >>>
> > >  Would it make sense to have any GSOC students for next summer
> work on
> > >  Beam?  Do we have some candidate things that would be suitable and
> > >  sufficiently discrete projects?
> > > 
> > >  Initial applications for organizations not even open for about a
> month,
> > >  though thought worth getting a sense from the group.
> > > 
> > >  A bit of info:
> > >  https://summerofcode.withgoogle.com/archive/
> > > 
> > > 
> https://opensource.googleblog.com/2018/11/google-summer-of-code-15-years-strong.html
> > > 
> > > 
> > > 
> > > 
> > > >>> Hi Pablo!
> > > >>> I am a junior majoring in CS and interested in Apache Beam and
> data process. I hope to >participate in GSOC and work on Beam next summer.
> Could you give me some advice on
> > > >>> how to prepare for it? Thanks a lot.
> > > >>
>


Re: [ANNOUNCE] Apache Beam 2.9.0 released!

2018-12-14 Thread Chamikara Jayalath
On Fri, Dec 14, 2018 at 9:28 AM Ruoyun Huang  wrote:

> Great work in making this happen, Chamikara!
>
> PS> I updated new releases in the real wiki
> page as well. :-P
>

Ah missed that :). Thanks everyone.


>
> On Thu, Dec 13, 2018 at 7:42 PM Chamikara Jayalath 
> wrote:
>
>> The Apache Beam team is pleased to announce the release of version 2.9.0!
>>
>> Apache Beam is an open source unified programming model to define and
>> execute data processing pipelines, including ETL, batch and stream
>> (continuous) processing. See https://beam.apache.org
>>
>> You can download the release here:
>>
>> https://beam.apache.org/get-started/downloads/
>>
>> This release includes the following major new features & improvements.
>> Please see the blog post for more details:
>> https://beam.apache.org/blog/2018/12/13/beam-2.9.0.html
>>
>> Thanks to everyone who contributed to this release, and we hope you enjoy
>> using Beam 2.9.0.
>> -- Chamikara Jayalath, on behalf of The Apache Beam team
>>
>
>
> --
> 
> Ruoyun  Huang
>
>


Re: [ANNOUNCEMENT] [SQL] [BEAM-6133] Support for user defined table functions (UDTF)

2018-12-14 Thread Kenneth Knowles
OK great. I happen to have recently read that bit of SQL 2016 so that puts
it in context for me very nicely.

Kenn

On Fri, Dec 14, 2018 at 3:58 AM Gleb Kanterov  wrote:

> Kenn, I don't have a copy of a recent SQL standard to confirm what I'm
> saying. To my knowledge, initially, there was a concept of a table
> function. Table functions should have a static type that doesn't depend on
> supplied arguments. In ANSI SQL 2016, there is a concept of polymorphic
> table functions, that can infer types depending on provided arguments. Both
> TableFunction and TableMacro in Calcite are polymorphic table functions,
> and the difference between TableFunction and TableMacro is internal to
> Calcite.
>
> Gleb
>
>
>
> On Fri, Dec 14, 2018 at 4:26 AM Kenneth Knowles  wrote:
>
>> Sorry for the slow reply & review. Having UDTF support in Beam SQL is
>> extremely useful. Are both table functions and table macros part of
>> "standard" SQL or is this a distinction between different Calcite concepts?
>>
>> Kenn
>>
>> On Wed, Nov 28, 2018 at 10:36 AM Gleb Kanterov  wrote:
>>
>>> At the moment we support only ScalarFunction UDF, it's functions that
>>> operate on row fields. In Calcite, there are 3 kinds of UDFs: aggregate
>>> functions (that we already support), table macro and table functions. The
>>> difference between table functions and macros is that macros expand to
>>> relations, and table functions can refer to anything queryable, e.g.,
>>> enumerables. But in the case of Beam SQL, given everything translates to
>>> PTransforms, only table macros are relevant.
>>>
>>> UDTF are in a way similar to external tables but don't require to
>>> specify a schema explicitly. Instead, they can derive schema based on
>>> arguments. One of the use-cases would be querying ranges of dataset
>>> partitions using a helper function like:
>>>
>>> SELECT COUNT(*) FROM table(readAvro(id => 'dataset', start =>
>>> '2017-01-01', end => '2018-01-01'))
>>>
>>> With BEAM-6133  (
>>> apache/beam/#7141 ) we would
>>> have support for UDTF in Beam SQL.
>>>
>>> [1] https://issues.apache.org/jira/browse/BEAM-6133
>>> [2] https://github.com/apache/beam/pull/7141
>>>
>>> Gleb
>>>
>>
>
> --
> Cheers,
> Gleb
>


Re: [ANNOUNCE] Apache Beam 2.9.0 released!

2018-12-14 Thread Ruoyun Huang
Great work in making this happen, Chamikara!

PS> I updated new releases in the real wiki
page as well. :-P

On Thu, Dec 13, 2018 at 7:42 PM Chamikara Jayalath 
wrote:

> The Apache Beam team is pleased to announce the release of version 2.9.0!
>
> Apache Beam is an open source unified programming model to define and
> execute data processing pipelines, including ETL, batch and stream
> (continuous) processing. See https://beam.apache.org
>
> You can download the release here:
>
> https://beam.apache.org/get-started/downloads/
>
> This release includes the following major new features & improvements.
> Please see the blog post for more details:
> https://beam.apache.org/blog/2018/12/13/beam-2.9.0.html
>
> Thanks to everyone who contributed to this release, and we hope you enjoy
> using Beam 2.9.0.
> -- Chamikara Jayalath, on behalf of The Apache Beam team
>


-- 

Ruoyun  Huang


Re: Can we allow SimpleFunction and SerializableFunction to throw Exception?

2018-12-14 Thread Jeff Klukas
Checking in on this thread. Anybody interested to review
https://github.com/apache/beam/pull/7160 ?

There could be some discussion on whether these names are the right names,
but otherwise the only potential objection I see here is the additional
burden on developers to understand the differences between the existing
(SerializableFunction and SimpleFunction) and the new (ProcessFunction and
InferableFunction). I originally planned on marking the existing ones as
deprecated, but decided there are contexts where disallowing checked
exceptions probably makes sense. So we now have 4 objects for developers to
be familiar with rather than 2.

On Fri, Dec 7, 2018 at 6:54 AM Robert Bradshaw  wrote:

> How should we move forward on this? The idea looks good, and even
> comes with a PR to review. Any objections to the names?
> On Wed, Dec 5, 2018 at 12:48 PM Jeff Klukas  wrote:
> >
> > Reminder that I'm looking for review on
> https://github.com/apache/beam/pull/7160
> >
> > On Thu, Nov 29, 2018, 11:48 AM Jeff Klukas  >>
> >> I created a JIRA and a PR for this:
> >>
> >> https://issues.apache.org/jira/browse/BEAM-6150
> >> https://github.com/apache/beam/pull/7160
> >>
> >> On naming, I'm proposing that SerializableFunction extend
> ProcessFunction (since this new superinterface is particularly appropriate
> for user code executed inside a ProcessElement method) and that
> SimpleFunction extend InferableFunction (since type information and coder
> inference are what distinguish this from ProcessFunction).
> >>
> >> We originally discussed deprecating SerializableFunction and
> SimpleFunction in favor of the new types, but there appear to be two fairly
> separate use cases for SerializableFunction. It's either defining user code
> that will be executed in a DoFn, in which case I think we always want to
> prefer the new interface that allows declared exceptions. But it's also
> used where the code is to be executed as part of pipeline construction, in
> which case it may be reasonable to want to restrict checked exceptions. In
> any case, deprecating SerializableFunction and SimpleFunction can be
> discussed further in the future.
> >>
> >>
> >> On Wed, Nov 28, 2018 at 9:53 PM Kenneth Knowles 
> wrote:
> >>>
> >>> Nice! A clean solution and an opportunity to bikeshed on names. This
> has everything I love.
> >>>
> >>> Kenn
> >>>
> >>> On Wed, Nov 28, 2018 at 6:43 PM Jeff Klukas 
> wrote:
> 
>  It looks like we can add make the new interface a superinterface for
> the existing SerializableFunction while maintaining binary compatibility
> [0].
> 
>  We'd have:
> 
>  public interface NewSerializableFunction extends
> Serializable {
>    OutputT apply(InputT input) throws Exception;
>  }
> 
>  and then modify SerializableFunction to inherit from it:
> 
>  public interface SerializableFunction extends
> NewSerializableFunction, Serializable {
>    @Override
>    OutputT apply(InputT input);
>  }
> 
> 
>  IIUC, we can then more or less replace all references to
> SerializableFunction with NewSerializableFunction across the beam codebase
> without having to introduce any new overrides. I'm working on a proof of
> concept now.
> 
>  It's not clear what the actual names for NewSerializableFunction and
> NewSimpleFunction should be.
> 
>  [0]
> https://docs.oracle.com/javase/specs/jls/se8/html/jls-13.html#jls-13.4.4
> 
> 
>  On Mon, Nov 26, 2018 at 9:54 PM Thomas Weise  wrote:
> >
> > +1 for introducing the new interface now and deprecating the old
> one. The major version change then provides the opportunity to remove
> deprecated code.
> >
> >
> > On Mon, Nov 26, 2018 at 10:09 AM Lukasz Cwik 
> wrote:
> >>
> >> Before 3.0 we will still want to introduce this giving time for
> people to migrate, would it make sense to do that now and deprecate the
> alternatives that it replaces?
> >>
> >> On Mon, Nov 26, 2018 at 5:59 AM Jeff Klukas 
> wrote:
> >>>
> >>> Picking up this thread again. Based on the feedback from Kenn,
> Reuven, and Romain, it sounds like there's no objection to the idea of
> SimpleFunction and SerializableFunction declaring that they throw
> Exception. So the discussion at this point is about whether there's an
> acceptable way to introduce that change.
> >>>
> >>> IIUC correctly, Kenn was suggesting that we need to ensure
> backwards compatibility for existing user code both at runtime and
> recompile, which means we can't simply add the declaration to the existing
> interfaces, since that would cause errors at compile time for user code
> directly invoking SerializableFunction instances.
> >>>
> >>> I don't see an obvious way that introducing a new functional
> interface would help without littering the API with more variants (it's
> already a bit confusing that i.e. MapElements has multiple via() methods to
> support three different function 

Re: [ANNOUNCEMENT] [SQL] [BEAM-6133] Support for user defined table functions (UDTF)

2018-12-14 Thread Gleb Kanterov
Kenn, I don't have a copy of a recent SQL standard to confirm what I'm
saying. To my knowledge, initially, there was a concept of a table
function. Table functions should have a static type that doesn't depend on
supplied arguments. In ANSI SQL 2016, there is a concept of polymorphic
table functions, that can infer types depending on provided arguments. Both
TableFunction and TableMacro in Calcite are polymorphic table functions,
and the difference between TableFunction and TableMacro is internal to
Calcite.

Gleb



On Fri, Dec 14, 2018 at 4:26 AM Kenneth Knowles  wrote:

> Sorry for the slow reply & review. Having UDTF support in Beam SQL is
> extremely useful. Are both table functions and table macros part of
> "standard" SQL or is this a distinction between different Calcite concepts?
>
> Kenn
>
> On Wed, Nov 28, 2018 at 10:36 AM Gleb Kanterov  wrote:
>
>> At the moment we support only ScalarFunction UDF, it's functions that
>> operate on row fields. In Calcite, there are 3 kinds of UDFs: aggregate
>> functions (that we already support), table macro and table functions. The
>> difference between table functions and macros is that macros expand to
>> relations, and table functions can refer to anything queryable, e.g.,
>> enumerables. But in the case of Beam SQL, given everything translates to
>> PTransforms, only table macros are relevant.
>>
>> UDTF are in a way similar to external tables but don't require to specify
>> a schema explicitly. Instead, they can derive schema based on arguments.
>> One of the use-cases would be querying ranges of dataset partitions using a
>> helper function like:
>>
>> SELECT COUNT(*) FROM table(readAvro(id => 'dataset', start =>
>> '2017-01-01', end => '2018-01-01'))
>>
>> With BEAM-6133  (
>> apache/beam/#7141 ) we would
>> have support for UDTF in Beam SQL.
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-6133
>> [2] https://github.com/apache/beam/pull/7141
>>
>> Gleb
>>
>

-- 
Cheers,
Gleb


Re: [ANNOUNCE] Apache Beam 2.9.0 released!

2018-12-14 Thread Maximilian Michels
Thank you Chamikara for making the release happen! And thank you for everyone 
else involved.



No Spark Structured Streaming support yet?


No yet, but I hear that work on structured streaming has started and it might 
make it into the next release.


On 14.12.18 07:00, kant kodali wrote:

No Spark Structured Streaming support yet?

On Thu, Dec 13, 2018 at 7:53 PM Connell O'Callaghan > wrote:


Excellent thank you Chamikara and all who contributed to this release!!!

On Thu, Dec 13, 2018 at 7:42 PM Chamikara Jayalath mailto:chamik...@google.com>> wrote:

The Apache Beam team is pleased to announce the release of version 
2.9.0!

Apache Beam is an open source unified programming model to define and
execute data processing pipelines, including ETL, batch and stream
(continuous) processing. See https://beam.apache.org


You can download the release here:

https://beam.apache.org/get-started/downloads/

This release includes the following major new features & improvements.
Please see the blog post for more details:
https://beam.apache.org/blog/2018/12/13/beam-2.9.0.html

Thanks to everyone who contributed to this release, and we hope you
enjoy using Beam 2.9.0.
-- Chamikara Jayalath, on behalf of The Apache Beam team



Re: [ANNOUNCE] Apache Beam 2.9.0 released!

2018-12-14 Thread Alexey Romanenko
Congrats!
Thank you to all contributors and to Chamikara for driving this release!

> On 14 Dec 2018, at 10:32, Tim  wrote:
> 
> Thank you for running the release Chamikara.
> 
> Tim,
> Sent from my iPhone
> 
> On 14 Dec 2018, at 10:30, Matt Casters  > wrote:
> 
>> Great news! Congratulations!
>> My experience venturing into the world of Apache Beam couldn't possibly have 
>> been nicer.  Thank you to all involved!
>> ---
>> Matt
>> 
>> 
>> Op vr 14 dec. 2018 om 04:42 schreef Chamikara Jayalath > >:
>> The Apache Beam team is pleased to announce the release of version 2.9.0!
>> 
>> Apache Beam is an open source unified programming model to define and
>> execute data processing pipelines, including ETL, batch and stream
>> (continuous) processing. See https://beam.apache.org 
>> 
>> 
>> You can download the release here:
>> 
>> https://beam.apache.org/get-started/downloads/ 
>> 
>> 
>> This release includes the following major new features & improvements. 
>> Please see the blog post for more details: 
>> https://beam.apache.org/blog/2018/12/13/beam-2.9.0.html 
>> 
>> 
>> Thanks to everyone who contributed to this release, and we hope you enjoy 
>> using Beam 2.9.0.
>> -- Chamikara Jayalath, on behalf of The Apache Beam team



Re: [ANNOUNCE] Apache Beam 2.9.0 released!

2018-12-14 Thread kant kodali
No Spark Structured Streaming support yet?

On Thu, Dec 13, 2018 at 7:53 PM Connell O'Callaghan 
wrote:

> Excellent thank you Chamikara and all who contributed to this release!!!
>
> On Thu, Dec 13, 2018 at 7:42 PM Chamikara Jayalath 
> wrote:
>
>> The Apache Beam team is pleased to announce the release of version 2.9.0!
>>
>> Apache Beam is an open source unified programming model to define and
>> execute data processing pipelines, including ETL, batch and stream
>> (continuous) processing. See https://beam.apache.org
>>
>> You can download the release here:
>>
>> https://beam.apache.org/get-started/downloads/
>>
>> This release includes the following major new features & improvements.
>> Please see the blog post for more details:
>> https://beam.apache.org/blog/2018/12/13/beam-2.9.0.html
>>
>> Thanks to everyone who contributed to this release, and we hope you enjoy
>> using Beam 2.9.0.
>> -- Chamikara Jayalath, on behalf of The Apache Beam team
>>
>


Re: [ANNOUNCE] Apache Beam 2.9.0 released!

2018-12-14 Thread Matt Casters
Great news! Congratulations!
My experience venturing into the world of Apache Beam couldn't possibly
have been nicer.  Thank you to all involved!
---
Matt


Op vr 14 dec. 2018 om 04:42 schreef Chamikara Jayalath :

> The Apache Beam team is pleased to announce the release of version 2.9.0!
>
> Apache Beam is an open source unified programming model to define and
> execute data processing pipelines, including ETL, batch and stream
> (continuous) processing. See https://beam.apache.org
>
> You can download the release here:
>
> https://beam.apache.org/get-started/downloads/
>
> This release includes the following major new features & improvements.
> Please see the blog post for more details:
> https://beam.apache.org/blog/2018/12/13/beam-2.9.0.html
>
> Thanks to everyone who contributed to this release, and we hope you enjoy
> using Beam 2.9.0.
> -- Chamikara Jayalath, on behalf of The Apache Beam team
>


[Fwd: [Apache Beam] Custom DataSourceV2 instanciation: parameters passing and Encoders]

2018-12-14 Thread Etienne Chauchot
Hi guys,I'm currently coding a POC on a new spark runner based on structured 
streaming and new DataSourceV2 API and I'm
having an interrogation. Having found no pointers on the internet, I've asked 
the spark community with no luck. If
anyone of you have knowledge about new Spark DataSourceV2 API, can you share 
thoughts?
Also I did not mention in the email but I did not find any way to get a 
reference on the automatically created
DataSourceV2 instance, so I cannot lazy init the source either.
Thanks
Etienne
 Message transféré De: Etienne Chauchot 
À: dev@spark.apache.orgObjet: [Apache
Beam] Custom DataSourceV2 instanciation: parameters passing and EncodersDate: 
Tue, 11 Dec 2018 19:02:23 +0100
Hi Spark guys,
I'm Etienne Chauchot and I'm a committer on the Apache Beam project. 
We have what we call runners. They are pieces of software that translate 
pipelines written using Beam API into pipelines
that use native execution engine API. Currently, the Spark runner uses old RDD 
/ DStream APIs. I'm writing a new runner
that will use structured streaming (but not continuous processing, and also no 
schema for now).
I am just starting. I'm currently trying to map our sources to yours. I'm 
targeting new DataSourceV2 API. It maps pretty
well with Beam sources but I have a problem with instanciation of the custom 
source.I searched for an answer in stack-
overflow and user ML with no luck. I guess it is a too specific question:
When visiting Beam DAG I have access to Beam objects such as Source and Reader 
that I need to map to MicroBatchReader
and InputPartitionReader.As far as I understand, a custom DataSourceV2 is 
instantiated automatically by spark thanks to
sparkSession.readStream().format(providerClassName) or similar code. The 
problem is that I can only pass options of
primitive types + String so I cannot pass the Beam Source to DataSourceV2. => 
Is there a way to do so ?

Also I get as an output a Dataset. The Row contains an instance of Beam 
WindowedValue, T is the type parameter
of the Source. I  do a map on the Dataset to transform it to a 
Dataset>. I have a question related to
the Encoder: => how to properly create an Encoder for the generic type 
WindowedValue to use in the map?
Here is the 
code:https://github.com/apache/beam/tree/spark-runner_structured-streaming
And more specially:
https://github.com/apache/beam/blob/spark-runner_structured-streaming/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.javahttps://github.com/apache/beam/blob/spark-runner_structured-streaming/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java
Thanks,
Etienne









Re: [ANNOUNCE] Apache Beam 2.9.0 released!

2018-12-14 Thread Tim
Thank you for running the release Chamikara.

Tim,
Sent from my iPhone

> On 14 Dec 2018, at 10:30, Matt Casters  wrote:
> 
> Great news! Congratulations!
> My experience venturing into the world of Apache Beam couldn't possibly have 
> been nicer.  Thank you to all involved!
> ---
> Matt
> 
> 
> Op vr 14 dec. 2018 om 04:42 schreef Chamikara Jayalath :
>> The Apache Beam team is pleased to announce the release of version 2.9.0!
>> 
>> Apache Beam is an open source unified programming model to define and
>> execute data processing pipelines, including ETL, batch and stream
>> (continuous) processing. See https://beam.apache.org
>> 
>> You can download the release here:
>> 
>> https://beam.apache.org/get-started/downloads/
>> 
>> This release includes the following major new features & improvements. 
>> Please see the blog post for more details: 
>> https://beam.apache.org/blog/2018/12/13/beam-2.9.0.html
>> 
>> Thanks to everyone who contributed to this release, and we hope you enjoy 
>> using Beam 2.9.0.
>> -- Chamikara Jayalath, on behalf of The Apache Beam team