Re: Strata Conference this March 6-8

2018-01-16 Thread Ismaël Mejía
Maybe a good idea to try to organize a Beam meetup in london in the
same dates in case some of the people around can jump in and talk too.

On Wed, Jan 17, 2018 at 2:51 AM, Ron Gonzalez  wrote:
> Works for me...
>
> On Tuesday, January 16, 2018, 5:45:33 PM PST, Holden Karau
>  wrote:
>
>
> How would folks feel about during the afternoon break (3:20-4:20) on the
> Wednesday (same day as Eugene's talk)? We could do the Philz which is a bit
> of a walk but gets us away from the big crowd and also lets folks not
> attending the conference but in the area join us.
>
> On Tue, Jan 16, 2018 at 5:29 PM, Ron Gonzalez  wrote:
>
> Cool, let me know if you guys finally schedule it. I will definitely try to
> make it to Eugene's talk but having an informal BoF in the area would be
> nice...
>
> Thanks,
> Ron
>
> On Tuesday, January 16, 2018, 5:06:53 PM PST, Boris Lublinsky
>  wrote:
>
>
> All for it
>
> Boris Lublinsky
> FDP Architect
> boris.lublin...@lightbend.com
> https://www.lightbend.com/
>
> On Jan 16, 2018, at 7:01 PM, Ted Yu  wrote:
>
> +1 to BoF
>
> On Tue, Jan 16, 2018 at 5:00 PM, Dmitry Demeshchuk 
> wrote:
>
> Probably won't be attending the conference, but totally down for a BoF.
>
> On Tue, Jan 16, 2018 at 4:58 PM, Holden Karau  wrote:
>
> Do interested folks have any timing constraints around a BoF?
>
> On Tue, Jan 16, 2018 at 4:30 PM, Jesse Anderson 
> wrote:
>
> +1 to BoF. I don't know if any Beam talks will be on the schedule.
>
>> We could do an informal BoF at the Philz nearby or similar?
>
>
>
>
> --
> Twitter: https://twitter.com/h oldenkarau
>
>
>
>
> --
> Best regards,
> Dmitry Demeshchuk.
>
>
>
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau


Re: Some interesting use case

2018-01-16 Thread Ron Gonzalez
 Yes you're right. I believe this is the use case that I'm after. So if I 
understand correctly, transforms that do aggregations just assume that the 
batch of data being aggregated is passed as part of a tensor column. Is it 
possible to hook up a lookup call to another Tensorflow Serving servable for a 
join in batch mode?
Will a saved model when loaded into a tensorflow serving model actually have 
the definitions of the metadata when retrieved using the tensorflow serving 
metadata api?
Thanks,Ron
On Tuesday, January 16, 2018, 6:16:01 PM PST, Charles Chen 
 wrote:  
 
 This sounds similar to the use case for tf.Transform, a library that depends 
on Beam: https://github.com/tensorflow/transform
On Tue, Jan 16, 2018 at 5:51 PM Ron Gonzalez  wrote:

Hi,  I was wondering if anyone has encountered or used Beam in the following 
manner:   1. During machine learning training, use Beam to create the event 
table. The flow may consist of some joins, aggregations, row-based 
transformations, etc...  2. Once the model is created, deploy the model to some 
scoring service via PMML (or some other scoring service).  3. Enable the SAME 
transformations used in #1 by using a separate engine but thereby guaranteeing 
that it will transform the data identically as the engine used in #1.
  I think this is a pretty interesting use case where Beam is used to guarantee 
portability across engines and deployment (batch to true streaming, not 
micro-batch). What's not clear to me is with respect to how batch joins would 
translate during one-by-one scoring (probably lookups) or how aggregations 
given that some kind of history would need to be stored (and how much is kept 
is configurable too).
  Thoughts?
Thanks,Ron
  

Re: Some interesting use case

2018-01-16 Thread Charles Chen
This sounds similar to the use case for tf.Transform, a library that
depends on Beam: https://github.com/tensorflow/transform

On Tue, Jan 16, 2018 at 5:51 PM Ron Gonzalez  wrote:

> Hi,
>   I was wondering if anyone has encountered or used Beam in the following
> manner:
>
>   1. During machine learning training, use Beam to create the event table.
> The flow may consist of some joins, aggregations, row-based
> transformations, etc...
>   2. Once the model is created, deploy the model to some scoring service
> via PMML (or some other scoring service).
>   3. Enable the SAME transformations used in #1 by using a separate engine
> but thereby guaranteeing that it will transform the data identically as the
> engine used in #1.
>
>   I think this is a pretty interesting use case where Beam is used to
> guarantee portability across engines and deployment (batch to true
> streaming, not micro-batch). What's not clear to me is with respect to how
> batch joins would translate during one-by-one scoring (probably lookups) or
> how aggregations given that some kind of history would need to be stored
> (and how much is kept is configurable too).
>
>   Thoughts?
>
> Thanks,
> Ron
>


Re: Strata Conference this March 6-8

2018-01-16 Thread Ron Gonzalez
 Works for me...
On Tuesday, January 16, 2018, 5:45:33 PM PST, Holden Karau 
 wrote:  
 
 How would folks feel about during the afternoon break (3:20-4:20) on the 
Wednesday (same day as Eugene's talk)? We could do the Philz which is a bit of 
a walk but gets us away from the big crowd and also lets folks not attending 
the conference but in the area join us.
On Tue, Jan 16, 2018 at 5:29 PM, Ron Gonzalez  wrote:

 Cool, let me know if you guys finally schedule it. I will definitely try to 
make it to Eugene's talk but having an informal BoF in the area would be nice...
Thanks,Ron
On Tuesday, January 16, 2018, 5:06:53 PM PST, Boris Lublinsky 
 wrote:  
 
 All for it
Boris Lublinsky
FDP Architect
boris.lublin...@lightbend.com
https://www.lightbend.com/

On Jan 16, 2018, at 7:01 PM, Ted Yu  wrote:
+1 to BoF
On Tue, Jan 16, 2018 at 5:00 PM, Dmitry Demeshchuk  wrote:

Probably won't be attending the conference, but totally down for a BoF.
On Tue, Jan 16, 2018 at 4:58 PM, Holden Karau  wrote:

Do interested folks have any timing constraints around a BoF?
On Tue, Jan 16, 2018 at 4:30 PM, Jesse Anderson  
wrote:

+1 to BoF. I don't know if any Beam talks will be on the schedule.

> We could do an informal BoF at the Philz nearby or similar?




-- 
Twitter: https://twitter.com/h oldenkarau




-- 
Best regards,Dmitry Demeshchuk.



  



-- 
Twitter: https://twitter.com/holdenkarau
  

Some interesting use case

2018-01-16 Thread Ron Gonzalez
Hi,  I was wondering if anyone has encountered or used Beam in the following 
manner:   1. During machine learning training, use Beam to create the event 
table. The flow may consist of some joins, aggregations, row-based 
transformations, etc...  2. Once the model is created, deploy the model to some 
scoring service via PMML (or some other scoring service).  3. Enable the SAME 
transformations used in #1 by using a separate engine but thereby guaranteeing 
that it will transform the data identically as the engine used in #1.
  I think this is a pretty interesting use case where Beam is used to guarantee 
portability across engines and deployment (batch to true streaming, not 
micro-batch). What's not clear to me is with respect to how batch joins would 
translate during one-by-one scoring (probably lookups) or how aggregations 
given that some kind of history would need to be stored (and how much is kept 
is configurable too).
  Thoughts?
Thanks,Ron

Re: Strata Conference this March 6-8

2018-01-16 Thread Holden Karau
How would folks feel about during the afternoon break (3:20-4:20) on the
Wednesday (same day as Eugene's talk)? We could do the Philz which is a bit
of a walk but gets us away from the big crowd and also lets folks not
attending the conference but in the area join us.

On Tue, Jan 16, 2018 at 5:29 PM, Ron Gonzalez  wrote:

> Cool, let me know if you guys finally schedule it. I will definitely try
> to make it to Eugene's talk but having an informal BoF in the area would be
> nice...
>
> Thanks,
> Ron
>
> On Tuesday, January 16, 2018, 5:06:53 PM PST, Boris Lublinsky <
> boris.lublin...@lightbend.com> wrote:
>
>
> All for it
>
> Boris Lublinsky
> FDP Architect
> boris.lublin...@lightbend.com
> https://www.lightbend.com/
>
> On Jan 16, 2018, at 7:01 PM, Ted Yu  wrote:
>
> +1 to BoF
>
> On Tue, Jan 16, 2018 at 5:00 PM, Dmitry Demeshchuk 
> wrote:
>
> Probably won't be attending the conference, but totally down for a BoF.
>
> On Tue, Jan 16, 2018 at 4:58 PM, Holden Karau 
> wrote:
>
> Do interested folks have any timing constraints around a BoF?
>
> On Tue, Jan 16, 2018 at 4:30 PM, Jesse Anderson  > wrote:
>
> +1 to BoF. I don't know if any Beam talks will be on the schedule.
>
> > We could do an informal BoF at the Philz nearby or similar?
>
>
>
>
> --
> Twitter: https://twitter.com/h oldenkarau
> 
>
>
>
>
> --
> Best regards,
> Dmitry Demeshchuk.
>
>
>
>


-- 
Twitter: https://twitter.com/holdenkarau


Re: Strata Conference this March 6-8

2018-01-16 Thread Ron Gonzalez
 Cool, let me know if you guys finally schedule it. I will definitely try to 
make it to Eugene's talk but having an informal BoF in the area would be nice...
Thanks,Ron
On Tuesday, January 16, 2018, 5:06:53 PM PST, Boris Lublinsky 
 wrote:  
 
 All for it
Boris Lublinsky
FDP Architect
boris.lublin...@lightbend.com
https://www.lightbend.com/

On Jan 16, 2018, at 7:01 PM, Ted Yu  wrote:
+1 to BoF
On Tue, Jan 16, 2018 at 5:00 PM, Dmitry Demeshchuk  wrote:

Probably won't be attending the conference, but totally down for a BoF.
On Tue, Jan 16, 2018 at 4:58 PM, Holden Karau  wrote:

Do interested folks have any timing constraints around a BoF?
On Tue, Jan 16, 2018 at 4:30 PM, Jesse Anderson  
wrote:

+1 to BoF. I don't know if any Beam talks will be on the schedule.

> We could do an informal BoF at the Philz nearby or similar?




-- 
Twitter: https://twitter.com/h oldenkarau




-- 
Best regards,Dmitry Demeshchuk.



  

Re: Strata Conference this March 6-8

2018-01-16 Thread Ted Yu
+1 to BoF

On Tue, Jan 16, 2018 at 5:00 PM, Dmitry Demeshchuk 
wrote:

> Probably won't be attending the conference, but totally down for a BoF.
>
> On Tue, Jan 16, 2018 at 4:58 PM, Holden Karau 
> wrote:
>
>> Do interested folks have any timing constraints around a BoF?
>>
>> On Tue, Jan 16, 2018 at 4:30 PM, Jesse Anderson <
>> je...@bigdatainstitute.io> wrote:
>>
>>> +1 to BoF. I don't know if any Beam talks will be on the schedule.
>>>
>>> > We could do an informal BoF at the Philz nearby or similar?
>>>
>>
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
>
>
>
> --
> Best regards,
> Dmitry Demeshchuk.
>


Re: Strata Conference this March 6-8

2018-01-16 Thread Dmitry Demeshchuk
Probably won't be attending the conference, but totally down for a BoF.

On Tue, Jan 16, 2018 at 4:58 PM, Holden Karau  wrote:

> Do interested folks have any timing constraints around a BoF?
>
> On Tue, Jan 16, 2018 at 4:30 PM, Jesse Anderson  > wrote:
>
>> +1 to BoF. I don't know if any Beam talks will be on the schedule.
>>
>> > We could do an informal BoF at the Philz nearby or similar?
>>
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
>



-- 
Best regards,
Dmitry Demeshchuk.


Re: Strata Conference this March 6-8

2018-01-16 Thread Holden Karau
Do interested folks have any timing constraints around a BoF?

On Tue, Jan 16, 2018 at 4:30 PM, Jesse Anderson 
wrote:

> +1 to BoF. I don't know if any Beam talks will be on the schedule.
>
> > We could do an informal BoF at the Philz nearby or similar?
>



-- 
Twitter: https://twitter.com/holdenkarau


Strata Conference this March 6-8

2018-01-16 Thread Jesse Anderson
+1 to BoF. I don't know if any Beam talks will be on the schedule.

> We could do an informal BoF at the Philz nearby or similar?


Re: Strata Conference this March 6-8

2018-01-16 Thread Holden Karau
We could do an informal BoF at the Philz nearby or similar?

On Wed, Jan 17, 2018 at 11:23 AM Eugene Kirpichov 
wrote:

> I'm giving a talk about splittable DoFn's
> https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63696?locale=zh
>
> There are no other talks with the word "Beam" in the title, unless I
> missed something.
>
> On Tue, Jan 16, 2018 at 4:11 PM Ron Gonzalez  wrote:
>
>> Hi,
>>   Will there be some talks or representation of Apache Beam at the coming
>> Strata Conference this March 6-8?
>>   Would be great to hear someone talk about how Beam's been used at their
>> company as their core data integration platform.
>>
>> Thanks,
>> Ron
>>
>>
> --
Twitter: https://twitter.com/holdenkarau


Re: Strata Conference this March 6-8

2018-01-16 Thread Eugene Kirpichov
I'm giving a talk about splittable DoFn's
https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63696?locale=zh

There are no other talks with the word "Beam" in the title, unless I missed
something.

On Tue, Jan 16, 2018 at 4:11 PM Ron Gonzalez  wrote:

> Hi,
>   Will there be some talks or representation of Apache Beam at the coming
> Strata Conference this March 6-8?
>   Would be great to hear someone talk about how Beam's been used at their
> company as their core data integration platform.
>
> Thanks,
> Ron
>
>


Strata Conference this March 6-8

2018-01-16 Thread Ron Gonzalez
Hi,  Will there be some talks or representation of Apache Beam at the coming 
Strata Conference this March 6-8?  Would be great to hear someone talk about 
how Beam's been used at their company as their core data integration platform.
Thanks,Ron  

Re: KafkaIO reading from latest offset when pipeline fails on FlinkRunner

2018-01-16 Thread Kenneth Knowles
Is there a JIRA filed for this? I think this discussion should live in a
ticket.

Kenn

On Wed, Jan 10, 2018 at 11:00 AM, Mingmin Xu  wrote:

> @Sushil, I have several jobs running on KafkaIO+FlinkRunner, hope my
> experience can help you a bit.
>
> For short, `ENABLE_AUTO_COMMIT_CONFIG` doesn't meet your requirement, you
> need to leverage exactly-once checkpoint/savepoint in Flink. The reason
> is,  with `ENABLE_AUTO_COMMIT_CONFIG` KafkaIO commits offset after data is
> read, and once job is restarted KafkaIO reads from last_committed_offset.
>
> In my jobs, I enable external(external should be optional I think?)
> checkpoint on exactly-once mode in Flink cluster. When the job auto-restart
> on failures it doesn't lost data. In case of manually redeploy the job, I
> use savepoint to cancel and launch the job.
>
> Mingmin
>
> On Wed, Jan 10, 2018 at 10:34 AM, Raghu Angadi  wrote:
>
>> How often does your pipeline checkpoint/snapshot? If the failure happens
>> before the first checkpoint, the pipeline could restart without any state,
>> in which case KafkaIO would read from latest offset. There is probably some
>> way to verify if pipeline is restarting from a checkpoint.
>>
>> On Sun, Jan 7, 2018 at 10:57 PM, Sushil Ks  wrote:
>>
>>> HI Aljoscha,
>>>The issue is let's say I consumed 100 elements in 5
>>> mins Fixed Window with *GroupByKey* and later I applied *ParDO* for all
>>> those elements. If there is an issue while processing element 70 in
>>> *ParDo *and the pipeline restarts with *UserCodeException *it's
>>> skipping the rest 30 elements. Wanted to know if this is expected? In case
>>> if you still having doubt let me know will share a code snippet.
>>>
>>> Regards,
>>> Sushil Ks
>>>
>>
>>
>
>
> --
> 
> Mingmin
>


Re: Switching to Java 8

2018-01-16 Thread Eugene Kirpichov
Thanks. I think the one changing build is higher priority because it
enables people to start modernizing the code (e.g. FileIO) and it'd be good
to do that before 2.3 cut. I wasn't able to find the PR you mentioned in
https://github.com/apache/beam/pulls , which one is it?

On Mon, Jan 15, 2018 at 10:41 PM Jean-Baptiste Onofré 
wrote:

> Hi
>
> I created the PR about build during the weekend. I'm working on the
> examples merge PR and also polishing the first one. I will add you as
> reviewer.
>
> Regards
> JB
> Le 16 janv. 2018, à 07:35, Eugene Kirpichov  a
> écrit:
>>
>> Hi JB - any updates here?
>>
>> On Tue, Jan 9, 2018, 2:51 AM Jean-Baptiste Onofré < j...@nanthrax.net>
>> wrote:
>>
>>> Actually, it's part of the build and I will "expand" the java version in
>>> the
>>> enforcer.
>>>
>>> Regards
>>> JB
>>>
>>> On 01/09/2018 11:46 AM, Etienne Chauchot wrote:
>>> > Hi,
>>> >
>>> > +1 as well, excellent news !
>>> >
>>> > I would add also: remove (AFAIK in some IOs) the enforcer
>>> configuration (like
>>> > [1]) that were put when java 8 was needed in a java 7 build.
>>> >
>>> > [1]
>>> >
>>> > 
>>> > [1.8,)
>>> > 
>>> >
>>> >
>>> > Etienne
>>> >
>>> >
>>> > Le 08/01/2018 à 14:02, Jean-Baptiste Onofré a écrit :
>>> >> I created https://issues.apache.org/jira/browse/BEAM-3426 as
>>> umbrella Jira and
>>> >> created the sub-tasks related to build and examples.
>>> >>
>>> >> Feel free to add the relevant sub-tasks there.
>>> >>
>>> >> Regards
>>> >> JB
>>> >>
>>> >> On 01/08/2018 11:33 AM, Ismaël Mejía wrote:
>>> >>> Excellent news ! Probably a good idea to fill JIRAs to all of those.
>>> I
>>> >>> would add:
>>> >>>
>>> >>> - Remove the references in the website to Java 7
>>> >>> - Remove Java 7 and any related task from the CI
>>> >>> - Update the docker dev build images (I will take this one since
>>> >>> reproducible build is my pet project)
>>> >>> - Upgrade the IOs who were still in older versions because of client
>>> >>> compatibility. I remember SolfIO was one case but probably there are
>>> >>> others.
>>> >>>
>>> >>>
>>> >>> On Mon, Jan 8, 2018 at 7:49 AM, Jean-Baptiste Onofré <
>>> j...@nanthrax.net> wrote:
>>>  Yes, that's the plan: build first, example "merge" after.
>>> 
>>>  Regards
>>>  JB
>>> 
>>>  On 01/08/2018 07:43 AM, Eugene Kirpichov wrote:
>>> >
>>> > Sounds great, thanks! Probably best done as 2 separate steps,
>>> because
>>> > after updating the build scripts, everything else can begin in
>>> parallel?
>>> >
>>> > On Sun, Jan 7, 2018 at 10:38 PM Jean-Baptiste Onofré <
>>> j...@nanthrax.net
>>> > > wrote:
>>> >
>>> >  Hi Eugene,
>>> >
>>> >  I'm taking the build update: Maven/Gradle with enforcer +
>>> merge of the
>>> > examples
>>> >  all together.
>>> >
>>> >  Regards
>>> >  JB
>>> >
>>> >  On 01/08/2018 07:34 AM, Eugene Kirpichov wrote:
>>> >   > The vote on user@ about switching to Java 8 has
>>> concluded,
>>> > affirmatively.
>>> >   >
>>> >   > What needs to be done to complete the switch? I can see at
>>> least
>>> > the
>>> >  following:
>>> >   > - Change maven and gradle scripts to use 1.8 source and
>>> target
>>> > version
>>> >   > - Fix resulting compilation/test errors (Java8 has
>>> slightly
>>> > different type
>>> >   > checking, more minor issues may arise)
>>> >   > - Remove all special-casing of java8 in build scripts
>>> >   > - Merge all modules like "java8 examples" and "java8
>>> tests" into
>>> > respective
>>> >   > non-"java8" modules
>>> >   > - Organize an effort to modernize code to use Java 8
>>> constructs
>>> > where
>>> >   > appropriate. Especially important to modernize examples.
>>> To a large
>>> >  extent this
>>> >   > can probably be automated with an IDE.
>>> >   >
>>> >   > Anything else?
>>> >   >
>>> >
>>> >  --
>>> >  Jean-Baptiste Onofré
>>> >  jbono...@apache.org 
>>> >  http://blog.nanthrax.net
>>> >  Talend - http://www.talend.com
>>> >
>>> 
>>>  --
>>>  Jean-Baptiste Onofré
>>>  jbono...@apache.org
>>>  http://blog.nanthrax.net
>>>  Talend - http://www.talend.com
>>> >>
>>> >
>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>


Re: [SQL] Windowing and triggering changes proposal

2018-01-16 Thread Mingmin Xu
Thanks @Anton for the proposal. Window(w/ trigger) support in SQL is
limited now, you're very welcome to join the improvement.

There's a balance between injected DSL mode and CLI mode when we were
implementing BealmSQL overall, not only widowing. Many default behaviors
are introduced to make it workable in pure SQL CLI scenario. If it limits
the potential with DSL mode, we should adjust it absolutely.

Mingmin

On Tue, Jan 16, 2018 at 9:56 AM, Kenneth Knowles  wrote:

> I've commented on the doc. This is a really nice analysis and I think the
> proposal is good for making SQL work with Beam windowing and triggering in
> a way that will make sense to users.
>
> Kenn
>
> On Thu, Jan 11, 2018 at 4:05 PM, Anton Kedin  wrote:
>
>> Hi,
>>
>> Wanted to gather feedback on changes I propose to the behavior of some
>> aspects of windowing and triggering in Beam SQL.
>>
>> In short:
>>
>> Beam SQL currently overrides input PCollections' windowing/triggering
>> configuration in few cases. For example if a query has a simple GROUP BY
>> clause, we would apply GlobalWindows. And it's not configurable by the
>> user, it happens under the hood of SQL.
>>
>> Proposal is to update the Beam SQL implementation in these cases to avoid
>> changing the input PCollections' configuration as much as possible.
>>
>> More details here: https://docs.google.com/docume
>> nt/d/1RmyV9e1Qab-axsLI1WWpw5oGAJDv0X7y9OSnPnrZWJk
>>
>> Regards,
>> Anton
>>
>
>


-- 

Mingmin


Re: bigquery issue

2018-01-16 Thread Lukasz Cwik
Look at the worker logs. This page shows how to log information and how to
find what was logged:
https://cloud.google.com/dataflow/pipelines/logging#cloud-dataflow-worker-log-example
The worker logs contain a lot of information written by Dataflow and also
by your code. Note that you may need to change log levels to get enough
information:
https://cloud.google.com/dataflow/pipelines/logging#SettingLevels

Also good to take a look at this generic troubleshooting information:
https://cloud.google.com/dataflow/pipelines/troubleshooting-your-pipeline

On Mon, Jan 15, 2018 at 12:18 AM, Chaim Turkel  wrote:

> Hi,
>   I have a fairly simple pipeline that create daily snapshots of my
> data, and it sometimes fails, but the reason is not obvious:
>
>
> (863777e448a29a5c): Workflow failed. Causes: (863777e448a298ff):
> S41:Account_audit/BigQueryIO.Write/BatchLoads/SinglePartitionsReshuffle/
> GroupByKey/Read+Account_audit/BigQueryIO.Write/BatchLoads/
> SinglePartitionsReshuffle/GroupByKey/GroupByWindow+
> Account_audit/BigQueryIO.Write/BatchLoads/SinglePartitionsReshuffle/
> ExpandIterable+Account_audit/BigQueryIO.Write/BatchLoads/
> SinglePartitionWriteTables/ParMultiDo(WriteTables)+
> Account_audit/BigQueryIO.Write/BatchLoads/SinglePartitionWriteTables/
> ParMultiDo(WriteTables).WriteTablesMainOutput/extract
> table name +Account_audit/BigQueryIO.Write/BatchLoads/
> SinglePartitionWriteTables/ParMultiDo(WriteTables).
> WriteTablesMainOutput/count/GroupByKey+Account_audit/
> BigQueryIO.Write/BatchLoads/SinglePartitionWriteTables/
> ParMultiDo(WriteTables).WriteTablesMainOutput/count/
> Combine.GroupedValues/Partial+Account_audit/BigQueryIO.Write/BatchLoads/
> SinglePartitionWriteTables/ParMultiDo(WriteTables).
> WriteTablesMainOutput/count/GroupByKey/Reify+Account_
> audit/BigQueryIO.Write/BatchLoads/SinglePartitionWriteTables/
> ParMultiDo(WriteTables).WriteTablesMainOutput/count/
> GroupByKey/Write+Account_audit/BigQueryIO.Write/BatchLoads/
> SinglePartitionWriteTables/WithKeys/AddKeys/Map+Account_
> audit/BigQueryIO.Write/BatchLoads/SinglePartitionWriteTables/
> Window.Into()/Window.Assign+Account_audit/BigQueryIO.Write/BatchLoads/
> SinglePartitionWriteTables/GroupByKey/Reify+Account_
> audit/BigQueryIO.Write/BatchLoads/SinglePartitionWriteTables/
> GroupByKey/Write
> failed., (a412b0e93a586d57): A work item was attempted 4 times without
> success. Each time the worker eventually lost contact with the
> service. The work item was attempted on:
> dailysnapshotoptions-chai-01142358-ef6c-harness-qw6s,
> dailysnapshotoptions-chai-01142358-ef6c-harness-j09b,
> dailysnapshotoptions-chai-01142358-ef6c-harness-3t9m,
> dailysnapshotoptions-chai-01142358-ef6c-harness-372t
>
> is there any way to get more information?
>
> the job is:
>
>
> https://console.cloud.google.com/dataflow/jobsDetail/
> locations/us-central1/jobs/2018-01-14_23_58_54-
> 6125672650598375925?project=ordinal-ember-163410&
> organizationId=782381653268
>
>
> chaim
>
> --
>
>
> Loans are funded by FinWise Bank, a Utah-chartered bank located in Sandy,
> Utah, member FDIC, Equal Opportunity Lender. Merchant Cash Advances are
> made by Behalf. For more information on ECOA, click here
> . For important information about
> opening a new account, review Patriot Act procedures here
> . Visit Legal
>  to review our comprehensive program terms,
> conditions, and disclosures.
>


Re: [SQL] Windowing and triggering changes proposal

2018-01-16 Thread Kenneth Knowles
I've commented on the doc. This is a really nice analysis and I think the
proposal is good for making SQL work with Beam windowing and triggering in
a way that will make sense to users.

Kenn

On Thu, Jan 11, 2018 at 4:05 PM, Anton Kedin  wrote:

> Hi,
>
> Wanted to gather feedback on changes I propose to the behavior of some
> aspects of windowing and triggering in Beam SQL.
>
> In short:
>
> Beam SQL currently overrides input PCollections' windowing/triggering
> configuration in few cases. For example if a query has a simple GROUP BY
> clause, we would apply GlobalWindows. And it's not configurable by the
> user, it happens under the hood of SQL.
>
> Proposal is to update the Beam SQL implementation in these cases to avoid
> changing the input PCollections' configuration as much as possible.
>
> More details here: https://docs.google.com/document/d/1RmyV9e1Qab-
> axsLI1WWpw5oGAJDv0X7y9OSnPnrZWJk
>
> Regards,
> Anton
>


Re: Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #5661

2018-01-16 Thread Jean-Baptiste Onofré

Thanks for the quick merge Ismaël.

Build should be fine again now ;)

Regards
JB

On 01/16/2018 06:27 PM, Ismaël Mejía wrote:

We just merged a fix for this, it should be OK now. Sorry for the
inconvience and thanks for the fix JB.

On Tue, Jan 16, 2018 at 5:07 AM, Kenneth Knowles  wrote:

In case anyone is following, this is
https://issues.apache.org/jira/browse/BEAM-3438

Kenn

On Mon, Jan 15, 2018 at 5:08 PM, Apache Jenkins Server
 wrote:


See






--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #5661

2018-01-16 Thread Ismaël Mejía
We just merged a fix for this, it should be OK now. Sorry for the
inconvience and thanks for the fix JB.

On Tue, Jan 16, 2018 at 5:07 AM, Kenneth Knowles  wrote:
> In case anyone is following, this is
> https://issues.apache.org/jira/browse/BEAM-3438
>
> Kenn
>
> On Mon, Jan 15, 2018 at 5:08 PM, Apache Jenkins Server
>  wrote:
>>
>> See
>> 
>>
>


Re: [DISCUSS] State of the project

2018-01-16 Thread Ismaël Mejía
Thanks Davor for opening this discussion and HUGE +1 to do this every
year or in cycles. I will fork this thread into a new one for the
Culture / Project management part issues as suggested.

About the diversity of users across runners subject I think that this
requires more attention to unification and implies at least work in
different areas:

* Automatized validation and consistent semantics among runners

Users should be confident that moving their code from one runner to
the other just works and the only way to ensure this is by having a
runner to pass ValidatesRunner/TCK tests and with this 'graduate' such
support as Romain suggested. The capatility-matrix is really nice but
it is not a programmatic way to do this. And also usually individual
features do work, but feature combinations produce issues so we need
to have a more exact semantics to avoid these.

Some parts of Beam's semantics are loose (e.g. bundle partiiioning,
pipeline termination, etc.), this I suppose has been a design decision
to allow flexibility in the runners implementation but it becomes
inconvenient when users move among runners and have different results.
I am not sure if the current tradeoff is worth the usability sacrifice
for the end user.

* Make user experience across runners a priority

Today all runners not only behave in different ways but the way users
publish and package their applications differ. Of course this is not a
trivial problem because deployment normally is a end user problem, but
we can improve in this area, e.g. guaranteeing a consistent deployment
mechanism across runners, and making IO integration easier for example
when using multiple IOs and switching runners it is easy to run into
conflicts, we should try to minimize this for the end-users.

* Simplify operational tasks among runners

We need to add a minimum degree of consistent observability across
runners. Of course Beam has metrics to do this, but it is not enough,
an end-user that starts on one runner and moves to another has to deal
with a totally different set of logs and operational issues. We can
try to improve this too, of course without trying to cover the full
spectrum but at least bringing some minimum set of observability. I
hope that the current work on portability will bring some improvements
in this area. But this is crucial for users that probably pass more
time running (and dealing) with issues in their jobs than writing
them.

We need to have some integration tests that simulate common user
scenarios and some distribution use cases, e.g. Probably the most
common data store used for streaming is Kafka (at least in Open
Source). We should have an IT that tests some common issues that can
arrive when you use kafka, what happens if a kafka broker goes down,
does Beam continues to read without issue? what about a new leader
election, do we continue to work as expected, etc. Few projects have
something like this and this will send a clear message that Beam cares
about reliability too.

Apart of these, I think we also need to work on:

* Simpler APIs + User friendly libraries.

I want to add a big thanks for Jesse for his list on criteria that
people seek when they choose a framework for data processing. And the
first point 'Will this dramatically improve the problems I'm trying to
solve?' is super important. Of course Beam has portability and a rich
model as its biggest assets  but I have been consistently asked in
conferences if Beam has libraries for graph processing, CEP, Machine
Learning or a Scala API.

Of course we have had some progress with the recent addition of the
SQL and hopefully the schema-aware PCollections would help there too,
but there is still some way to go, and of course this can not be
crucial considering the portability goals of Beam but these libraries
are sometimes what make users to decide if they use a tool or not, so
better have those than not.

These are the most important issues from my point of view. my excuses
for the long email but this was the perfect moment to discuss these.

One extra point I think we should write and agree on a concise roadmap
and take a look at our progress on it at the middle and the end of the
year as other communities do.

Regards,
Ismaël

On Mon, Jan 15, 2018 at 7:49 PM, Jesse Anderson
 wrote:
> I think a focus on the runners is what's key to Beam's adoption. The runners
> are the foundation on which Beam sits. If the runners don't work properly,
> Beam won't work.
>
> A focus on improved unit tests is a good start, but isn't what's needed.
> Compatibility matrices will help see how your runner of choice stacks up,
> but that requires too much knowledge of Beam's internals to be
> interpretable.
>
> Imagine you're an (enterprise) architect looking at adopting Beam. What do
> you look at or what do you look for before going deeper? What would make you
> stick your neck out to adopt Beam? For my experience, there are several/pass
> fails along the way.
>
> Here are a few of the com

Re: [DISCUSS] State of the project: Community growth

2018-01-16 Thread Ismaël Mejía
Some ideas I have around the issues mentioned before:

* Annual User Survey

One important thing we should do is some sort of annual survey of
users to have some feedback on the state of the Beam from the users
point of view, we can take for example as a template the survey done
by the Rust community or others, and maybe run it like we did for the
Java 8 poll on twitter.

https://blog.rust-lang.org/2017/09/05/Rust-2017-Survey-Results.html

* An event webpage / calendar

So everyone who organizes an event can add their events there and we
can also share them in advance in the mailing lists.

Regards,
Ismaël

On Tue, Jan 16, 2018 at 4:17 AM, Austin Bennett
 wrote:
> Think that Beam's great, so happy to grow community -- by my individual
> involvement and to encourage others.
>
> 1). Users:  I think that this can follow from the others being done well
> 2).  Contributors: some other projects mark very simple contributions for
> newcomers to find as onramping.  Hadn't noticed that here in the little I
> had explored.
> 3-5). Community/Event/Brand:  Beam is inherently collaborative given its
> model and what I understand of purpose (different runners, batch/stream,
> etc).  An easy place to start will be places that play very well and/or
> integrate with.
>
>
>
> On Mon, Jan 15, 2018 at 3:27 PM, Griselda Cuevas  wrote:
>>
>> Hi Everyone,
>>
>>
>> Thanks Davor for starting the discussion around the state of the Apache
>> Beam in 2017[1]. In this fork of that conversation, I’d like to continue the
>> dialogue around how should we grow our community in alignment to the
>> project's vision, goals & values.
>>
>>
>> Here are some areas and prompts to guide the conversation. If there are
>> other ideas related to community growth missing here, please add them in
>> this thread as well.
>>
>> New users: How could the community help potential and new users learn
>> about Beam? How can we support questions from users better?
>>
>> New contributors: What are our contributions Best Practices and how can we
>> share them with new members? How could the community help new contributors
>> ramp up faster in our Project’s dev environment? How could we empower new
>> developers to contribute to Beam’s core features?
>>
>> Community engagement: What type of activities and efforts would you like
>> to see to increase our community engagement?
>>
>> Events: What events should we attend? What events should we sponsor? How
>> could we support community led events better?
>>
>> Brand building: What efforts do you think we should do to build our brand?
>> What use cases, ideas, talks, etc. should we collaborate on to give the
>> project more visibility?
>>
>>
>> If there’s interest, I’d also be happy to host a virtual meeting for
>> anyone who’d prefer that avenue of discussion and will make sure that any
>> new ideas or details are brought back to the discussion threads after that.
>> Express interest in a virtual meeting in this thread so I can coordinate.
>>
>>
>> Thanks, and let’s make this an exciting community!
>>
>> Gris Cuevas
>>
>>
>>
>> [1]https://lists.apache.org/thread.html/f750f288af8dab3f468b869bf5a3f473094f4764db419567f33805d0@%3Cdev.beam.apache.org%3E
>
>


Jenkins build is unstable: beam_Release_NightlySnapshot #654

2018-01-16 Thread Apache Jenkins Server
See