Re: Capability Matrix

2016-03-18 Thread Kostas Kloudas
Great to have an overview of the available 
runners and a comprehensible visualization of 
the features each one supports!

Kostas

> On Mar 18, 2016, at 11:32 AM, Maximilian Michels  wrote:
> 
> Well done. The matrix provides a good basis for improving the existing
> runners. Moreover, new backends can use it to evaluate capabilities
> for creating a runner.
> 
> On Fri, Mar 18, 2016 at 1:15 AM, Jean-Baptiste Onofré  
> wrote:
>> Catcha, thanks !
>> 
>> Regards
>> JB
>> 
>> 
>> On 03/18/2016 12:51 AM, Frances Perry wrote:
>>> 
>>> That's "partially". Check out the full matrix for complete details:
>>> http://beam.incubator.apache.org/capability-matrix/
>>> 
>>> On Thu, Mar 17, 2016 at 4:50 PM, Jean-Baptiste Onofré 
>>> wrote:
>>> 
 Great job !
 
 By the way, when you use ~ in the matrix, does it mean that it works only
 in some cases (depending of the pipeline or transform) or it doesn't work
 as expected ? Just curious for the Aggregators and the meaning in the
 Beam
 Model.
 
 Thanks,
 Regards
 JB
 
 
 On 03/18/2016 12:45 AM, Tyler Akidau wrote:
 
> Just pushed the capability matrix and an attendant blog post to the
> site:
> 
> - Blog post:
> 
> 
> http://beam.incubator.apache.org/beam/capability/2016/03/17/capability-matrix.html
> - Matrix: http://beam.incubator.apache.org/capability-matrix/
> 
> For those of you that want to keep the matrix up to date as your runner
> evolves, you'll want to make updates in the _data/capability-matrix.yml
> file:
> 
> 
> https://github.com/apache/incubator-beam-site/blob/asf-site/_data/capability-matrix.yml
> 
> Thanks to everyone for helping fill out the initial set of capabilities!
> Looking forward to updates as things progress. :-)
> 
> And thanks also to Max for moving all the website stuff to git!
> 
> -Tyler
> 
> 
> On Sat, Mar 12, 2016 at 9:37 AM Tyler Akidau  wrote:
> 
> Thanks all! At this point, it looks like most all of the fields have
> been
>> 
>> filled out. I'm in the process of migrating the spreadsheet contents to
>> YAML within the website source, so I've revoked edit access from the
>> doc
>> to
>> keep things from changing while I'm doing that. If you have further
>> edits
>> to make, feel free to leave a comment, and I'll incorporate it into the
>> YAML.
>> 
>> -Tyler
>> 
>> 
>> On Thu, Mar 10, 2016 at 12:43 AM Jean-Baptiste Onofré 
>> wrote:
>> 
>> Hi Tyler,
>>> 
>>> 
>>> good idea !
>>> 
>>> I like it !
>>> 
>>> Regards
>>> JB
>>> 
>>> On 03/09/2016 11:14 PM, Tyler Akidau wrote:
>>> 
 I just filed BEAM-104
 
 regarding publishing a capability matrix on the Beam website. We've
 
>>> seeded
>>> 
 the spreadsheet linked there (
 
 
>>> 
>>> https://docs.google.com/spreadsheets/d/1OM077lZBARrtUi6g0X0O0PHaIbFKCD6v0djRefQRE1I/edit
>>> 
 )
 with an initial proposed set of capabilities, as well as descriptions
 
>>> for
>>> 
 the model and Cloud Dataflow. If folks for other runners (currently
 
>>> Flink
>>> 
 and Spark) could please make sure their columns are filled out as
 well,
 it'd be much appreciated. Also let us know if there are capabilities
 you
 think we've missed.
 
 Our hope is to get this up and published soon, since we've been
 getting
 
>>> a
>>> 
 lot of questions regarding runner capabilities, portability, etc.
 
 -Tyler
 
 
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>> 
>>> 
>> 
> 
 --
 Jean-Baptiste Onofré
 jbono...@apache.org
 http://blog.nanthrax.net
 Talend - http://www.talend.com
 
>>> 
>> 
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com



Re: Jenkins build became unstable: beam_MavenVerify #14

2016-03-18 Thread Jason Kuster
We've applied the change (local to the workspace) to both Jenkins projects
- hopefully this will fix. We'll keep an eye out for breakages related to
this.

On Thu, Mar 17, 2016 at 10:29 AM, Davor Bonaci  wrote:

> Adding Jason explicitly. This is somewhat expected.
>
> Thanks Andreas.
>
> On Thu, Mar 17, 2016 at 8:08 AM, Andreas Veithen <
> andreas.veit...@gmail.com> wrote:
>
>> It looks like nowadays, when enabling "Use private Maven repository",
>> you also need to select a strategy. If you want to shield the build against
>> pollution of the local Maven repository by other builds (which I highly
>> recommend, speaking from experience), you should use "Local to the
>> workspace". That's currently not the case for that build: it's set to
>> "Default (~/.m2/respository)", which doesn't sound very private...
>>
>> Andreas
>>
>> On Thu, Mar 17, 2016 at 2:57 PM, Amit Sela  wrote:
>>
>>> Looks like the cached MapR Hadoop dependencies issue is back..
>>> I didn't push anything lately, and the only commit was unrelated.
>>>
>>> On Thu, Mar 17, 2016 at 4:54 PM Apache Jenkins Server <
>>> jenk...@builds.apache.org> wrote:
>>>
>>> > See 
>>> >
>>> >
>>>
>>
>>
>


-- 
---
Jason Kuster
SETI - Cloud Dataflow


Re: Draft Contribution Guide

2016-03-18 Thread Siva Kalagarla
Thanks Frances,  This document is helpful for newbies like myself.  Will
follow these steps over this weekend.

On Thu, Mar 17, 2016 at 2:19 PM, Frances Perry 
wrote:

> Hi Beamers!
>
> We've started a draft
> <
> https://docs.google.com/document/d/1syFyfqIsGOYDE_Hn3ZkRd8a6ylcc64Kud9YtrGHgU0E/comment
> >
> for the Beam contribution guide. Please take a look and provide feedback.
> Once things settle, we'll get this moved over on to the Beam website.
>
> Frances
>



-- 


Regards,
Siva Kalagarla
@SivaKalagarla 


Draft Contribution Guide

2016-03-18 Thread Frances Perry
Hi Beamers!

We've started a draft

for the Beam contribution guide. Please take a look and provide feedback.
Once things settle, we'll get this moved over on to the Beam website.

Frances


Re: Capability Matrix

2016-03-18 Thread Amit Sela
Looks great!
I think it's the best way to give a clear picture of capabilities for users
and runner developers.
And as always, Love the colours ;)


On Fri, Mar 18, 2016 at 3:33 PM Kostas Kloudas 
wrote:

> Great to have an overview of the available
> runners and a comprehensible visualization of
> the features each one supports!
>
> Kostas
>
> > On Mar 18, 2016, at 11:32 AM, Maximilian Michels  wrote:
> >
> > Well done. The matrix provides a good basis for improving the existing
> > runners. Moreover, new backends can use it to evaluate capabilities
> > for creating a runner.
> >
> > On Fri, Mar 18, 2016 at 1:15 AM, Jean-Baptiste Onofré 
> wrote:
> >> Catcha, thanks !
> >>
> >> Regards
> >> JB
> >>
> >>
> >> On 03/18/2016 12:51 AM, Frances Perry wrote:
> >>>
> >>> That's "partially". Check out the full matrix for complete details:
> >>> http://beam.incubator.apache.org/capability-matrix/
> >>>
> >>> On Thu, Mar 17, 2016 at 4:50 PM, Jean-Baptiste Onofré  >
> >>> wrote:
> >>>
>  Great job !
> 
>  By the way, when you use ~ in the matrix, does it mean that it works
> only
>  in some cases (depending of the pipeline or transform) or it doesn't
> work
>  as expected ? Just curious for the Aggregators and the meaning in the
>  Beam
>  Model.
> 
>  Thanks,
>  Regards
>  JB
> 
> 
>  On 03/18/2016 12:45 AM, Tyler Akidau wrote:
> 
> > Just pushed the capability matrix and an attendant blog post to the
> > site:
> >
> > - Blog post:
> >
> >
> >
> http://beam.incubator.apache.org/beam/capability/2016/03/17/capability-matrix.html
> > - Matrix: http://beam.incubator.apache.org/capability-matrix/
> >
> > For those of you that want to keep the matrix up to date as your
> runner
> > evolves, you'll want to make updates in the
> _data/capability-matrix.yml
> > file:
> >
> >
> >
> https://github.com/apache/incubator-beam-site/blob/asf-site/_data/capability-matrix.yml
> >
> > Thanks to everyone for helping fill out the initial set of
> capabilities!
> > Looking forward to updates as things progress. :-)
> >
> > And thanks also to Max for moving all the website stuff to git!
> >
> > -Tyler
> >
> >
> > On Sat, Mar 12, 2016 at 9:37 AM Tyler Akidau 
> wrote:
> >
> > Thanks all! At this point, it looks like most all of the fields have
> > been
> >>
> >> filled out. I'm in the process of migrating the spreadsheet
> contents to
> >> YAML within the website source, so I've revoked edit access from the
> >> doc
> >> to
> >> keep things from changing while I'm doing that. If you have further
> >> edits
> >> to make, feel free to leave a comment, and I'll incorporate it into
> the
> >> YAML.
> >>
> >> -Tyler
> >>
> >>
> >> On Thu, Mar 10, 2016 at 12:43 AM Jean-Baptiste Onofré <
> j...@nanthrax.net>
> >> wrote:
> >>
> >> Hi Tyler,
> >>>
> >>>
> >>> good idea !
> >>>
> >>> I like it !
> >>>
> >>> Regards
> >>> JB
> >>>
> >>> On 03/09/2016 11:14 PM, Tyler Akidau wrote:
> >>>
>  I just filed BEAM-104
>  
>  regarding publishing a capability matrix on the Beam website.
> We've
> 
> >>> seeded
> >>>
>  the spreadsheet linked there (
> 
> 
> >>>
> >>>
> https://docs.google.com/spreadsheets/d/1OM077lZBARrtUi6g0X0O0PHaIbFKCD6v0djRefQRE1I/edit
> >>>
>  )
>  with an initial proposed set of capabilities, as well as
> descriptions
> 
> >>> for
> >>>
>  the model and Cloud Dataflow. If folks for other runners
> (currently
> 
> >>> Flink
> >>>
>  and Spark) could please make sure their columns are filled out as
>  well,
>  it'd be much appreciated. Also let us know if there are
> capabilities
>  you
>  think we've missed.
> 
>  Our hope is to get this up and published soon, since we've been
>  getting
> 
> >>> a
> >>>
>  lot of questions regarding runner capabilities, portability, etc.
> 
>  -Tyler
> 
> 
> >>> --
> >>> Jean-Baptiste Onofré
> >>> jbono...@apache.org
> >>> http://blog.nanthrax.net
> >>> Talend - http://www.talend.com
> >>>
> >>>
> >>
> >
>  --
>  Jean-Baptiste Onofré
>  jbono...@apache.org
>  http://blog.nanthrax.net
>  Talend - http://www.talend.com
> 
> >>>
> >>
> >> --
> >> Jean-Baptiste Onofré
> >> jbono...@apache.org
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
>
>


Re: [PROPOSAL] MultiLineIO

2016-03-18 Thread Dan Halperin
Hi Peter,

Echoing Eugene's and JB's thoughts -- we'd love a PR!

I also wanted to say: we've hit you with a lot of recommendations in this
email thread. If you have any questions, you can ask us here -- but we'll
of course be happy to answer them during code review as well. Do not feel
like meeting all these many criteria is a pre-requisite for opening a Pull
Request -- we just may give you feedback and ask for changes before merging
:).

Thanks!
Dan

On Mon, Mar 14, 2016 at 12:27 PM, Jean-Baptiste Onofré 
wrote:

> Yes, you already use the "new style" as you use BoundedSource.
>
> Regards
> JB
>
>
> On 03/14/2016 08:08 PM, Giesin, Peter wrote:
>
>> The MultiLineIO is a BoundedSource and an extension of FileBasedSource.
>> Where the FileBasedSource reads a single line at a time the MultiLineIO
>> allows the user to define an arbitrary “message” delimiter. It then reads
>> through the file, removing newlines, until the separator is read, finally
>> returning the character sequence that is built.
>>
>>
>>
>> I believe it is already built using the new style but I will compare it
>> to the BigTableIO to confirm that.
>>
>> Peter
>>
>> On 3/14/16, 1:50 PM, "Jean-Baptiste Onofré"  wrote:
>>
>> I second Eugene here.
>>>
>>> In the past, I developed some IOs using the "old style" (as did in the
>>> PubSubIO). I'm now refactoring it to use the "new style".
>>>
>>> Regards
>>> JB
>>>
>>> On 03/14/2016 06:47 PM, Eugene Kirpichov wrote:
>>>
 Hi Peter,
 Looking forward to your PR. Please note that source classes are
 relatively
 tricky to develop, so would you mind briefly explaining what your source
 will do here over email, so that we hash out some possible issues early
 rather than in PR comments?
 Also note that now recommend to package IO connectors as PTransforms,
 making the PTransform class itself be a builder - while the Source/Sink
 classes should be kept package-private (rather than exposed to the
 user).
 For an example of a connector packaged in this style, see BigtableIO (

 https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_GoogleCloudPlatform_DataflowJavaSDK_blob_master_sdk_src_main_java_com_google_cloud_dataflow_sdk_io_bigtable_BigtableIO.java&d=BQIDaQ&c=3BfiSO86x5iKjpl2b39jud9R1NrKYqPq2js90dwBswk&r=Qm-l_hW9ETnsf6X4GnnKezFfnAEwc328ni8ljHdGYjo&m=spZLCFrFYTtUSPsGFMTVvmXPyfW-dr7Uouq-4BtWaPQ&s=qJJMaoRlOHxy1MRcAwa7aIJxwGYJyUKL93FdO4jZr1I&e=
 ).
 The advantage is that this style allows you to restructure the
 connector or
 add additional transforms into its implementation if necessary, without
 changing the call sites. It might seem less important in case of a
 simple
 connector like reading lines from file, but it will become much more
 important with things like SplittableDoFn
 <
 https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_BEAM-2D65&d=BQIDaQ&c=3BfiSO86x5iKjpl2b39jud9R1NrKYqPq2js90dwBswk&r=Qm-l_hW9ETnsf6X4GnnKezFfnAEwc328ni8ljHdGYjo&m=spZLCFrFYTtUSPsGFMTVvmXPyfW-dr7Uouq-4BtWaPQ&s=POJMhWDTbkUnHHLnKcH9FtzeP-lrZkuGZG3YPNNhXSU&e=
 >.

 On Mon, Mar 14, 2016 at 10:29 AM Jean-Baptiste Onofré 
 wrote:

 Hi Peter,
>
> awesome !
>
> Yes, you can create the PR using the github mirror.
>
> Does your MultiLineIO use Bounded/Unbounded "new" classes ?
>
> Regards
> JB
>
> On 03/14/2016 06:23 PM, Giesin, Peter wrote:
>
>> Hi all!
>>
>> I am looking to get involved in the project. I have a MultiLineIO
>>
> file-based source that I think would be useful. I know the project is
> just
> spinning up but can I simply clone the repo and create a PR for the
> new IO?
> Also looked over JIRA and there are some tickets I can help out with.
>
>>
>> Best regards,
>> Peter Giesin
>> peter.gie...@fisglobal.com
>>
>>
>> _
>> The information contained in this message is proprietary and/or
>>
> confidential. If you are not the intended recipient, please: (i)
> delete the
> message and all copies; (ii) do not disclose, distribute or use the
> message
> in any manner; and (iii) notify the sender immediately. In addition,
> please
> be aware that any message addressed to our domain is subject to
> archiving
> and review by persons other than the intended recipient. Thank you.
>
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.nanthrax.net&d=BQIDaQ&c=3BfiSO86x5iKjpl2b39jud9R1NrKYqPq2js90dwBswk&r=Qm-l_hW9ETnsf6X4GnnKezFfnAEwc328ni8ljHdGYjo&m=spZLCFrFYTtUSPsGFMTVvmXPyfW-dr7Uouq-4BtWaPQ&s=f6FNnwvFjzBZnAIvDfndYuU_lAso931YU4yr4oSnypE&e=
> Talend -
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.talend.com&d=BQIDaQ&c=3BfiSO86x5iKjpl2b39jud9R1NrKYqPq2js90dwBswk&r=Qm-l_hW9ETnsf6X4GnnKezFfnAEwc328ni8

Re: Unable to serialize exception running KafkaWindowedWordCountExample

2016-03-18 Thread 刘见康
@Max:
Thanks for your quick fix, this serializable exception has been solved.
However, it reported another one:
16/03/17 20:14:23 INFO flink.FlinkPipelineRunner:
PipelineOptions.filesToStage was not specified. Defaulting to files from
the classpath: will stage 158 files. Enable logging at DEBUG level to see
which files will be staged.
16/03/17 20:14:23 INFO flink.FlinkPipelineExecutionEnvironment: Creating
the required Streaming Environment.
16/03/17 20:14:23 INFO kafka.FlinkKafkaConsumer08: Trying to get topic
metadata from broker localhost:9092 in try 0/3
16/03/17 20:14:23 INFO kafka.FlinkKafkaConsumerBase: Consumer is going to
read the following topics (with number of partitions):
Exception in thread "main" java.lang.RuntimeException: Flink Sources are
supported only when running with the FlinkPipelineRunner.
at
org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedFlinkSource.getDefaultOutputCoder(UnboundedFlinkSource.java:71)
at
com.google.cloud.dataflow.sdk.io.Read$Unbounded.getDefaultOutputCoder(Read.java:230)
at
com.google.cloud.dataflow.sdk.transforms.PTransform.getDefaultOutputCoder(PTransform.java:294)
at
com.google.cloud.dataflow.sdk.transforms.PTransform.getDefaultOutputCoder(PTransform.java:309)
at
com.google.cloud.dataflow.sdk.values.TypedPValue.inferCoderOrFail(TypedPValue.java:167)
at
com.google.cloud.dataflow.sdk.values.TypedPValue.getCoder(TypedPValue.java:48)
at
com.google.cloud.dataflow.sdk.values.PCollection.getCoder(PCollection.java:137)
at
com.google.cloud.dataflow.sdk.values.TypedPValue.finishSpecifying(TypedPValue.java:88)
at com.google.cloud.dataflow.sdk.Pipeline.applyInternal(Pipeline.java:331)
at com.google.cloud.dataflow.sdk.Pipeline.applyTransform(Pipeline.java:274)
at
com.google.cloud.dataflow.sdk.values.PCollection.apply(PCollection.java:161)
at
org.apache.beam.runners.flink.examples.streaming.KafkaWindowedWordCountExample.main(KafkaWindowedWordCountExample.java:127)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

Dive into the UnboundedFlinkSource class, it just like a simple class imply
the UnboundedSource interface with throw RuntimeException.
I just wonder if this Kafka Streaming example is runnable?

Thanks
Jiankang


On Thu, Mar 17, 2016 at 7:35 PM, Maximilian Michels  wrote:

> @Dan: You're right that the PipelineOptions shouldn't be cached like
> this. In this particular wrapper, it was not even necessary.
>
> @Jiankang: I've pushed a fix to the repository with a few
> improvements. Could you please try again? You will have to recompile.
>
> Thanks,
> Max
>
> On Thu, Mar 17, 2016 at 8:44 AM, Dan Halperin  wrote:
> > +Max for the Flink Runner, and +Luke who wrote most of the initial code
> > around PipelineOptions.
> >
> > The UnboundedFlinkSource is caching the `PipelineOptions` object, here:
> >
> https://github.com/apache/incubator-beam/blob/071e4dd67021346b0cab2aafa0900ec7e34c4ef8/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedFlinkSource.java#L36
> >
> > I think this is a mismatch with how we intended them to be used. For
> > example, the PipelineOptions may be changed by a Runner between graph
> > construction time (when the UnboundedFlinkSource is created) and actual
> > pipeline execution time. This is partially why, for example,
> PipelineOptions
> > are provided by the Runner as an argument to functions like
> > DoFn.startBundle, .processElement, and .finishBundle.
> >
> > PipelineOptions itself does not extend Serializable, and per the
> > PipelineOptions documentation it looks like we intend for it to be
> > serialized through Jackson rather than through Java serialization. I bet
> the
> > Flink runner does this, and we probably just need to remove this cached
> > PipelineOptions from the unbounded source.
> >
> > I'll let Luke and Max correct me on any or all of the above :)
> >
> > Thanks,
> > Dan
> >
> > On Wed, Mar 16, 2016 at 10:57 PM, 刘见康  wrote:
> >>
> >> Hi guys,
> >>
> >> Failed to run KafkaWindowedWordCountExample with Unable to serialize
> >> exception, the stack exception as below:
> >>
> >> 16/03/17 13:49:09 INFO flink.FlinkPipelineRunner:
> >> PipelineOptions.filesToStage was not specified. Defaulting to files from
> >> the classpath: will stage 160 files. Enable logging at DEBUG level to
> see
> >> which files will be staged.
> >> 16/03/17 13:49:09 INFO flink.FlinkPipelineExecutionEnvironment: Creating
> >> the required Streaming Environment.
> >> 16/03/17 13:49:09 INFO kafka.FlinkKafkaConsumer08: Trying to get topic
> >> metadata from broker localhost:9092 in try 0/3
> >> 16/03/17 13:49:09 INFO kafka.FlinkKafkaConsumerBase: Consumer is going
> to
> >> read