Re: Possibility for GSoC 2016

2016-03-20 Thread Davor Bonaci
A downside of starting an effort like this is that the Runner API is not
stable yet, and we expect a lot of changes in the next several months. In
that sense, there could be better things to do.

If you choose to proceed, JB would probably be your best resource for this.

On Tue, Mar 15, 2016 at 1:54 AM, Milindu Sanoj Kumarage <
agentmili...@gmail.com> wrote:

> Hi Davor,
>
>  I went through the JIRA and found "Create OSGi/Karaf runner" [1]
> interesting. Like to know more on it. I went through Apache Karaf project
> also.
>
> Is it possible to me to work on this task for GSoC?  I'm seeking the
> guidance of the community to find a good task.
>
> [1] https://issues.apache.org/jira/browse/BEAM-10
>
> Regards,
> Milindu Sanoj Kumarage
> LinkedIn  | GitHub
>  | agentmilindu.com
>
> On Mon, Mar 14, 2016 at 10:22 PM, Davor Bonaci 
> wrote:
>
> > Hi Milindu,
> > You are welcome to look at our JIRA project [1]. If you find something of
> > interest, consider contributing it. That said, please start small!
> >
> > We highly encourage coordinating work on JIRA to avoid frustration later
> > on. Also, we are in process of creating and publishing contributors /
> > developers guide for the project -- please bear with us until we clarify
> > this.
> >
> > [1] https://issues.apache.org/jira/browse/BEAM/
> >
> > Thanks,
> > Davor
> >
> > On Mon, Mar 14, 2016 at 5:48 AM, Milindu Sanoj Kumarage <
> > agentmili...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I'm Milindu Sanoj Kumarage, an undergraduate of University of Colombo
> > > School of Computing doing my 4th year of Computer Science major. I have
> > > just started my 4th year and I am working on a research base on
> > distributed
> > > computing( on Pub/Sub ), therefor I'm working with tools and
> technologies
> > > related to distributed computing. I came across "The Dataflow Model: A
> > > Practical Approach to Balancing Correctness, Latency, and Cost in
> > > Massive-Scale, Unbounded, Out-of-Order Data Processing" [1] and it got
> my
> > > interest. That's how I came to Apache Beam project. I'm just querying
> for
> > > the possibility to work on Apache Beam for GSoC 2016.
> > >
> > > I did my last year GSoC for Apache Stratos building a CLI for Stratos
> > using
> > > Python and year before that it was Sahana Software Foundation building
> a
> > > GIS module for Sahana Vesuvius. I have much experience working in many
> > > cloud related tools such as Google Compute Engine, AWS, etc.
> > >
> > > I'm really interested in working with Apache Beam project if possible,
> > > Please help me with this.
> > >
> > > [1] http://research.google.com/pubs/pub43864.html
> > >
> > >
> > > Regards,
> > > Milindu Sanoj Kumarage
> > > LinkedIn  | GitHub
> > >  | agentmilindu.com
> > >
> >
>


Re: Renaming process: first step Maven coordonates

2016-03-20 Thread Davor Bonaci
I left a few comments on PR #46.

Thanks JB for doing this; a clear improvement.

On Mon, Mar 14, 2016 at 6:04 PM, Jean-Baptiste Onofré 
wrote:

> Hi all,
>
> I started the renaming process from Dataflow to Beam.
>
> I submitted a first PR about the Maven coordinates:
>
> https://github.com/apache/incubator-beam/pull/46
>
> I will start the packages renaming (updating the same PR). For the
> directories structure, I would like to talk with Frances, Dan, Tyler, and
> Davor first.
>
> Regards
> JB
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [HEADS UP] Renaming/polishing

2016-03-20 Thread Davor Bonaci
I believe we'll put ourselves into a corner with "0.1-incubating-SNAPSHOT".

The format has to be: ..-, as per
[1], i.e., no two dashes. If it is not, Maven resolution will get things
wrong by comparing strings instead of numbers: 10 becomes less than 2, etc.
Maven handles "-SNAPSHOT" qualifier specially; qualifier "-incubating-SNAPSHOT"
will not get that benefit.

Here's a very specific example from [1]:

Take the version release numbers “1.2.3-alpha-2” and “1.2.3-alpha-10,”
where the “alpha-2” build corresponds to the 2nd alpha build, and the
“alpha-10” build corresponds to the 10th alpha build. Even though
“alpha-10” should be considered more recent than “alpha-2,” Maven is going
to sort “alpha-10” before “alpha-2”.


There are several orthogonal decisions here:

1. How much version numbers do we need for now? I argue do don't need the
incremental part before the first stable release -- two numbers should be
sufficient. So, the format, before the first stable release, can be
.-.

2. I don't think we need "incubating-SNAPSHOT" ever. For the most part,
both qualifiers communicate the same thing -- that this is not really ready
for primetime yet. For example, we can use -SNAPSHOT for the nightly build,
and "-incubating" for the actual releases while we are in the incubation
phase. Snapshots will not get released anywhere -- no reason for them to
carry "incubating" too; we'll just mess up resolution handling.

3. I found many projects in the Incubator that don't actually have
"incubating" in the version part. Some put it in the artifact id; others
put it in the name only; a few don't have it at all. I dislike the artifact
approach, and I'm neutral between name & version. Name is easier, however.

4. When we release the first stable version, I propose that it is marked as
2.0.0. Before that, we'll likely push several pre-release versions. We have
released 1.5.0 in Dataflow recently, and might release a few more. It might
be smarter to leave a few numbers for any such versions of Dataflow. So, we
could start with something like 1.9.0. I think 0.1 communicates more
clearly that this is a pre-release version.

To summarize, I think a good proposal is as follows:

Start with 0.1-SNAPSHOT. This goes into Beam's parent pom.xml. When we
release 0.1, we override it to 0.1-incubating. At that time, the pom goes
to 0.2-SNAPSHOT, and we release it as 0.2-incubating. Sometime before the
first stable release post incubation, we change it to 2.0.0-SNAPSHOT, and
release as 2.0.0.

[1]
https://books.sonatype.com/mvnref-book/reference/pom-relationships-sect-pom-syntax.html

On Sun, Mar 20, 2016 at 12:31 PM, Jean-Baptiste Onofré 
wrote:

> Hi beamers,
>
> as the project is more and more visible, and we begin to see incoming
> contributions, I think we really have to move forward on the code cleanup
> and polishing.
>
> So, I'm updating PR #46 about renaming the packages and re-organizing the
> folders. I will update the PR by tomorrow.
>
> In the mean time, I sent an e-mail about the version. Right now, I
> proposed 1.5.0-incubating-SNAPSHOT. Some expressed to start with
> 0.1-incubating-SNAPSHOT.
>
> I think 0.1-incubating-SNAPSHOT makes sense. Please, if you disagree, let
> me know, else I will update the version in PR #46.
>
> Thanks
> Regards
> JB
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


[HEADS UP] Renaming/polishing

2016-03-20 Thread Jean-Baptiste Onofré

Hi beamers,

as the project is more and more visible, and we begin to see incoming 
contributions, I think we really have to move forward on the code 
cleanup and polishing.


So, I'm updating PR #46 about renaming the packages and re-organizing 
the folders. I will update the PR by tomorrow.


In the mean time, I sent an e-mail about the version. Right now, I 
proposed 1.5.0-incubating-SNAPSHOT. Some expressed to start with 
0.1-incubating-SNAPSHOT.


I think 0.1-incubating-SNAPSHOT makes sense. Please, if you disagree, 
let me know, else I will update the version in PR #46.


Thanks
Regards
JB
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Committer workflow

2016-03-20 Thread Jean-Baptiste Onofré

Hi all,

As a reminder, we agreed that everyone in the project should use the 
same workflow: prepare a PR, submit the PR, give some time to review, 
apply the PR.


Right now, we had some pushes directly without a PR.

It would be great to we *all* use the same workflow.

Thanks !

Regards
JB
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Capability Matrix

2016-03-20 Thread Tyler Akidau
Just pushed the capability matrix and an attendant blog post to the site:

   - Blog post:
   
http://beam.incubator.apache.org/beam/capability/2016/03/17/capability-matrix.html
   - Matrix: http://beam.incubator.apache.org/capability-matrix/

For those of you that want to keep the matrix up to date as your runner
evolves, you'll want to make updates in the _data/capability-matrix.yml
file:
https://github.com/apache/incubator-beam-site/blob/asf-site/_data/capability-matrix.yml

Thanks to everyone for helping fill out the initial set of capabilities!
Looking forward to updates as things progress. :-)

And thanks also to Max for moving all the website stuff to git!

-Tyler


On Sat, Mar 12, 2016 at 9:37 AM Tyler Akidau  wrote:

> Thanks all! At this point, it looks like most all of the fields have been
> filled out. I'm in the process of migrating the spreadsheet contents to
> YAML within the website source, so I've revoked edit access from the doc to
> keep things from changing while I'm doing that. If you have further edits
> to make, feel free to leave a comment, and I'll incorporate it into the
> YAML.
>
> -Tyler
>
>
> On Thu, Mar 10, 2016 at 12:43 AM Jean-Baptiste Onofré 
> wrote:
>
>> Hi Tyler,
>>
>> good idea !
>>
>> I like it !
>>
>> Regards
>> JB
>>
>> On 03/09/2016 11:14 PM, Tyler Akidau wrote:
>> > I just filed BEAM-104 
>> > regarding publishing a capability matrix on the Beam website. We've
>> seeded
>> > the spreadsheet linked there (
>> >
>> https://docs.google.com/spreadsheets/d/1OM077lZBARrtUi6g0X0O0PHaIbFKCD6v0djRefQRE1I/edit
>> > )
>> > with an initial proposed set of capabilities, as well as descriptions
>> for
>> > the model and Cloud Dataflow. If folks for other runners (currently
>> Flink
>> > and Spark) could please make sure their columns are filled out as well,
>> > it'd be much appreciated. Also let us know if there are capabilities you
>> > think we've missed.
>> >
>> > Our hope is to get this up and published soon, since we've been getting
>> a
>> > lot of questions regarding runner capabilities, portability, etc.
>> >
>> > -Tyler
>> >
>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>