Re: Podling Report Reminder - November 2016

2016-10-27 Thread Jean-Baptiste Onofré
Perfect.

Thanks James !

Regards
JB

⁣​

On Oct 27, 2016, 01:05, at 01:05, James Malone  
wrote:
>Hello everyone!
>
>Unless anyone disagrees or wants to do it, I am happy to volunteer to
>draft
>this podling report for review before we submit it. I can get it done
>for a
>review this Friday (US-Pacific) if that works.
>
>Cheers!
>
>James
>
>On Wed, Oct 26, 2016 at 4:01 PM,  wrote:
>
>> Dear podling,
>>
>> This email was sent by an automated system on behalf of the Apache
>> Incubator PMC. It is an initial reminder to give you plenty of time
>to
>> prepare your quarterly board report.
>>
>> The board meeting is scheduled for Wed, 16 November 2016, 10:30 am
>PDT.
>> The report for your podling will form a part of the Incubator PMC
>> report. The Incubator PMC requires your report to be submitted 2
>weeks
>> before the board meeting, to allow sufficient time for review and
>> submission (Wed, November 02).
>>
>> Please submit your report with sufficient time to allow the Incubator
>> PMC, and subsequently board members to review and digest. Again, the
>> very latest you should submit your report is 2 weeks prior to the
>board
>> meeting.
>>
>> Thanks,
>>
>> The Apache Incubator PMC
>>
>> Submitting your Report
>>
>> --
>>
>> Your report should contain the following:
>>
>> *   Your project name
>> *   A brief description of your project, which assumes no knowledge
>of
>> the project or necessarily of its field
>> *   A list of the three most important issues to address in the move
>> towards graduation.
>> *   Any issues that the Incubator PMC or ASF Board might wish/need to
>be
>> aware of
>> *   How has the community developed since the last report
>> *   How has the project developed since the last report.
>>
>> This should be appended to the Incubator Wiki page at:
>>
>> http://wiki.apache.org/incubator/November2016
>>
>> Note: This is manually populated. You may need to wait a little
>before
>> this page is created from a template.
>>
>> Mentors
>> ---
>>
>> Mentors should review reports for their project(s) and sign them off
>on
>> the Incubator wiki page. Signing off reports shows that you are
>> following the project - projects that are not signed may raise alarms
>> for the Incubator PMC.
>>
>> Incubator PMC
>>


Re: [DISCUSS] Merging master -> feature branch

2016-10-27 Thread Frances Perry
Great, let's document that in the feature branch section of the
contribution guide:
http://beam.incubator.apache.org/contribute/contribution-guide/#feature-branches

Anyone want to take that?

On Thu, Oct 27, 2016 at 1:01 PM, Kenneth Knowles 
wrote:

> In the spirit of explicitly summarizing and concluding threads on list: I
> think we have affirmative consensus to go for it when a downstream
> integration is completely conflict-free and fixup-free.
>
> On Thu, Oct 27, 2016 at 12:43 PM Robert Bradshaw
>  wrote:
>
> > My concern was mostly about what to do in the face of conflicts, but
> > it sounds like the consensus is that for a clean merge, with no
> > conflicts or test breakage (or other concerns) a committer is free to
> > push without any oversight which is fine by me.
> >
> > [If/when the Mergbot comes into action, and runs more extensive tests
> > than standard precommit, it might make sense to still go through that
> > rather than debug bad merges discovered in postcommit tests.]
> >
> > On Wed, Oct 26, 2016 at 9:07 PM, Davor Bonaci 
> > wrote:
> > > +1
> > >
> > > I concur it is fine to proceed with a downstream integration (master ->
> > > feature branch -> sub-feature branch) without waiting for review for a
> > > completely clean merge. Exactly as proposed -- I think there should
> still
> > > be a pull request and comment saying it is a clean merge. (In some
> ideal
> > > world, this would happen nightly by a tool automatically, but I think
> > > that's not feasible in the short term.)
> > >
> > > I think other cases (upstream integration, merge conflict, any manual
> > > action, etc.) should still wait for a normal review.
> > >
> > > On Wed, Oct 26, 2016 at 10:34 AM, Thomas Weise  wrote:
> > >
> > >> +1
> > >>
> > >> For a merge from master to the feature branch that does not require
> > extra
> > >> changes, RTC does not add value. It actually delays and burns reviewer
> > time
> > >> (even mechanics need some) that "real" PRs could benefit from. If
> > >> adjustments are needed, then the regular process kicks in.
> > >>
> > >> Thanks,
> > >> Thomas
> > >>
> > >>
> > >> On Wed, Oct 26, 2016 at 1:33 AM, Amit Sela 
> > wrote:
> > >>
> > >> > I generally agree with Kenneth.
> > >> >
> > >> > While working on the SparkRunnerV2 branch, it was a pain - i avoided
> > >> > frequent merges to avoid trivial PRs, but it cost me with very large
> > and
> > >> > non-trivial merges later.
> > >> > I think that frequent merges for feature-branches should most of the
> > time
> > >> > be trivial (no conflicts) and a committer should be allowed to
> > self-merge
> > >> > once tests pass.
> > >> > As for conflicts, even for the smallest once I'd go with review just
> > so
> > >> > it's very clear when self-merging is OK - we can always revisit this
> > >> later
> > >> > and further discuss if we think we can improve this process.
> > >> >
> > >> > I guess +1 from me.
> > >> >
> > >> > Thanks,
> > >> > Amit.
> > >> >
> > >> > On Wed, Oct 26, 2016 at 8:10 AM Frances Perry
>  > >
> > >> > wrote:
> > >> >
> > >> > > On Tue, Oct 25, 2016 at 9:44 PM, Jean-Baptiste Onofré <
> > j...@nanthrax.net
> > >> >
> > >> > > wrote:
> > >> > >
> > >> > > > Agree. When possible it would be great to have the branch merged
> > on
> > >> > > master
> > >> > > > quickly, even when it's not fully ready. It would give more
> > >> visibility
> > >> > to
> > >> > > > potential contributors.
> > >> > > >
> > >> > >
> > >> > > This thread is about the opposite, I think -- merging master into
> > >> feature
> > >> > > branches regularly to prevent them from getting out of sync.
> > >> > >
> > >> > > As for increasing the visibility of feature branches, we have
> these
> > new
> > >> > > webpages:
> > >> > > http://beam.incubator.apache.org/contribute/work-in-progress/
> > >> > > http://beam.incubator.apache.org/contribute/contribution-
> > >> > > guide/#feature-branches
> > >> > > with more changes coming in the basic SDK/Runner landing pages
> too.
> > >> > >
> > >> >
> > >>
> >
>


Re: Placement of temporary files by FileBasedSink

2016-10-27 Thread Chamikara Jayalath
Yeah, you are right. I was testing using 'gsutil' which behaves differently.

Thanks,
Cham

On Thu, Oct 27, 2016 at 2:06 PM Eugene Kirpichov
 wrote:

> Indeed IOChannelFactory uses GcsUtil for GCS, and GcsUtil in fact does not
> recurse into subdirectories inside a "*" pattern (see
>
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java#L598
> )
> , and it does not support "**" patterns. However, this is not unit-tested
> by GcsUtilTest, which is a separate issue.
>
> On Thu, Oct 27, 2016 at 1:56 PM Eugene Kirpichov 
> wrote:
>
> > I don't think your assessment of behavior of glob patterns correct, per
> >
> https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames#directory-by-directory-vs-recursive-wildcards
> >  .
> > I believe (and hope) that behavior of IOChannelFactory.match() matches
> the
> > behavior of gsutil.
> >
> > On Thu, Oct 27, 2016 at 1:48 PM Chamikara Jayalath  >
> > wrote:
> >
> > BTW I'm in favor of using a sub-directory and possibly asking users to
> > update their glob pattern while also allowing users to optionally
> specify a
> > temporary path in the future, as you propose.
> >
> > Thanks,
> > Cham
> >
> > On Thu, Oct 27, 2016 at 1:45 PM Chamikara Jayalath  >
> > wrote:
> >
> > > On Thu, Oct 27, 2016 at 1:27 PM Eugene Kirpichov
> > >  wrote:
> > >
> > > Getting back to this. I noticed that the original user's job mentioned
> in
> > >
> > >
> >
> http://stackoverflow.com/questions/39822859/temp-files-remain-in-gcs-after-a-dataflow-job-succeeded
> > > is
> > > configured to write to /path/to/$date/foo-x-of-y and another
> job
> > > then reads from /path/to/$date/*, so sibling files won't work - it's
> > > necessary to put temp files either into a subdirectory, or in a
> location
> > > completely outside /path/to/$date/.
> > >
> > >
> > > I think, at least for GCS, glob pattern '/path/to/$date/*' will include
> > > files that are within any immediate sub-directory
> '/path/to/$date/uuid/'.
> > > So unless users use the pattern '/path/to/$date/foo*' they could run
> into
> > > the same issue.
> > >
> > > Thanks,
> > > Cham
> > >
> > >
> > >
> > > By the way, if we ever support recursive globs (e.g.
> /path/to/foo/**/*),
> > > then a subdirectory won't help; and if a user has another job that
> reads
> > > from, say, /path/to/**/* (without the "foo" component - e.g. if foo is
> a
> > > date, and they have a job that reads all data for all dates), then a
> > > sibling directory won't help either.
> > >
> > > I think these two cases are good motivation for allowing the user to
> > > provide a specific temp directory, as a last resort.
> > >
> > > To sum up:
> > > - in order to solve the user's problem, we need to use a directory
> > > - in the future we'll need to allow users to configure the temp
> directory
> > > on FileBasedSink.
> > >
> > > The current PR takes the "directory sibling to the write path"
> approach,
> > > and I don't see a better option that would address the needs of most
> > users
> > > automatically.
> > >
> > > Dan - you mentioned on the PR that you would prefer a subdirectory to a
> > > sibling directory, but this *is* a subdirectory (specified write path
> is
> > > /path/to/$date/foo-x-of-y and the suggested temp path is
> > > /path/to/$date/temp-beam-foo-$uid/ which is a subdirectory of the
> > directory
> > > to which the sink is writing).
> > >
> > > Any alternatives / objections to proceeding with the approach in the PR
> > > as-is?
> > >
> > > On Thu, Oct 20, 2016 at 6:26 PM Kenneth Knowles  >
> > > wrote:
> > >
> > > > @Eugene, we can make breaking changes. But if we really don't want
> to,
> > we
> > > > can add it under a new name easily. That particular inheritance
> > hierarchy
> > > > is not precious IMO.
> > > >
> > > > On Thu, Oct 20, 2016, 14:03 Eugene Kirpichov
> > >  > > > >
> > > > wrote:
> > > >
> > > > > @Cham - this addresses temporary files that were written by
> > successful
> > > > > bundles, but not by failed bundles (and not the case when the
> entire
> > > > > pipeline fails), so it's not sufficient.
> > > > >
> > > > > @Dan - yes, there are situations when it's impossible to create a
> > > > sibling.
> > > > > In that case, we'd need a fallback - either something the user
> needs
> > to
> > > > > explicitly specify ("your path is such that we don't know where to
> > > place
> > > > > temporary files, please specify withTempLocation or something"),
> or I
> > > > like
> > > > > Robert's option of using sibling but differently-named files in
> this
> > > > case.
> > > > >
> > > > > @Kenn - yeah, a directory-based format would be great
> > > > > (/path/to/foo/x-of-y), but this would be a breaking change
> to
> > > the
> > > > > expected behavior.
> > > > >
> 

Re: Placement of temporary files by FileBasedSink

2016-10-27 Thread Eugene Kirpichov
I don't think your assessment of behavior of glob patterns correct, per
https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames#directory-by-directory-vs-recursive-wildcards
 .
I believe (and hope) that behavior of IOChannelFactory.match() matches the
behavior of gsutil.

On Thu, Oct 27, 2016 at 1:48 PM Chamikara Jayalath 
wrote:

> BTW I'm in favor of using a sub-directory and possibly asking users to
> update their glob pattern while also allowing users to optionally specify a
> temporary path in the future, as you propose.
>
> Thanks,
> Cham
>
> On Thu, Oct 27, 2016 at 1:45 PM Chamikara Jayalath 
> wrote:
>
> > On Thu, Oct 27, 2016 at 1:27 PM Eugene Kirpichov
> >  wrote:
> >
> > Getting back to this. I noticed that the original user's job mentioned in
> >
> >
> http://stackoverflow.com/questions/39822859/temp-files-remain-in-gcs-after-a-dataflow-job-succeeded
> > is
> > configured to write to /path/to/$date/foo-x-of-y and another job
> > then reads from /path/to/$date/*, so sibling files won't work - it's
> > necessary to put temp files either into a subdirectory, or in a location
> > completely outside /path/to/$date/.
> >
> >
> > I think, at least for GCS, glob pattern '/path/to/$date/*' will include
> > files that are within any immediate sub-directory '/path/to/$date/uuid/'.
> > So unless users use the pattern '/path/to/$date/foo*' they could run into
> > the same issue.
> >
> > Thanks,
> > Cham
> >
> >
> >
> > By the way, if we ever support recursive globs (e.g. /path/to/foo/**/*),
> > then a subdirectory won't help; and if a user has another job that reads
> > from, say, /path/to/**/* (without the "foo" component - e.g. if foo is a
> > date, and they have a job that reads all data for all dates), then a
> > sibling directory won't help either.
> >
> > I think these two cases are good motivation for allowing the user to
> > provide a specific temp directory, as a last resort.
> >
> > To sum up:
> > - in order to solve the user's problem, we need to use a directory
> > - in the future we'll need to allow users to configure the temp directory
> > on FileBasedSink.
> >
> > The current PR takes the "directory sibling to the write path" approach,
> > and I don't see a better option that would address the needs of most
> users
> > automatically.
> >
> > Dan - you mentioned on the PR that you would prefer a subdirectory to a
> > sibling directory, but this *is* a subdirectory (specified write path is
> > /path/to/$date/foo-x-of-y and the suggested temp path is
> > /path/to/$date/temp-beam-foo-$uid/ which is a subdirectory of the
> directory
> > to which the sink is writing).
> >
> > Any alternatives / objections to proceeding with the approach in the PR
> > as-is?
> >
> > On Thu, Oct 20, 2016 at 6:26 PM Kenneth Knowles 
> > wrote:
> >
> > > @Eugene, we can make breaking changes. But if we really don't want to,
> we
> > > can add it under a new name easily. That particular inheritance
> hierarchy
> > > is not precious IMO.
> > >
> > > On Thu, Oct 20, 2016, 14:03 Eugene Kirpichov
> >  > > >
> > > wrote:
> > >
> > > > @Cham - this addresses temporary files that were written by
> successful
> > > > bundles, but not by failed bundles (and not the case when the entire
> > > > pipeline fails), so it's not sufficient.
> > > >
> > > > @Dan - yes, there are situations when it's impossible to create a
> > > sibling.
> > > > In that case, we'd need a fallback - either something the user needs
> to
> > > > explicitly specify ("your path is such that we don't know where to
> > place
> > > > temporary files, please specify withTempLocation or something"), or I
> > > like
> > > > Robert's option of using sibling but differently-named files in this
> > > case.
> > > >
> > > > @Kenn - yeah, a directory-based format would be great
> > > > (/path/to/foo/x-of-y), but this would be a breaking change to
> > the
> > > > expected behavior.
> > > >
> > > > I actually really like the option of sibling-but-differently-named
> > files
> > > > (/path/to/temp-beam-foo-$uid) which would be a very non-invasive
> change
> > > to
> > > > the current (/path/to/foo-temp-$uid) and indeed would not involve
> > > creating
> > > > new directories or needing new IOChannelFactory APIs. It will still
> > > match a
> > > > glob like /path/to/* though (which a user could conceivably specify
> in
> > a
> > > > situation like gs://my-logs-bucket/*), but it might be good enough.
> > > >
> > > >
> > > > On Thu, Oct 20, 2016 at 10:14 AM Robert Bradshaw
> > > >  wrote:
> > > >
> > > > > On Thu, Oct 20, 2016 at 9:58 AM, Kenneth Knowles
> > >  > > > >
> > > > > wrote:
> > > > > > I like the spirit of proposal #1 for addressing the critical
> > > > duplication
> > > > > > problem, though as Dan points out the logic to choose a related
> but

Re: Placement of temporary files by FileBasedSink

2016-10-27 Thread Chamikara Jayalath
BTW I'm in favor of using a sub-directory and possibly asking users to
update their glob pattern while also allowing users to optionally specify a
temporary path in the future, as you propose.

Thanks,
Cham

On Thu, Oct 27, 2016 at 1:45 PM Chamikara Jayalath 
wrote:

> On Thu, Oct 27, 2016 at 1:27 PM Eugene Kirpichov
>  wrote:
>
> Getting back to this. I noticed that the original user's job mentioned in
>
> http://stackoverflow.com/questions/39822859/temp-files-remain-in-gcs-after-a-dataflow-job-succeeded
> is
> configured to write to /path/to/$date/foo-x-of-y and another job
> then reads from /path/to/$date/*, so sibling files won't work - it's
> necessary to put temp files either into a subdirectory, or in a location
> completely outside /path/to/$date/.
>
>
> I think, at least for GCS, glob pattern '/path/to/$date/*' will include
> files that are within any immediate sub-directory '/path/to/$date/uuid/'.
> So unless users use the pattern '/path/to/$date/foo*' they could run into
> the same issue.
>
> Thanks,
> Cham
>
>
>
> By the way, if we ever support recursive globs (e.g. /path/to/foo/**/*),
> then a subdirectory won't help; and if a user has another job that reads
> from, say, /path/to/**/* (without the "foo" component - e.g. if foo is a
> date, and they have a job that reads all data for all dates), then a
> sibling directory won't help either.
>
> I think these two cases are good motivation for allowing the user to
> provide a specific temp directory, as a last resort.
>
> To sum up:
> - in order to solve the user's problem, we need to use a directory
> - in the future we'll need to allow users to configure the temp directory
> on FileBasedSink.
>
> The current PR takes the "directory sibling to the write path" approach,
> and I don't see a better option that would address the needs of most users
> automatically.
>
> Dan - you mentioned on the PR that you would prefer a subdirectory to a
> sibling directory, but this *is* a subdirectory (specified write path is
> /path/to/$date/foo-x-of-y and the suggested temp path is
> /path/to/$date/temp-beam-foo-$uid/ which is a subdirectory of the directory
> to which the sink is writing).
>
> Any alternatives / objections to proceeding with the approach in the PR
> as-is?
>
> On Thu, Oct 20, 2016 at 6:26 PM Kenneth Knowles 
> wrote:
>
> > @Eugene, we can make breaking changes. But if we really don't want to, we
> > can add it under a new name easily. That particular inheritance hierarchy
> > is not precious IMO.
> >
> > On Thu, Oct 20, 2016, 14:03 Eugene Kirpichov
>  > >
> > wrote:
> >
> > > @Cham - this addresses temporary files that were written by successful
> > > bundles, but not by failed bundles (and not the case when the entire
> > > pipeline fails), so it's not sufficient.
> > >
> > > @Dan - yes, there are situations when it's impossible to create a
> > sibling.
> > > In that case, we'd need a fallback - either something the user needs to
> > > explicitly specify ("your path is such that we don't know where to
> place
> > > temporary files, please specify withTempLocation or something"), or I
> > like
> > > Robert's option of using sibling but differently-named files in this
> > case.
> > >
> > > @Kenn - yeah, a directory-based format would be great
> > > (/path/to/foo/x-of-y), but this would be a breaking change to
> the
> > > expected behavior.
> > >
> > > I actually really like the option of sibling-but-differently-named
> files
> > > (/path/to/temp-beam-foo-$uid) which would be a very non-invasive change
> > to
> > > the current (/path/to/foo-temp-$uid) and indeed would not involve
> > creating
> > > new directories or needing new IOChannelFactory APIs. It will still
> > match a
> > > glob like /path/to/* though (which a user could conceivably specify in
> a
> > > situation like gs://my-logs-bucket/*), but it might be good enough.
> > >
> > >
> > > On Thu, Oct 20, 2016 at 10:14 AM Robert Bradshaw
> > >  wrote:
> > >
> > > > On Thu, Oct 20, 2016 at 9:58 AM, Kenneth Knowles
> >  > > >
> > > > wrote:
> > > > > I like the spirit of proposal #1 for addressing the critical
> > > duplication
> > > > > problem, though as Dan points out the logic to choose a related but
> > > > > collision-free name might be slightly more complex.
> > > > >
> > > > > It is a nice bonus that it addresses the less critical issues and
> > > > improves
> > > > > usability for manual inspections and interventions.
> > > > >
> > > > > The term "sibling" is being slightly misused here. I'd say #1 as
> > > proposed
> > > > > is a "sibling of the parent" while today's behavior is "sibling".
> I'd
> > > > say a
> > > > > root cause of multiple problems is that our sharded file format is
> "a
> > > > bunch
> > > > > of files next to each other" and the sibling is "other files in the
> > > same
> > 

Re: Placement of temporary files by FileBasedSink

2016-10-27 Thread Chamikara Jayalath
On Thu, Oct 27, 2016 at 1:27 PM Eugene Kirpichov
 wrote:

> Getting back to this. I noticed that the original user's job mentioned in
>
> http://stackoverflow.com/questions/39822859/temp-files-remain-in-gcs-after-a-dataflow-job-succeeded
> is
> configured to write to /path/to/$date/foo-x-of-y and another job
> then reads from /path/to/$date/*, so sibling files won't work - it's
> necessary to put temp files either into a subdirectory, or in a location
> completely outside /path/to/$date/.
>

I think, at least for GCS, glob pattern '/path/to/$date/*' will include
files that are within any immediate sub-directory '/path/to/$date/uuid/'.
So unless users use the pattern '/path/to/$date/foo*' they could run into
the same issue.

Thanks,
Cham


>
> By the way, if we ever support recursive globs (e.g. /path/to/foo/**/*),
> then a subdirectory won't help; and if a user has another job that reads
> from, say, /path/to/**/* (without the "foo" component - e.g. if foo is a
> date, and they have a job that reads all data for all dates), then a
> sibling directory won't help either.
>
> I think these two cases are good motivation for allowing the user to
> provide a specific temp directory, as a last resort.
>
> To sum up:
> - in order to solve the user's problem, we need to use a directory
> - in the future we'll need to allow users to configure the temp directory
> on FileBasedSink.
>
> The current PR takes the "directory sibling to the write path" approach,
> and I don't see a better option that would address the needs of most users
> automatically.
>
> Dan - you mentioned on the PR that you would prefer a subdirectory to a
> sibling directory, but this *is* a subdirectory (specified write path is
> /path/to/$date/foo-x-of-y and the suggested temp path is
> /path/to/$date/temp-beam-foo-$uid/ which is a subdirectory of the directory
> to which the sink is writing).
>
> Any alternatives / objections to proceeding with the approach in the PR
> as-is?
>
> On Thu, Oct 20, 2016 at 6:26 PM Kenneth Knowles 
> wrote:
>
> > @Eugene, we can make breaking changes. But if we really don't want to, we
> > can add it under a new name easily. That particular inheritance hierarchy
> > is not precious IMO.
> >
> > On Thu, Oct 20, 2016, 14:03 Eugene Kirpichov
>  > >
> > wrote:
> >
> > > @Cham - this addresses temporary files that were written by successful
> > > bundles, but not by failed bundles (and not the case when the entire
> > > pipeline fails), so it's not sufficient.
> > >
> > > @Dan - yes, there are situations when it's impossible to create a
> > sibling.
> > > In that case, we'd need a fallback - either something the user needs to
> > > explicitly specify ("your path is such that we don't know where to
> place
> > > temporary files, please specify withTempLocation or something"), or I
> > like
> > > Robert's option of using sibling but differently-named files in this
> > case.
> > >
> > > @Kenn - yeah, a directory-based format would be great
> > > (/path/to/foo/x-of-y), but this would be a breaking change to
> the
> > > expected behavior.
> > >
> > > I actually really like the option of sibling-but-differently-named
> files
> > > (/path/to/temp-beam-foo-$uid) which would be a very non-invasive change
> > to
> > > the current (/path/to/foo-temp-$uid) and indeed would not involve
> > creating
> > > new directories or needing new IOChannelFactory APIs. It will still
> > match a
> > > glob like /path/to/* though (which a user could conceivably specify in
> a
> > > situation like gs://my-logs-bucket/*), but it might be good enough.
> > >
> > >
> > > On Thu, Oct 20, 2016 at 10:14 AM Robert Bradshaw
> > >  wrote:
> > >
> > > > On Thu, Oct 20, 2016 at 9:58 AM, Kenneth Knowles
> >  > > >
> > > > wrote:
> > > > > I like the spirit of proposal #1 for addressing the critical
> > > duplication
> > > > > problem, though as Dan points out the logic to choose a related but
> > > > > collision-free name might be slightly more complex.
> > > > >
> > > > > It is a nice bonus that it addresses the less critical issues and
> > > > improves
> > > > > usability for manual inspections and interventions.
> > > > >
> > > > > The term "sibling" is being slightly misused here. I'd say #1 as
> > > proposed
> > > > > is a "sibling of the parent" while today's behavior is "sibling".
> I'd
> > > > say a
> > > > > root cause of multiple problems is that our sharded file format is
> "a
> > > > bunch
> > > > > of files next to each other" and the sibling is "other files in the
> > > same
> > > > > directory" so it takes some care, and explicit file name tracking
> > > instead
> > > > > of globbing, to work with it correctly.
> > > > >
> > > > >  AFAIK (corrections welcome) there is nothing special about
> > > > > Write.to("s3://bucket/file") meaning write to
> > > > > 

Re: [DISCUSS] Merging master -> feature branch

2016-10-27 Thread Kenneth Knowles
In the spirit of explicitly summarizing and concluding threads on list: I
think we have affirmative consensus to go for it when a downstream
integration is completely conflict-free and fixup-free.

On Thu, Oct 27, 2016 at 12:43 PM Robert Bradshaw
 wrote:

> My concern was mostly about what to do in the face of conflicts, but
> it sounds like the consensus is that for a clean merge, with no
> conflicts or test breakage (or other concerns) a committer is free to
> push without any oversight which is fine by me.
>
> [If/when the Mergbot comes into action, and runs more extensive tests
> than standard precommit, it might make sense to still go through that
> rather than debug bad merges discovered in postcommit tests.]
>
> On Wed, Oct 26, 2016 at 9:07 PM, Davor Bonaci 
> wrote:
> > +1
> >
> > I concur it is fine to proceed with a downstream integration (master ->
> > feature branch -> sub-feature branch) without waiting for review for a
> > completely clean merge. Exactly as proposed -- I think there should still
> > be a pull request and comment saying it is a clean merge. (In some ideal
> > world, this would happen nightly by a tool automatically, but I think
> > that's not feasible in the short term.)
> >
> > I think other cases (upstream integration, merge conflict, any manual
> > action, etc.) should still wait for a normal review.
> >
> > On Wed, Oct 26, 2016 at 10:34 AM, Thomas Weise  wrote:
> >
> >> +1
> >>
> >> For a merge from master to the feature branch that does not require
> extra
> >> changes, RTC does not add value. It actually delays and burns reviewer
> time
> >> (even mechanics need some) that "real" PRs could benefit from. If
> >> adjustments are needed, then the regular process kicks in.
> >>
> >> Thanks,
> >> Thomas
> >>
> >>
> >> On Wed, Oct 26, 2016 at 1:33 AM, Amit Sela 
> wrote:
> >>
> >> > I generally agree with Kenneth.
> >> >
> >> > While working on the SparkRunnerV2 branch, it was a pain - i avoided
> >> > frequent merges to avoid trivial PRs, but it cost me with very large
> and
> >> > non-trivial merges later.
> >> > I think that frequent merges for feature-branches should most of the
> time
> >> > be trivial (no conflicts) and a committer should be allowed to
> self-merge
> >> > once tests pass.
> >> > As for conflicts, even for the smallest once I'd go with review just
> so
> >> > it's very clear when self-merging is OK - we can always revisit this
> >> later
> >> > and further discuss if we think we can improve this process.
> >> >
> >> > I guess +1 from me.
> >> >
> >> > Thanks,
> >> > Amit.
> >> >
> >> > On Wed, Oct 26, 2016 at 8:10 AM Frances Perry  >
> >> > wrote:
> >> >
> >> > > On Tue, Oct 25, 2016 at 9:44 PM, Jean-Baptiste Onofré <
> j...@nanthrax.net
> >> >
> >> > > wrote:
> >> > >
> >> > > > Agree. When possible it would be great to have the branch merged
> on
> >> > > master
> >> > > > quickly, even when it's not fully ready. It would give more
> >> visibility
> >> > to
> >> > > > potential contributors.
> >> > > >
> >> > >
> >> > > This thread is about the opposite, I think -- merging master into
> >> feature
> >> > > branches regularly to prevent them from getting out of sync.
> >> > >
> >> > > As for increasing the visibility of feature branches, we have these
> new
> >> > > webpages:
> >> > > http://beam.incubator.apache.org/contribute/work-in-progress/
> >> > > http://beam.incubator.apache.org/contribute/contribution-
> >> > > guide/#feature-branches
> >> > > with more changes coming in the basic SDK/Runner landing pages too.
> >> > >
> >> >
> >>
>


Re: [DISCUSS] Merging master -> feature branch

2016-10-27 Thread Robert Bradshaw
My concern was mostly about what to do in the face of conflicts, but
it sounds like the consensus is that for a clean merge, with no
conflicts or test breakage (or other concerns) a committer is free to
push without any oversight which is fine by me.

[If/when the Mergbot comes into action, and runs more extensive tests
than standard precommit, it might make sense to still go through that
rather than debug bad merges discovered in postcommit tests.]

On Wed, Oct 26, 2016 at 9:07 PM, Davor Bonaci  wrote:
> +1
>
> I concur it is fine to proceed with a downstream integration (master ->
> feature branch -> sub-feature branch) without waiting for review for a
> completely clean merge. Exactly as proposed -- I think there should still
> be a pull request and comment saying it is a clean merge. (In some ideal
> world, this would happen nightly by a tool automatically, but I think
> that's not feasible in the short term.)
>
> I think other cases (upstream integration, merge conflict, any manual
> action, etc.) should still wait for a normal review.
>
> On Wed, Oct 26, 2016 at 10:34 AM, Thomas Weise  wrote:
>
>> +1
>>
>> For a merge from master to the feature branch that does not require extra
>> changes, RTC does not add value. It actually delays and burns reviewer time
>> (even mechanics need some) that "real" PRs could benefit from. If
>> adjustments are needed, then the regular process kicks in.
>>
>> Thanks,
>> Thomas
>>
>>
>> On Wed, Oct 26, 2016 at 1:33 AM, Amit Sela  wrote:
>>
>> > I generally agree with Kenneth.
>> >
>> > While working on the SparkRunnerV2 branch, it was a pain - i avoided
>> > frequent merges to avoid trivial PRs, but it cost me with very large and
>> > non-trivial merges later.
>> > I think that frequent merges for feature-branches should most of the time
>> > be trivial (no conflicts) and a committer should be allowed to self-merge
>> > once tests pass.
>> > As for conflicts, even for the smallest once I'd go with review just so
>> > it's very clear when self-merging is OK - we can always revisit this
>> later
>> > and further discuss if we think we can improve this process.
>> >
>> > I guess +1 from me.
>> >
>> > Thanks,
>> > Amit.
>> >
>> > On Wed, Oct 26, 2016 at 8:10 AM Frances Perry 
>> > wrote:
>> >
>> > > On Tue, Oct 25, 2016 at 9:44 PM, Jean-Baptiste Onofré > >
>> > > wrote:
>> > >
>> > > > Agree. When possible it would be great to have the branch merged on
>> > > master
>> > > > quickly, even when it's not fully ready. It would give more
>> visibility
>> > to
>> > > > potential contributors.
>> > > >
>> > >
>> > > This thread is about the opposite, I think -- merging master into
>> feature
>> > > branches regularly to prevent them from getting out of sync.
>> > >
>> > > As for increasing the visibility of feature branches, we have these new
>> > > webpages:
>> > > http://beam.incubator.apache.org/contribute/work-in-progress/
>> > > http://beam.incubator.apache.org/contribute/contribution-
>> > > guide/#feature-branches
>> > > with more changes coming in the basic SDK/Runner landing pages too.
>> > >
>> >
>>


Re: Can we have more quick start examples ?

2016-10-27 Thread Neelesh Salian
+1 to this.
I liked the guides for the setup for GC and Storage.
The documentation is by far better than any guide I have seen. I also
provided feedback on the documentation where it could use improvement.

But certainly a more abstract and user friendly example would be
encouraging for new users and folks curious in using Beam itself.

On Thu, Oct 27, 2016 at 11:49 AM, Jean-Baptiste Onofré 
wrote:

> Yes it sounds good to me. I would love to see this as part of the examples.
>
> Ismael and I also started the beam-samples (http://github.com/jbonofre/
> beam-samples) that could be part of the examples.
> The purpose is to have more real use cases implementation with real data.
>
> Regards
> JB
>
> ⁣​
>
> On Oct 27, 2016, 17:17, at 17:17, Jesse Anderson 
> wrote:
> >
>



-- 
Neelesh Srinivas Salian
Customer Operations Engineer


Re: [DISCUSS] Using Verbs for Transforms

2016-10-27 Thread Neelesh Salian
Thanks everyone for all the inputs.
It's really encouraging for a new contributor, as myself, to get valuable
input and mentoring (like on this thread) and, in turn, help make the
community better.



On Thu, Oct 27, 2016 at 11:41 AM, Jean-Baptiste Onofré 
wrote:

> You did well ! It's an interesting discussion we have and it's great to
> have it on the mailing list (better than in Jira or PR comments IMHO).
>
> Thanks !
>
> Regards
> JB
>
> ⁣​
>
> On Oct 27, 2016, 20:39, at 20:39, Robert Bradshaw
>  wrote:
> >+1 to all Dan says.
> >
> >I only brought this up because it seemed new contributors (yay)
> >jumping in and renaming a core transform based on "Something to
> >consider" deserved a couple more more eyeballs, but didn't intend for
> >it to become a big deal.
> >
> >On Thu, Oct 27, 2016 at 11:03 AM, Dan Halperin
> > wrote:
> >> Folks, I don't think this needs to be a "vote". This is just not that
> >big a
> >> deal :). It is important to be transparent and have these discussions
> >on
> >> the list, which is why we brought it here from GitHub/JIRA, but at
> >the end
> >> of the day I hope that a small group of committers and developers can
> >> assess "good enough" consensus for these minor issues.
> >>
> >> Here's my assessment:
> >> * We don't really have any rules about naming transforms. "Should be
> >a
> >> verb" is a sort of guiding principle inherited from the Google Flume
> >> project from which Dataflow evolved, but honestly we violate this
> >rule for
> >> clarity all over the place. ("Values", for example).
> >> * The "Big Data" community is significantly more familiar with the
> >concept
> >> of Distinct -- Jesse, who filed the original JIRA, is a good example
> >here.
> >> * Finally, nobody feels very strongly. We could argue minor points of
> >each
> >> solution, but at the end of the day I don't think anyone wants to
> >block a
> >> change.
> >>
> >> Let's go with Distinct. It's important to align Beam with the open
> >source
> >> big data community. (And thanks Jesse, our newest (*tied) committer,
> >for
> >> pushing us in the right direction!)
> >>
> >> Jesse, can you please take charge of wrapping up the PR and merging
> >it?
> >>
> >> Thanks!
> >> Dan
> >>
> >> On Wed, Oct 26, 2016 at 11:12 PM, Jean-Baptiste Onofré
> >
> >> wrote:
> >>
> >>> Just to clarify. Davor is right for a code modification change: -1
> >means a
> >>> veto.
> >>> I meant that -1 is not a veto for a release vote.
> >>>
> >>> Anyway, even if it's not a formal code, we can have a discussion
> >with
> >>> "options" a,b and c.
> >>>
> >>> Regards
> >>> JB
> >>>
> >>> ⁣
> >>>
> >>> On Oct 27, 2016, 06:48, at 06:48, Davor Bonaci
> >
> >>> wrote:
> >>> >In terms of reaching a decision on any code or design changes,
> >>> >including
> >>> >this one, I'd suggest going without formal votes. Voting process
> >for
> >>> >code
> >>> >modifications between choices A and B doesn't necessarily end with
> >a
> >>> >decision A or B -- a single (qualified) -1 vote is a veto and
> >cannot be
> >>> >overridden [1]. Said differently, the guideline is that code
> >changes
> >>> >should
> >>> >be made by consensus; not by one group outvoting another. I'd like
> >to
> >>> >avoid
> >>> >setting such precedent; we should try to drive consensus, as
> >opposed to
> >>> >attempting to outvote another part of the community.
> >>> >
> >>> >In this particular case, we have had a great discussion. Many
> >>> >contributors
> >>> >brought different perspectives. Consequently, some opinions have
> >been
> >>> >likely changed. At this point, someone should summarize the
> >arguments,
> >>> >try
> >>> >to critique them from a neutral standpoint, and suggest a refined
> >>> >proposal
> >>> >that takes these perspectives into account. If nobody objects in a
> >>> >short
> >>> >time, we should consider this decided. [ I can certainly help here,
> >but
> >>> >I'd
> >>> >love to see somebody else do it! ]
> >>> >
> >>> >[1] http://www.apache.org/foundation/voting.html
> >>> >
> >>> >On Wed, Oct 26, 2016 at 7:35 AM, Ben Chambers
> >>> >
> >>> >wrote:
> >>> >
> >>> >> I also like Distinct since it doesn't make it sound like it
> >modifies
> >>> >any
> >>> >> underlying collection. RemoveDuplicates makes it sound like the
> >>> >duplicates
> >>> >> are removed, rather than a new PCollection without duplicates
> >being
> >>> >> returned.
> >>> >>
> >>> >> On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré
> >
> >>> >> wrote:
> >>> >>
> >>> >> > Agree. It was more a transition proposal.
> >>> >> >
> >>> >> > Regards
> >>> >> > JB
> >>> >> >
> >>> >> > ⁣
> >>> >> >
> >>> >> > On Oct 26, 2016, 08:31, at 08:31, Robert Bradshaw
> >>> >> >  wrote:
> >>> >> > >On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré
> >>> >> > > wrote:
> >>> >> > >> 

Re: Can we have more quick start examples ?

2016-10-27 Thread Jean-Baptiste Onofré
Yes it sounds good to me. I would love to see this as part of the examples.

Ismael and I also started the beam-samples 
(http://github.com/jbonofre/beam-samples) that could be part of the examples.
The purpose is to have more real use cases implementation with real data.

Regards
JB

⁣​

On Oct 27, 2016, 17:17, at 17:17, Jesse Anderson  wrote:
>


Re: [DISCUSS] Using Verbs for Transforms

2016-10-27 Thread Jean-Baptiste Onofré
You did well ! It's an interesting discussion we have and it's great to have it 
on the mailing list (better than in Jira or PR comments IMHO).

Thanks !

Regards
JB

⁣​

On Oct 27, 2016, 20:39, at 20:39, Robert Bradshaw  
wrote:
>+1 to all Dan says.
>
>I only brought this up because it seemed new contributors (yay)
>jumping in and renaming a core transform based on "Something to
>consider" deserved a couple more more eyeballs, but didn't intend for
>it to become a big deal.
>
>On Thu, Oct 27, 2016 at 11:03 AM, Dan Halperin
> wrote:
>> Folks, I don't think this needs to be a "vote". This is just not that
>big a
>> deal :). It is important to be transparent and have these discussions
>on
>> the list, which is why we brought it here from GitHub/JIRA, but at
>the end
>> of the day I hope that a small group of committers and developers can
>> assess "good enough" consensus for these minor issues.
>>
>> Here's my assessment:
>> * We don't really have any rules about naming transforms. "Should be
>a
>> verb" is a sort of guiding principle inherited from the Google Flume
>> project from which Dataflow evolved, but honestly we violate this
>rule for
>> clarity all over the place. ("Values", for example).
>> * The "Big Data" community is significantly more familiar with the
>concept
>> of Distinct -- Jesse, who filed the original JIRA, is a good example
>here.
>> * Finally, nobody feels very strongly. We could argue minor points of
>each
>> solution, but at the end of the day I don't think anyone wants to
>block a
>> change.
>>
>> Let's go with Distinct. It's important to align Beam with the open
>source
>> big data community. (And thanks Jesse, our newest (*tied) committer,
>for
>> pushing us in the right direction!)
>>
>> Jesse, can you please take charge of wrapping up the PR and merging
>it?
>>
>> Thanks!
>> Dan
>>
>> On Wed, Oct 26, 2016 at 11:12 PM, Jean-Baptiste Onofré
>
>> wrote:
>>
>>> Just to clarify. Davor is right for a code modification change: -1
>means a
>>> veto.
>>> I meant that -1 is not a veto for a release vote.
>>>
>>> Anyway, even if it's not a formal code, we can have a discussion
>with
>>> "options" a,b and c.
>>>
>>> Regards
>>> JB
>>>
>>> ⁣
>>>
>>> On Oct 27, 2016, 06:48, at 06:48, Davor Bonaci
>
>>> wrote:
>>> >In terms of reaching a decision on any code or design changes,
>>> >including
>>> >this one, I'd suggest going without formal votes. Voting process
>for
>>> >code
>>> >modifications between choices A and B doesn't necessarily end with
>a
>>> >decision A or B -- a single (qualified) -1 vote is a veto and
>cannot be
>>> >overridden [1]. Said differently, the guideline is that code
>changes
>>> >should
>>> >be made by consensus; not by one group outvoting another. I'd like
>to
>>> >avoid
>>> >setting such precedent; we should try to drive consensus, as
>opposed to
>>> >attempting to outvote another part of the community.
>>> >
>>> >In this particular case, we have had a great discussion. Many
>>> >contributors
>>> >brought different perspectives. Consequently, some opinions have
>been
>>> >likely changed. At this point, someone should summarize the
>arguments,
>>> >try
>>> >to critique them from a neutral standpoint, and suggest a refined
>>> >proposal
>>> >that takes these perspectives into account. If nobody objects in a
>>> >short
>>> >time, we should consider this decided. [ I can certainly help here,
>but
>>> >I'd
>>> >love to see somebody else do it! ]
>>> >
>>> >[1] http://www.apache.org/foundation/voting.html
>>> >
>>> >On Wed, Oct 26, 2016 at 7:35 AM, Ben Chambers
>>> >
>>> >wrote:
>>> >
>>> >> I also like Distinct since it doesn't make it sound like it
>modifies
>>> >any
>>> >> underlying collection. RemoveDuplicates makes it sound like the
>>> >duplicates
>>> >> are removed, rather than a new PCollection without duplicates
>being
>>> >> returned.
>>> >>
>>> >> On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré
>
>>> >> wrote:
>>> >>
>>> >> > Agree. It was more a transition proposal.
>>> >> >
>>> >> > Regards
>>> >> > JB
>>> >> >
>>> >> > ⁣
>>> >> >
>>> >> > On Oct 26, 2016, 08:31, at 08:31, Robert Bradshaw
>>> >> >  wrote:
>>> >> > >On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré
>>> >> > > wrote:
>>> >> > >> And what about use RemoveDuplicates and create an alias
>Distinct
>>> >?
>>> >> > >
>>> >> > >I'd really like to avoid (long term) aliases--you end up
>having to
>>> >> > >document (and maintain) them both, and it adds confusion as to
>>> >which
>>> >> > >one to use (especially if they every diverge), and means
>searching
>>> >for
>>> >> > >one or the other yields half the results.
>>> >> > >
>>> >> > >> It doesn't break the API and would address both SQL users
>and
>>> >more
>>> >> > >"big data" users.
>>> >> > >>
>>> >> > >> My $0.01 ;)
>>> >> > >>
>>> >> > >> 

Re: [DISCUSS] Using Verbs for Transforms

2016-10-27 Thread Jean-Baptiste Onofré
It sounds good to me.

So basically you did kind of vote with a proposing solution ;)

Regards
JB

⁣​

On Oct 27, 2016, 20:04, at 20:04, Dan Halperin  
wrote:
>Folks, I don't think this needs to be a "vote". This is just not that
>big a
>deal :). It is important to be transparent and have these discussions
>on
>the list, which is why we brought it here from GitHub/JIRA, but at the
>end
>of the day I hope that a small group of committers and developers can
>assess "good enough" consensus for these minor issues.
>
>Here's my assessment:
>* We don't really have any rules about naming transforms. "Should be a
>verb" is a sort of guiding principle inherited from the Google Flume
>project from which Dataflow evolved, but honestly we violate this rule
>for
>clarity all over the place. ("Values", for example).
>* The "Big Data" community is significantly more familiar with the
>concept
>of Distinct -- Jesse, who filed the original JIRA, is a good example
>here.
>* Finally, nobody feels very strongly. We could argue minor points of
>each
>solution, but at the end of the day I don't think anyone wants to block
>a
>change.
>
>Let's go with Distinct. It's important to align Beam with the open
>source
>big data community. (And thanks Jesse, our newest (*tied) committer,
>for
>pushing us in the right direction!)
>
>Jesse, can you please take charge of wrapping up the PR and merging it?
>
>Thanks!
>Dan
>
>On Wed, Oct 26, 2016 at 11:12 PM, Jean-Baptiste Onofré
>
>wrote:
>
>> Just to clarify. Davor is right for a code modification change: -1
>means a
>> veto.
>> I meant that -1 is not a veto for a release vote.
>>
>> Anyway, even if it's not a formal code, we can have a discussion with
>> "options" a,b and c.
>>
>> Regards
>> JB
>>
>> ⁣​
>>
>> On Oct 27, 2016, 06:48, at 06:48, Davor Bonaci
>
>> wrote:
>> >In terms of reaching a decision on any code or design changes,
>> >including
>> >this one, I'd suggest going without formal votes. Voting process for
>> >code
>> >modifications between choices A and B doesn't necessarily end with a
>> >decision A or B -- a single (qualified) -1 vote is a veto and cannot
>be
>> >overridden [1]. Said differently, the guideline is that code changes
>> >should
>> >be made by consensus; not by one group outvoting another. I'd like
>to
>> >avoid
>> >setting such precedent; we should try to drive consensus, as opposed
>to
>> >attempting to outvote another part of the community.
>> >
>> >In this particular case, we have had a great discussion. Many
>> >contributors
>> >brought different perspectives. Consequently, some opinions have
>been
>> >likely changed. At this point, someone should summarize the
>arguments,
>> >try
>> >to critique them from a neutral standpoint, and suggest a refined
>> >proposal
>> >that takes these perspectives into account. If nobody objects in a
>> >short
>> >time, we should consider this decided. [ I can certainly help here,
>but
>> >I'd
>> >love to see somebody else do it! ]
>> >
>> >[1] http://www.apache.org/foundation/voting.html
>> >
>> >On Wed, Oct 26, 2016 at 7:35 AM, Ben Chambers
>> >
>> >wrote:
>> >
>> >> I also like Distinct since it doesn't make it sound like it
>modifies
>> >any
>> >> underlying collection. RemoveDuplicates makes it sound like the
>> >duplicates
>> >> are removed, rather than a new PCollection without duplicates
>being
>> >> returned.
>> >>
>> >> On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré
>
>> >> wrote:
>> >>
>> >> > Agree. It was more a transition proposal.
>> >> >
>> >> > Regards
>> >> > JB
>> >> >
>> >> > ⁣​
>> >> >
>> >> > On Oct 26, 2016, 08:31, at 08:31, Robert Bradshaw
>> >> >  wrote:
>> >> > >On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré
>> >> > > wrote:
>> >> > >> And what about use RemoveDuplicates and create an alias
>Distinct
>> >?
>> >> > >
>> >> > >I'd really like to avoid (long term) aliases--you end up having
>to
>> >> > >document (and maintain) them both, and it adds confusion as to
>> >which
>> >> > >one to use (especially if they every diverge), and means
>searching
>> >for
>> >> > >one or the other yields half the results.
>> >> > >
>> >> > >> It doesn't break the API and would address both SQL users and
>> >more
>> >> > >"big data" users.
>> >> > >>
>> >> > >> My $0.01 ;)
>> >> > >>
>> >> > >> Regards
>> >> > >> JB
>> >> > >>
>> >> > >> ⁣
>> >> > >>
>> >> > >> On Oct 24, 2016, 22:23, at 22:23, Dan Halperin
>> >> > > wrote:
>> >> > >>>I find "MakeDistinct" more confusing. My votes in decreasing
>> >> > >>>preference:
>> >> > >>>
>> >> > >>>1. Keep `RemoveDuplicates` name, ensure that important
>keywords
>> >are
>> >> > >in
>> >> > >>>the
>> >> > >>>Javadoc. This reduces churn on our users and is honestly
>pretty
>> >dang
>> >> > >>> descriptive.
>> >> > >>>2. Rename to `Distinct`, which is clear if you're a 

Re: [DISCUSS] Using Verbs for Transforms

2016-10-27 Thread Jesse Anderson
Sure

On Thu, Oct 27, 2016, 8:04 PM Dan Halperin 
wrote:

> Folks, I don't think this needs to be a "vote". This is just not that big a
> deal :). It is important to be transparent and have these discussions on
> the list, which is why we brought it here from GitHub/JIRA, but at the end
> of the day I hope that a small group of committers and developers can
> assess "good enough" consensus for these minor issues.
>
> Here's my assessment:
> * We don't really have any rules about naming transforms. "Should be a
> verb" is a sort of guiding principle inherited from the Google Flume
> project from which Dataflow evolved, but honestly we violate this rule for
> clarity all over the place. ("Values", for example).
> * The "Big Data" community is significantly more familiar with the concept
> of Distinct -- Jesse, who filed the original JIRA, is a good example here.
> * Finally, nobody feels very strongly. We could argue minor points of each
> solution, but at the end of the day I don't think anyone wants to block a
> change.
>
> Let's go with Distinct. It's important to align Beam with the open source
> big data community. (And thanks Jesse, our newest (*tied) committer, for
> pushing us in the right direction!)
>
> Jesse, can you please take charge of wrapping up the PR and merging it?
>
> Thanks!
> Dan
>
> On Wed, Oct 26, 2016 at 11:12 PM, Jean-Baptiste Onofré 
> wrote:
>
> > Just to clarify. Davor is right for a code modification change: -1 means
> a
> > veto.
> > I meant that -1 is not a veto for a release vote.
> >
> > Anyway, even if it's not a formal code, we can have a discussion with
> > "options" a,b and c.
> >
> > Regards
> > JB
> >
> > ⁣​
> >
> > On Oct 27, 2016, 06:48, at 06:48, Davor Bonaci  >
> > wrote:
> > >In terms of reaching a decision on any code or design changes,
> > >including
> > >this one, I'd suggest going without formal votes. Voting process for
> > >code
> > >modifications between choices A and B doesn't necessarily end with a
> > >decision A or B -- a single (qualified) -1 vote is a veto and cannot be
> > >overridden [1]. Said differently, the guideline is that code changes
> > >should
> > >be made by consensus; not by one group outvoting another. I'd like to
> > >avoid
> > >setting such precedent; we should try to drive consensus, as opposed to
> > >attempting to outvote another part of the community.
> > >
> > >In this particular case, we have had a great discussion. Many
> > >contributors
> > >brought different perspectives. Consequently, some opinions have been
> > >likely changed. At this point, someone should summarize the arguments,
> > >try
> > >to critique them from a neutral standpoint, and suggest a refined
> > >proposal
> > >that takes these perspectives into account. If nobody objects in a
> > >short
> > >time, we should consider this decided. [ I can certainly help here, but
> > >I'd
> > >love to see somebody else do it! ]
> > >
> > >[1] http://www.apache.org/foundation/voting.html
> > >
> > >On Wed, Oct 26, 2016 at 7:35 AM, Ben Chambers
> > >
> > >wrote:
> > >
> > >> I also like Distinct since it doesn't make it sound like it modifies
> > >any
> > >> underlying collection. RemoveDuplicates makes it sound like the
> > >duplicates
> > >> are removed, rather than a new PCollection without duplicates being
> > >> returned.
> > >>
> > >> On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré 
> > >> wrote:
> > >>
> > >> > Agree. It was more a transition proposal.
> > >> >
> > >> > Regards
> > >> > JB
> > >> >
> > >> > ⁣​
> > >> >
> > >> > On Oct 26, 2016, 08:31, at 08:31, Robert Bradshaw
> > >> >  wrote:
> > >> > >On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré
> > >> > > wrote:
> > >> > >> And what about use RemoveDuplicates and create an alias Distinct
> > >?
> > >> > >
> > >> > >I'd really like to avoid (long term) aliases--you end up having to
> > >> > >document (and maintain) them both, and it adds confusion as to
> > >which
> > >> > >one to use (especially if they every diverge), and means searching
> > >for
> > >> > >one or the other yields half the results.
> > >> > >
> > >> > >> It doesn't break the API and would address both SQL users and
> > >more
> > >> > >"big data" users.
> > >> > >>
> > >> > >> My $0.01 ;)
> > >> > >>
> > >> > >> Regards
> > >> > >> JB
> > >> > >>
> > >> > >> ⁣
> > >> > >>
> > >> > >> On Oct 24, 2016, 22:23, at 22:23, Dan Halperin
> > >> > > wrote:
> > >> > >>>I find "MakeDistinct" more confusing. My votes in decreasing
> > >> > >>>preference:
> > >> > >>>
> > >> > >>>1. Keep `RemoveDuplicates` name, ensure that important keywords
> > >are
> > >> > >in
> > >> > >>>the
> > >> > >>>Javadoc. This reduces churn on our users and is honestly pretty
> > >dang
> > >> > >>> descriptive.
> > >> > >>>2. Rename to `Distinct`, which is clear if you're a 

Re: [DISCUSS] Using Verbs for Transforms

2016-10-27 Thread Dan Halperin
Folks, I don't think this needs to be a "vote". This is just not that big a
deal :). It is important to be transparent and have these discussions on
the list, which is why we brought it here from GitHub/JIRA, but at the end
of the day I hope that a small group of committers and developers can
assess "good enough" consensus for these minor issues.

Here's my assessment:
* We don't really have any rules about naming transforms. "Should be a
verb" is a sort of guiding principle inherited from the Google Flume
project from which Dataflow evolved, but honestly we violate this rule for
clarity all over the place. ("Values", for example).
* The "Big Data" community is significantly more familiar with the concept
of Distinct -- Jesse, who filed the original JIRA, is a good example here.
* Finally, nobody feels very strongly. We could argue minor points of each
solution, but at the end of the day I don't think anyone wants to block a
change.

Let's go with Distinct. It's important to align Beam with the open source
big data community. (And thanks Jesse, our newest (*tied) committer, for
pushing us in the right direction!)

Jesse, can you please take charge of wrapping up the PR and merging it?

Thanks!
Dan

On Wed, Oct 26, 2016 at 11:12 PM, Jean-Baptiste Onofré 
wrote:

> Just to clarify. Davor is right for a code modification change: -1 means a
> veto.
> I meant that -1 is not a veto for a release vote.
>
> Anyway, even if it's not a formal code, we can have a discussion with
> "options" a,b and c.
>
> Regards
> JB
>
> ⁣​
>
> On Oct 27, 2016, 06:48, at 06:48, Davor Bonaci 
> wrote:
> >In terms of reaching a decision on any code or design changes,
> >including
> >this one, I'd suggest going without formal votes. Voting process for
> >code
> >modifications between choices A and B doesn't necessarily end with a
> >decision A or B -- a single (qualified) -1 vote is a veto and cannot be
> >overridden [1]. Said differently, the guideline is that code changes
> >should
> >be made by consensus; not by one group outvoting another. I'd like to
> >avoid
> >setting such precedent; we should try to drive consensus, as opposed to
> >attempting to outvote another part of the community.
> >
> >In this particular case, we have had a great discussion. Many
> >contributors
> >brought different perspectives. Consequently, some opinions have been
> >likely changed. At this point, someone should summarize the arguments,
> >try
> >to critique them from a neutral standpoint, and suggest a refined
> >proposal
> >that takes these perspectives into account. If nobody objects in a
> >short
> >time, we should consider this decided. [ I can certainly help here, but
> >I'd
> >love to see somebody else do it! ]
> >
> >[1] http://www.apache.org/foundation/voting.html
> >
> >On Wed, Oct 26, 2016 at 7:35 AM, Ben Chambers
> >
> >wrote:
> >
> >> I also like Distinct since it doesn't make it sound like it modifies
> >any
> >> underlying collection. RemoveDuplicates makes it sound like the
> >duplicates
> >> are removed, rather than a new PCollection without duplicates being
> >> returned.
> >>
> >> On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré 
> >> wrote:
> >>
> >> > Agree. It was more a transition proposal.
> >> >
> >> > Regards
> >> > JB
> >> >
> >> > ⁣​
> >> >
> >> > On Oct 26, 2016, 08:31, at 08:31, Robert Bradshaw
> >> >  wrote:
> >> > >On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré
> >> > > wrote:
> >> > >> And what about use RemoveDuplicates and create an alias Distinct
> >?
> >> > >
> >> > >I'd really like to avoid (long term) aliases--you end up having to
> >> > >document (and maintain) them both, and it adds confusion as to
> >which
> >> > >one to use (especially if they every diverge), and means searching
> >for
> >> > >one or the other yields half the results.
> >> > >
> >> > >> It doesn't break the API and would address both SQL users and
> >more
> >> > >"big data" users.
> >> > >>
> >> > >> My $0.01 ;)
> >> > >>
> >> > >> Regards
> >> > >> JB
> >> > >>
> >> > >> ⁣
> >> > >>
> >> > >> On Oct 24, 2016, 22:23, at 22:23, Dan Halperin
> >> > > wrote:
> >> > >>>I find "MakeDistinct" more confusing. My votes in decreasing
> >> > >>>preference:
> >> > >>>
> >> > >>>1. Keep `RemoveDuplicates` name, ensure that important keywords
> >are
> >> > >in
> >> > >>>the
> >> > >>>Javadoc. This reduces churn on our users and is honestly pretty
> >dang
> >> > >>> descriptive.
> >> > >>>2. Rename to `Distinct`, which is clear if you're a SQL user and
> >> > >likely
> >> > >>>less clear otherwise. This is a backwards-incompatible API
> >change, so
> >> > >>>we
> >> > >>>should do it before we go stable.
> >> > >>>
> >> > >>>I am not super strong that 1 > 2, but I am very strong that
> >> > >"Distinct"
> >> > >>
> >> > >>>"MakeDistinct" or and "RemoveDuplicates" >>> 

Re: [PROPOSAL] New Beam website design?

2016-10-27 Thread Davor Bonaci
The best place to learn how to get started is the Contribution Guide [1].
The list of pending JIRA issues related to the website is also available
[2].

I think BEAM-752 would be the best to get your feet wet. Other good
candidates are 516, 268, 776. If someone knows a good (non-fragile)
solution to 751, that would be a great contribution!

Davor

[1] http://beam.incubator.apache.org/contribute/contribution-guide/
[2]
https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20%3D%20website

On Thu, Oct 27, 2016 at 5:20 AM, Jean-Baptiste Onofré 
wrote:

> Great !! Thanks.
>
> You can take a look on BEAM-500 and 501 and also the PR I did last week.
>
> I plan to submit new PRs during the week end. So please let me know how we
> can sync.
>
> Thanks
> Regards
> JB
>
> On Oct 27, 2016, at 14:04, Minudika Malshan  wrote:
>>
>> Hi all,
>>
>> I would like to join for the development of the new site.
>> Is there any issue tracking method for this? (Are there any jirra issues)
>>
>> Thank you!
>>
>>
>>
>> On Thu, Oct 27, 2016 at 4:01 PM, Jean-Baptiste Onofré 
>> wrote:
>>
>>
>>
>>>Hi
>>>
>>>
>>>
>>>  You can propose a PR on this Jira.
>>>
>>>
>>>
>>>  We will be more than happy to review it.
>>>
>>>
>>>
>>>  Thanks
>>>
>>>  Regards
>>>
>>>  JB
>>>
>>>
>>>
>>>  ⁣​
>>>
>>>
>>>
>>>  On Oct 27, 2016, 11:26, at 11:26, Abdullah Bashir 
>>>
>>>  wrote:
>>>
>>>
>>>
Thank you very much for taking time to respond Davor :)



 Regarding BEAM-752, i can work on that, i have already built some

 Dataflow

 Piplines on Google Cloud in Python language.



 Again Can you tell me where to start for BEAM-752. I am new to ASF

 contribution, so onboarding steps are kind of a black box to me :).



 On Thu, Oct 27, 2016 at 11:34 AM, Davor Bonaci 

 wrote:





>  Absolutely!
>
>
>
>  I'm currently reviewing JB's PR #51, and that should go in shortly.
>
>
>Within



>  a day or so, I should have a better idea about future work in this
>
>
>specific



>  area; please stay tuned.
>
>
>
>  There are also separate things that are ready to be started at any
>
>
>time.



>  BEAM-752 comes to mind first. Is this something you'd be interested
>
>
>in?



>
>  On Wed, Oct 26, 2016 at 11:17 PM, Abdullah Bashir
>
>
>



>  wrote:
>
>
>
>
>
>>   Hi Davor,
>>
>>
>>
>>  I am done with my local setup to start contributing, I have forked
>>
>>
>>
>and



>
>>   merged pull request *(**pull/51)* into my local repo. Then I read
>>
>>
>>
>the



>
>>
>>  google docs, their are two tasks mentioned in it, as [Beam-500] and
>>
>>  [Beam-501].
>>
>>  I found out that [Beam-500] is closed in JIRA and [Beam-501] is
>>
>>  assigned to Jean-Baptiste
>>
>>  Onofré, Is their any task that you can assign to me ?
>>
>>
>>
>>  Thanks.
>>
>>
>>
>>  Regards,
>>
>>  Abdullah Bashir
>>
>>
>>
>>
>>
>>  On Tue, Oct 25, 2016 at 1:50 AM, Davor Bonaci 
>>
>>
>>
>wrote:



>
>>
>>
>>
>>>Abdullah, welcome!
>>>
>>>
>>>
>>>  I think it's rather clear we've been struggling with the website,
>>>
>>>
>>>
>>
>so any



>
>>
>>>help is very welcome. It is a little bit messy right now -- there
>>>
>>>
>>>
>>
>are a



>
>>   few
>>
>>
>>
>>>outstanding pull requests and forked branches. I'm trying to get
>>>
>>>
>>>
>>
>all



>
>>   this
>>
>>
>>
>>>into one place, so anybody can contribute and make progress.
>>>
>>>
>>>
>>>  Also, the general website organization has been discussed before,
>>>
>>>
>>>
>>
>see



>
>>   this
>>
>>
>>
>>>thread [1] and the attached document for details.
>>>
>>>
>>>
>>>  Davor
>>>
>>>
>>>
>>>  [1]
>>>
>>>
>>>   https://mail-archives.apache.org/mod_mbox/beam-dev/201606.
>>>
>>>  mbox/%3CCAAzyFAwu992x+xcxN6Ha-avKZZbF-RK00mUg1-vezYCmtOm4Ww@
>>>
>>>
>>>   mail.gmail.com%3E
>>>
>>>
>>>
>>>  On Sun, Oct 23, 2016 at 12:34 AM, 

Re: Apex runner status and next steps

2016-10-27 Thread Dan Halperin
I would add (explicitly, though this may be implicit or already supported)
that Apex should also be able to run the precommit
WordCountIT/WindowedWordCountIT
that execute on all runners.

https://github.com/apache/incubator-beam/blob/master/examples/java/pom.xml#L42
and
https://github.com/apache/incubator-beam/blob/master/examples/java/pom.xml#L132

Dan

On Wed, Oct 26, 2016 at 10:39 PM, Jean-Baptiste Onofré 
wrote:

> +1
>
> Good idea and fully agree about the three points.
>
> Regards
> JB
>
> ⁣​
>
> On Oct 26, 2016, 19:24, at 19:24, Thomas Weise  wrote:
> >Hi,
> >
> >The Apex runner is currently in a feature branch:
> >
> >https://github.com/apache/incubator-beam/tree/apex-runner
> >
> >Focus till here has been on functional completeness. It passes all the
> >integration tests.
> >
> >Apex with its stateful stream processing architecture can support all
> >of
> >the concepts in the Beam model (event time, triggers, watermarks etc.).
> >Most of these are already supported through the Beam SDK. The glue code
> >that had to be written isn't that much, which speaks to the conceptual
> >alignment in general.
> >
> >The runner in its current form does not leverage all the performance
> >and
> >scalability that Apex can deliver. We expect to address this with
> >future
> >contributions, leveraging things like incremental checkpointing,
> >partitioning and operator affinity from Apex.
> >
> >From a code perspective, the runner should be close to what is needed
> >for a
> >merge to master (based on the contribution guidelines). The following
> >items
> >have been identified as prerequisite:
> >
> >* Add a README.md to the runner directory that summarizes its current
> >state
> >* Update the https://beam.apache.org/learn/runners/capability-matrix/
> >to
> >include the Apex info
> >* Create the page under learn/runners (at least the place holder)
> >
> >It should also be noted that the integration tests currently take quite
> >long to run with embedded Apex (~50 minutes). Some of that has to do
> >with
> >how completion of the tests is determined and there are ideas to
> >improve it.
> >
> >I have created some JIRAs from my TODO list of follow-up work for more
> >contributors to get involved:
> >
> >https://issues.apache.org/jira/issues/?jql=project%20%3D%
> 20BEAM%20AND%20component%20%3D%20runner-apex
> >
> >Some folks on the Apex dev list have expressed interest to take up some
> >of
> >this work. And thanks to Ismaël Mejía for BEAM-815
> > !
> >
> >I'm looking forward to your comments and suggestions.
> >
> >Thanks,
> >Thomas
>


Re: Can we have more quick start examples ?

2016-10-27 Thread Davor Bonaci
Indeed -- this is a clear area for improvement. Sources are usually not as
big of an issue -- these resources are publicly accessible regardless
where/how you run the pipeline (locally, or with any runner). On the other
hand, Sinks require write access, which is often more problematic.

One correction, however: WordCount supports both GCS and local paths, with
some exceptions depending on a runner.

There are several efforts to improve this, most notably BEAM-59, which is
assigned to Pei.

On Thu, Oct 27, 2016 at 8:17 AM, Jesse Anderson 
wrote:

> Those tutorials help. I was going through the example code and had the same
> thought. We need to take a pass through the examples and remove some of the
> Google Cloud dependencies.
>
> On Thu, Oct 27, 2016, 5:13 PM Thomas Weise  wrote:
>
> > The Beam tutorials seem to address this:
> >
> > https://github.com/eljefe6a/beamexample/blob/master/README.md
> >
> >
> > On Thu, Oct 27, 2016 at 8:04 AM, Manu Zhang 
> > wrote:
> >
> > > Hey guys,
> > >
> > > I find Beam examples under the examples folder are not easy to run due
> to
> > > dependency on Google specific services. Even the MinimalWordCount
> > >  > >
> > examples/java/src/main/java/org/apache/beam/examples/
> MinimalWordCount.java
> > > >
> > > requires
> > > input and output to be on Google Cloud Storage. Others like
> > > WindowedWordCount
> > >  > > examples/java/src/main/java/org/apache/beam/examples/
> > > WindowedWordCount.java>
> > > require
> > > BigQuery.  I wouldn't expect newcomers to tweak IO themselves.
> > >
> > > Can we have more quick start examples that can be run anywhere ?
> > >
> > > Thanks,
> > > Manu Zhang
> > >
> >
>


Re: Tracking backward-incompatible changes for Beam

2016-10-27 Thread Robert Bradshaw
If the API/semantics are sufficiently well tested, backwards
incompatibility should manifest as test failures. The corollary is
that one should look closely at any test changes that get proposed.

On Mon, Oct 24, 2016 at 1:52 PM, Davor Bonaci  wrote:
> I don't think we have it right now. We should, of course, but this is
> something that needs to be defined/discussed first.
>
> On Mon, Oct 24, 2016 at 1:20 PM, Neelesh Salian 
> wrote:
>
>> +1 for the labels and also a need for tests.
>> Do we document any rules for backward-compatibility? Be good to have a
>> checklist-like list.
>>
>>
>>
>>
>> On Mon, Oct 24, 2016 at 1:02 PM, Davor Bonaci 
>> wrote:
>>
>> > It would be awesome to have that! At least a good portion of
>> > backward-incompatible changes could be automatically caught.
>> >
>> > We should also think about defining backward-compatibility more
>> precisely.
>> > This would be good in its own right, but also necessary to configure the
>> > tool. Historically, we have applied the backward-compatibility rules on
>> > APIs that are intended for users, excluding experimental ones, but not
>> > necessarily on all publicly visible APIs. If we continue this practice,
>> it
>> > might be a challenge for the tool. In any case, I think there's a good
>> > discussion to be had around what backward-compatibility means exactly in
>> > Beam.
>> >
>> > On Sat, Oct 22, 2016 at 2:47 AM, Aljoscha Krettek 
>> > wrote:
>> >
>> > > Very good idea!
>> > >
>> > > Should we already start thinking about automatic tests for backwards
>> > > compatibility of the API?
>> > >
>> > > On Fri, 21 Oct 2016 at 10:56 Jean-Baptiste Onofré 
>> > wrote:
>> > >
>> > > > Hi Dan,
>> > > >
>> > > > +1, good idea.
>> > > >
>> > > > Regards
>> > > > JB
>> > > >
>> > > > On 10/21/2016 02:21 AM, Dan Halperin wrote:
>> > > > > Hey everyone,
>> > > > >
>> > > > > In the Beam codebase, we’ve improved, rewritten, or deleted many
>> > APIs.
>> > > > > While this has improved the model and gives us great freedom to
>> > > > experiment,
>> > > > > we are also causing churn on users authoring Beam libraries and
>> > > > pipelines.
>> > > > >
>> > > > > To really kick off Beam as something users can depend on, we need
>> to
>> > > > > stabilize the Beam API. Stabilizing means a commitment to not
>> making
>> > > > > breaking changes -- except between major versions as per standard
>> > > > semantic
>> > > > > versioning.
>> > > > >
>> > > > > To get there, I’ve started a process for tracking these changes by
>> > > > applying
>> > > > > the `backward-incompatible` label [1] to the corresponding JIRA
>> > issues.
>> > > > > Naturally, open `backward-incompatible` changes are “blocking
>> issues”
>> > > for
>> > > > > the first stable release. (Or we’ll have to put them off for the
>> next
>> > > > major
>> > > > > version!)
>> > > > >
>> > > > > So here are some requests for help:
>> > > > > * Please review and appropriately label the components I skipped:
>> > > > > runner-{apex, flink, gearpump, spark}, sdk-py.
>> > > > > * Please proactively file JIRA issues for breaking API changes you
>> > > still
>> > > > > want to make, and label them.
>> > > > >
>> > > > > Thanks everyone!
>> > > > > Dan
>> > > > >
>> > > > >
>> > > > > [1]
>> > > > >
>> > > > https://issues.apache.org/jira/issues/?jql=project%20%
>> > > 3D%20BEAM%20AND%20labels%20%3D%20backward-incompatible
>> > > > >
>> > > >
>> > > > --
>> > > > Jean-Baptiste Onofré
>> > > > jbono...@apache.org
>> > > > http://blog.nanthrax.net
>> > > > Talend - http://www.talend.com
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> Neelesh Srinivas Salian
>> Customer Operations Engineer
>>


Re: Can we have more quick start examples ?

2016-10-27 Thread Jesse Anderson
Those tutorials help. I was going through the example code and had the same
thought. We need to take a pass through the examples and remove some of the
Google Cloud dependencies.

On Thu, Oct 27, 2016, 5:13 PM Thomas Weise  wrote:

> The Beam tutorials seem to address this:
>
> https://github.com/eljefe6a/beamexample/blob/master/README.md
>
>
> On Thu, Oct 27, 2016 at 8:04 AM, Manu Zhang 
> wrote:
>
> > Hey guys,
> >
> > I find Beam examples under the examples folder are not easy to run due to
> > dependency on Google specific services. Even the MinimalWordCount
> >  >
> examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java
> > >
> > requires
> > input and output to be on Google Cloud Storage. Others like
> > WindowedWordCount
> >  > examples/java/src/main/java/org/apache/beam/examples/
> > WindowedWordCount.java>
> > require
> > BigQuery.  I wouldn't expect newcomers to tweak IO themselves.
> >
> > Can we have more quick start examples that can be run anywhere ?
> >
> > Thanks,
> > Manu Zhang
> >
>


Re: Can we have more quick start examples ?

2016-10-27 Thread Thomas Weise
The Beam tutorials seem to address this:

https://github.com/eljefe6a/beamexample/blob/master/README.md


On Thu, Oct 27, 2016 at 8:04 AM, Manu Zhang  wrote:

> Hey guys,
>
> I find Beam examples under the examples folder are not easy to run due to
> dependency on Google specific services. Even the MinimalWordCount
>  examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java
> >
> requires
> input and output to be on Google Cloud Storage. Others like
> WindowedWordCount
>  examples/java/src/main/java/org/apache/beam/examples/
> WindowedWordCount.java>
> require
> BigQuery.  I wouldn't expect newcomers to tweak IO themselves.
>
> Can we have more quick start examples that can be run anywhere ?
>
> Thanks,
> Manu Zhang
>


Can we have more quick start examples ?

2016-10-27 Thread Manu Zhang
Hey guys,

I find Beam examples under the examples folder are not easy to run due to
dependency on Google specific services. Even the MinimalWordCount

requires
input and output to be on Google Cloud Storage. Others like
WindowedWordCount

require
BigQuery.  I wouldn't expect newcomers to tweak IO themselves.

Can we have more quick start examples that can be run anywhere ?

Thanks,
Manu Zhang


Re: [PROPOSAL] New Beam website design?

2016-10-27 Thread Jean-Baptiste Onofré
Great !! Thanks.

You can take a look on BEAM-500 and 501 and also the PR I did last week.

I plan to submit new PRs during the week end. So please let me know how we can 
sync.

Thanks
Regards
JB

⁣​

On Oct 27, 2016, 14:04, at 14:04, Minudika Malshan  
wrote:
>Hi all,
>
>I would like to join for the development of the new site.
>Is there any issue tracking method for this? (Are there any jirra
>issues)
>
>Thank you!
>
>
>
>On Thu, Oct 27, 2016 at 4:01 PM, Jean-Baptiste Onofré 
>wrote:
>
>> Hi
>>
>> You can propose a PR on this Jira.
>>
>> We will be more than happy to review it.
>>
>> Thanks
>> Regards
>> JB
>>
>> ⁣​
>>
>> On Oct 27, 2016, 11:26, at 11:26, Abdullah Bashir
>
>> wrote:
>> >Thank you very much for taking time to respond Davor :)
>> >
>> >Regarding BEAM-752, i can work on that, i have already built some
>> >Dataflow
>> >Piplines on Google Cloud in Python language.
>> >
>> >Again Can you tell me where to start for BEAM-752. I am new to ASF
>> >contribution, so onboarding steps are kind of a black box to me :).
>> >
>> >On Thu, Oct 27, 2016 at 11:34 AM, Davor Bonaci 
>> >wrote:
>> >
>> >> Absolutely!
>> >>
>> >> I'm currently reviewing JB's PR #51, and that should go in
>shortly.
>> >Within
>> >> a day or so, I should have a better idea about future work in this
>> >specific
>> >> area; please stay tuned.
>> >>
>> >> There are also separate things that are ready to be started at any
>> >time.
>> >> BEAM-752 comes to mind first. Is this something you'd be
>interested
>> >in?
>> >>
>> >> On Wed, Oct 26, 2016 at 11:17 PM, Abdullah Bashir
>> >
>> >> wrote:
>> >>
>> >>> Hi Davor,
>> >>>
>> >>> I am done with my local setup to start contributing, I have
>forked
>> >and
>> >>> merged pull request *(**pull/51)* into my  local repo. Then I
>read
>> >the
>> >>>
>> >>> google docs, their are two tasks mentioned in it, as [Beam-500]
>and
>> >>> [Beam-501].
>> >>> I found out that [Beam-500] is closed in JIRA and [Beam-501] is
>> >>> assigned to Jean-Baptiste
>> >>> Onofré, Is their any task that you can assign to me ?
>> >>>
>> >>> Thanks.
>> >>>
>> >>> Regards,
>> >>> Abdullah Bashir
>> >>>
>> >>>
>> >>> On Tue, Oct 25, 2016 at 1:50 AM, Davor Bonaci 
>> >wrote:
>> >>>
>> >>> > Abdullah, welcome!
>> >>> >
>> >>> > I think it's rather clear we've been struggling with the
>website,
>> >so any
>> >>> > help is very welcome. It is a little bit messy right now --
>there
>> >are a
>> >>> few
>> >>> > outstanding pull requests and forked branches. I'm trying to
>get
>> >all
>> >>> this
>> >>> > into one place, so anybody can contribute and make progress.
>> >>> >
>> >>> > Also, the general website organization has been discussed
>before,
>> >see
>> >>> this
>> >>> > thread [1] and the attached document for details.
>> >>> >
>> >>> > Davor
>> >>> >
>> >>> > [1]
>> >>> > https://mail-archives.apache.org/mod_mbox/beam-dev/201606.
>> >>> > mbox/%3CCAAzyFAwu992x+xcxN6Ha-avKZZbF-RK00mUg1-vezYCmtOm4Ww@
>> >>> > mail.gmail.com%3E
>> >>> >
>> >>> > On Sun, Oct 23, 2016 at 12:34 AM, Jean-Baptiste Onofré
>> >> >>> >
>> >>> > wrote:
>> >>> >
>> >>> > > Hi
>> >>> > >
>> >>> > > You can take a look on the PR I creates last Friday. It
>contains
>> >a
>> >>> > > CSS/skin proposal.
>> >>> > >
>> >>> > > The mock-up is there: http://maven.nanthrax.net/beam
>> >>> > >
>> >>> > > Regards
>> >>> > > JB
>> >>> > >
>> >>> > > ⁣​
>> >>> > >
>> >>> > > On Oct 23, 2016, 09:27, at 09:27, Abdullah Bashir <
>> >>> > mabdullah...@gmail.com>
>> >>> > > wrote:
>> >>> > > >Hi,
>> >>> > > >
>> >>> > > >is their any help i can do on website designing ?
>> >>> > > >I am good at HTML5, CSS3 and javascript.
>> >>> > > >
>> >>> > > >Regards,
>> >>> > > >Abdullah Bashir
>> >>> > >
>> >>> >
>> >>>
>> >>
>> >>
>>
>
>
>
>-- 
>*Minudika Malshan*
>Undergraduate
>Department of Computer Science and Engineering
>University of Moratuwa
>Sri Lanka.


Re: [PROPOSAL] New Beam website design?

2016-10-27 Thread Minudika Malshan
Hi all,

I would like to join for the development of the new site.
Is there any issue tracking method for this? (Are there any jirra issues)

Thank you!



On Thu, Oct 27, 2016 at 4:01 PM, Jean-Baptiste Onofré 
wrote:

> Hi
>
> You can propose a PR on this Jira.
>
> We will be more than happy to review it.
>
> Thanks
> Regards
> JB
>
> ⁣​
>
> On Oct 27, 2016, 11:26, at 11:26, Abdullah Bashir 
> wrote:
> >Thank you very much for taking time to respond Davor :)
> >
> >Regarding BEAM-752, i can work on that, i have already built some
> >Dataflow
> >Piplines on Google Cloud in Python language.
> >
> >Again Can you tell me where to start for BEAM-752. I am new to ASF
> >contribution, so onboarding steps are kind of a black box to me :).
> >
> >On Thu, Oct 27, 2016 at 11:34 AM, Davor Bonaci 
> >wrote:
> >
> >> Absolutely!
> >>
> >> I'm currently reviewing JB's PR #51, and that should go in shortly.
> >Within
> >> a day or so, I should have a better idea about future work in this
> >specific
> >> area; please stay tuned.
> >>
> >> There are also separate things that are ready to be started at any
> >time.
> >> BEAM-752 comes to mind first. Is this something you'd be interested
> >in?
> >>
> >> On Wed, Oct 26, 2016 at 11:17 PM, Abdullah Bashir
> >
> >> wrote:
> >>
> >>> Hi Davor,
> >>>
> >>> I am done with my local setup to start contributing, I have forked
> >and
> >>> merged pull request *(**pull/51)* into my  local repo. Then I read
> >the
> >>>
> >>> google docs, their are two tasks mentioned in it, as [Beam-500] and
> >>> [Beam-501].
> >>> I found out that [Beam-500] is closed in JIRA and [Beam-501] is
> >>> assigned to Jean-Baptiste
> >>> Onofré, Is their any task that you can assign to me ?
> >>>
> >>> Thanks.
> >>>
> >>> Regards,
> >>> Abdullah Bashir
> >>>
> >>>
> >>> On Tue, Oct 25, 2016 at 1:50 AM, Davor Bonaci 
> >wrote:
> >>>
> >>> > Abdullah, welcome!
> >>> >
> >>> > I think it's rather clear we've been struggling with the website,
> >so any
> >>> > help is very welcome. It is a little bit messy right now -- there
> >are a
> >>> few
> >>> > outstanding pull requests and forked branches. I'm trying to get
> >all
> >>> this
> >>> > into one place, so anybody can contribute and make progress.
> >>> >
> >>> > Also, the general website organization has been discussed before,
> >see
> >>> this
> >>> > thread [1] and the attached document for details.
> >>> >
> >>> > Davor
> >>> >
> >>> > [1]
> >>> > https://mail-archives.apache.org/mod_mbox/beam-dev/201606.
> >>> > mbox/%3CCAAzyFAwu992x+xcxN6Ha-avKZZbF-RK00mUg1-vezYCmtOm4Ww@
> >>> > mail.gmail.com%3E
> >>> >
> >>> > On Sun, Oct 23, 2016 at 12:34 AM, Jean-Baptiste Onofré
> > >>> >
> >>> > wrote:
> >>> >
> >>> > > Hi
> >>> > >
> >>> > > You can take a look on the PR I creates last Friday. It contains
> >a
> >>> > > CSS/skin proposal.
> >>> > >
> >>> > > The mock-up is there: http://maven.nanthrax.net/beam
> >>> > >
> >>> > > Regards
> >>> > > JB
> >>> > >
> >>> > > ⁣​
> >>> > >
> >>> > > On Oct 23, 2016, 09:27, at 09:27, Abdullah Bashir <
> >>> > mabdullah...@gmail.com>
> >>> > > wrote:
> >>> > > >Hi,
> >>> > > >
> >>> > > >is their any help i can do on website designing ?
> >>> > > >I am good at HTML5, CSS3 and javascript.
> >>> > > >
> >>> > > >Regards,
> >>> > > >Abdullah Bashir
> >>> > >
> >>> >
> >>>
> >>
> >>
>



-- 
*Minudika Malshan*
Undergraduate
Department of Computer Science and Engineering
University of Moratuwa
Sri Lanka.


Re: [PROPOSAL] New Beam website design?

2016-10-27 Thread Jean-Baptiste Onofré
Hi

You can propose a PR on this Jira.

We will be more than happy to review it.

Thanks
Regards
JB

⁣​

On Oct 27, 2016, 11:26, at 11:26, Abdullah Bashir  
wrote:
>Thank you very much for taking time to respond Davor :)
>
>Regarding BEAM-752, i can work on that, i have already built some
>Dataflow
>Piplines on Google Cloud in Python language.
>
>Again Can you tell me where to start for BEAM-752. I am new to ASF
>contribution, so onboarding steps are kind of a black box to me :).
>
>On Thu, Oct 27, 2016 at 11:34 AM, Davor Bonaci 
>wrote:
>
>> Absolutely!
>>
>> I'm currently reviewing JB's PR #51, and that should go in shortly.
>Within
>> a day or so, I should have a better idea about future work in this
>specific
>> area; please stay tuned.
>>
>> There are also separate things that are ready to be started at any
>time.
>> BEAM-752 comes to mind first. Is this something you'd be interested
>in?
>>
>> On Wed, Oct 26, 2016 at 11:17 PM, Abdullah Bashir
>
>> wrote:
>>
>>> Hi Davor,
>>>
>>> I am done with my local setup to start contributing, I have forked
>and
>>> merged pull request *(**pull/51)* into my  local repo. Then I read
>the
>>>
>>> google docs, their are two tasks mentioned in it, as [Beam-500] and
>>> [Beam-501].
>>> I found out that [Beam-500] is closed in JIRA and [Beam-501] is
>>> assigned to Jean-Baptiste
>>> Onofré, Is their any task that you can assign to me ?
>>>
>>> Thanks.
>>>
>>> Regards,
>>> Abdullah Bashir
>>>
>>>
>>> On Tue, Oct 25, 2016 at 1:50 AM, Davor Bonaci 
>wrote:
>>>
>>> > Abdullah, welcome!
>>> >
>>> > I think it's rather clear we've been struggling with the website,
>so any
>>> > help is very welcome. It is a little bit messy right now -- there
>are a
>>> few
>>> > outstanding pull requests and forked branches. I'm trying to get
>all
>>> this
>>> > into one place, so anybody can contribute and make progress.
>>> >
>>> > Also, the general website organization has been discussed before,
>see
>>> this
>>> > thread [1] and the attached document for details.
>>> >
>>> > Davor
>>> >
>>> > [1]
>>> > https://mail-archives.apache.org/mod_mbox/beam-dev/201606.
>>> > mbox/%3CCAAzyFAwu992x+xcxN6Ha-avKZZbF-RK00mUg1-vezYCmtOm4Ww@
>>> > mail.gmail.com%3E
>>> >
>>> > On Sun, Oct 23, 2016 at 12:34 AM, Jean-Baptiste Onofré
>>> >
>>> > wrote:
>>> >
>>> > > Hi
>>> > >
>>> > > You can take a look on the PR I creates last Friday. It contains
>a
>>> > > CSS/skin proposal.
>>> > >
>>> > > The mock-up is there: http://maven.nanthrax.net/beam
>>> > >
>>> > > Regards
>>> > > JB
>>> > >
>>> > > ⁣​
>>> > >
>>> > > On Oct 23, 2016, 09:27, at 09:27, Abdullah Bashir <
>>> > mabdullah...@gmail.com>
>>> > > wrote:
>>> > > >Hi,
>>> > > >
>>> > > >is their any help i can do on website designing ?
>>> > > >I am good at HTML5, CSS3 and javascript.
>>> > > >
>>> > > >Regards,
>>> > > >Abdullah Bashir
>>> > >
>>> >
>>>
>>
>>


Re: [PROPOSAL] New Beam website design?

2016-10-27 Thread Abdullah Bashir
Thank you very much for taking time to respond Davor :)

Regarding BEAM-752, i can work on that, i have already built some Dataflow
Piplines on Google Cloud in Python language.

Again Can you tell me where to start for BEAM-752. I am new to ASF
contribution, so onboarding steps are kind of a black box to me :).

On Thu, Oct 27, 2016 at 11:34 AM, Davor Bonaci  wrote:

> Absolutely!
>
> I'm currently reviewing JB's PR #51, and that should go in shortly. Within
> a day or so, I should have a better idea about future work in this specific
> area; please stay tuned.
>
> There are also separate things that are ready to be started at any time.
> BEAM-752 comes to mind first. Is this something you'd be interested in?
>
> On Wed, Oct 26, 2016 at 11:17 PM, Abdullah Bashir 
> wrote:
>
>> Hi Davor,
>>
>> I am done with my local setup to start contributing, I have forked and
>> merged pull request *(**pull/51)* into my  local repo. Then I read the
>>
>> google docs, their are two tasks mentioned in it, as [Beam-500] and
>> [Beam-501].
>> I found out that [Beam-500] is closed in JIRA and [Beam-501] is
>> assigned to Jean-Baptiste
>> Onofré, Is their any task that you can assign to me ?
>>
>> Thanks.
>>
>> Regards,
>> Abdullah Bashir
>>
>>
>> On Tue, Oct 25, 2016 at 1:50 AM, Davor Bonaci  wrote:
>>
>> > Abdullah, welcome!
>> >
>> > I think it's rather clear we've been struggling with the website, so any
>> > help is very welcome. It is a little bit messy right now -- there are a
>> few
>> > outstanding pull requests and forked branches. I'm trying to get all
>> this
>> > into one place, so anybody can contribute and make progress.
>> >
>> > Also, the general website organization has been discussed before, see
>> this
>> > thread [1] and the attached document for details.
>> >
>> > Davor
>> >
>> > [1]
>> > https://mail-archives.apache.org/mod_mbox/beam-dev/201606.
>> > mbox/%3CCAAzyFAwu992x+xcxN6Ha-avKZZbF-RK00mUg1-vezYCmtOm4Ww@
>> > mail.gmail.com%3E
>> >
>> > On Sun, Oct 23, 2016 at 12:34 AM, Jean-Baptiste Onofré > >
>> > wrote:
>> >
>> > > Hi
>> > >
>> > > You can take a look on the PR I creates last Friday. It contains a
>> > > CSS/skin proposal.
>> > >
>> > > The mock-up is there: http://maven.nanthrax.net/beam
>> > >
>> > > Regards
>> > > JB
>> > >
>> > > ⁣​
>> > >
>> > > On Oct 23, 2016, 09:27, at 09:27, Abdullah Bashir <
>> > mabdullah...@gmail.com>
>> > > wrote:
>> > > >Hi,
>> > > >
>> > > >is their any help i can do on website designing ?
>> > > >I am good at HTML5, CSS3 and javascript.
>> > > >
>> > > >Regards,
>> > > >Abdullah Bashir
>> > >
>> >
>>
>
>


Re: [VOTE] Release 0.3.0-incubating, release candidate #1

2016-10-27 Thread Sergio Fernández
Hi JB,

On Tue, Oct 25, 2016 at 12:00 PM, Jean-Baptiste Onofré 
wrote:

> Thanks Sergio ;)
>

You are welcome.


> Just tried to explain to the others what is a binding vote ;)
>

It's a common mistake in many podlings that PPMC members thing they have
binding votes over developers who are not part of the project. But during
incubation only IPMC are binding votes. I hope that's clear.

In theory it's simple. So sorry if I've made some noise with that. I'll
repeat my vote later at general@incubator if you prefer it in that way.

Cheers,

P.S.: after 0.3.0-incubating, are you thinking about graduation? I think
you should ;-)



On Oct 25, 2016, 11:53, at 11:53, "Sergio Fernández" 
> wrote:
> >On Tue, Oct 25, 2016 at 11:36 AM, Jean-Baptiste Onofré
> >
> >wrote:
> >
> >> By the way, your vote is not binding from a podling perspective (you
> >are
> >> not PPMC). Your vote is binding from IPMC perspective (so when you
> >will
> >> vote on the incubator mailing list).
> >>
> >
> >Well, PPMC are never binding votes, only IPMC are actually binding.
> >That
> >I'm not part of the PPMC is not much relevant. Therefore I think my
> >vote is
> >still a valid binding one; but I can vote again on general@incubator,
> >no
> >problem.
> >
> >Sorry for jumping-in too early. Besides a IPMC, I'm also a developer
> >interested in Beam ;-)
> >
> >Cheers,
> >
> >
> >
> >
> >> On Oct 25, 2016, 11:33, at 11:33, "Sergio Fernández"
> >
> >> wrote:
> >> >+1 (binding)
> >> >
> >> >So far I've successfully checked:
> >> >* signatures and digests
> >> >* source releases file layouts
> >> >* matched git tags and commit ids
> >> >* incubator suffix and disclaimer
> >> >* NOTICE and LICENSE files
> >> >* license headers
> >> >* clean build (Java 1.8.0_91, Scala, 2.11.7, SBT 0.13.9, Debian
> >amd64)
> >> >
> >> >
> >> >Couple of minor issues I've seen it'd be great to have fixed in
> >> >upcoming
> >> >releases:
> >> >* MongoDbIOTest fails (addr already in use) when a Mongo service is
> >> >locally
> >> >running. I'd say the port should be random in the test suite.
> >> >* How did you generated the checksums? Because both SHA1/MD5 can't
> >be
> >> >automatically checked because "no properly formatted SHA1/MD5
> >checksum
> >> >lines found".
> >> >
> >> >Great to see the project moving forward at this speed :-)
> >> >
> >> >Cheers,
> >> >
> >> >
> >> >
> >> >On Mon, Oct 24, 2016 at 11:30 PM, Aljoscha Krettek
> >> >
> >> >wrote:
> >> >
> >> >> Hi Team!
> >> >>
> >> >> Please review and vote at your leisure on release candidate #1 for
> >> >version
> >> >> 0.3.0-incubating, as follows:
> >> >> [ ] +1, Approve the release
> >> >> [ ] -1, Do not approve the release (please provide specific
> >comments)
> >> >>
> >> >> The complete staging area is available for your review, which
> >> >includes:
> >> >> * JIRA release notes [1],
> >> >> * the official Apache source release to be deployed to
> >> >dist.apache.org
> >> >> [2],
> >> >> * all artifacts to be deployed to the Maven Central Repository
> >[3],
> >> >> * source code tag "v0.3.0-incubating-RC1" [4],
> >> >> * website pull request listing the release and publishing the API
> >> >reference
> >> >> manual [5].
> >> >>
> >> >> Please keep in mind that this release is not focused on providing
> >new
> >> >> functionality. We want to refine the release process and make
> >stable
> >> >source
> >> >> and binary artefacts available to our users.
> >> >>
> >> >> The vote will be open for at least 72 hours. It is adopted by
> >> >majority
> >> >> approval, with at least 3 PPMC affirmative votes.
> >> >>
> >> >> Cheers,
> >> >> Aljoscha
> >> >>
> >> >> [1]
> >> >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> >> >> projectId=12319527=12338051
> >> >> [2] https://dist.apache.org/repos/dist/dev/incubator/beam/0.3.0-
> >> >> incubating/
> >> >> [3]
> >> >> https://repository.apache.org/content/repositories/staging/
> >> >> org/apache/beam/
> >> >> [4]
> >> >>
> >https://git-wip-us.apache.org/repos/asf?p=incubator-beam.git;a=tag;h=
> >> >> 5d86ff7f04862444c266142b0d5acecb5a6b7144
> >> >> [5] https://github.com/apache/incubator-beam-site/pull/52
> >> >>
> >> >
> >> >
> >> >
> >> >--
> >> >Sergio Fernández
> >> >Partner Technology Manager
> >> >Redlink GmbH
> >> >m: +43 6602747925
> >> >e: sergio.fernan...@redlink.co
> >> >w: http://redlink.co
> >>
> >
> >
> >
> >--
> >Sergio Fernández
> >Partner Technology Manager
> >Redlink GmbH
> >m: +43 6602747925
> >e: sergio.fernan...@redlink.co
> >w: http://redlink.co
>



-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 6602747925
e: sergio.fernan...@redlink.co
w: http://redlink.co


Re: [VOTE] Release 0.3.0-incubating, release candidate #1

2016-10-27 Thread Sergio Fernández
Hi Aljoscha,

On Tue, Oct 25, 2016 at 12:06 PM, Aljoscha Krettek 
wrote:
>
> I used this line from the Apache release signing doc (
> https://www.apache.org/dev/release-signing.html#md5):
> $ gpg --print-md MD5 [fileName] > [fileName].md5
>
> What is normally used to checksum/verify the md5 and sha hash? As it is
> now, they can be manually verified by comparing the checksum in the file
> with one that was derived from the zip. I hope that's still ok for the
> release to go through?
>

Well, I think is just how the hash is formatted. Normally it's preferable
to have to the simple hash, as the maven-gpg-plugin generates (e.g.,
https://dist.apache.org/repos/dist/release/marmotta/3.3.0/apache-
marmotta-3.3.0-src.zip.md5). So you can easily automatically check (md5sum
-c foo.md5).

I've just outline a PR [1] to fix the issue [2], so we can discuss it over
the code.

Hope that helps.

Cheers,


[1] https://github.com/apache/incubator-beam/pull/1206
[2] https://issues.apache.org/jira/browse/BEAM-841



On Tue, 25 Oct 2016 at 11:53 Sergio Fernández  wrote:
>
> > On Tue, Oct 25, 2016 at 11:36 AM, Jean-Baptiste Onofré 
> > wrote:
> >
> > > By the way, your vote is not binding from a podling perspective (you
> are
> > > not PPMC). Your vote is binding from IPMC perspective (so when you will
> > > vote on the incubator mailing list).
> > >
> >
> > Well, PPMC are never binding votes, only IPMC are actually binding. That
> > I'm not part of the PPMC is not much relevant. Therefore I think my vote
> is
> > still a valid binding one; but I can vote again on general@incubator, no
> > problem.
> >
> > Sorry for jumping-in too early. Besides a IPMC, I'm also a developer
> > interested in Beam ;-)
> >
> > Cheers,
> >
> >
> >
> >
> > > On Oct 25, 2016, 11:33, at 11:33, "Sergio Fernández" <
> wik...@apache.org>
> > > wrote:
> > > >+1 (binding)
> > > >
> > > >So far I've successfully checked:
> > > >* signatures and digests
> > > >* source releases file layouts
> > > >* matched git tags and commit ids
> > > >* incubator suffix and disclaimer
> > > >* NOTICE and LICENSE files
> > > >* license headers
> > > >* clean build (Java 1.8.0_91, Scala, 2.11.7, SBT 0.13.9, Debian amd64)
> > > >
> > > >
> > > >Couple of minor issues I've seen it'd be great to have fixed in
> > > >upcoming
> > > >releases:
> > > >* MongoDbIOTest fails (addr already in use) when a Mongo service is
> > > >locally
> > > >running. I'd say the port should be random in the test suite.
> > > >* How did you generated the checksums? Because both SHA1/MD5 can't be
> > > >automatically checked because "no properly formatted SHA1/MD5 checksum
> > > >lines found".
> > > >
> > > >Great to see the project moving forward at this speed :-)
> > > >
> > > >Cheers,
> > > >
> > > >
> > > >
> > > >On Mon, Oct 24, 2016 at 11:30 PM, Aljoscha Krettek
> > > >
> > > >wrote:
> > > >
> > > >> Hi Team!
> > > >>
> > > >> Please review and vote at your leisure on release candidate #1 for
> > > >version
> > > >> 0.3.0-incubating, as follows:
> > > >> [ ] +1, Approve the release
> > > >> [ ] -1, Do not approve the release (please provide specific
> comments)
> > > >>
> > > >> The complete staging area is available for your review, which
> > > >includes:
> > > >> * JIRA release notes [1],
> > > >> * the official Apache source release to be deployed to
> > > >dist.apache.org
> > > >> [2],
> > > >> * all artifacts to be deployed to the Maven Central Repository [3],
> > > >> * source code tag "v0.3.0-incubating-RC1" [4],
> > > >> * website pull request listing the release and publishing the API
> > > >reference
> > > >> manual [5].
> > > >>
> > > >> Please keep in mind that this release is not focused on providing
> new
> > > >> functionality. We want to refine the release process and make stable
> > > >source
> > > >> and binary artefacts available to our users.
> > > >>
> > > >> The vote will be open for at least 72 hours. It is adopted by
> > > >majority
> > > >> approval, with at least 3 PPMC affirmative votes.
> > > >>
> > > >> Cheers,
> > > >> Aljoscha
> > > >>
> > > >> [1]
> > > >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> > > >> projectId=12319527=12338051
> > > >> [2] https://dist.apache.org/repos/dist/dev/incubator/beam/0.3.0-
> > > >> incubating/
> > > >> [3]
> > > >> https://repository.apache.org/content/repositories/staging/
> > > >> org/apache/beam/
> > > >> [4]
> > > >> https://git-wip-us.apache.org/repos/asf?p=incubator-beam.git
> ;a=tag;h=
> > > >> 5d86ff7f04862444c266142b0d5acecb5a6b7144
> > > >> [5] https://github.com/apache/incubator-beam-site/pull/52
> > > >>
> > > >
> > > >
> > > >
> > > >--
> > > >Sergio Fernández
> > > >Partner Technology Manager
> > > >Redlink GmbH
> > > >m: +43 6602747925 <+43%20660%202747925>
> > > >e: sergio.fernan...@redlink.co
> > > >w: http://redlink.co
> > >
> >
> >
> >
> > --
> > Sergio Fernández
> > Partner Technology Manager
> > Redlink GmbH
> > m: 

Re: [PROPOSAL] New Beam website design?

2016-10-27 Thread Abdullah Bashir
Hi Davor,

I am done with my local setup to start contributing, I have forked and
merged pull request *(**pull/51)* into my  local repo. Then I read the
google docs, their are two tasks mentioned in it, as [Beam-500] and
[Beam-501].
I found out that [Beam-500] is closed in JIRA and [Beam-501] is
assigned to Jean-Baptiste
Onofré, Is their any task that you can assign to me ?

Thanks.

Regards,
Abdullah Bashir


On Tue, Oct 25, 2016 at 1:50 AM, Davor Bonaci  wrote:

> Abdullah, welcome!
>
> I think it's rather clear we've been struggling with the website, so any
> help is very welcome. It is a little bit messy right now -- there are a few
> outstanding pull requests and forked branches. I'm trying to get all this
> into one place, so anybody can contribute and make progress.
>
> Also, the general website organization has been discussed before, see this
> thread [1] and the attached document for details.
>
> Davor
>
> [1]
> https://mail-archives.apache.org/mod_mbox/beam-dev/201606.
> mbox/%3CCAAzyFAwu992x+xcxN6Ha-avKZZbF-RK00mUg1-vezYCmtOm4Ww@
> mail.gmail.com%3E
>
> On Sun, Oct 23, 2016 at 12:34 AM, Jean-Baptiste Onofré 
> wrote:
>
> > Hi
> >
> > You can take a look on the PR I creates last Friday. It contains a
> > CSS/skin proposal.
> >
> > The mock-up is there: http://maven.nanthrax.net/beam
> >
> > Regards
> > JB
> >
> > ⁣​
> >
> > On Oct 23, 2016, 09:27, at 09:27, Abdullah Bashir <
> mabdullah...@gmail.com>
> > wrote:
> > >Hi,
> > >
> > >is their any help i can do on website designing ?
> > >I am good at HTML5, CSS3 and javascript.
> > >
> > >Regards,
> > >Abdullah Bashir
> >
>


Re: [DISCUSS] Using Verbs for Transforms

2016-10-27 Thread Jean-Baptiste Onofré
Just to clarify. Davor is right for a code modification change: -1 means a veto.
I meant that -1 is not a veto for a release vote.

Anyway, even if it's not a formal code, we can have a discussion with "options" 
a,b and c.

Regards
JB

⁣​

On Oct 27, 2016, 06:48, at 06:48, Davor Bonaci  wrote:
>In terms of reaching a decision on any code or design changes,
>including
>this one, I'd suggest going without formal votes. Voting process for
>code
>modifications between choices A and B doesn't necessarily end with a
>decision A or B -- a single (qualified) -1 vote is a veto and cannot be
>overridden [1]. Said differently, the guideline is that code changes
>should
>be made by consensus; not by one group outvoting another. I'd like to
>avoid
>setting such precedent; we should try to drive consensus, as opposed to
>attempting to outvote another part of the community.
>
>In this particular case, we have had a great discussion. Many
>contributors
>brought different perspectives. Consequently, some opinions have been
>likely changed. At this point, someone should summarize the arguments,
>try
>to critique them from a neutral standpoint, and suggest a refined
>proposal
>that takes these perspectives into account. If nobody objects in a
>short
>time, we should consider this decided. [ I can certainly help here, but
>I'd
>love to see somebody else do it! ]
>
>[1] http://www.apache.org/foundation/voting.html
>
>On Wed, Oct 26, 2016 at 7:35 AM, Ben Chambers
>
>wrote:
>
>> I also like Distinct since it doesn't make it sound like it modifies
>any
>> underlying collection. RemoveDuplicates makes it sound like the
>duplicates
>> are removed, rather than a new PCollection without duplicates being
>> returned.
>>
>> On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré 
>> wrote:
>>
>> > Agree. It was more a transition proposal.
>> >
>> > Regards
>> > JB
>> >
>> > ⁣​
>> >
>> > On Oct 26, 2016, 08:31, at 08:31, Robert Bradshaw
>> >  wrote:
>> > >On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré
>> > > wrote:
>> > >> And what about use RemoveDuplicates and create an alias Distinct
>?
>> > >
>> > >I'd really like to avoid (long term) aliases--you end up having to
>> > >document (and maintain) them both, and it adds confusion as to
>which
>> > >one to use (especially if they every diverge), and means searching
>for
>> > >one or the other yields half the results.
>> > >
>> > >> It doesn't break the API and would address both SQL users and
>more
>> > >"big data" users.
>> > >>
>> > >> My $0.01 ;)
>> > >>
>> > >> Regards
>> > >> JB
>> > >>
>> > >> ⁣
>> > >>
>> > >> On Oct 24, 2016, 22:23, at 22:23, Dan Halperin
>> > > wrote:
>> > >>>I find "MakeDistinct" more confusing. My votes in decreasing
>> > >>>preference:
>> > >>>
>> > >>>1. Keep `RemoveDuplicates` name, ensure that important keywords
>are
>> > >in
>> > >>>the
>> > >>>Javadoc. This reduces churn on our users and is honestly pretty
>dang
>> > >>> descriptive.
>> > >>>2. Rename to `Distinct`, which is clear if you're a SQL user and
>> > >likely
>> > >>>less clear otherwise. This is a backwards-incompatible API
>change, so
>> > >>>we
>> > >>>should do it before we go stable.
>> > >>>
>> > >>>I am not super strong that 1 > 2, but I am very strong that
>> > >"Distinct"
>> > >>
>> > >>>"MakeDistinct" or and "RemoveDuplicates" >>> "AvoidDuplicate".
>> > >>>
>> > >>>Dan
>> > >>>
>> > >>>On Mon, Oct 24, 2016 at 10:12 AM, Kenneth Knowles
>> > >>>
>> > >>>wrote:
>> > >>>
>> >  The precedent that we use verbs has many exceptions. We have
>> >  ApproximateQuantiles, Values, Keys, WithTimestamps, and I
>would
>> > >even
>> >  include Sum (at least when I read it).
>> > 
>> >  Historical note: the predilection towards verbs is from the
>Google
>> > >>>Style
>> >  Guide for Java method names
>> > 
>> > >>>> 2.3-method-names
>> > >,
>> >  which states "Method names are typically verbs or verb
>phrases".
>> > >But
>> > >>>even
>> >  in Google code there are lots of exceptions when it makes
>sense,
>> > >like
>> >  Guava's
>> >  Iterables.any(), Iterables.all(), Iterables.toArray(), the
>entire
>> >  Predicates module, etc. Just an aside; Beam isn't Google code.
>I
>> > >>>suggest we
>> >  use our judgment rather than a policy.
>> > 
>> >  I think "Distinct" is one of those exceptions. It is a
>standard
>> > >>>widespread
>> >  name and also reads better as an adjective. I prefer it, but
>also
>> > >>>don't
>> >  care strongly enough to change it or to change it back :-)
>> > 
>> >  If we must have a verb, I like it as-is more than MakeDistinct
>and
>> >  AvoidDuplicate.
>> > 
>> >  On Mon, Oct 24, 2016 at 9:46 AM Jesse Anderson
>> > >>>
>> >