Re: Community hackathon

2017-04-26 Thread Tibor Kiss
I've created a meetup in Budapest for this event:
https://www.meetup.com/futureofdata-budapest/events/239504356/

We (folks from the Hortonworks office @ Budapest) will try to prep a demo
for the event and if time allows we'll jump into open issues.

On Tue, Apr 25, 2017 at 7:54 AM, Davor Bonaci  wrote:

> Thanks everyone for the enthusiasm!
>
> Let's go with this Wednesday, 4/26, starting at 10 AM Pacific time, and
> running for the following 24 hours. I'll try to seed the
> instructions/starting point, and then let's take it from there.
>
> (Michael, invite sent.)
>
> Davor
>
> On Mon, Apr 24, 2017 at 7:47 PM, Michael Huston 
> wrote:
>
> > Could you please add me to the Slack channel also? My apologizes for the
> > noise on this mailing list and if there is a better way to request
> access.
> >
> > Cheers,
> > Michael
> >
> > On Mon, Apr 24, 2017 at 6:15 PM, Lukasz Cwik 
> > wrote:
> >
> > > Dylan, sent you invite to slack channel.
> > >
> > > On Mon, Apr 24, 2017 at 5:18 PM, Dylan Raithel  >
> > > wrote:
> > >
> > > > Can you please add me to the Slack channel?
> > > >
> > > > On Apr 24, 2017 12:51 AM, "Jean-Baptiste Onofré" 
> > > wrote:
> > > >
> > > > > That's a wonderful idea !
> > > > >
> > > > > I think the easiest way to organize this event is using the Slack
> > > > channels
> > > > > to discuss, help each other, and sync together.
> > > > >
> > > > > Regards
> > > > > JB
> > > > >
> > > > > On 04/24/2017 09:48 AM, Davor Bonaci wrote:
> > > > >
> > > > >> We've been working as a community towards the first stable release
> > > for a
> > > > >> while now, and I think we made a ton of progress across the board
> > over
> > > > the
> > > > >> last few weeks.
> > > > >>
> > > > >> We could try to organize a community-wide hackathon to identify
> and
> > > fix
> > > > >> those last few issues, as well as to get a better sense of the
> > overall
> > > > >> project quality as it stands right now.
> > > > >>
> > > > >> This could be a self-organized event, and coordinated via the
> Slack
> > > > >> channel. For example, we (as a community and participants) can try
> > out
> > > > the
> > > > >> project in various ways -- quickstart, examples, different
> runners,
> > > > >> different platforms -- immediately fixing issues as we run into
> > them.
> > > It
> > > > >> could last, say, 24 hours, with people from different time zones
> > > > >> participating at the time of their choosing.
> > > > >>
> > > > >> Thoughts?
> > > > >>
> > > > >> Davor
> > > > >>
> > > > >>
> > > > > --
> > > > > Jean-Baptiste Onofré
> > > > > jbono...@apache.org
> > > > > http://blog.nanthrax.net
> > > > > Talend - http://www.talend.com
> > > > >
> > > >
> > >
> >
>



-- 
Kiss Tibor

+36 70 275 9863
tibor.k...@gmail.com


Re: [PROPOSAL] ORC support

2017-04-03 Thread Tibor Kiss
Thanks for your replies, I've created
https://issues.apache.org/jira/browse/BEAM-1861 to track this effort.

On Sun, Apr 2, 2017 at 7:40 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> +1
>
> By the way, around the same topic, I'm working on Apache CarbonData
> support (http://carbondata.apache.org/).
>
> Regards
> JB
>
>
> On 04/01/2017 05:31 PM, Tibor Kiss wrote:
>
>> Hello,
>>
>> Recently the Optimized Row Columnar (ORC) file format was spin off from
>> Hive
>> and became a top level Apache Project: https://orc.apache.org/
>>
>> It is similar to Parquet in a sense that it uses column major format but
>> ORC has
>> a more elaborate type system and stores basic statistics about each row.
>>
>> I'd be interested extending Beam with ORC support if others find it
>> helpful
>> too.
>>
>> What do you think?
>>
>> - Tibor
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>



-- 
Kiss Tibor

+36 70 275 9863
tibor.k...@gmail.com


Re: Update of Pei in Alibaba

2017-04-01 Thread Tibor Kiss
Exciting times, looking forward to try it out!

I shall mention that Taylor Goetz also started creating a BEAM runner using
Storm.
His work is available in the storm repo:
https://github.com/apache/storm/commits/beam-runner
Maybe it's worth while to take a peek and see if something is reusable from
there.

- Tibor

On Sat, Apr 1, 2017 at 4:37 AM, JingsongLee  wrote:

> Wow, very glad to see JStorm also started building BeamRunner.
> I am working in Galaxy (Another streaming process engine in Alibaba).
> I hope that we can work together to promote the use of Apache Beam
> in Alibaba and China.
>
> best,
> JingsongLee
> --From:Pei
> HE Time:2017 Apr 1 (Sat) 09:24To:dev <
> dev@beam.apache.org>Subject:Update of Pei in Alibaba
> Hi all,
> On February, I moved from Seattle to Hangzhou, China, and joined Alibaba.
> And, I want to give an update of things in here.
>
> A colleague and I have been working on JStorm
>  runner. We have a prototype that works
> with WordCount and PAssert. (I am going to start a separate email thread
> about how to get it reviewed and merged in Apache Beam.)
> We also have Spark clusters, and are very interested in
> using Spark runner.
>
> Last Saturday, I went to China Hadoop Summit, and gave a talk about Apache
> Beam model. While many companies gave talks of their
> in-house solutions for
> unified batch and unified SQL, there are also lots of interests
> and enthusiasts of Beam.
>
> Looking forward to chat more.
> --
> Pei
>
>


-- 
Kiss Tibor

+36 70 275 9863
tibor.k...@gmail.com


Re: [DISCUSSION] Consistent use of loggers

2017-03-22 Thread Tibor Kiss
This is a great idea!

I believe Python-SDK's logging could also be enhanced (a bit differently):
Currently we are not instantiating the logger, just using the class what 
logging package provides.
Shortcoming of this approach is that the user cannot set the log level on a per 
module basis as all log messages
end up in the root level.

On 3/22/17, 5:46 AM, "Aviem Zur"  wrote:

+1 to what JB said.

Will just have to be documented well as if we provide no binding there will
be no logging out of the box unless the user adds a binding.

On Wed, Mar 22, 2017 at 6:24 AM Jean-Baptiste Onofré 
wrote:

> Hi Aviem,
>
> Good point.
>
> I think, in our dependencies set, we should just depend to slf4j-api and
> let the
> user provides the binding he wants (slf4j-log4j12, slf4j-simple, 
whatever).
>
> We define a binding only with test scope in our modules.
>
> Regards
> JB
>
> On 03/22/2017 04:58 AM, Aviem Zur wrote:
> > Hi all,
> >
> > There have been a few reports lately (On JIRA [1] and on Slack) from
> users
> > regarding inconsistent loggers used across Beam's modules.
> >
> > While we use SLF4J, different modules use a different logger behind it
> > (JUL, log4j, etc)
> > So when people add a log4j.properties file to their classpath for
> instance,
> > they expect this to affect all of their dependencies on Beam modules, 
but
> > it doesn’t and they miss out on some logs they thought they would see.
> >
> > I think we should strive for consistency in which logger is used behind
> > SLF4J, and try to enforce this in our modules.
> > I for one think it should be slf4j-log4j. However, if performance of
> > logging is critical we might want to consider logback.
> >
> > Note: SLF4J will still be the facade for logging across the project. The
> > only change would be the logger SLF4J delegates to.
> >
> > Once we have something like this it would also be useful to add
> > documentation on logging in Beam to the website.
> >
> > [1] https://issues.apache.org/jira/browse/BEAM-1757
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>




Release notes snapshotting to website

2017-03-19 Thread Tibor Kiss
Hello,

I’d like to propose to put the ‘Release notes’ as a separate document into the 
website.

Currently we are providing the notes through JIRA search. If someone 
(accidentally) sets a ticket’s fix version to an already released version it 
will pop-up in the linked search later.
Snapshotting to the website would resolve such problems.

What do you think?

- Tibor


Begin forwarded message:

From: Ahmet Altay >
Subject: Apache Beam, version 0.6.0 with Python SDK
Date: March 17, 2017 at 6:19:26 AM GMT+1
To: u...@beam.apache.org
Reply-To: >

The Apache Beam community is pleased to announce the availability of the 0.6.0 
release [1].

This release introduces a new SDK for the Python programming language [2]. 
Additionally, the release adds a new IO connector for Apache HBase in the Java 
SDK, along with a usual batch of bug fixes and improvements. Finally, several 
runners improved their support for the Beam model, including support for the 
recently-introduced State and Timer API, and Beam’s connectors to distributed 
file systems. For all major changes in this release, please refer to the 
release notes [3].

The 0.6.0 release is now the recommended version; we encourage everyone to 
upgrade from any earlier releases.

We thank all users and contributors who have helped make this release possible. 
If you haven't already, we'd like to invite you to join us, as we work towards 
our first release with API stability.

- Ahmet Altay, on behalf of the Apache Beam community.

[1] https://beam.apache.org/get-started/downloads/
[2] https://beam.apache.org/blog/2017/03/16/python-sdk-release.html
[3] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12339256



Re: [ANNOUNCEMENT] New committers, March 2017 edition!

2017-03-19 Thread Tibor Kiss
Congratulations / gratulalok! ;)


> On Mar 19, 2017, at 7:42 AM, Aljoscha Krettek  wrote:
> 
> Congrats, everyone! :-)
> 
> On Sat, Mar 18, 2017, at 16:09, Stas Levin wrote:
>> Congrats to the new committers!
>> 
>> On Sat, Mar 18, 2017 at 3:44 PM Aviem Zur  wrote:
>> 
>> Thanks all! Very excited to join.
>> Congratulations to other new committers!
>> 
>> On Sat, Mar 18, 2017 at 2:17 AM Thomas Weise 
>> wrote:
>> 
>>> Congrats!
>>> 
>>> 
>>> On Fri, Mar 17, 2017 at 4:28 PM, Chamikara Jayalath 
>>> wrote:
>>> 
 Thanks all. Congrats to other new committers !!
 
 I'm very excited to join.
 
 - Cham
 
 On Fri, Mar 17, 2017 at 3:02 PM Mark Liu 
 wrote:
 
> Congrats to all of them!
> 
> On Fri, Mar 17, 2017 at 2:24 PM, Neelesh Salian <
 neeleshssal...@gmail.com>
> wrote:
> 
>> Congratulations!
>> 
>> On Fri, Mar 17, 2017 at 2:16 PM, Kenneth Knowles
 > 
>> wrote:
>> 
>>> Congrats all!
>>> 
>>> On Fri, Mar 17, 2017 at 2:13 PM, Davor Bonaci 
> wrote:
>>> 
 Please join me and the rest of Beam PMC in welcoming the
>>> following
 contributors as our newest committers. They have significantly
>>> contributed
 to the project in different ways, and we look forward to many
>>> more
 contributions in the future.
 
 * Chamikara Jayalath
 Chamikara has been contributing to Beam since inception, and
> previously
>>> to
 Google Cloud Dataflow, accumulating a total of 51 commits (8,301
 ++ /
>>> 3,892
 --) since February 2016 [1]. He contributed broadly to the
>>> project,
> but
 most significantly to the Python SDK, building the IO framework
>>> in
> this
>>> SDK
 [2], [3].
 
 * Eugene Kirpichov
 Eugene has been contributing to Beam since inception, and
 previously
> to
 Google Cloud Dataflow, accumulating a total of 95 commits
>> (22,122
 ++
> /
 18,407 --) since February 2016 [1]. In recent months, he’s been
> driving
>>> the
 Splittable DoFn effort [4]. A true expert on IO subsystem,
>> Eugene
 has
 reviewed nearly every IO contributed to Beam. Finally, Eugene
>> contributed
 the Beam Style Guide, and is championing it across the project.
 
 * Ismaël Mejia
 Ismaël has been contributing to Beam since mid-2016,
>>> accumulating a
>> total
 of 35 commits (3,137 ++ / 1,328 --) [1]. He authored the HBaseIO
>>> connector,
 helped on the Spark runner, and contributed in other areas as
>>> well,
 including cross-project collaboration with Apache Zeppelin.
>>> Ismaël
>>> reported
 24 Jira issues.
 
 * Aviem Zur
 Aviem has been contributing to Beam since early fall,
>>> accumulating
 a
>>> total
 of 49 commits (6,471 ++ / 3,185 --) [1]. He reported 43 Jira
 issues,
>> and
 resolved ~30 issues. Aviem improved the stability of the Spark
> runner a
 lot, and introduced support for metrics. Finally, Aviem is
> championing
 dependency management across the project.
 
 Congratulations to all four! Welcome!
 
 Davor
 
 [1]
 https://github.com/apache/beam/graphs/contributors?from=
 2016-02-01=2017-03-17=c
 [2]
 https://github.com/apache/beam/blob/v0.6.0/sdks/python/
 apache_beam/io/iobase.py#L70
 [3]
 https://github.com/apache/beam/blob/v0.6.0/sdks/python/
 apache_beam/io/iobase.py#L561
 [4] https://s.apache.org/splittable-do-fn
 
>>> 
>> 
>> 
>> 
>> --
>> Regards,
>> Neelesh S. Salian
>> 
> 
 
>>> 
> 



Re: Beam File System in the Python SDK

2017-03-19 Thread Tibor Kiss
Thanks for putting this together, Sourabh!
I made two comments in the document (error handling, with statement).

Are there any plans to support permissions (mode bits or acls) in the FS API?
I believe most (if not all) of the underlying filesystems support (some sort 
of) permission enforcement.

Thanks,
Tibor

> On Mar 17, 2017, at 10:14 PM, Sourabh Bajaj  
> wrote:
> 
> Wanted to share the design proposal
> 
> for the Beam File System API in python. I have marked the places where it
> might be slightly different from the current Java implementation, mainly
> around error handling. As always feedback and comments are welcome.
> 
> Thanks
> Sourabh
> 
> On Wed, Mar 1, 2017 at 4:44 PM Chamikara Jayalath 
> wrote:
> 
>> Great! Thanks Sourabh.
>> 
>> - Cham
>> 
>> On Wed, Mar 1, 2017 at 3:58 PM Robert Bradshaw >> 
>> wrote:
>> 
>>> Much needed! Added a couple of comments.
>>> 
>>> On Wed, Mar 1, 2017 at 3:08 PM, Sourabh Bajaj <
>>> sourabhba...@google.com.invalid> wrote:
>>> 
 Hi,
 
 BEAM-1441  is a
>> ticket
 for
 implementing the Beam File System in the Python SDK similar to the one
 introduced in BEAM-59 .
>> I
 tried to take a pass on the implementation in #2136
  and followed the Java API
>> as
 closely as possible. Please feel free to give your comments here or on
>>> the
 pull request directly.
 
 Reference: Original design doc
 
 
 
 Thanks
 Sourabh
 
>>> 
>> 



Re: [VOTE] Release 0.6.0, release candidate #2

2017-03-12 Thread Tibor Kiss
+1

Details:
 - Built locally with 'mvn clean install –Prelease'
 - Needed to change pip2 to pip2.7. This issue is known on OS X and shall not 
block the release in my opinion.
 - Sanity check on Python-SDK:
 - apache-beam-0.6.0.tar.gz has the same content as apache-beam-0.6.0.zip
 - ran wordcount, estimate_pi examples
 - Checked code from getting started guide. The document is not 100% 
accurate, but that shall not hold up the release.

Thanks for putting this RC together!

- Tibor

On 3/11/17, 6:05 AM, "Ahmet Altay"  wrote:

Hi everyone,

Please review and vote on the release candidate #2 for the version 0.6.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint 6096FA00 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v0.6.0-RC2" [5],
* website pull request listing the release and publishing the API reference
manual [6].
* python artifacts are deployed along with the source release to to
dist.apache.org [2].

A suite of Jenkins jobs:
* PreCommit_Java_MavenInstall [7],
* PostCommit_Java_MavenInstall [8],
* PostCommit_Java_RunnableOnService_Apex [9],
* PostCommit_Java_RunnableOnService_Flink [10],
* PostCommit_Java_RunnableOnService_Spark [11],
* PostCommit_Java_RunnableOnService_Dataflow [12]
* PostCommit_Python_Verify [13]

Compared to release candidate #1, this candidate contains pull requests
#2217 [14], #2221 [15], # [16], #2224 [17], and #2225 [18]; see the
discussion for reasoning.

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Ahmet

[1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?proje
ctId=12319527=12339256
[2] https://dist.apache.org/repos/dist/dev/beam/0.6.0/
[3] https://dist.apache.org/repos/dist/dev/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1013/
[5] https://git-wip-us.apache.org/repos/asf?p=beam.git;a=tag;h=r
efs/tags/v0.6.0-RC2
[6] https://github.com/apache/beam-site/pull/175
[7] https://builds.apache.org/view/Beam/job/beam_PreCommit_Java_
MavenInstall/8340/
[8] https://builds.apache.org/view/Beam/job/beam_PostCommit_
Java_MavenInstall/2877/
[9] https://builds.apache.org/view/Beam/job/beam_PostCommit_Java
_RunnableOnService_Apex/736/
[10] https://builds.apache.org/view/Beam/job/beam_PostCommit_Java
_RunnableOnService_Flink/1895/
[11] https://builds.apache.org/view/Beam/job/beam_PostCommit_Java
_RunnableOnService_Spark/1207/
[12] https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_
RunnableOnService_Dataflow/2526/
[13] https://builds.apache.org/view/Beam/job/beam_PostCommit_Pyth
on_Verify/1481/
[14] https://github.com/apache/beam/pull/2217
[15] https://github.com/apache/beam/pull/2221
[16] https://github.com/apache/beam/pull/
[17] https://github.com/apache/beam/pull/2224
[18] https://github.com/apache/beam/pull/2225