Re: A personal update

2017-12-13 Thread Ismaël Mejía
Hello Davor, great to know you are going to continue contributing to the project. Welcome back and best of wishes for this new phase ! On Wed, Dec 13, 2017 at 3:12 PM, Kenneth Knowles wrote: > Great to have you back! > > On Tue, Dec 12, 2017 at 11:20 PM, Robert Bradshaw > wrote: >> >> Great to h

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2017-12-11 Thread Ismaël Mejía
Hello, I wanted to bring back this subject because I think we should take action on this and at least first have a shaded version of guava. I was playing with a toy project and I did the procedure we use to submit jars to a Hadoop cluster via Flink/Spark which involves creating an uber jar and I re

Re: Introduction + interest in helping Beam builds, tests, and releases

2017-12-08 Thread Ismaël Mejía
Welcome Alan looking forward to your help and in particular to have ways to validate our releases with Hadoop's YARN too (Dataproc). If I can add an extra point it would be to have also some 'backwards' compatible version of Holden's wish. So we can test for example the releases with previous versi

Re: Apache Ignite as a distributed processing back-ends

2017-12-08 Thread Ismaël Mejía
Hello Denis, This is really gret news, I think Ignite can be integrated on Beam as an IO in that case Beam developers will read/write their data from/to Ignite from their data processing pipelines. You can take a look at some of the existing IOs for ideas and follow the Ptransform guide for style

Re: Apache Beam, version 2.2.0

2017-12-08 Thread Ismaël Mejía
t >>>> though.) >>>> >>>> On Wed, Dec 6, 2017 at 9:09 AM, Eugene Kirpichov >>>> wrote: >>>> > Okay, then let's go forward. Seems that we should: >>>> > - Open a new poll on user@, in light of 2.2 having been released >

Re: Apache Beam, version 2.2.0

2017-12-06 Thread Ismaël Mejía
:j...@nanthrax.net>>> wrote: >> > >> > My apologizes, I thought we had a consensus already. >> > >> > Regards >> > JB >> > >>

Re: [DISCUSS] Updating contribution guide for gitbox

2017-11-29 Thread Ismaël Mejía
+1 to Kenneth proposal, using reviewer and asignee, for the merge strategy (a) +1 with the same arguments (preserving commits when they are meaningful and isolated, ask committers to do extra squash if needed. I don't really favor having one big commit per PR (in particular if the change is big) b

Re: [DISCUSS] Thinking about Beam 3.x roadmap and release schedule

2017-11-29 Thread Ismaël Mejía
It is good to see so much enthusiasm about the future of Beam independently of the fact that we call it Beam 3 or no. I have some doubts about the idea of a release per month, Apache releases are designed to be slow-pace (via the 3-day voting process). It is just a question that we have in the sam

Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-27 Thread Ismaël Mejía
I have been a little bit out of the discussion on maven vs gradle because I was expecting the technical proof of concepts to evaluate the best approach. I deeply appreciate all the effort that Lukasz has put into the gradle version, and I also think that during the discussion Romain and others have

Re: [RESULT][VOTE] Migrate to gitbox

2017-11-23 Thread Ismaël Mejía
If github already does the notifications, I think that having an extra notifications/reviews mailing list could be overkill (or spammy). However I can see the value of this for archival reasons, e.g. to store the history of the project comments out of github for the future. +1 for new mailing list

Re: [VOTE] Fixing @yyy.com.INVALID mailing addresses

2017-11-23 Thread Ismaël Mejía
+1 On Thu, Nov 23, 2017 at 6:35 AM, Robert Bradshaw wrote: > +1 > > On Wed, Nov 22, 2017, 10:10 PM Jean-Baptiste Onofré wrote: > >> +1 >> >> Regards >> JB >> >> On 11/23/2017 12:25 AM, Lukasz Cwik wrote: >> > I have noticed that some e-mail addresses (notably @google.com) get >> > .INVALID suffi

Re: [VOTE] Choose the "new" Spark runner

2017-11-20 Thread Ismaël Mejía
Moving my vote from previous threads: [ ] Use Spark 1 & Spark 2 Support Branch [X] Use Spark 2 Only Branch Ismaël On Thu, Nov 16, 2017 at 2:08 PM, Jean-Baptiste Onofré wrote: > Hi guys, > > To illustrate the current discussion about Spark versions support, you can > take a look on: > > -- > Sp

Re: New Contributor

2017-11-14 Thread Ismaël Mejía
Great news, Welcome Axel and Ben ! On Tue, Nov 14, 2017 at 11:46 PM, Reuven Lax wrote: > Welcome both of you! > > On Wed, Nov 15, 2017 at 6:14 AM, Griselda Cuevas > wrote: > >> Welcome guys! >> >> On 14 November 2017 at 13:11, Jesse Anderson >> wrote: >> >> > Welcome! >> > >> > On Tue, Nov 14,

Re: [VOTE] Drop Spark 1.x support to focus on Spark 2.x

2017-11-09 Thread Ismaël Mejía
+1 for the move to Spark 2 modulo preventing users and deciding on support: I agree that having compatibility for both versions of Spark is desirable but I am not sure if is worth the effort. Apart of the reasons mentioned by Holden and Pei, I will add that the burden of simultaneous maintenance c

Re: [VOTE] Release 2.2.0, release candidate #2

2017-11-08 Thread Ismaël Mejía
I tested the python version of the release I just created a new virtualenv and run python setup.py install and it gave me this message: Traceback (most recent call last): File "setup.py", line 203, in 'test': generate_protos_first(test), File "/usr/lib/python2.7/distutils/core.py", line

Re: [VOTE] Release 2.2.0, release candidate #2

2017-11-03 Thread Ismaël Mejía
I found some issues during the vote validation (not sure if those would require a new vote since most seem to be packaging related and we can get with it by removing the extra stuff that ended up in the zip files): 1. I inspected the apache-beam-2.2.0-source-release.zip file and was a bit surprise

Re: [Proposal] Sharing Neville's post and upcoming meetups in the Twitter handle

2017-10-23 Thread Ismaël Mejía
Has anybody thought about getting some Beam 'swag' for these events (the meetups + conference talks) ? So far I have seen some Beam stickers around but it would be really nice to have some other items: t-shirts, mugs, socks, corkscrews, webcam covers, whatever. People seem to love these things and

Re: Possibility of requiring Java 8 compiler for building Java 7 sources?

2017-10-17 Thread Ismaël Mejía
> > >> > > > > Kafka is considering it: >> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- >> > > > > 118%3A+Drop+Support+for+Java+7+in+Kafka+0.11 >> > > > > and >> > > > > quotes a number of othe

Re: Possibility of requiring Java 8 compiler for building Java 7 sources?

2017-10-16 Thread Ismaël Mejía
g. >>>> >>>> As for other ways to solve this, if there is a way to avoid compiling the >>>> advanced features of AutoValue that might be worth a try. We could also >>>> try >>>> to get a release of AutoValue with the fix that works in Java

Re: CouchDbIO connector in beam io

2017-10-12 Thread Ismaël Mejía
This is an interesting one please go ahead and create the JIRA. Maybe it is a good idea that you ping Seshadri and the guys who were interested in implementing CouchbaseIO https://issues.apache.org/jira/browse/BEAM-1893 I totally ignore if the APIs of CouchDb and Couchbase are similar but if they

Re: [VOTE] Migrate to gitbox

2017-10-10 Thread Ismaël Mejía
+1 (non-binding) On Tue, Oct 10, 2017 at 10:42 AM, Aljoscha Krettek wrote: > +1 > >> On 10. Oct 2017, at 09:42, Jean-Baptiste Onofré wrote: >> >> Hi all, >> >> following the discussion, here's the formal vote to migrate to gitbox: >> >> [ ] +1, Approve to migrate to gitbox >> [ ] -1, Do not migr

Re: [DISCUSS] Switch to gitbox

2017-10-09 Thread Ismaël Mejía
+1 On Oct 9, 2017 6:52 PM, "Thomas Groh" wrote: > +1. > > I do love myself a forcing function for passing tests. > > On Mon, Oct 9, 2017 at 7:51 AM, Aljoscha Krettek > wrote: > > > +1 > > > > > On 6. Oct 2017, at 18:57, Kenneth Knowles > > wrote: > > > > > > Sounds great. If I recall correctly

Re: Possibility of requiring Java 8 compiler for building Java 7 sources?

2017-09-26 Thread Ismaël Mejía
issue is moot? I didn't quite understand from your email whether it is > a compilation blocker for Beam or not. > > On Tue, Sep 26, 2017 at 2:32 PM Ismaël Mejía wrote: > >> Great that you are also working on this too Daniel and thanks for >> bringing this subject to the ma

Re: Possibility of requiring Java 8 compiler for building Java 7 sources?

2017-09-26 Thread Ismaël Mejía
Great that you are also working on this too Daniel and thanks for bringing this subject to the mailing list, I was waiting to my return to office next week, but you did it first :) Eugene for reference (This is the issue on the migration to Java 9), notice that here the goal is first that beam pa

Re: Java 9

2017-09-11 Thread Ismaël Mejía
Hello Alexey, There is a JIRA issue covering Java 9 support: https://issues.apache.org/jira/browse/BEAM-2530 The goal of this JIRA is to update the project to support the current Beam features with Java 9 (we need to keep our current level of backwards compatibility with Java >= 7). Migration to

Re: Merge branch DSL_SQL to master

2017-09-07 Thread Ismaël Mejía
riate >> > > >> > > * Besides of integration tests in package >> > org.apache.beam.sdk.extensions.sql, >> > > there's another example in org.apache.beam.sdk.extensions.sql.example. >> > > BeamSqlExample. >> > > >> >

Re: Beam 2.2.0 release

2017-08-30 Thread Ismaël Mejía
The current master has accumulated a good amount of nice features since 2.1.0 so a new release is welcomed. I have two JIRAs/PR that I think are important to check/solve before the cut: BEAM-2516 (this is a regression on the performance of Direct runner on Java). We had never really defined if a p

Re: Policy for stale PRs

2017-08-16 Thread Ismaël Mejía
Thanks Ahmet for bringing this subject. +1 to close the stale PRs automatically after a fixed time of inactivity. 90 days is ok, but maybe a shorter period is better. If we consider that being stale is just not having any activity i.e., the author of the PR does not answer any message. The author

Re: Hello from a newbie to the data world living in the city by the bay!

2017-08-16 Thread Ismaël Mejía
Hello and welcome Griselda, Umang, Justin Apart of the links provided by Ahmet you might read Beam-related material on the website (See Documentation > Programming Guide and Documentation > Additional Resources among others). But probably as important as improving your Beam related knowledge is t

Re: [VOTE] Release 2.1.0, release candidate #3

2017-08-14 Thread Ismaël Mejía
+1 (non-binding) - Validated signatures OK - mvn clean verify -Prelease on both OpenJDK 1.7 and Oracle JDK 8 with the docker development images (WIP), both OK - Run WordCount on local Flink and Spark runners OK Everything looks nice, only one minor thing (not blocking at all). The proto generated

Re: Proposal : An extension for sketch-based statistics

2017-08-14 Thread Ismaël Mejía
Kenneth’s idea of using sketches for state with the State API is really interesting, it really opens some interesting use cases, I haven’t really thought about it but I believe it is really an appealing use case for the sketches. Note that the origin of this work was in the line of statistics, in p

Re: [ANNOUNCEMENT] New PMC members, August 2017 edition!

2017-08-11 Thread Ismaël Mejía
Congratulations Ahmet and Aviem, keep up the great work ! On Fri, Aug 11, 2017 at 8:30 PM, Thomas Groh wrote: > Congratulations to both of you! Looking forwards to both of your continued > contributions. > > On Fri, Aug 11, 2017 at 10:40 AM, Davor Bonaci wrote: > >> Please join me and the rest o

Re: [ANNOUNCEMENT] New committers, August 2017 edition!

2017-08-11 Thread Ismaël Mejía
Congrats everyone, well deserved, excellent work guys ! On Fri, Aug 11, 2017 at 7:53 PM, Jesse Anderson wrote: > Welcome! > > On Fri, Aug 11, 2017, 10:48 AM Jason Kuster > wrote: > >> Congrats to all, many thanks for the great contributions. >> >> On Fri, Aug 11, 2017 at 10:46 AM, Ahmet Altay >

Re: [CANCEL][VOTE] Release 2.1.0, release candidate #2

2017-07-24 Thread Ismaël Mejía
Not a blocker but maybe it is worth considering the fix for https://issues.apache.org/jira/browse/BEAM-2587 too. I also was bitten by this issue and I could only get it to work by doing a 'pip install --user grpcio-tools' (not sure if this is a proper solution but it works for me), however when I

Re: [PROPOSAL] Connectors for memcache and Couchbase

2017-07-11 Thread Ismaël Mejía
ot;watch mutations" command would allow one to build a >> streaming memcache IO which shows all changes occurring underneath. >> >> memcached protocol: >> https://github.com/memcached/memcached/blob/master/doc/protocol.txt >> >> On Mon, Jul 10, 2017 at 2:41 AM

Re: MergeBot is here!

2017-07-11 Thread Ismaël Mejía
uot; and block other people's PRs. One other >>> option would be to allow the person requesting the merge to say something >>> like "@asfgit merge squash" or "@asfgit merge nosquash", parametrizing the >>> merge request. Thoughts? >>&

Re: [PROPOSAL] Connectors for memcache and Couchbase

2017-07-10 Thread Ismaël Mejía
Hello, Thanks Lukasz for bring some of this subjects. I have briefly discussed with the guys working on this they are the same team who did HCatalogIO (Hive). We just analyzed the different libraries that allowed to develop this integration from Java and decided that the most complete implementat

Re: MergeBot is here!

2017-07-10 Thread Ismaël Mejía
Excellent!, Automation of such repetitive (and error-prone) tasks is strongly welcomed. Thanks for making this happen Jason! Some comments: 1. I suppose the code of mergebot is now part of Apache Infra, no? Do you know exactly where the code is hosted? And what is the procedure in case somebody

Re: [DISCUSS] Bridge beam metrics to underlying runners to support metrics reporters?

2017-06-23 Thread Ismaël Mejía
t; I've also read the design doc and IMHO it's not easy to support > Meter/Histogram (currently Distribution is a bit too simple). I'm thinking > about adding full > support of dropwizard metrics and will come up with a doc later so that we > can discuss this in detail

Re: [DISCUSS] Bridge beam metrics to underlying runners to support metrics reporters?

2017-06-23 Thread Ismaël Mejía
Cody not sure if I follow, but isn't Distribution on Beam similar to codahale/dropwizard's HIstogram (without the quantiles) ? Meters are also in the plan but not implemented yet, see the Metrics design doc: https://s.apache.org/beam-metrics-api If I understand what you want is to have some sort

Re: [DISCUSS] Apache Beam 2.1.0 release next week ?

2017-06-21 Thread Ismaël Mejía
Thahks JB for keeping the time based release agenda. I really don't have any blocker but I would like to have the hadoop version alignment PR merged before this one and probably also Nexmark (considering that Etienne fixed most of the issues and we have already the LGTM, we are just waiting for a l

Re: Beam Proposal: Pipeline Drain

2017-06-12 Thread Ismaël Mejía
Hello Reuven, I finally took the time to read the Drain proposal, thanks a lot for bringing this, it looks like a nice fit with the current APIs and it would be great if this could be implemented as much as possible in a Runner independent way. I am eager now to see the snapshot and update propo

Re: [DISCUSS] HadoopInputFormat based IOs

2017-06-01 Thread Ismaël Mejía
ES/Cassandra HIFIO test code, I'd > propose that we add comments in there directing people to the correct > native source. > > S > [1] writeThenRead style IO IT - > https://lists.apache.org/thread.html/26ee3ba827c2917c393ab26ce97e7491846594d8f574b5ae29a44551@%3Cdev.beam.apache.org%3E &

Re: [DISCUSS] HadoopInputFormat based IOs

2017-05-30 Thread Ismaël Mejía
The whole goal of this discussion is that we define what shall we do when someone wants to add a new IO that uses HIFIO. The consensus so far following the PR comments + this thread is that it should be discouraged and those contribution be included as documentation in the website, and that we shou

Re: [DISCUSS] HadoopInputFormat based IOs

2017-05-30 Thread Ismaël Mejía
t;non-native but readable" IOs (better name suggestions appreciated :) - > that could include a list of data stores that jdbc/jms/hifio support and > link to HIFIO's info on how to use them. (That might also be a good place > to document the performance tradeoffs of using

Re: [New Proposal] Hive connector using native api

2017-05-24 Thread Ismaël Mejía
, Ismaël Mejía wrote: > Hello, > > I created a new JIRA for this native implementation of the IO so feel > free to PR the 'native' implementation using this ticket. > https://issues.apache.org/jira/browse/BEAM-2357 > > We will discuss all the small details in the

Re: [New Proposal] Hive connector using native api

2017-05-24 Thread Ismaël Mejía
Hello, I created a new JIRA for this native implementation of the IO so feel free to PR the 'native' implementation using this ticket. https://issues.apache.org/jira/browse/BEAM-2357 We will discuss all the small details in the PR. The old JIRA (BEAM-1158) will still be there just to add the rea

[DISCUSS] HadoopInputFormat based IOs

2017-05-23 Thread Ismaël Mejía
better to add just the tests/docs of how to use them as proposed in the PR (option 2). Feel free to comment/vote or maybe add an eventual third option if you think there is one better option. Regards, Ismaël Mejía [1] https://issues.apache.org/jira/browse/BEAM-1158

Re: First stable release completed!

2017-05-17 Thread Ismaël Mejía
Amazing milestone, congrats everyone! On Wed, May 17, 2017 at 7:54 PM, Reuven Lax wrote: > Sweet! > > On Wed, May 17, 2017 at 4:28 AM, Davor Bonaci wrote: > >> The first stable release is now complete! >> >> Release artifacts are available through various repositories, including >> dist.apache.o

Re: [VOTE] First stable release: release candidate #4

2017-05-14 Thread Ismaël Mejía
+1 (non-binding) Validated signatures OK Run mvn clean verify -Prelease OK Executed Nexmark with Direct/Spark/Flink/Apex runners in local mode (temporally downgraded to 2.0.0 to validate the version). OK This is looking great now. As Robert said, a release to be proud of. On Sun, May 14, 2017 at

Re: [DISCUSSION] using NexMark for Beam

2017-05-14 Thread Ismaël Mejía
Hello, Thanks Etienne for opening the Pull Request and starting the discussion for the review process. I also want to thank publicly all the people that somehow contributed to this: - Mark Shields and the original people at google who worked at nexmark for contributing this in the first place. -

Re: First stable release: version designation?

2017-05-04 Thread Ismaël Mejía
My vote, like Davor: Slight preference toward 2.0.0, but fine with 1.0.0 On Thu, May 4, 2017 at 9:32 PM, Thomas Weise wrote: > I'm in the relaxed 1.0.0 camp. > > -- > sent from mobile > On May 4, 2017 12:29 PM, "Mingmin Xu" wrote: > >> I slightly prefer1.0.0 for the *first* stable release, but f

Re: Congratulations Davor!

2017-05-04 Thread Ismaël Mejía
Congratulations Davor! Your membership is really deserved, You really got the Apache spirit ! On Thu, May 4, 2017 at 5:02 PM, Thomas Groh wrote: > Congratulations! > > On Thu, May 4, 2017 at 7:56 AM, Thomas Weise wrote: > >> Congrats! >> >> >> On Thu, May 4, 2017 at 7:53 AM, Sourabh Bajaj < >> s

Re: [PROPOSAL] HiveIO - updated link to document

2017-04-25 Thread Ismaël Mejía
Hello, I created the HiveIO JIRA and followed the initial discussions about the best approach for HiveIO so I want first to suggest you to read the previous thread(s) on the mailing list. https://www.mail-archive.com/dev@beam.incubator.apache.org/msg02313.html The main idea I concluded from that

Re: [DISCUSSION] Encouraging more contributions

2017-04-25 Thread Ismaël Mejía
I think it is important to clarify that the developer documentation discussed in this thread is of two kinds: 6.1. Documents with proposals and new designs, those covered by the Beam Improvement Proposal (BEAM-566), and that we need to put with a single file index (I remember there was a google di

Re: [DISCUSSION] Encouraging more contributions

2017-04-24 Thread Ismaël Mejía
+1 Great idea Aviem, thanks for bringing this subject to the mailing list. I agree in particular with the freeing JIRA part, I think we shouldn’t keep assigned JIRAs that are things that we don’t expect to solve in the next weeks. (note the exception for this are the long features). I would add t

Re: Pipeline termination in the unified Beam model

2017-04-18 Thread Ismaël Mejía
+1 Having a unified termination semantics for all runners is super important. Stas or Aviem, is it feasible to do this for the Spark runner or the timeout is due to a technical limitation of spark. Thomas Weise, Aljoscha anything to say on this? Aljoscha, what is the current status for the Flink

Re: [DISCUSSION] PAssert success/failure count validation for all runners

2017-04-10 Thread Ismaël Mejía
I have the impression this conversation went into a different sub-discussion ignoring the core subject that is if it makes sense to do the implementation of Passert as we are doing it right now (1), or in a runner agnostic way (2). Big +1 for (2). And I think also this is critical enough to be pa

Re: Update of Pei in Alibaba

2017-04-07 Thread Ismaël Mejía
t once” job, JStorm runner can be reused on Storm. But > for “window”, “state” and “exactly once” job, unfortunately, JStorm runner > can’t be reused. Anyway, we will figure out if the propagation is possible > for Storm in the future. > > > > Regards > > Jian Liu(Basti) &

Re: Update of Pei in Alibaba

2017-04-03 Thread Ismaël Mejía
Thanks Jingsong for answering, and the Streamscope ref, I am going to check the paper, the concept of non-global-checkpointing sounds super interesting. It is nice that you guys are also trying to promote the move to a unified model. Regards, Ismaël On Sun, Apr 2, 2017 at 3:40 PM, JingsongLee

Re: [PROPOSAL] ORC support

2017-04-01 Thread Ismaël Mejía
+1 >From my previous work experience ORC in certain cases performs better than Parquet and really deserves to be supported. On Sat, Apr 1, 2017 at 5:58 PM, Ted Yu wrote: > +1 > >> On Apr 1, 2017, at 8:31 AM, Tibor Kiss wrote: >> >> Hello, >> >> Recently the Optimized Row Columnar (ORC) file fo

Re: Update of Pei in Alibaba

2017-04-01 Thread Ismaël Mejía
Excellent news, Pei it would be great to have a new runner. I am curious about how different are the implementations of storm among them considering that there are already three 'versions': Storm, Jstorm and Heron, I wonder if one runner could traduce to an API that would cover all of them (of cou

Re: First IO IT Running!

2017-03-22 Thread Ismaël Mejía
Excellent news, I am eager to see more IOs/Runners been included in the Integration Tests, and I will be glad to contribute in anything I can. Congratulations for this important milestone. Ismaël ps. I will try to reproduce the Kubernetes setup so I will be eventually annoying you with questions.

Re: Docker image dependencies

2017-03-22 Thread Ismaël Mejía
rted-guides/gce/ > [4] docker swarm on GCE - > https://rominirani.com/docker-swarm-on-google-compute-engine-364765b400ed#.gzvruzis9 > > [5] postgres k8 script - > https://github.com/apache/beam/tree/master/sdks/java/io/jdbc/src/test/resources/kubernetes > > [6] > https://github.c

Re: Beam spark 2.x runner status

2017-03-22 Thread Ismaël Mejía
Amit, I suppose JB is talking about the RDD based version, so no need to worry about SparkSession or different incompatible APIs. Remember the idea we are discussing is to have in master both the spark 1 and spark 2 runners using the RDD based translation. At the same time we can have a feature br

Re: Docker image dependencies

2017-03-20 Thread Ismaël Mejía
nality we need. Does docker-compose > provide something beyond the functionality that k8 does? I'm not familiar > with docker-compose, but looking at > https://docs.docker.com/compose/overview/#compose-documentation it doesn't > seem to provide anything that k8 doesn't already

Re: [ANNOUNCEMENT] New committers, March 2017 edition!

2017-03-20 Thread Ismaël Mejía
Thanks everyone, Feels great to be part of the team. Congratulations to the other new committers ! -Ismaël On Mon, Mar 20, 2017 at 2:50 PM, Tyler Akidau wrote: > Welcome! > > On Mon, Mar 20, 2017, 02:25 Jean-Baptiste Onofré wrote: > >> Welcome aboard, and congrats ! >> >> Really happy to count

Re: splitIntoBundles vs. generateInitialSplits

2017-03-20 Thread Ismaël Mejía
This is an forgotten one, Stas did you create a JIRA about this one? I think this change should be also tagged as First version release, because this is an API change and can break stuff if we do it later on. On Wed, Jan 11, 2017 at 4:30 PM, Jean-Baptiste Onofré wrote: > Hi Eugene and Stas, > > J

Re: Performance Testing Next Steps

2017-03-16 Thread Ismaël Mejía
we can give a hand if needed. On Thu, Mar 16, 2017 at 9:17 AM, Jason Kuster wrote: > Thanks Ismael for the comments! Replied inline. > > On Wed, Mar 15, 2017 at 8:18 AM, Ismaël Mejía wrote: > >> Excellent proposal, sorry to jump into this discussion so late, this >> was i

Re: Beam spark 2.x runner status

2017-03-15 Thread Ismaël Mejía
> Otherwise, we'll have the current runner, another RDD API runner with Spark > 2, and a third one for the Dataset API. I don't want to maintain all of > them. It's a mess. > > On Wed, Mar 15, 2017 at 6:39 PM Ismaël Mejía wrote: > >> > However, I do feel th

Re: Beam spark 2.x runner status

2017-03-15 Thread Ismaël Mejía
g with Spark 1 > runner, and having Structured Streaming advance in Spark 2, we could start > work on Spark 2 runner in a separate branch. > > However, I do feel that we should use the Dataset API, starting with batch > support first. WDYT ? > > On Wed, Mar 15, 2017 at 5:47 PM

Re: Beam spark 2.x runner status

2017-03-15 Thread Ismaël Mejía
ily > investing there. > > We could think of starting to migrate the Spark 1 runner to Spark 2 and > follow with Dataset API support feature-by-feature as ot advances, but I > think most Spark installations today still run 1.X, or am I wrong ? > > On Wed, Mar 15, 2017 at 4:26 PM I

Re: Performance Testing Next Steps

2017-03-15 Thread Ismaël Mejía
Excellent proposal, sorry to jump into this discussion so late, this was in my toread list for almost two weeks, and I finally got the time to read the document and I have two minor comments: I have the impression that the strict separation of Providers (the data-processing systems) and Resources

Re: Beam spark 2.x runner status

2017-03-15 Thread Ismaël Mejía
BIG +1 JB, If we can just jump the version number with minor changes staying as close as possible to the current implementation for spark 1 we can go faster and offer in principle the exact same support but for version 2. I know that the advanced streaming stuff based on the DataSet API won't be

Re: Docker image dependencies

2017-03-15 Thread Ismaël Mejía
Hi, Thanks for bringing this subject to the mailing list. +1 We definitely need a consensus on this, and I agree with your proposal and JB’s comments modulo certain clarifications: I think we shall go in this priority order if the version of the image we want is available: 1. Image provided by t

Re: Style: how much testing for transform builder classes?

2017-03-15 Thread Ismaël Mejía
kip as trivial, so documentation on this topic >> should be in the form of guidelines, high-quality example code (i.e. clean >> up the unit tests of IOs bundled with Beam SDK), and informal knowledge in >> the heads of readers of this thread, rather than hard rules. >> >

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

2017-03-15 Thread Ismaël Mejía
beam' today and it felt amazing! > > Ahmet > > > On Tue, Mar 14, 2017 at 2:22 PM, Ahmet Altay wrote: > > > I'm happy to announce that we have unanimously approved this release. > > > > There are 7 approving votes, 4 of which are binding: > > * Aljosc

Re: Style: how much testing for transform builder classes?

2017-03-14 Thread Ismaël Mejía
​+0.5 I used to think that some of those tests were not worth, for example testBuildRead and testBuildReadAlt. However the reality is that these tests allowed me to find bugs both during the development of HBaseIO and just yesterday when I tried to test the write support for the emulator with Data

Re: [VOTE] Release 0.6.0, release candidate #2

2017-03-13 Thread Ismaël Mejía
​+1 (non-binding) - verified signatures + checksums - run mvn clean install -Prelease, all artifacts build and the tests run smoothly (modulo some local issues I had with the installation of tox for the python sdk, I created a PR to fix those in case other people can have the same trouble). Some

Re: [VOTE] Release 0.6.0, release candidate #2

2017-03-12 Thread Ismaël Mejía
I found an issue too with the .md5 and sha1 files of the python release, they refer to a different default file (a forgotten part of the renaming): curl https://dist.apache.org/repos/dist/dev/beam/0.6.0/apache-beam-0.6.0-python.zip.md5 7d4170e381ce0e1aa8d11bee2e63d151 apache-beam-0.6.0.zip This

Re: Merge HadoopInputFormatIO and HDFSIO in a single module

2017-03-02 Thread Ismaël Mejía
​Hello, I answer since I have been leading the refactor to hadoop-common. My criteria to move a class into hadoop-common is that it is used at least by more than one other module or IO, this is the reason is not big, but it can grow if needed. +1 for option #1 because of the visibility reasons yo

Re: Next major milestone: first stable release

2017-03-01 Thread Ismaël Mejía
if everyone, particularly component leads > in > > JIRA, take a pass too! > > > > On Wed, Mar 1, 2017 at 2:51 AM, Jean-Baptiste Onofré > > wrote: > > > > > Yes, fully agree. > > > > > > As far as I understood/know, BEAM-59 is targeted for Beam 1.0

Re: Next major milestone: first stable release

2017-03-01 Thread Ismaël Mejía
Also joining a bit late, I agree with Amit, HDFS improvements are a really good thing to have before the stable release. I will also add the IOChannelFactory refactorings to support things like Read.from(“hdfs://”) aka BEAM-59. In the worse case particular IOs can still be marked as experimental t

Re: Interest in a (virtual) contributor meeting?

2017-02-23 Thread Ismaël Mejía
+1 to do it periodically about different subjects. It is a good idea to have a sort of mini agenda, in the sense that the two previous meetings had really different focus, the first one was about contributors meeting each other and discussion of ongoing work just after the project started on Apach

Re: Metrics for Beam IOs.

2017-02-22 Thread Ismaël Mejía
Hello, Thanks everyone for giving your points of view. I was waiting to see how the conversation evolved to summarize it and continue on the open points. Points where mostly everybody agrees (please correct me if somebody still disagrees): - Default metrics should not affect performance, for tha

Re: Hbase IO preview

2017-02-22 Thread Ismaël Mejía
PM, Ismaël Mejía wrote: > We are having progress with this one, we will keep you informed once the > branch is ready for testing/contribution so you can try it (or help us > improve it). > > For the moment you can track the progress following this JIRA > https://issues.apache.or

Re: Better developer instructions for using Maven?

2017-02-16 Thread Ismaël Mejía
themselves but this has to be evaluated. On Wed, Feb 15, 2017 at 5:46 PM, Jean-Baptiste Onofré wrote: > On Jenkins it's possible to run several jobs in the same time but on > different executor. That's the easiest way. > > Regards > JB > > On Feb 15, 2017, 10:15, at

Re: Better developer instructions for using Maven?

2017-02-15 Thread Ismaël Mejía
This question got lost in the discussion, but there is a small improvement that we can do: > Just to check, are we doing parallel builds? We are on jenkins, not in travis, there is an ongoing PR to fix this. What we can improve is to check if we can run some of the test suites in parallel to gai

Metrics for Beam IOs.

2017-02-14 Thread Ismaël Mejía
​Hello, The new metrics API allows us to integrate some basic metrics into the Beam IOs. I have been following some discussions about this on JIRAs/PRs, and I think it is important to discuss the subject here so we can have more awareness and obtain ideas from the community. First I want to thank

Re: [VOTE] Apache Beam, version 0.5.0, release candidate #1

2017-01-30 Thread Ismaël Mejía
+1 (non-binding) - verified signatures + checksums - run mvn clean verify -Prelease, all artifacts build and the tests run smoothly Great to see a shorter release cycle, the improvements and the new IOs. On Fri, Jan 27, 2017 at 9:55 PM, Jean-Baptiste Onofré wrote: > Hi everyone, > > Please re

Re: [ANNOUNCEMENT] New committers, January 2017 edition!

2017-01-27 Thread Ismaël Mejía
Congratulations, well deserved guys ! On Fri, Jan 27, 2017 at 9:28 AM, Amit Sela wrote: > Welcome and congratulations to all! > > On Fri, Jan 27, 2017, 10:12 Ahmet Altay wrote: > > > Thank you all! And congratulations to other new committers. > > > > Ahmet > > > > On Thu, Jan 26, 2017 at 9:45

Re: Request for becoming a contributor

2017-01-24 Thread Ismaël Mejía
Similar to yesterday's discussion about opening access to the slack channel, I wonder if it makes sense to let people assign themselves as contributors and pick JIRAs without asking for this, Is this possible with Apache's JIRA? And do you think this is a good idea? On Tue, Jan 24, 2017 at 7:15 A

Re: Beam Fn API

2017-01-24 Thread Ismaël Mejía
Awesome job Lukasz, Excellent, I have to confess the first time I heard about the Fn API idea I was a bit incredulous, but you are making it real, amazing! Just one question from your document, you said that 80% of the extra (15%) time goes into encoding and decoding the data for your test case, c

Re: Better developer instructions for using Maven?

2017-01-24 Thread Ismaël Mejía
I just wanted to know if we have achieved some consensus about this, I just saw this PR that reminded me about this discussion. ​https://github.com/apache/beam/pull/1829​ It is important that we mention the existing profiles (and the intended checks) in the contribution guide (e.g. -Prelease (or

Re: [VOTE] Merge Python SDK to the master branch

2017-01-23 Thread Ismaël Mejía
[X] +1, Merge python-sdk branch to master after the 0.5.0 release Big +1, unbounded support will come/stabilize later on (as it happened with InProcessRunner), visibility is more important. On Mon, Jan 23, 2017 at 8:36 AM, Sergio Fernández wrote: > +1 > > On Fri, Jan 20, 2017 at 9:24 PM, Robert

Re: Hosting data stores for IO Transform testing

2017-01-18 Thread Ismaël Mejía
;>> pre-built packages for multi-node clusters of data stores. If there's a >>> good repository of them that we trust, that would definitely save us >>> time. >>> Can you point me at the mesos repository? >>> >>> S >>> >>> >>

Re: Hosting data stores for IO Transform testing

2017-01-18 Thread Ismaël Mejía
;>>>>>>>> > > >>>>>>>>> that. > > >>>>>> > > >>>>>>> I consider the integration tests/performance benchmarks to be > > >>>>>>>>>> costly > > >

Re: Graduation!

2017-01-10 Thread Ismaël Mejía
Congratulations everyone, this is great news, graduation will give new users confidence about the project and its community. Awesome ! Ismaël ps. @Michal It is only one unified API for both bounded and unbounded data, the behavior changes depending on the data sources. On Tue, Jan 10, 2017 at 4:

<    3   4   5   6   7   8