Re: A new reworked Elasticsearch 7+ IO module

2020-03-06 Thread Jean-Baptiste Onofre
Hi,

I think WARN makes sense and the safest approach. It allows users to be notify 
and eventually update or back on previous Beam IO version.

Regards
JB

> Le 6 mars 2020 à 18:49, Kenneth Knowles  a écrit :
> 
> Since the user provides backendVersion, here are some possible levels of 
> things to add in expand() based on that (these are extra niceties beyond the 
> agreed number of releases to remove)
> 
>  - WARN for backendVersion < n
>  - reject for backendVersion < n with opt-in pipeline option to keep it 
> working one more version (gets their attention and indicates urgency)
>  - reject completely
> 
> Kenn
> 
> On Fri, Mar 6, 2020 at 2:26 AM Etienne Chauchot  > wrote:
> Hi all, 
> 
> it's been 3 weeks since the survey on ES versions the users use. 
> 
> The survey received very few responses: only 9 responses for now (multiple 
> versions possible of course). The responses are the following:
> 
> ES2: 0 clients, ES5: 1, ES6: 5, ES7: 8 
> 
> It tends to go toward a drop of ES2 support but for now it is still not very 
> representative.
> 
> I'm cross-posting to @users to let you know that I'm closing the survey 
> within 1 or 2 weeks. So please respond if you're using ESIO.
> 
> Best
> 
> Etienne
> 
> On 13/02/2020 12:37, Etienne Chauchot wrote:
>> Hi Cham, thanks for your comments !
>> 
>> I just sent an email to user ML with a survey link to count ES uses per 
>> version:
>> 
>> https://lists.apache.org/thread.html/rc8185afb8af86a2a032909c13f569e18bd89e75a5839894d5b5d4082%40%3Cuser.beam.apache.org%3E
>>  
>> 
>> Best
>> 
>> Etienne
>> 
>> On 10/02/2020 19:46, Chamikara Jayalath wrote:
>>> 
>>> 
>>> On Thu, Feb 6, 2020 at 8:13 AM Etienne Chauchot >> > wrote:
>>> Hi,
>>> 
>>> please see my comments inline
>>> 
>>> On 06/02/2020 16:24, Alexey Romanenko wrote:
 Please, see my comments inline.
 
> On 6 Feb 2020, at 10:50, Etienne Chauchot  > wrote:
 1. regarding version support: ES v2 is no more maintained by Elastic 
 since 2018/02 so we plan to remove it from the IO. In the past we 
 already retired versions (like spark 1.6 for instance).
 
>> 
>> 
>> My only concern here is that there might be users who use the existing 
>> module who might not be able to easily upgrade the Beam version if we 
>> remove it. But given that V2 is 5 versions behind the latest release 
>> this might be OK.
>> 
>> It seems we have a consensus on this.
>> I think there should be another general discussion on the long term 
>> support of our prefered tool IO modules.
> => yes, consensus, let's drop ESV2
> 
 We had (and still have) a similar problem with KafkaIO to support 
 different versions of Kafka, especially very old version 0.9. We raised 
 this question on user@ and it appears that there are users who for some 
 reasons still use old Kafka versions. So, before dropping a support of any 
 ES versions, I’d suggest to ask it user@ and see if any people will be 
 affected by this.
>>> Yes we can do a survey among users but the question is, should we support 
>>> an ES version that is no more supported by Elastic themselves ?
>>> 
>>> +1 for asking in the user list. I guess this is more about whether users 
>>> need this specific version that we hope to drop support for. Whether we 
>>> need to support unsupported versions is a more generic question that should 
>>> prob. be addressed in the dev list. (and I personally don't think we should 
>>> unless there's a large enough user base for a given version).
>>> 
> 
 2. regarding the user: the aim is to unlock some new features (listed 
 by Ludovic) and give the user more flexibility on his request. For 
 that, it requires to use high level java ES client in place of the low 
 level REST client (that was used because it is the only one compatible 
 with all ES versions). We plan to replace the API (json document in 
 and out) by more complete standard ES objects that contain de request 
 logic (insert/update, doc routing etc...) and the data. There are 
 already IOs like SpannerIO that use similar objects in input 
 PCollection rather than pure POJOs. 
 
>> 
>> 
>> Won't this be a breaking change for all users ? IMO using POJOs in 
>> PCollections is safer since we have to worry about changes to the 
>> underlying client library API. Exception would be when underlying client 
>> library offers a backwards compatibility guarantee that we can rely on 
>> for the foreseeable future (for example, BQ TableRow).
>> 
>> Agreed but actually, there will be POJOs in order to abstract 
>> Elasticsearch's version support. The following

Re: GroupIntoBatches not Working properly for Direct Runner Java

2020-03-06 Thread Kenneth Knowles
Can you reproduce it if you replace your Pubsub source with a TestStream
and verify with PAssert [1]? This would enable you to easily build a unit
test. You could even open a pull request adding that to the test suite for
GroupIntoBatches [2]. That would be an excellent contribution to Beam.

Kenn

[1] https://beam.apache.org/blog/2016/10/20/test-stream.html
[2]
https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupIntoBatchesTest.java

On Mon, Mar 2, 2020 at 9:25 AM Vasu Gupta  wrote:

> Input : a-1, Timestamp : 1582994620366
> Input : c-2, Timestamp : 1582994620367
> Input : e-3, Timestamp : 1582994620367
> Input : d-4, Timestamp : 1582994620367
> Input : e-5, Timestamp : 1582994620367
> Input : b-6, Timestamp : 1582994620368
> Input : a-7, Timestamp : 1582994620368
>
> Output : Timestamp : 1582994620367, Key : e-3,5
> Output : Timestamp : 1582994620368, Key : a-1,7
>
> As you can see c-2 and d-4 are missing and I never received these packets.
>
> On 2020/02/28 18:15:03, Kenneth Knowles  wrote:
> > What are the timestamps on the elements?
> >
> > On Fri, Feb 28, 2020 at 8:36 AM Vasu Gupta 
> wrote:
> >
> > > Edit: Issue is on Direct Runner(Not Direction Runner - mistyped)
> > > Issue Details:
> > > Input data: 7 key-value Packets like: a-1, a-4, b-3, c-5, d-1, e-4, e-5
> > > Batch Size: 5
> > > Expected output: a-1,4, b-3, c-5, d-1, e-4,5
> > > Getting Packets with irregular size like a-1, b-5, e-4,5 OR a-1,4, c-5
> etc
> > > But i always got correct number of packets with BATCH_SIZE = 1
> > >
> > > On 2020/02/27 20:40:16, Kenneth Knowles  wrote:
> > > > Can you share some more details? What is the expected output and what
> > > > output are you seeing?
> > > >
> > > > On Thu, Feb 27, 2020 at 9:39 AM Vasu Gupta 
> > > wrote:
> > > >
> > > > > Hey folks, I am using Apache beam Framework in Java with Direction
> > > Runner
> > > > > for local testing purposes. When using GroupIntoBatches with batch
> > > size 1
> > > > > it works perfectly fine i.e. the output of the transform is
> consistent
> > > and
> > > > > as expected. But when using with batch size > 1 the output
> Pcollection
> > > has
> > > > > less data than it should be.
> > > > >
> > > > > Pipeline flow:
> > > > > 1. A Transform for reading from pubsub
> > > > > 2. Transform for making a KV out of the data
> > > > > 3. A Fixed Window transform of 1 second
> > > > > 4. Applying GroupIntoBatches transform
> > > > > 5. And last, Logging the resulting Iterables.
> > > > >
> > > > > Weird thing is that it batch_size > 1 works great when running on
> > > > > DataflowRunner but not with DirectRunner. I think the issue might
> be
> > > with
> > > > > Timer Expiry since GroupIntoBatches uses BagState internally.
> > > > >
> > > > > Any help will be much appreciated.
> > > > >
> > > >
> > >
> >
>


Re: No space left on device - beam-jenkins 1 and 7

2020-03-06 Thread Alan Myrvold
Did a one time cleanup of tmp files owned by jenkins older than 3 days.
Agree that we need a longer term solution.

Passing recent tests on all executors except jenkins-12, which has not
scheduled recent builds for the past 13 days. Not scheduling:
https://builds.apache.org/computer/apache-beam-jenkins-12/builds

Recent passing builds:
https://builds.apache.org/computer/apache-beam-jenkins-1/builds

https://builds.apache.org/computer/apache-beam-jenkins-2/builds

https://builds.apache.org/computer/apache-beam-jenkins-3/builds

https://builds.apache.org/computer/apache-beam-jenkins-4/builds

https://builds.apache.org/computer/apache-beam-jenkins-5/builds

https://builds.apache.org/computer/apache-beam-jenkins-6/builds

https://builds.apache.org/computer/apache-beam-jenkins-7/builds

https://builds.apache.org/computer/apache-beam-jenkins-8/builds

https://builds.apache.org/computer/apache-beam-jenkins-9/builds

https://builds.apache.org/computer/apache-beam-jenkins-10/builds

https://builds.apache.org/computer/apache-beam-jenkins-11/builds

https://builds.apache.org/computer/apache-beam-jenkins-13/builds

https://builds.apache.org/computer/apache-beam-jenkins-14/builds

https://builds.apache.org/computer/apache-beam-jenkins-15/builds

https://builds.apache.org/computer/apache-beam-jenkins-16/builds


On Fri, Mar 6, 2020 at 11:54 AM Ahmet Altay  wrote:

> +Alan Myrvold  is doing a one time cleanup. I agree
> that we need to have a solution to automate this task or address the root
> cause of the buildup.
>
> On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia 
> wrote:
>
>> Hi there,
>> it seems we have a problem with Jenkins workers again. Nodes 1 and 7 both
>> fail jobs with "No space left on device".
>> Who is the best person to contact in these cases (someone with access
>> permissions to the workers).
>>
>> I also noticed that such errors are becoming more and more frequent
>> recently and I'd like to discuss how can this be remedied. Can a cleanup
>> task be automated on Jenkins somehow?
>>
>> Regards
>> Michal
>>
>> --
>>
>> Michał Walenia
>> Polidea  | Software Engineer
>>
>> M: +48 791 432 002 <+48791432002>
>> E: michal.wale...@polidea.com
>>
>> Unique Tech
>> Check out our projects! 
>>
>


Re: No space left on device - beam-jenkins 1 and 7

2020-03-06 Thread Ahmet Altay
+Alan Myrvold  is doing a one time cleanup. I agree
that we need to have a solution to automate this task or address the root
cause of the buildup.

On Thu, Mar 5, 2020 at 2:47 AM Michał Walenia 
wrote:

> Hi there,
> it seems we have a problem with Jenkins workers again. Nodes 1 and 7 both
> fail jobs with "No space left on device".
> Who is the best person to contact in these cases (someone with access
> permissions to the workers).
>
> I also noticed that such errors are becoming more and more frequent
> recently and I'd like to discuss how can this be remedied. Can a cleanup
> task be automated on Jenkins somehow?
>
> Regards
> Michal
>
> --
>
> Michał Walenia
> Polidea  | Software Engineer
>
> M: +48 791 432 002 <+48791432002>
> E: michal.wale...@polidea.com
>
> Unique Tech
> Check out our projects! 
>


Re: Snowflake connector

2020-03-06 Thread Chamikara Jayalath
Absolutely. Please create a JIRA and coordinate with Elias and any others
that would like to contribute to this.

Thanks,
Cham

On Fri, Mar 6, 2020 at 10:46 AM Shashanka Balakuntala <
shbalakunt...@gmail.com> wrote:

> Hi Chamikara and Elias,
> This seems like an interesting feature. Can I start working on this?
> *Regards*
>   Shashanka Balakuntala Srinivasa
>
>
>
> On Sat, Mar 7, 2020 at 12:00 AM Chamikara Jayalath 
> wrote:
>
>> I don't think we have this but contributions are welcome.
>>
>> Thanks,
>> Cham
>>
>> On Tue, Mar 3, 2020 at 4:46 AM Elias Djurfeldt <
>> elias.djurfe...@mirado.com> wrote:
>>
>>> Hi all,
>>>
>>> I've stumbled upon a use case where I might need a SnowflakeIO in
>>> Python. Has anyone worked on this before or are there any discussions
>>> surrounding it?
>>>
>>> There is a Snowflake Python library available [1], so looks feasible to
>>> implement in Beam.
>>>
>>> [1] https://docs.snowflake.net/manuals/user-guide/python-connector.html
>>>
>>> Cheers,
>>> Elias
>>>
>>


Re: Jenkins jobs not running for my PR 10438

2020-03-06 Thread Tomo Suzuki
Thanks, Luke!

On Fri, Mar 6, 2020 at 13:34 Tomo Suzuki  wrote:

> Hi Beam committers,
> (Thank you, Michał!)
>
> Would you trigger the precommit checks for
> https://github.com/apache/beam/pull/11063
> with the following 6 commands ?
> Run Java PostCommit
> Run Java HadoopFormatIO Performance Test
> Run BigQueryIO Streaming Performance Test Java
> Run Dataflow ValidatesRunner
> Run Spark ValidatesRunner
> Run SQL Postcommit
>
> Regards,
> Tomo
>
-- 
Regards,
Tomo


Re: Jenkins jobs not running for my PR 10438

2020-03-06 Thread Luke Cwik
Done

On Fri, Mar 6, 2020 at 10:35 AM Tomo Suzuki  wrote:

> Hi Beam committers,
> (Thank you, Michał!)
>
> Would you trigger the precommit checks for
> https://github.com/apache/beam/pull/11063
> with the following 6 commands ?
> Run Java PostCommit
> Run Java HadoopFormatIO Performance Test
> Run BigQueryIO Streaming Performance Test Java
> Run Dataflow ValidatesRunner
> Run Spark ValidatesRunner
> Run SQL Postcommit
>
> Regards,
> Tomo
>


Re: [DISCUSS] @Experimental annotations - processes and alternatives

2020-03-06 Thread Kenneth Knowles
OK I tried to make a tiny bit of progress on this, with `grep --ignore-case
--line-number --recursive '@experimental' .` there are 578 occurrences
(includes website and comments). Via `| cut -d ':' -f 1 | sort | uniq | wc
-l` there are 377 distinct code files.

So that's a big project but easily scales to the contributors. I suggest we
need to crowdsource a bit.

I created
https://docs.google.com/spreadsheets/d/1T98I7tFoUgwW2tegS5xbNRjaVDvZiBBLn7jg0Ef_IwU/edit?usp=sharing
where you can suggest/comment adding your name to a file to volunteer to
own going through the file.

I have not checked git history to try to find owners.

Kenn

On Mon, Dec 2, 2019 at 10:26 AM Alexey Romanenko 
wrote:

> Thank you Kenn for starting this discussion.
>
> As I see, for now, the main goal for “@Experimental" annotation is to
> relive and be useful in the sense as it’s name says (this is obviously not
> a case for the moment). I'd suggest a bit more simplified scenario for this:
>
> 1. We do a revision of all “@Experimental" annotation uses now. For the
> code (IOs/libs/etc) that we 100% know that has been used in production for
> a long time with current stable API, we just take this annotation away
> since it’s no needed anymore.
>
> 2. For the code, that is left after p.1, we leave as “@Experimental”, wait
> for N releases (N=3 ?) and then take it away if there are no breaking
> changes happened. We may want to add new argument for “@Experimental” to
> keep track release number when it was added.
>
> 3. We would need to have a regular “Experimental annotation report” (like
> we have for dependencies) sending to dev@ and it will allow us to track
> new and out-dated annotation.
>
> 4. And on course we update contributors documentation about that.
>
> Idea of graduation by voting seems a bit complicated - for me it means
> that all added new user APIs should go through this process and I’m afraid
> that in the end, we potentially can be overwhelmed with number of such
> polls. I think that several releases of maturation and final decision of
> the person(2) responsible for the component should be enough.
>
> In the same time, I like the Andrew’s idea about checking a breaking
> changes through external tool. So, it could guarantee us to to remove
> experimental state without any fear to break API.
>
> In case of breaking changes of stable API, that won’t be possible to
> avoid, we still can use @Deprecated and wait for 3 release to remove (as we
> already did before). So, having up-to-date @Experimental and  @Deprecated
>  annotations won’t be confusing for users.
>
>
>
>
>
> On 28 Nov 2019, at 04:48, Kenneth Knowles  wrote:
>
>
>
> On Wed, Nov 27, 2019 at 1:04 PM Elliotte Rusty Harold 
> wrote:
>
>> On Wed, Nov 27, 2019 at 1:12 PM Kenneth Knowles  wrote:
>> >
>>
>> > *Opt-in*: This is a powerful idea that I think changes everything.
>> >- for an experimental new IO, a separate artifact; this way we can
>> also see downloads
>> >- for experimental code fragments, add checkState that the relevant
>> experiment is turned on via flags
>>
>> To be clear the experimental artifact would have the same group ID and
>> artifact ID but a different version than the non-experimental
>> artifacts?  E.g.
>> org.apache.beam:beam-runners-gcp-gcemd:2.4.0-experimental
>>
>> That could work. Changing the artifact ID or the package name would
>> risk split package issues and diamond dependency problems. We'd still
>> need to be careful about mixing experimental and non-experimental
>> artifacts.
>>
>
> That's clever! I think using the classifier might be better than a
> modified version number, e.g.
> org.apache.beam:beam-io-mydb:2.4.0:experimental
>
> My prior idea was much less clever: for any version 2.X there would either
> be beam-io-mydb-experimental or beam-io-mydb (after graduation) so no
> problem with a split package. There would be no "same artifact id" concern.
>
> Your idea would allow us to ship two variants of the library, if we
> developed the tooling for it. I think doing the stripping of experimental
> bits and ensuring they both compile might be tricky unless we are stripping
> rather disjoint piece of the library.
>
> Kenn
>
>
>


Re: Snowflake connector

2020-03-06 Thread Shashanka Balakuntala
Hi Chamikara and Elias,
This seems like an interesting feature. Can I start working on this?
*Regards*
  Shashanka Balakuntala Srinivasa



On Sat, Mar 7, 2020 at 12:00 AM Chamikara Jayalath 
wrote:

> I don't think we have this but contributions are welcome.
>
> Thanks,
> Cham
>
> On Tue, Mar 3, 2020 at 4:46 AM Elias Djurfeldt 
> wrote:
>
>> Hi all,
>>
>> I've stumbled upon a use case where I might need a SnowflakeIO in Python.
>> Has anyone worked on this before or are there any discussions surrounding
>> it?
>>
>> There is a Snowflake Python library available [1], so looks feasible to
>> implement in Beam.
>>
>> [1] https://docs.snowflake.net/manuals/user-guide/python-connector.html
>>
>> Cheers,
>> Elias
>>
>


Re: Jenkins jobs not running for my PR 10438

2020-03-06 Thread Tomo Suzuki
Hi Beam committers,
(Thank you, Michał!)

Would you trigger the precommit checks for
https://github.com/apache/beam/pull/11063
with the following 6 commands ?
Run Java PostCommit
Run Java HadoopFormatIO Performance Test
Run BigQueryIO Streaming Performance Test Java
Run Dataflow ValidatesRunner
Run Spark ValidatesRunner
Run SQL Postcommit

Regards,
Tomo


Re: Snowflake connector

2020-03-06 Thread Chamikara Jayalath
I don't think we have this but contributions are welcome.

Thanks,
Cham

On Tue, Mar 3, 2020 at 4:46 AM Elias Djurfeldt 
wrote:

> Hi all,
>
> I've stumbled upon a use case where I might need a SnowflakeIO in Python.
> Has anyone worked on this before or are there any discussions surrounding
> it?
>
> There is a Snowflake Python library available [1], so looks feasible to
> implement in Beam.
>
> [1] https://docs.snowflake.net/manuals/user-guide/python-connector.html
>
> Cheers,
> Elias
>


Re: A new reworked Elasticsearch 7+ IO module

2020-03-06 Thread Kenneth Knowles
Since the user provides backendVersion, here are some possible levels of
things to add in expand() based on that (these are extra niceties beyond
the agreed number of releases to remove)

 - WARN for backendVersion < n
 - reject for backendVersion < n with opt-in pipeline option to keep it
working one more version (gets their attention and indicates urgency)
 - reject completely

Kenn

On Fri, Mar 6, 2020 at 2:26 AM Etienne Chauchot 
wrote:

> Hi all,
>
> it's been 3 weeks since the survey on ES versions the users use.
>
> The survey received very few responses: only 9 responses for now (multiple
> versions possible of course). The responses are the following:
>
> ES2: 0 clients, ES5: 1, ES6: 5, ES7: 8
>
> It tends to go toward a drop of ES2 support but for now it is still not
> very representative.
>
> I'm cross-posting to @users to let you know that I'm closing the survey
> within 1 or 2 weeks. So please respond if you're using ESIO.
>
> Best
>
> Etienne
> On 13/02/2020 12:37, Etienne Chauchot wrote:
>
> Hi Cham, thanks for your comments !
>
> I just sent an email to user ML with a survey link to count ES uses per
> version:
>
>
> https://lists.apache.org/thread.html/rc8185afb8af86a2a032909c13f569e18bd89e75a5839894d5b5d4082%40%3Cuser.beam.apache.org%3E
>
> Best
>
> Etienne
> On 10/02/2020 19:46, Chamikara Jayalath wrote:
>
>
>
> On Thu, Feb 6, 2020 at 8:13 AM Etienne Chauchot 
> wrote:
>
>> Hi,
>>
>> please see my comments inline
>> On 06/02/2020 16:24, Alexey Romanenko wrote:
>>
>> Please, see my comments inline.
>>
>> On 6 Feb 2020, at 10:50, Etienne Chauchot  wrote:
>>
>> 1. regarding version support: ES v2 is no more maintained by Elastic
 since 2018/02 so we plan to remove it from the IO. In the past we already
 retired versions (like spark 1.6 for instance).


>>> My only concern here is that there might be users who use the existing
>>> module who might not be able to easily upgrade the Beam version if we
>>> remove it. But given that V2 is 5 versions behind the latest release this
>>> might be OK.
>>>
>>
>> It seems we have a consensus on this.
>> I think there should be another general discussion on the long term
>> support of our prefered tool IO modules.
>>
>> => yes, consensus, let's drop ESV2
>>
>> We had (and still have) a similar problem with KafkaIO to support
>> different versions of Kafka, especially very old version 0.9. We raised
>> this question on user@ and it appears that there are users who for some
>> reasons still use old Kafka versions. So, before dropping a support of any
>> ES versions, I’d suggest to ask it user@ and see if any people will be
>> affected by this.
>>
>> Yes we can do a survey among users but the question is, should we support
>> an ES version that is no more supported by Elastic themselves ?
>>
>
> +1 for asking in the user list. I guess this is more about whether users
> need this specific version that we hope to drop support for. Whether we
> need to support unsupported versions is a more generic question that should
> prob. be addressed in the dev list. (and I personally don't think we should
> unless there's a large enough user base for a given version).
>
> 2. regarding the user: the aim is to unlock some new features (listed by
 Ludovic) and give the user more flexibility on his request. For that, it
 requires to use high level java ES client in place of the low level REST
 client (that was used because it is the only one compatible with all ES
 versions). We plan to replace the API (json document in and out) by more
 complete standard ES objects that contain de request logic (insert/update,
 doc routing etc...) and the data. There are already IOs like SpannerIO that
 use similar objects in input PCollection rather than pure POJOs.


>>> Won't this be a breaking change for all users ? IMO using POJOs in
>>> PCollections is safer since we have to worry about changes to the
>>> underlying client library API. Exception would be when underlying client
>>> library offers a backwards compatibility guarantee that we can rely on for
>>> the foreseeable future (for example, BQ TableRow).
>>>
>>
>> Agreed but actually, there will be POJOs in order to abstract
>> Elasticsearch's version support. The following third point explains this.
>>
>> => indeed it will be a breaking change, hence this email to get a
>> consensus on that. Also I think our wrappers of ES request objects will
>> offer a backward compatible as the underlying objects
>>
>> I just want to remind that according to what we agreed some time ago on
>> dev@ (at least, for IOs), all breaking user API changes have to be added
>> along with deprecation of old API that could be removed after 3 consecutive
>> Beam releases. In this case, users will have a time to move to new API
>> smoothly.
>>
>> We are more discussing the target architecture of the new module here but
>> the process of deprecation is important to recall, I agree. When I say DTOs
>> back

Re: [ANNOUNCE] New Committer: Kamil Wasilewski

2020-03-06 Thread Chamikara Jayalath
Congrats Kamil!

On Fri, Mar 6, 2020 at 6:44 AM Katarzyna Kucharczyk 
wrote:

> Great news! Congratulations Kamil! 🎉
>
> On Tue, Mar 3, 2020 at 9:05 PM Łukasz Gajowy  wrote:
>
>> Congratulations mate! Well deserved. :)
>>
>> pon., 2 mar 2020 o 17:01 Elias Djurfeldt 
>> napisał(a):
>>
>>> Congrats Kamil!!
>>>
>>> On Mon, 2 Mar 2020 at 16:16, Karolina Rosół 
>>> wrote:
>>>
 Congratulations Kamil! Well deserved :-)

 Karolina Rosół
 Polidea  | Project Manager

 M: +48 606 630 236 <+48606630236>
 E: karolina.ro...@polidea.com
 [image: Polidea] 

 Check out our projects! 
 [image: Github]  [image: Facebook]
  [image: Twitter]
  [image: Linkedin]
  [image: Instagram]
  [image: Behance]
  [image: dribbble]
 


 On Mon, Mar 2, 2020 at 10:43 AM Kamil Wasilewski <
 kamil.wasilew...@polidea.com> wrote:

> Thank you all! I am very happy to be a part of the community :)
>
>
> On Mon, Mar 2, 2020 at 9:45 AM Ryan Skraba  wrote:
>
>> Congratulations Kamil!
>>
>> On Mon, Mar 2, 2020 at 8:06 AM Michał Walenia <
>> michal.wale...@polidea.com> wrote:
>>
>>> Congratulations!
>>>
>>> On Sun, Mar 1, 2020 at 2:55 AM Reza Rokni  wrote:
>>>
 Congratilation Kamil

 On Sat, 29 Feb 2020, 06:18 Udi Meiri,  wrote:

> Welcome Kamil!
>
> On Fri, Feb 28, 2020 at 12:53 PM Mark Liu 
> wrote:
>
>> Congrats, Kamil!
>>
>> On Fri, Feb 28, 2020 at 12:23 PM Ismaël Mejía 
>> wrote:
>>
>>> Congratulations Kamil!
>>>
>>> On Fri, Feb 28, 2020 at 7:09 PM Yichi Zhang 
>>> wrote:
>>>
 Congrats, Kamil!

 On Fri, Feb 28, 2020 at 9:53 AM Valentyn Tymofieiev <
 valen...@google.com> wrote:

> Congratulations, Kamil!
>
> On Fri, Feb 28, 2020 at 9:34 AM Pablo Estrada <
> pabl...@google.com> wrote:
>
>> Hi everyone,
>>
>> Please join me and the rest of the Beam PMC in welcoming a
>> new committer: Kamil Wasilewski
>>
>> Kamil has contributed to Beam in many ways, including the
>> performance testing infrastructure, and a custom BQ source, 
>> along with
>> other contributions.
>>
>> In consideration of his contributions, the Beam PMC trusts
>> him with the responsibilities of a Beam committer[1].
>>
>> Thanks for your contributions Kamil!
>>
>> Pablo, on behalf of the Apache Beam PMC.
>>
>> [1] https://beam.apache.org/contribute/become-a-committer
>> /#an-apache-beam-committer
>>
>>
>>>
>>> --
>>>
>>> Michał Walenia
>>> Polidea  | Software Engineer
>>>
>>> M: +48 791 432 002 <+48791432002>
>>> E: michal.wale...@polidea.com
>>>
>>> Unique Tech
>>> Check out our projects! 
>>>
>>


Re: Contributing Twister2 runner to Apache Beam

2020-03-06 Thread Pulasthi Supun Wickramasinghe
I understand that the discussion is on a more broad level than the Twister2
runner. From my experience developing the runner the main advantage of
being inside the beam project was the easy access to the wide range of
tests and other core/utility code as Kyle pointed out. Unmerging runners
that are not properly maintained and updated would be the most logical path
to follow since the internals of the runners are only well understood by
developers of that particular project. It would be unreasonable to expect
the Beam community to maintain them. And since the runners do not alter the
core API's I assume they would be easy to unmerge if the need arises.

Talking specifically about Twister2 runner, we hope to continue developing
the runner in the future to add both streaming capability and develop a
portable runner as well. The team behind Twister2 is working towards the
goal to get the project into Apache Incubator in the near future (Hopefully
to submit the proposal in the next couple of months).

Best Regards,
Pulasthi



On Thu, Mar 5, 2020 at 6:56 PM Robert Bradshaw  wrote:

> I think we will get to a point where it makes sense for runners to
> live in their own repositories, with their own release cadence, but
> we're not at that point yet. One prerequisite is a stable API--we're
> closing in on that with the portability protos, but many (java)
> runners actually share the common runner core libraries and that is
> even less set in stone.
>
> On the other hand, taking responsibility for maintaining all runners
> is not a tenable or scalable position for the Beam project. If a
> runner is merged, it should be understood that it can be "un-merged"
> if it causes a maintenance burden. A completely separate
> project/repository makes this less messy.
>
> On Thu, Mar 5, 2020 at 10:01 AM Kenneth Knowles  wrote:
> >
> > I agree with both of you, mostly :-)
> >
> > The monorepo approach doesn't work/scale well for shipped libraries
> (name a Google library that silently just works and never causes any
> dependency problems) and the pain we feel has been constant and increasing,
> but I don't think we are at the breaking point.
> >
> > But Google's big monorepo [1] demonstrates similar benefits to what Kyle
> describes. In the early stages the benefit of not having to think too hard
> about build/test infra and share it everywhere is a big help, and it scales
> well. Eventually, shipping test utility libraries and compliance suites can
> be equivalent. And to your point - it is very helpful for users to know
> that they can use CassandraIO with the other Beam artifacts. This is why
> Google requires the whole big repo to depend on a single version of any
> externally-controlled artifact. But, yes, as a consequence it is
> preposterously difficult to stay up to date, since literally anything can
> block progress. You need a unified escalation chain for that policy to make
> sense. It is the definition of a healthy Apache project to *not* have that
> (PMC is different).
> >
> > Independent dependencies, independent git histories, and independent
> release cadence/process are all separate discussions.
> >
> > It is a broader question than this particular contribution, so let's
> merge this runner before changing our whole way of doing things :-)
> >
> > Kenn
> >
> > [1]
> https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext
> (really quite a balanced analysis)
> >
> > On Wed, Mar 4, 2020 at 11:51 AM Kyle Weaver  wrote:
> >>
> >> > Should runners, current and future, be in the same repository as Beam
> >> > core?
> >>
> >> In the distant past, runners lived in their own repositories, and then
> were donated to Beam. But Beam's current uber-repo setup allows a lot of
> convenience. For example, a ton of code (including core functionality and
> tests) is shared directly between runners, which is useful for keeping
> runners up to date and ensuring consistent behavior between them (in other
> words, maintainable and reliable).
> >>
> >> Generally, it is up to the authors of a particular Beam related
> project/subproject to decide whether to host their code in Beam or in a
> different repo, and up to the community to decide whether to take on the
> donation, as discussed in previous threads on the Twister2 runner. In this
> case, it seems there is agreement between the Twister2 runner authors and
> the community that the runner can be hosted in Beam proper.
> >>
> >> There are examples of successful independent Beam projects, such as
> Spotify's Scio, but having an independent project with its own releases
> requires a lot of dedicated resources, and the bar for entry for extending
> Beam should not be that high. All that's required of subproject authors is
> that they keep the subproject in step with Beam. If they can't maintain it
> any longer, the subproject can be allowed to bitrot without getting in
> anyone's way. On the other hand, I'm not sure o

Re: [ANNOUNCE] New Committer: Kamil Wasilewski

2020-03-06 Thread Katarzyna Kucharczyk
Great news! Congratulations Kamil! 🎉

On Tue, Mar 3, 2020 at 9:05 PM Łukasz Gajowy  wrote:

> Congratulations mate! Well deserved. :)
>
> pon., 2 mar 2020 o 17:01 Elias Djurfeldt 
> napisał(a):
>
>> Congrats Kamil!!
>>
>> On Mon, 2 Mar 2020 at 16:16, Karolina Rosół 
>> wrote:
>>
>>> Congratulations Kamil! Well deserved :-)
>>>
>>> Karolina Rosół
>>> Polidea  | Project Manager
>>>
>>> M: +48 606 630 236 <+48606630236>
>>> E: karolina.ro...@polidea.com
>>> [image: Polidea] 
>>>
>>> Check out our projects! 
>>> [image: Github]  [image: Facebook]
>>>  [image: Twitter]
>>>  [image: Linkedin]
>>>  [image: Instagram]
>>>  [image: Behance]
>>>  [image: dribbble]
>>> 
>>>
>>>
>>> On Mon, Mar 2, 2020 at 10:43 AM Kamil Wasilewski <
>>> kamil.wasilew...@polidea.com> wrote:
>>>
 Thank you all! I am very happy to be a part of the community :)


 On Mon, Mar 2, 2020 at 9:45 AM Ryan Skraba  wrote:

> Congratulations Kamil!
>
> On Mon, Mar 2, 2020 at 8:06 AM Michał Walenia <
> michal.wale...@polidea.com> wrote:
>
>> Congratulations!
>>
>> On Sun, Mar 1, 2020 at 2:55 AM Reza Rokni  wrote:
>>
>>> Congratilation Kamil
>>>
>>> On Sat, 29 Feb 2020, 06:18 Udi Meiri,  wrote:
>>>
 Welcome Kamil!

 On Fri, Feb 28, 2020 at 12:53 PM Mark Liu 
 wrote:

> Congrats, Kamil!
>
> On Fri, Feb 28, 2020 at 12:23 PM Ismaël Mejía 
> wrote:
>
>> Congratulations Kamil!
>>
>> On Fri, Feb 28, 2020 at 7:09 PM Yichi Zhang 
>> wrote:
>>
>>> Congrats, Kamil!
>>>
>>> On Fri, Feb 28, 2020 at 9:53 AM Valentyn Tymofieiev <
>>> valen...@google.com> wrote:
>>>
 Congratulations, Kamil!

 On Fri, Feb 28, 2020 at 9:34 AM Pablo Estrada <
 pabl...@google.com> wrote:

> Hi everyone,
>
> Please join me and the rest of the Beam PMC in welcoming a new
>  committer: Kamil Wasilewski
>
> Kamil has contributed to Beam in many ways, including the
> performance testing infrastructure, and a custom BQ source, along 
> with
> other contributions.
>
> In consideration of his contributions, the Beam PMC trusts him
> with the responsibilities of a Beam committer[1].
>
> Thanks for your contributions Kamil!
>
> Pablo, on behalf of the Apache Beam PMC.
>
> [1] https://beam.apache.org/contribute/become-a-committer
> /#an-apache-beam-committer
>
>
>>
>> --
>>
>> Michał Walenia
>> Polidea  | Software Engineer
>>
>> M: +48 791 432 002 <+48791432002>
>> E: michal.wale...@polidea.com
>>
>> Unique Tech
>> Check out our projects! 
>>
>


Re: A new reworked Elasticsearch 7+ IO module

2020-03-06 Thread Etienne Chauchot

Hi all,

it's been 3 weeks since the survey on ES versions the users use.

The survey received very few responses: only 9 responses for now 
(multiple versions possible of course). The responses are the following:


ES2: 0 clients, ES5: 1, ES6: 5, ES7: 8

It tends to go toward a drop of ES2 support but for now it is still not 
very representative.


I'm cross-posting to @users to let you know that I'm closing the survey 
within 1 or 2 weeks. So please respond if you're using ESIO.


Best

Etienne

On 13/02/2020 12:37, Etienne Chauchot wrote:


Hi Cham, thanks for your comments !

I just sent an email to user ML with a survey link to count ES uses 
per version:


https://lists.apache.org/thread.html/rc8185afb8af86a2a032909c13f569e18bd89e75a5839894d5b5d4082%40%3Cuser.beam.apache.org%3E

Best

Etienne

On 10/02/2020 19:46, Chamikara Jayalath wrote:



On Thu, Feb 6, 2020 at 8:13 AM Etienne Chauchot > wrote:


Hi,

please see my comments inline

On 06/02/2020 16:24, Alexey Romanenko wrote:

Please, see my comments inline.


On 6 Feb 2020, at 10:50, Etienne Chauchot mailto:echauc...@apache.org>> wrote:



1. regarding version support: ES v2 is no more
maintained by Elastic since 2018/02 so we plan to
remove it from the IO. In the past we already
retired versions (like spark 1.6 for instance).



My only concern here is that there might be users who use
the existing module who might not be able to easily
upgrade the Beam version if we remove it. But given that
V2 is 5 versions behind the latest release this might be OK.


It seems we have a consensus on this.
I think there should be another general discussion on the long
term support of our prefered tool IO modules.


=> yes, consensus, let's drop ESV2


We had (and still have) a similar problem with KafkaIO to
support different versions of Kafka, especially very old version
0.9. We raised this question on user@ and it appears that there
are users who for some reasons still use old Kafka versions. So,
before dropping a support of any ES versions, I’d suggest to ask
it user@ and see if any people will be affected by this.

Yes we can do a survey among users but the question is, should we
support an ES version that is no more supported by Elastic
themselves ?


+1 for asking in the user list. I guess this is more about whether 
users need this specific version that we hope to drop support for. 
Whether we need to support unsupported versions is a more generic 
question that should prob. be addressed in the dev list. (and I 
personally don't think we should unless there's a large enough user 
base for a given version).



2. regarding the user: the aim is to unlock some new
features (listed by Ludovic) and give the user more
flexibility on his request. For that, it requires to
use high level java ES client in place of the low
level REST client (that was used because it is the
only one compatible with all ES versions). We plan
to replace the API (json document in and out) by
more complete standard ES objects that contain de
request logic (insert/update, doc routing etc...)
and the data. There are already IOs like SpannerIO
that use similar objects in input PCollection rather
than pure POJOs.



Won't this be a breaking change for all users ? IMO using
POJOs in PCollections is safer since we have to worry
about changes to the underlying client library API.
Exception would be when underlying client library offers a
backwards compatibility guarantee that we can rely on for
the foreseeable future (for example, BQ TableRow).


Agreed but actually, there will be POJOs in order to abstract
Elasticsearch's version support. The following third point
explains this.


=> indeed it will be a breaking change, hence this email to get
a consensus on that. Also I think our wrappers of ES request
objects will offer a backward compatible as the underlying objects


I just want to remind that according to what we agreed some time
ago on dev@ (at least, for IOs), all breaking user API changes
have to be added along with deprecation of old API that could be
removed after 3 consecutive Beam releases. In this case, users
will have a time to move to new API smoothly.


We are more discussing the target architecture of the new module
here but the process of deprecation is important to recall, I
agree. When I say DTOs backward compatible above I mean between
per-version sub-modules inside the new module. Anyway, sure, for
some time, both modules (the old REST-based that supports v2-7
and the new that supports v5-7) will cohabit and the old one will
recei

Re: Upcoming Apache Beam meetups in Warsaw

2020-03-06 Thread Michał Walenia
Hi,
I'd also like to bump the meetup - I had great fun and
interesting conversations on the previous one and I'm looking forward to
having more.
See you in Warsaw!

If you'd like to present on the meetup (or on another one), please let us
know! :)

Cheers!
Michal

On Tue, Mar 3, 2020 at 11:17 AM Kamil Wasilewski <
kamil.wasilew...@polidea.com> wrote:

> Bumping up. The previous Beam meetup in Warsaw was successful and we'd
> like to repeat that. It's also a great opportunity to meet and talk with us
> in our office
>
> On Mon, Mar 2, 2020 at 4:20 PM Karolina Rosół 
> wrote:
>
>> Hi everyone,
>>
>> I'm Project Manager at Polidea and work closely with three Apache Beam
>> committers (Katarzyna Kucharczyk, Kamil Wasilewski and Michał Walenia).
>>
>> Together with folks from Polidea we'd like to announce our plans towards
>> the upcoming Apache Beam meetups in Warsaw. The next date for the Beam
>> meetup we're considering is March 26th (Thursday).
>>
>> Does any of you want to be a speaker? If yes, let me know.
>>
>> The event has not been announced yet because we're struggling a
>> little bit with finding the right people to carry out the talks :-)
>>
>> Thanks,
>>
>> Karolina Rosół
>> Polidea  | Project Manager
>>
>> M: +48 606 630 236 <+48606630236>
>> E: karolina.ro...@polidea.com
>> [image: Polidea] 
>>
>> Check out our projects! 
>> [image: Github]  [image: Facebook]
>>  [image: Twitter]
>>  [image: Linkedin]
>>  [image: Instagram]
>>  [image: Behance]
>>  [image: dribbble]
>> 
>>
>

-- 

Michał Walenia
Polidea  | Software Engineer

M: +48 791 432 002 <+48791432002>
E: michal.wale...@polidea.com

Unique Tech
Check out our projects!