Re: Try Beam Katas Today

2020-05-14 Thread Henry Suryawirawan
The guide for sharing the course (publishing to Stepik) can be found here:
https://www.jetbrains.com/help/education/educator-start-guide.html#share_course
It is well integrated in the IDE with a few clicks.

Yeah the publication process is now distinct.
At the moment, when the PR for the course changes is approved, we update
the Stepik version, add & commit the auto-generated metadata YAML to the
PR, and then merge & close the PR.
Open to suggestion if there is a better way to publish.



On Thu, May 14, 2020 at 10:21 PM Austin Bennett 
wrote:

> It looks like there are instructions online for writing exercises/Katas:
> https://www.jetbrains.com/help/education/educator-start-guide.html
>
> Do we have a guide for contributing and publication/releases occur
> (publishing to Stepik)?  Although the code lives in the main repo
> (therefore subject to those contrib guidelines), I think the
> release/publication schedule is distinct?
>
> This hopefully will help illustrate that we are able to contribute to
> Katas (PRs welcome?), and not just consume them!
>
>
>
> On Thu, May 14, 2020 at 1:41 AM Henry Suryawirawan <
> hsuryawira...@google.com> wrote:
>
>> Yeah certainly we can expand it further.
>> There are more lessons that definitely can be added further.
>>
>> >Eg more the write side windowing interactions?
>> Are you referring to Write IOs?
>>
>>
>>
>> On Wed, May 13, 2020 at 11:56 PM Nathan Fisher 
>> wrote:
>>
>>> I went through them earlier this week! Definitely helpful.
>>>
>>> Is it possible to expand the katas available in the lO section? Eg more
>>> the write side windowing interactions?
>>>
>>> On Wed, May 13, 2020 at 11:36, Luke Cwik  wrote:
>>>
 These are an excellent learning tool.

 On Tue, May 12, 2020 at 11:02 PM Pablo Estrada 
 wrote:

> Sharing Damon's email with the user@ list as well. Thanks Damon!
>
> On Tue, May 12, 2020 at 9:02 PM Damon Douglas 
> wrote:
>
>> Hello Everyone,
>>
>> If you don't already know, there are helpful instructional tools for
>> learning the Apache Beam SDKs called Beam Katas hosted on
>> https://stepik.org.  Similar to traditional Kata
>> , they are meant to be repeated
>> as practice.  Before practicing the katas myself, I found myself
>> copy/pasting code (Please accept my confession  ).  Now I find myself
>> actually composing pipelines.  Just like kata forms, you find them 
>> becoming
>> part of you.  If you are interested, below are listed the current 
>> available
>> katas:
>>
>> 1.  Java - https://stepik.org/course/54530
>>
>> 2.  Python -  https://stepik.org/course/54532
>>
>> 3.  Go (in development) - https://stepik.org/course/70387
>>
>> If you are absolutely brand new to Beam and it scares you like it
>> scared me, come talk to me.
>>
>> Best,
>>
>> Damon
>>
> --
>>> Nathan Fisher
>>>  w: http://junctionbox.ca/
>>>
>>


Re: [Proposal] Apache Beam Fn API - GCP IO Debuggability Metrics

2020-05-14 Thread Alex Amato
Thanks to all who have spent their time on this, there were many great
suggestions, just another reminder that tomorrow I will be finalizing the
documents, unless there are any major objections left. Please take a look
at it if you are interested.

I will still welcome feedback at any time :).

But I believe we have gathered enough information to produce a good design,
which I will start to work on soon.
I will begin to build the necessary subset of the new features proposed to
support the BigQueryIO metrics use case, proposed.
I will likely start with the python SDK first.

https://s.apache.org/beam-gcp-debuggability
https://s.apache.org/beam-histogram-metrics


On Wed, May 13, 2020 at 3:07 PM Alex Amato  wrote:

> Thanks again for more feedback :). I have iterated on things again. I'll
> report back at the end of the week. If there are no major disagreements
> still, I'll close the discussion, believe it to be in a good enough state
> to start some implementation. But welcome feedback.
>
> Latest changes are changing the exponential format to allow denser
> buckets. Using only two MonitoringInfoSpec now for all of the IOs to use.
> Requiring some labels, but allowing optional
> ones for specific IOs to provide more contents.
>
> https://s.apache.org/beam-gcp-debuggability
> https://s.apache.org/beam-histogram-metrics
>
> On Mon, May 11, 2020 at 4:24 PM Alex Amato  wrote:
>
>> Thanks for the great feedback so far :). I've included many new ideas,
>> and made some revisions. Both docs have changed a fair bit since the
>> initial mail out.
>>
>> https://s.apache.org/beam-gcp-debuggability
>> https://s.apache.org/beam-histogram-metrics
>>
>> PTAL and let me know what you think, and hopefully we can resolve major
>> issues by the end of the week. I'll try to finalize things by then, but of
>> course always stay open to your great ideas. :)
>>
>> On Wed, May 6, 2020 at 6:19 PM Alex Amato  wrote:
>>
>>> Thanks everyone so far for taking a look so far :).
>>>
>>> I am hoping to have this finalize the two reviews by the end of next
>>> week, May 15th.
>>>
>>> I'll continue to follow up on feedback and make changes, and I will add
>>> some more mentions to the documents to draw attention
>>>
>>> https://s.apache.org/beam-gcp-debuggability
>>>  https://s.apache.org/beam-histogram-metrics
>>>
>>> On Wed, May 6, 2020 at 10:00 AM Luke Cwik  wrote:
>>>
 Thanks, also took a look and left some comments.

 On Tue, May 5, 2020 at 6:24 PM Alex Amato  wrote:

> Hello,
>
> I created another design document. This time for GCP IO Debuggability
> Metrics. Which defines some new metrics to collect in the GCP IO 
> libraries.
> This is for monitoring request counts and request latencies.
>
> Please take a look and let me know what you think:
> https://s.apache.org/beam-gcp-debuggability
>
> I also sent out a separate design yesterday (
> https://s.apache.org/beam-histogram-metrics) which is related as this
> document uses a Histogram style metric :).
>
> I would love some feedback to make this feature the best possible :D,
> Alex
>



Re: Try Beam Katas Today

2020-05-14 Thread Rion Williams
+1 on the contributions front. My team and I have been working with Beam 
primarily with Kotlin and I recently added the appropriate dependencies to 
Gradle and performed a bit of conversions and have it working as expected 
against the existing Java course.

I don’t know how many others are actively working with Kotlin and Beam, but I’d 
love to work on transitioning that into a proper course (assuming there’s 
interest in it).

> On May 14, 2020, at 10:32 AM, Nathan Fisher  wrote:
> 
> 
> Yes write IO
> 
>> On Thu, May 14, 2020 at 05:41, Henry Suryawirawan  
>> wrote:
>> Yeah certainly we can expand it further.
>> There are more lessons that definitely can be added further.
>> 
>> >Eg more the write side windowing interactions?
>> Are you referring to Write IOs?
>> 
>> 
>> 
>>> On Wed, May 13, 2020 at 11:56 PM Nathan Fisher  
>>> wrote:
>>> I went through them earlier this week! Definitely helpful.
>>> 
>>> Is it possible to expand the katas available in the lO section? Eg more the 
>>> write side windowing interactions?
>>> 
 On Wed, May 13, 2020 at 11:36, Luke Cwik  wrote:
 These are an excellent learning tool.
 
> On Tue, May 12, 2020 at 11:02 PM Pablo Estrada  wrote:
> Sharing Damon's email with the user@ list as well. Thanks Damon!
> 
>> On Tue, May 12, 2020 at 9:02 PM Damon Douglas  
>> wrote:
>> Hello Everyone,
>> 
>> If you don't already know, there are helpful instructional tools for 
>> learning the Apache Beam SDKs called Beam Katas hosted on 
>> https://stepik.org.  Similar to traditional Kata, they are meant to be 
>> repeated as practice.  Before practicing the katas myself, I found 
>> myself copy/pasting code (Please accept my confession  ).  Now I find 
>> myself actually composing pipelines.  Just like kata forms, you find 
>> them becoming part of you.  If you are interested, below are listed the 
>> current available katas:
>> 
>> 1.  Java - https://stepik.org/course/54530
>> 
>> 2.  Python -  https://stepik.org/course/54532
>> 
>> 3.  Go (in development) - https://stepik.org/course/70387
>> 
>> If you are absolutely brand new to Beam and it scares you like it scared 
>> me, come talk to me.
>> 
>> Best,
>> 
>> Damon
>>> -- 
>>> Nathan Fisher
>>>  w: http://junctionbox.ca/
> -- 
> Nathan Fisher
>  w: http://junctionbox.ca/


Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-14 Thread Aizhamal Nurmamat kyzy
Thank you Ahmet, Brian, Robert and everyone else who spent time working on
this. The pull requests are now merged and the website seems to be working
as expected [1].

One minor issue that I have noticed is that the code blocks have a grey
background, which makes it less accessible than before. I created a Jira
issue for this [2], and will follow up to get it fixed. If you notice any
other issues, please file Jira issues and let me know.

Hope you are all safe,
Aizhamal

[1] https://beam.apache.org/
[2] https://issues.apache.org/jira/browse/BEAM-10001

On Thu, May 14, 2020 at 11:25 AM Pablo Estrada  wrote:

> Here's a zipped-up tree from a staged sample of the website:
> https://drive.google.com/file/d/1LKL936tBJ79jpjvlL5vC5uYYwTHsWXiJ/view?usp=sharing
>
> I'd also suggest tagging the commit, so we can find the fist commit later
> on for reference. I can push the tag after the PR is merged.
>
> On Thu, May 14, 2020 at 10:43 AM Ahmet Altay  wrote:
>
>>
>>
>> On Thu, May 14, 2020 at 9:16 AM Aizhamal Nurmamat kyzy <
>> aizha...@apache.org> wrote:
>>
>>> Thank you all for reviewing and validating this pull request. I see that
>>> all tests are passing now, should we merge it?
>>>
>>
>> +1 to merging now.
>>
>> Before the merge, please share a link to an archive copy of the old
>> website. After the merge, please try out the live website see if it is
>> working as expected.
>>
>>
>>>
>>> On Wed, May 13, 2020, 5:41 PM Ahmet Altay  wrote:
>>>
 Thank you! Let's merge it once tests are done.

 On Wed, May 13, 2020 at 5:23 PM Robert Bradshaw 
 wrote:

> I took a (non-comprehensive) look at these as well, and didn't see any
> issues, so am happy to sign off on this. Thanks Nam, Brian, Ahmet, and
> everyone else.
>
> On Wed, May 13, 2020 at 7:58 AM Nam Bui  wrote:
>
>> Hi Ahmet,
>> "Does this mean the internal links (e.g. contribute/team) will
>> disappear?"
>> Yes, I'd like to get rid of them. And to make sure it won't appear to
>> confuse people, I replaced all of the spots using "contribute/team" with
>> the external one. Currently, we only have 2 "redirect_to" links which are
>> "contribute/team" & "contribute/project/team", so this act won't have any
>> affects.
>> Also, based on your question, I just added a section in the
>> documentation (CONTRIBUTE.md), which mentions the replaced/removed 
>> features
>> of Jekyll in terms of writing a new blog post or documentation in Hugo.
>>
>
 Got it. The main effect will be any one has a bookmark/link to these
 pages, those links will no longer work. It is fine if it is only limited to
 these 2 urls.


>
>>
>> On Wed, May 13, 2020 at 4:17 AM Ahmet Altay  wrote:
>>
>>> - I reviewed the diff output with Nam's explanations. The change
>>> looks minimal. Large diffs are primarily coming from index and redirect
>>> files. codeblocks have differences but the content is seemingly 
>>> preserved.
>>> IIUC, the source of truth is snippet files anyway. (It would be good to 
>>> get
>>> one more set of eyes on this.)
>>> - Brian and I reviewed the infrastructure changes. They look
>>> reasonable.
>>>
>>> I think PR is very close to a mergeable state. Especially if we can
>>> get an archive copy of the current website, I will be comfortable with 
>>> the
>>> merge.
>>>
>>> And, thank you Nam for your work so far.
>>>
>>> On Tue, May 12, 2020 at 4:13 PM Nam Bui  wrote:
>>>
 Hi,

 A new commit covers Robert's script is pushed [1], and also the
 script output is attached in this email.

 Based on the diff output of the script, my strategy is looking at
 the sections which contain the large/massive removed texts, to make 
 sure
 that there are no lost content or files. And below are all of the links
 which have large of the removed content.

 - Detection:
 These links lost some of the contents. Fixed!
 + documentation/runners/jstorm/index.html
 + documentation/dsls/sql/calcite/lexical-structure/index.html
 + documentation/dsls/sql/zetasql/data-types/index.html
 + documentation/dsls/sql/zetasql/query-syntax/index.html

 - Aliases:
 These links are redirected links. So in Hugo, these HTML files only
 include redirected URLs. I also took a look at them to ensure the 
 content
 was there.
 + documentation/dsls/sql/calcite/lexical/index.html
 + old URLs of blog posts

 - Ignore:
 Hugo and Jekyll have different structures of code highlighters
 rendering in HTML. Ahmed & Pablo agree with me that its fair to ignore 
 them
 for now.
 + codeblocks

 - Missing files:
 The script returns some of “missing files” 

Re: New Grafana dashboards

2020-05-14 Thread Pablo Estrada
I noticed that postcommit status dashboard shows 0/1 values - I remember it
used to show green/red for passed/failed tests. Maybe something changed for
it?
Best
-P.

On Wed, May 13, 2020 at 3:23 PM Pablo Estrada  wrote:

> Thanks Kamil! These dashboards are super useful. I'm happy to see our perf
> tests there as well now.
> Thanks!
> -P.
>
> On Wed, May 13, 2020 at 8:43 AM Tyson Hamilton  wrote:
>
>> The dashboards look great! Thank you.
>>
>> It would be nice if the 'Useful Links' section included links to Apache
>> Beam related material like the cwiki documentation.
>>
>> On Wed, May 13, 2020 at 4:50 AM Kamil Wasilewski <
>> kamil.wasilew...@polidea.com> wrote:
>>
>>> Hello everyone,
>>>
>>> I'm pleased to announce that we've just moved dashboards gathering
>>> performance test execution times from Perfkit Explorer to Grafana. Here's a
>>> link to new dashboards: http://metrics.beam.apache.org
>>>
>>> *Why Grafana?*
>>> Grafana is an open source visualization tool. It offers better user
>>> experience and more flexibility that the tool we've been using until now.
>>> It also changes the way of developing new charts - all Grafana dashboards
>>> are stored as json files in Beam's repository, and therefore require a full
>>> code review process.
>>>
>>> *What's next?*
>>> There is still some work to be done. This includes moving even more
>>> charts to Grafana and simplifying the process of updating and creating new
>>> charts. We are also working on improving the docs [1].
>>>
>>> As always, I'd be happy to hear any feedback from you.
>>>
>>> Cheers,
>>> Kamil
>>>
>>> [1]
>>> https://cwiki.apache.org/confluence/display/BEAM/Test+Results+Monitoring
>>>
>>


Interested in applying to GSoD project

2020-05-14 Thread Deepak Vohra
Aizhamal,

I am interested in applying as a technical writer to the Apache Beam project in 
Google Season of Docs. In the project exploration phase I would like to 
introduce myself as a potential applicant (when the application opens). I have 
experience using several data processing frameworks and have published dozens 
of articles and a few books on the same. Some books on similar topics :
1.Practical Hadoop 
Ecosystemhttps://www.amazon.com/gp/product/B01M0NAHU3/ref=dbs_a_def_rwt_hsch_vapi_tkin_p1_i5
2. Apache HBase 
Primerhttps://www.amazon.com/gp/product/B01MTOSTAB/ref=dbs_a_def_rwt_bibl_vppi_i1
I have also published 3 other books on Kubernetes; Kubernetes being a commonly 
used deployment platform for Apache Beam.
Please let me know about any   potential topics other than those already listed 
and what I could do to get started.
regards,Deepak




Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-14 Thread Pablo Estrada
Here's a zipped-up tree from a staged sample of the website:
https://drive.google.com/file/d/1LKL936tBJ79jpjvlL5vC5uYYwTHsWXiJ/view?usp=sharing

I'd also suggest tagging the commit, so we can find the fist commit later
on for reference. I can push the tag after the PR is merged.

On Thu, May 14, 2020 at 10:43 AM Ahmet Altay  wrote:

>
>
> On Thu, May 14, 2020 at 9:16 AM Aizhamal Nurmamat kyzy <
> aizha...@apache.org> wrote:
>
>> Thank you all for reviewing and validating this pull request. I see that
>> all tests are passing now, should we merge it?
>>
>
> +1 to merging now.
>
> Before the merge, please share a link to an archive copy of the old
> website. After the merge, please try out the live website see if it is
> working as expected.
>
>
>>
>> On Wed, May 13, 2020, 5:41 PM Ahmet Altay  wrote:
>>
>>> Thank you! Let's merge it once tests are done.
>>>
>>> On Wed, May 13, 2020 at 5:23 PM Robert Bradshaw 
>>> wrote:
>>>
 I took a (non-comprehensive) look at these as well, and didn't see any
 issues, so am happy to sign off on this. Thanks Nam, Brian, Ahmet, and
 everyone else.

 On Wed, May 13, 2020 at 7:58 AM Nam Bui  wrote:

> Hi Ahmet,
> "Does this mean the internal links (e.g. contribute/team) will
> disappear?"
> Yes, I'd like to get rid of them. And to make sure it won't appear to
> confuse people, I replaced all of the spots using "contribute/team" with
> the external one. Currently, we only have 2 "redirect_to" links which are
> "contribute/team" & "contribute/project/team", so this act won't have any
> affects.
> Also, based on your question, I just added a section in the
> documentation (CONTRIBUTE.md), which mentions the replaced/removed 
> features
> of Jekyll in terms of writing a new blog post or documentation in Hugo.
>

>>> Got it. The main effect will be any one has a bookmark/link to these
>>> pages, those links will no longer work. It is fine if it is only limited to
>>> these 2 urls.
>>>
>>>

>
> On Wed, May 13, 2020 at 4:17 AM Ahmet Altay  wrote:
>
>> - I reviewed the diff output with Nam's explanations. The change
>> looks minimal. Large diffs are primarily coming from index and redirect
>> files. codeblocks have differences but the content is seemingly 
>> preserved.
>> IIUC, the source of truth is snippet files anyway. (It would be good to 
>> get
>> one more set of eyes on this.)
>> - Brian and I reviewed the infrastructure changes. They look
>> reasonable.
>>
>> I think PR is very close to a mergeable state. Especially if we can
>> get an archive copy of the current website, I will be comfortable with 
>> the
>> merge.
>>
>> And, thank you Nam for your work so far.
>>
>> On Tue, May 12, 2020 at 4:13 PM Nam Bui  wrote:
>>
>>> Hi,
>>>
>>> A new commit covers Robert's script is pushed [1], and also the
>>> script output is attached in this email.
>>>
>>> Based on the diff output of the script, my strategy is looking at
>>> the sections which contain the large/massive removed texts, to make sure
>>> that there are no lost content or files. And below are all of the links
>>> which have large of the removed content.
>>>
>>> - Detection:
>>> These links lost some of the contents. Fixed!
>>> + documentation/runners/jstorm/index.html
>>> + documentation/dsls/sql/calcite/lexical-structure/index.html
>>> + documentation/dsls/sql/zetasql/data-types/index.html
>>> + documentation/dsls/sql/zetasql/query-syntax/index.html
>>>
>>> - Aliases:
>>> These links are redirected links. So in Hugo, these HTML files only
>>> include redirected URLs. I also took a look at them to ensure the 
>>> content
>>> was there.
>>> + documentation/dsls/sql/calcite/lexical/index.html
>>> + old URLs of blog posts
>>>
>>> - Ignore:
>>> Hugo and Jekyll have different structures of code highlighters
>>> rendering in HTML. Ahmed & Pablo agree with me that its fair to ignore 
>>> them
>>> for now.
>>> + codeblocks
>>>
>>> - Missing files:
>>> The script returns some of “missing files” status
>>> + coming-soon.html (this file was used nowhere in Jekyll, so I
>>> didn’t migrate to Hugo)
>>> + documentation/dsls/sql/statements/select/index.html (aliases)
>>> + blog/2019/04/25/beam-2.12.0.html (fixed!)
>>> + blog/2020/05/08/beam-summit-digital-2020.html (new blog post,
>>> added!)
>>> + v2/index.html (this file was used nowhere in Jekyll, so I didn’t
>>> migrate to Hugo)
>>> + contribute/team/index.html (mentioned in “redirect_to” below)
>>> + contribute/project/team/index.html (mentioned in “redirect_to”
>>> below)
>>>
>>> - “redirect_to”:
>>> In Jekyll, there is a feature called “redirect_to”. For instance,
>>> you click on an internal link “contribute/team/” 

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-14 Thread Ahmet Altay
On Thu, May 14, 2020 at 9:16 AM Aizhamal Nurmamat kyzy 
wrote:

> Thank you all for reviewing and validating this pull request. I see that
> all tests are passing now, should we merge it?
>

+1 to merging now.

Before the merge, please share a link to an archive copy of the old
website. After the merge, please try out the live website see if it is
working as expected.


>
> On Wed, May 13, 2020, 5:41 PM Ahmet Altay  wrote:
>
>> Thank you! Let's merge it once tests are done.
>>
>> On Wed, May 13, 2020 at 5:23 PM Robert Bradshaw 
>> wrote:
>>
>>> I took a (non-comprehensive) look at these as well, and didn't see any
>>> issues, so am happy to sign off on this. Thanks Nam, Brian, Ahmet, and
>>> everyone else.
>>>
>>> On Wed, May 13, 2020 at 7:58 AM Nam Bui  wrote:
>>>
 Hi Ahmet,
 "Does this mean the internal links (e.g. contribute/team) will
 disappear?"
 Yes, I'd like to get rid of them. And to make sure it won't appear to
 confuse people, I replaced all of the spots using "contribute/team" with
 the external one. Currently, we only have 2 "redirect_to" links which are
 "contribute/team" & "contribute/project/team", so this act won't have any
 affects.
 Also, based on your question, I just added a section in the
 documentation (CONTRIBUTE.md), which mentions the replaced/removed features
 of Jekyll in terms of writing a new blog post or documentation in Hugo.

>>>
>> Got it. The main effect will be any one has a bookmark/link to these
>> pages, those links will no longer work. It is fine if it is only limited to
>> these 2 urls.
>>
>>
>>>

 On Wed, May 13, 2020 at 4:17 AM Ahmet Altay  wrote:

> - I reviewed the diff output with Nam's explanations. The change looks
> minimal. Large diffs are primarily coming from index and redirect files.
> codeblocks have differences but the content is seemingly preserved. IIUC,
> the source of truth is snippet files anyway. (It would be good to get one
> more set of eyes on this.)
> - Brian and I reviewed the infrastructure changes. They look
> reasonable.
>
> I think PR is very close to a mergeable state. Especially if we can
> get an archive copy of the current website, I will be comfortable with the
> merge.
>
> And, thank you Nam for your work so far.
>
> On Tue, May 12, 2020 at 4:13 PM Nam Bui  wrote:
>
>> Hi,
>>
>> A new commit covers Robert's script is pushed [1], and also the
>> script output is attached in this email.
>>
>> Based on the diff output of the script, my strategy is looking at the
>> sections which contain the large/massive removed texts, to make sure that
>> there are no lost content or files. And below are all of the links which
>> have large of the removed content.
>>
>> - Detection:
>> These links lost some of the contents. Fixed!
>> + documentation/runners/jstorm/index.html
>> + documentation/dsls/sql/calcite/lexical-structure/index.html
>> + documentation/dsls/sql/zetasql/data-types/index.html
>> + documentation/dsls/sql/zetasql/query-syntax/index.html
>>
>> - Aliases:
>> These links are redirected links. So in Hugo, these HTML files only
>> include redirected URLs. I also took a look at them to ensure the content
>> was there.
>> + documentation/dsls/sql/calcite/lexical/index.html
>> + old URLs of blog posts
>>
>> - Ignore:
>> Hugo and Jekyll have different structures of code highlighters
>> rendering in HTML. Ahmed & Pablo agree with me that its fair to ignore 
>> them
>> for now.
>> + codeblocks
>>
>> - Missing files:
>> The script returns some of “missing files” status
>> + coming-soon.html (this file was used nowhere in Jekyll, so I didn’t
>> migrate to Hugo)
>> + documentation/dsls/sql/statements/select/index.html (aliases)
>> + blog/2019/04/25/beam-2.12.0.html (fixed!)
>> + blog/2020/05/08/beam-summit-digital-2020.html (new blog post,
>> added!)
>> + v2/index.html (this file was used nowhere in Jekyll, so I didn’t
>> migrate to Hugo)
>> + contribute/team/index.html (mentioned in “redirect_to” below)
>> + contribute/project/team/index.html (mentioned in “redirect_to”
>> below)
>>
>> - “redirect_to”:
>> In Jekyll, there is a feature called “redirect_to”. For instance, you
>> click on an internal link “contribute/team/” to reach the markdown
>> “team.md”, then from the markdown file, it redirects you to the external
>> URL “https://example.com”.
>> However, there is no such feature in Hugo. My solution is to directly
>> replace “contribute/team/” with “https://example.com”.
>>
>
> Does this mean the internal links (e.g. contribute/team) will
> disappear?
>
>
>>
>> [1] https://github.com/apache/beam/pull/11554
>>
>> On Mon, May 11, 2020 at 7:34 PM Nam Bui  wrote:
>>
>>> 

Regarding Google Season of Docs

2020-05-14 Thread Arun Prakash
Hi,

Greetings!

I am an MSc Computer Science Student at the Chalmers University of
Technology, Sweden! I would love to be a participant in Google Season of
Docs 2020. I have explored a lot about Apache Beam. I was part of the GSoC
Community to Implement a Hazlecast Portable Runner Support to Apache Beam
but last minute, the project was called off . But I never felt that the
invested hours are of vain, I loved the product! Now I want to contribute
making a doc for this awesome community. Could you please guide me further?

Following is the proposal I submitted for GSoC 2020

https://drive.google.com/file/d/1TJF6j3LoN5kB7rKB3yakBSpTozEajwoI/view?usp=sharing

Thank You !


Regards,
Arun Prakash Jothimani,
Msc Computer Science Student,
Chalmers University Of Technology, Sweden.
http://in.linkedin.com/pub/arunprakash-jothimani/24/aa2/924


Google Season of Docs

2020-05-14 Thread Shalini Mukhopadhyay
Respected sir/madam
I want to apply with your org for the Google Season of Docs. Here's my CV:
Can we talk about your ideas please?thanks



[image: Mailtrack]

Sender
notified by
Mailtrack

14/05/20,
12:39:06


ShaliniMukhopadhyayCV.docx
Description: MS-Word 2007 document


Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

2020-05-14 Thread Yoshiki Obata
Thank you, Kyle and Valentyn.

I'll update test codes to treat Python 3.5 and 3.7 as high-priority
versions at this point.

2020年5月12日(火) 2:10 Valentyn Tymofieiev :
>
> I agree with the point echoed earlier that the lowest and the highest of 
> supported versions will probably give the most useful test signal for 
> possible breakages. So 3.5. and 3.7 as high-priority versions SGTM.
>
> This can change later once Beam drops 3.5 support.
>
> On Mon, May 11, 2020 at 10:05 AM Yoshiki Obata  
> wrote:
>>
>> Hello again,
>>
>> Test infrastructure update is ongoing and then we should determine
>> which Python versions are high-priority.
>>
>> According to Pypi downloads stats[1], download proportion of Python
>> 3.5 is almost always greater than one of 3.6 and 3.7.
>> This situation has not changed since Robert told us Python 3.x
>> occupies nearly 40% of downloads[2]
>>
>> On the other hand, according to docker hub[3],
>> apachebeam/python3.x_sdk image downloaded the most is one of Python
>> 3.7 which was pointed by Kyle[4].
>>
>> Considering these stats, I think high-priority versions are 3.5 and 3.7.
>>
>> Is this assumption appropriate?
>> I would like to hear your thoughts about this.
>>
>> [1] https://pypistats.org/packages/apache-beam
>> [2] 
>> https://lists.apache.org/thread.html/r208c0d11639e790453a17249e511dbfe00a09f91bef8fcd361b4b74a%40%3Cdev.beam.apache.org%3E
>> [3] https://hub.docker.com/search?q=apachebeam%2Fpython=image
>> [4] 
>> https://lists.apache.org/thread.html/r9ca9ad316dae3d60a3bf298eedbe4aeecab2b2664454cc352648abc9%40%3Cdev.beam.apache.org%3E
>>
>> 2020年5月6日(水) 12:48 Yoshiki Obata :
>> >
>> > > Not sure how run_pylint.sh is related here - we should run linter on the 
>> > > entire codebase.
>> > ah, I mistyped... I meant run_pytest.sh
>> >
>> > > I am familiar with beam_PostCommit_PythonXX suites. Is there something 
>> > > specific about these suites that you wanted to know?
>> > Test suite runtime will depend on the number of  tests in the suite,
>> > how many tests we run in parallel, how long they take to run. To
>> > understand the load on test infrastructure we can monitor Beam test
>> > health metrics [1]. In particular, if time in queue[2] is high, it is
>> > a sign that there are not enough Jenkins slots available to start the
>> > test suite earlier.
>> > Sorry for ambiguous question. I wanted to know how to see the load on
>> > test infrastructure.
>> > The Grafana links you showed serves my purpose. Thank you.
>> >
>> > 2020年5月6日(水) 2:35 Valentyn Tymofieiev :
>> > >
>> > > On Mon, May 4, 2020 at 7:06 PM Yoshiki Obata  
>> > > wrote:
>> > >>
>> > >> Thank you for comment, Valentyn.
>> > >>
>> > >> > 1) We can seed the smoke test suite with typehints tests, and add 
>> > >> > more tests later if there is a need. We can identify them by the file 
>> > >> > path or by special attributes in test files. Identifying them using 
>> > >> > filepath seems simpler and independent of test runner.
>> > >>
>> > >> Yes, making run_pylint.sh allow target test file paths as arguments is
>> > >> good way if could.
>> > >
>> > >
>> > > Not sure how run_pylint.sh is related here - we should run linter on the 
>> > > entire codebase.
>> > >
>> > >>
>> > >> > 3)  We should reduce the code duplication across  
>> > >> > beam/sdks/python/test-suites/$runner/py3*. I think we could move the 
>> > >> > suite definition into a common file like 
>> > >> > beam/sdks/python/test-suites/$runner/build.gradle perhaps, and 
>> > >> > populate individual suites 
>> > >> > (beam/sdks/python/test-suites/$runner/py38/build.gradle) including 
>> > >> > the common file and/or logic from PythonNature [1].
>> > >>
>> > >> Exactly. I'll check it out.
>> > >>
>> > >> > 4) We have some tests that we run only under specific Python 3 
>> > >> > versions, for example: FlinkValidatesRunner test runs using Python 
>> > >> > 3.5: [2]
>> > >> > HDFS Python 3 tests are running only with Python 3.7 [3]. 
>> > >> > Cross-language Py3 tests for Spark are running under Python 3.5[4]: , 
>> > >> > there may be more test suites that selectively use particular 
>> > >> > versions.
>> > >> > We need to correct such suites, so that we do not tie them  to a 
>> > >> > specific Python version. I see several options here: such tests 
>> > >> > should run either for all high-priority versions, or run only under 
>> > >> > the lowest version among the high-priority versions.  We don't have 
>> > >> > to fix them all at the same time. In general, we should try to make 
>> > >> > it as easy as possible to configure, whether a suite runs across all  
>> > >> > versions, all high-priority versions, or just one version.
>> > >>
>> > >> The way of high-priority/low-priority configuration would be useful for 
>> > >> this.
>> > >> And which versions to be tested may be related to 5).
>> > >>
>> > >> > 5) If postcommit suites (that need to run against all versions) still 
>> > >> > constitute too much load on the infrastructure, we may need to 
>> 

[BEAM-9615][Go SDK] Beam Schemas

2020-05-14 Thread Robert Burke
https://s.apache.org/beam-go-schemas
The tracking JIRA is BEAM-9615


I've written up the plan and the overall description of the design of
schemas in the Go SDK, for your comment. Due to the approach, there's not
alot of code to look at for the user experience, but there are one or two
Go Playground links for what some of the code should look like.

I'll start to implement while the approach is being reviewed with the
non-contentious portions of design (eg. Adding implementations of standard
beam coders that are missing from the SDK).

Thank you kindly for your time.


Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-14 Thread Aizhamal Nurmamat kyzy
Thank you all for reviewing and validating this pull request. I see that
all tests are passing now, should we merge it?

On Wed, May 13, 2020, 5:41 PM Ahmet Altay  wrote:

> Thank you! Let's merge it once tests are done.
>
> On Wed, May 13, 2020 at 5:23 PM Robert Bradshaw 
> wrote:
>
>> I took a (non-comprehensive) look at these as well, and didn't see any
>> issues, so am happy to sign off on this. Thanks Nam, Brian, Ahmet, and
>> everyone else.
>>
>> On Wed, May 13, 2020 at 7:58 AM Nam Bui  wrote:
>>
>>> Hi Ahmet,
>>> "Does this mean the internal links (e.g. contribute/team) will
>>> disappear?"
>>> Yes, I'd like to get rid of them. And to make sure it won't appear to
>>> confuse people, I replaced all of the spots using "contribute/team" with
>>> the external one. Currently, we only have 2 "redirect_to" links which are
>>> "contribute/team" & "contribute/project/team", so this act won't have any
>>> affects.
>>> Also, based on your question, I just added a section in the
>>> documentation (CONTRIBUTE.md), which mentions the replaced/removed features
>>> of Jekyll in terms of writing a new blog post or documentation in Hugo.
>>>
>>
> Got it. The main effect will be any one has a bookmark/link to these
> pages, those links will no longer work. It is fine if it is only limited to
> these 2 urls.
>
>
>>
>>>
>>> On Wed, May 13, 2020 at 4:17 AM Ahmet Altay  wrote:
>>>
 - I reviewed the diff output with Nam's explanations. The change looks
 minimal. Large diffs are primarily coming from index and redirect files.
 codeblocks have differences but the content is seemingly preserved. IIUC,
 the source of truth is snippet files anyway. (It would be good to get one
 more set of eyes on this.)
 - Brian and I reviewed the infrastructure changes. They look reasonable.

 I think PR is very close to a mergeable state. Especially if we can get
 an archive copy of the current website, I will be comfortable with the
 merge.

 And, thank you Nam for your work so far.

 On Tue, May 12, 2020 at 4:13 PM Nam Bui  wrote:

> Hi,
>
> A new commit covers Robert's script is pushed [1], and also the script
> output is attached in this email.
>
> Based on the diff output of the script, my strategy is looking at the
> sections which contain the large/massive removed texts, to make sure that
> there are no lost content or files. And below are all of the links which
> have large of the removed content.
>
> - Detection:
> These links lost some of the contents. Fixed!
> + documentation/runners/jstorm/index.html
> + documentation/dsls/sql/calcite/lexical-structure/index.html
> + documentation/dsls/sql/zetasql/data-types/index.html
> + documentation/dsls/sql/zetasql/query-syntax/index.html
>
> - Aliases:
> These links are redirected links. So in Hugo, these HTML files only
> include redirected URLs. I also took a look at them to ensure the content
> was there.
> + documentation/dsls/sql/calcite/lexical/index.html
> + old URLs of blog posts
>
> - Ignore:
> Hugo and Jekyll have different structures of code highlighters
> rendering in HTML. Ahmed & Pablo agree with me that its fair to ignore 
> them
> for now.
> + codeblocks
>
> - Missing files:
> The script returns some of “missing files” status
> + coming-soon.html (this file was used nowhere in Jekyll, so I didn’t
> migrate to Hugo)
> + documentation/dsls/sql/statements/select/index.html (aliases)
> + blog/2019/04/25/beam-2.12.0.html (fixed!)
> + blog/2020/05/08/beam-summit-digital-2020.html (new blog post, added!)
> + v2/index.html (this file was used nowhere in Jekyll, so I didn’t
> migrate to Hugo)
> + contribute/team/index.html (mentioned in “redirect_to” below)
> + contribute/project/team/index.html (mentioned in “redirect_to” below)
>
> - “redirect_to”:
> In Jekyll, there is a feature called “redirect_to”. For instance, you
> click on an internal link “contribute/team/” to reach the markdown
> “team.md”, then from the markdown file, it redirects you to the external
> URL “https://example.com”.
> However, there is no such feature in Hugo. My solution is to directly
> replace “contribute/team/” with “https://example.com”.
>

 Does this mean the internal links (e.g. contribute/team) will disappear?


>
> [1] https://github.com/apache/beam/pull/11554
>
> On Mon, May 11, 2020 at 7:34 PM Nam Bui  wrote:
>
>> Updates for today:
>> - Thanks Brian & Ahmet for your reviews. I left my comments for some
>> of the questions and also adapted new changes to the reviews [1].
>> - I see that the new blog post was merged yesterday, so I added it to
>> the PR as well.
>>
>> I briefly tried the script from Robert with the input of build files
>> from old and new websites. It 

Re: Try Beam Katas Today

2020-05-14 Thread Nathan Fisher
Yes write IO

On Thu, May 14, 2020 at 05:41, Henry Suryawirawan 
wrote:

> Yeah certainly we can expand it further.
> There are more lessons that definitely can be added further.
>
> >Eg more the write side windowing interactions?
> Are you referring to Write IOs?
>
>
>
> On Wed, May 13, 2020 at 11:56 PM Nathan Fisher 
> wrote:
>
>> I went through them earlier this week! Definitely helpful.
>>
>> Is it possible to expand the katas available in the lO section? Eg more
>> the write side windowing interactions?
>>
>> On Wed, May 13, 2020 at 11:36, Luke Cwik  wrote:
>>
>>> These are an excellent learning tool.
>>>
>>> On Tue, May 12, 2020 at 11:02 PM Pablo Estrada 
>>> wrote:
>>>
 Sharing Damon's email with the user@ list as well. Thanks Damon!

 On Tue, May 12, 2020 at 9:02 PM Damon Douglas 
 wrote:

> Hello Everyone,
>
> If you don't already know, there are helpful instructional tools for
> learning the Apache Beam SDKs called Beam Katas hosted on
> https://stepik.org.  Similar to traditional Kata
> , they are meant to be repeated
> as practice.  Before practicing the katas myself, I found myself
> copy/pasting code (Please accept my confession  ).  Now I find myself
> actually composing pipelines.  Just like kata forms, you find them 
> becoming
> part of you.  If you are interested, below are listed the current 
> available
> katas:
>
> 1.  Java - https://stepik.org/course/54530
>
> 2.  Python -  https://stepik.org/course/54532
>
> 3.  Go (in development) - https://stepik.org/course/70387
>
> If you are absolutely brand new to Beam and it scares you like it
> scared me, come talk to me.
>
> Best,
>
> Damon
>
 --
>> Nathan Fisher
>>  w: http://junctionbox.ca/
>>
> --
Nathan Fisher
 w: http://junctionbox.ca/


Re: Try Beam Katas Today

2020-05-14 Thread Austin Bennett
It looks like there are instructions online for writing exercises/Katas:
https://www.jetbrains.com/help/education/educator-start-guide.html

Do we have a guide for contributing and publication/releases occur
(publishing to Stepik)?  Although the code lives in the main repo
(therefore subject to those contrib guidelines), I think the
release/publication schedule is distinct?

This hopefully will help illustrate that we are able to contribute to Katas
(PRs welcome?), and not just consume them!



On Thu, May 14, 2020 at 1:41 AM Henry Suryawirawan 
wrote:

> Yeah certainly we can expand it further.
> There are more lessons that definitely can be added further.
>
> >Eg more the write side windowing interactions?
> Are you referring to Write IOs?
>
>
>
> On Wed, May 13, 2020 at 11:56 PM Nathan Fisher 
> wrote:
>
>> I went through them earlier this week! Definitely helpful.
>>
>> Is it possible to expand the katas available in the lO section? Eg more
>> the write side windowing interactions?
>>
>> On Wed, May 13, 2020 at 11:36, Luke Cwik  wrote:
>>
>>> These are an excellent learning tool.
>>>
>>> On Tue, May 12, 2020 at 11:02 PM Pablo Estrada 
>>> wrote:
>>>
 Sharing Damon's email with the user@ list as well. Thanks Damon!

 On Tue, May 12, 2020 at 9:02 PM Damon Douglas 
 wrote:

> Hello Everyone,
>
> If you don't already know, there are helpful instructional tools for
> learning the Apache Beam SDKs called Beam Katas hosted on
> https://stepik.org.  Similar to traditional Kata
> , they are meant to be repeated
> as practice.  Before practicing the katas myself, I found myself
> copy/pasting code (Please accept my confession  ).  Now I find myself
> actually composing pipelines.  Just like kata forms, you find them 
> becoming
> part of you.  If you are interested, below are listed the current 
> available
> katas:
>
> 1.  Java - https://stepik.org/course/54530
>
> 2.  Python -  https://stepik.org/course/54532
>
> 3.  Go (in development) - https://stepik.org/course/70387
>
> If you are absolutely brand new to Beam and it scares you like it
> scared me, come talk to me.
>
> Best,
>
> Damon
>
 --
>> Nathan Fisher
>>  w: http://junctionbox.ca/
>>
>


Re: Add options to CassandraIO

2020-05-14 Thread Etienne Chauchot

Hi Nathan,

Thanks for raising this, and thanks for the PR proposal.

I would recommend (as it was done in other IOs such as ElasticsearchIO) 
the third solution: you could add a method called 
withConnectTimeout(Integer) to both the Read and Write builders of the 
IO (there is no common conf object on this IO). Indeed, there is sort of 
a "no knob" philosophy of Beam to reduce to the minimum the conf 
parameters available to the users; hence the encapsulation.


Best

Etienne

On 14/05/2020 00:34, Nathan Fisher wrote:

Hi all,

I frequently test pipelines over a VPN link. As a result the default 
SocketOptions configuration results in timeout exceptions. I would 
like the ability to tweak the timeouts which requires the ability to 
get at Cassandras Cluster.Builder and set a custom socket option:


If I were to raise an issue/PR what would be preference of these options?

- expose only SocketOptions as a setter on the builder.
- allow setting the Cassandra Cluster.Builder on the IO builder.
- encapsulate the socketoptions behind additional methods on the beam 
IO builder.


Regards,
Nathan
--
Nathan Fisher
 w: http://junctionbox.ca/


Re: TextIO. Writing late files

2020-05-14 Thread Jose Manuel
Hi again,

I have simplify the example to reproduce the data loss. The scenario is the
following:

- TextIO write files.
- getPerDestinationOutputFilenames emits file names
- File names are processed by a aggregator (combine, distinct,
groupbyKey...) with a window **without allowlateness**
- File names are discarded as late

Here you can see the data loss in the picture in
https://github.com/kiuby88/windowing-textio/blob/master/README.md#showing-data-loss

Please, follow README to run the pipeline and find log traces that say data
are dropped as late.
Remember, you can run the pipeline with another window's  lateness values
(check README.md)

Kby.

El mar., 12 may. 2020 a las 17:16, Jose Manuel ()
escribió:

> Hi,
>
> I would like to clarify that while TextIO is writing every data are in the
> files (shards). The losing happens when file names emitted by
> getPerDestinationOutputFilenames are processed by a window.
>
> I have created a pipeline to reproduce the scenario in which some
> filenames are loss after the getPerDestinationOutputFilenames. Please, note
> I tried to simplify the code as much as possible, but the scenario is not
> easy to reproduce.
>
> Please check this project https://github.com/kiuby88/windowing-textio
> Check readme to build and run (
> https://github.com/kiuby88/windowing-textio#build-and-run)
> Project contains only a class with the pipeline PipelineWithTextIo,
> a log4j2.xml file in the resources and the pom.
>
> The pipeline in PipelineWithTextIo generates unbounded data using a
> sequence. It adds a little delay (10s) per data entry, it uses a distinct
> (just to apply the window), and then it writes data using TexIO.
> The windows for the distinct is fixed (5 seconds) and it does not use
> lateness.
> Generated files can be found in
> windowing-textio/pipe_with_lateness_0s/files. To write files the
> FileNamePolicy uses window + timing + shards (see
> https://github.com/kiuby88/windowing-textio/blob/master/src/main/java/org/kby/PipelineWithTextIo.java#L135
> )
> Files are emitted using getPerDestinationOutputFilenames()
> (see the code here,
> https://github.com/kiuby88/windowing-textio/blob/master/src/main/java/org/kby/PipelineWithTextIo.java#L71-L78
> )
>
> Then, File names in the PCollection are extracted and logged. Please, note
> file names dot not have pain information in that point.
>
> To apply a window a distinct is used again. Here several files are
> discarded as late and they are not processed by this second distinct.
> Please, see
>
> https://github.com/kiuby88/windowing-textio/blob/master/src/main/java/org/kby/PipelineWithTextIo.java#L80-L83
>
> Debug is enabled for WindowTracing, so you can find in the terminal
> several messages as the followiing:
> DEBUG org.apache.beam.sdk.util.WindowTracing - LateDataFilter: Dropping
> element at 2020-05-12T14:05:14.999Z for
> key:path/pipe_with_lateness_0s/files/[2020-05-12T14:05:10.000Z..2020-05-12T14:05:15.000Z)-ON_TIME-0-of-1.txt;
> window:[2020-05-12T14:05:10.000Z..2020-05-12T14:05:15.000Z) since too far
> behind inputWatermark:2020-05-12T14:05:19.799Z;
> outputWatermark:2020-05-12T14:05:19.799Z`
>
> What happen here? I think that messages are generated per second and a
> window of 5 seconds group them. Then a delay is added and finally data are
> written in a file.
> The pipeline reads more data, increasing the watermark.
> Then, file names are emitted without pane information (see "Emitted File"
> in logs). Window in second distinct compares file names' timestamp and the
> pipeline watermark and then it discards file names as late.
>
>
> Bonus
> -
> You can add a lateness to the pipeline. See
> https://github.com/kiuby88/windowing-textio/blob/master/README.md#run-with-lateness
>
> If a minute is added a lateness for window the file names are processed as
> late. As result the traces of LateDataFilter disappear.
>
> Moreover, in order to illustrate better that file names are emitted as
> late for the second discarded I added a second TextIO to write file names
> in other files.
> Same FileNamePolicy than before was used (window + timing + shards). Then,
> you can find files that contains the original filenames in
> windowing-textio/pipe_with_lateness_60s/files-after-distinct. This is the
> interesting part, because you will find several files with LATE in their
> names.
>
> Please, let me know if you need more information or if the example is not
> enough to check the expected scenarios.
>
> Kby.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> El dom., 10 may. 2020 a las 17:04, Reuven Lax ()
> escribió:
>
>> Pane info is supposed to be preserved across transforms. If the Fink
>> runner does not, than I believe that is a bug.
>>
>> On Sat, May 9, 2020 at 11:22 PM Jozef Vilcek 
>> wrote:
>>
>>> I am using FileIO and I do observe the drop of pane info information on
>>> Flink runner too. It was mentioned in this thread:
>>> https://www.mail-archive.com/dev@beam.apache.org/msg20186.html
>>>
>>> It is a result of different 

Re: Try Beam Katas Today

2020-05-14 Thread Henry Suryawirawan
Yeah certainly we can expand it further.
There are more lessons that definitely can be added further.

>Eg more the write side windowing interactions?
Are you referring to Write IOs?



On Wed, May 13, 2020 at 11:56 PM Nathan Fisher 
wrote:

> I went through them earlier this week! Definitely helpful.
>
> Is it possible to expand the katas available in the lO section? Eg more
> the write side windowing interactions?
>
> On Wed, May 13, 2020 at 11:36, Luke Cwik  wrote:
>
>> These are an excellent learning tool.
>>
>> On Tue, May 12, 2020 at 11:02 PM Pablo Estrada 
>> wrote:
>>
>>> Sharing Damon's email with the user@ list as well. Thanks Damon!
>>>
>>> On Tue, May 12, 2020 at 9:02 PM Damon Douglas 
>>> wrote:
>>>
 Hello Everyone,

 If you don't already know, there are helpful instructional tools for
 learning the Apache Beam SDKs called Beam Katas hosted on
 https://stepik.org.  Similar to traditional Kata
 , they are meant to be repeated as
 practice.  Before practicing the katas myself, I found myself copy/pasting
 code (Please accept my confession  ).  Now I find myself actually
 composing pipelines.  Just like kata forms, you find them becoming part of
 you.  If you are interested, below are listed the current available katas:

 1.  Java - https://stepik.org/course/54530

 2.  Python -  https://stepik.org/course/54532

 3.  Go (in development) - https://stepik.org/course/70387

 If you are absolutely brand new to Beam and it scares you like it
 scared me, come talk to me.

 Best,

 Damon

>>> --
> Nathan Fisher
>  w: http://junctionbox.ca/
>