Below is a project proposal from a technical writer (bcc'd) who wants to work with your organization on a Season of Docs project. Please assess the proposal and ensure that you have a mentor to work with the technical writer.
If you want to accept the proposal, please submit the technical writing project to the Season of Docs program administrators. The project selection form is at this link: <https://bit.ly/gsod-tw-projectselection>. The form is also available in the guide for organization administrators <https://developers.google.com/season-of-docs/docs/admin-guide#tech-writer-application-phase>. The deadline for project selections is July 31, 2020 at 20:00 UTC. For other program deadlines, please see the full timeline <https://developers.google.com/season-of-docs/docs/timeline> on the Season of Docs website. If you have any questions about the program, please email the Season of Docs team at [email protected]. Best, The Google Season of Docs team Title: Update of the runner comparison page / capability matrix Project length: Standard length (3 months) Writer information *Name:* Sruthi Sree Kumar *Email:* [email protected] *Résumé/CV:* https://drive.google.com/file/d/12RtM7Obz2Fog-AcIJAX1kLCKPPytY2Hq/view?usp=sharing *Sample:* https://medium.com/big-data-processing *Additional information:* I, Sruthi Sree Kumar, is a dual degree master student in Cloud Computing and services. Currently, I am writing my master thesis on Apache Flink state management API with Continuous Deep Analytics research group at Research Institute of Sweden(RISE). Before my masters, I have 4 years of work experience as a backend developer. I would like to participate in the season of docs since I have found projects that are related to my current work, area of interest and future career path. Currently, I have been an active user of open source projects such as Apache Beam and Apache Flink. Having said that, I also started a technical blog earlier this year which has contents focussing on algorithms/concepts in distributed systems and distributed processing systems. Project Description Apache Beam is a unified platform for defining both batch and stream processing pipelines. Apache Beam lets you define a model to represent and transform datasets irrespective of any specific data processing platform. Once defined, you can run it on any of the supported run-time frameworks (runners) which includes Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Apache Beam also comes with different SDK’s which let you write your pipeline in programming languages such as Java, python and GO. I am submitting my application for the GSOD on “Update of the runner comparison page/capability matrix”. As Apache Beam supports multiple runners and SDK, a new user will be confused to choose between them. The current documentation of different runners gives a very brief overview of the runner. My idea is to add more comprehend details of each runner on the particular runner documentation page. Also, I would like to update the description of the example word count project to add a detailed explanation. For this, my plan is to try every word count example locally in my machine and find out if some steps are missing and add more explanation on the process. Another thing which I have noticed is that the documentation for the runners does not follow any pattern(Few has got an overview section while others start with how to use or the prerequisite or some random title). I will update all of them to follow a single simple pattern. I plan to add a new page to describe each runner and provide a descriptive narration to each of them[BEAM-3220]. From this page, users can redirect to the detailed description page of each runner and the capability matrix. I also plan to add a descriptive comparison of each runner here. Currently, I am using Beam NEXMark for benchmarking Flink runners for my master thesis. As I am completely aware of NEXMark benchmarking, I would like to include the benchmarking results of each runner in both batch and streaming mode here(BEAM-2944). I would also update the NEXMark documentation if I find out any parameters/ configuration are missing/removed. Before when I was using Flink runner I was stuck initially as one of the parameters was missing in the documentation [ https://lists.apache.org/thread.html/re71e8298e0c13180a4ab0ac6a65e808e1d82ce85e955778cf1089553%40%3Cuser.beam.apache.org%3E]. But now as I am more familiar with the NEXMark code base as well it would be easier for me to benchmark the runners and add the metrics. In this same page, I would like to include a brief summary of the production readiness of each runner. In the current documentation, the support for classic/portable runner is included in each runner description page. I think it's also better to bring them all at one place, either in the capability matrix or in the newly added description page. Also, currently, the portability support is maintained in a separate google sheet which I would like to merge to the compatibility matrix. https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0). As part of this task, I plan to include all the major/minor corrections which are mentioned in BEAM-2888. I consider GSoD as an opportunity to step into open source contributions. I will continue to contribute to open source projects especially Beam and would like to continue as an active community member. As Apache Beam has got an active community with continuous features being developed, I think there is always a scope to improve the documentation to make it updated. Also, I would like to contribute to the development work as well. If I have sound knowledge in Beam, I can also help the user community as I always got help from the community when I started with Beam. I believe that I am the right person for this project because: 1. I am a distributed systems enthusiast who is trying to understand the internals of data processing systems. 2. I have experience in working with Apache Beam and Apache Flink as a user. 3. I have already understood Apache Beam and Apache Flink code base as a developer. 4. I have done a project to compare different beam runners. 5. I have experience in writing technical blogs to explain concepts of big data processing and distributed systems. 6. Currently, I am working on my master thesis to improve the performance of Apache Flink state backend for which I am using Apache Beam NEXMark implementation for benchmarking and I have contributed to updating Apache Beam documentation. 7. As I have 4 years of work experience as a software developer, I have written multiple technical design documents and product documentation and Readme files(which I do not have access right now). 8. I write documentation in such a way that anyone without previous knowledge will understand it at first glance. {{EXTRA16}} {{EXTRA17}}
