Very nice post, Till! We are starting to get much better with this...
On Sat, Mar 21, 2015 at 6:45 PM, Henry Saputra <henry.sapu...@gmail.com> wrote: > Awesome, thanks Till > > On Saturday, March 21, 2015, Till Rohrmann <trohrm...@apache.org> wrote: > > > I wrote some internal documentation for Akka and the distributed > > communication [1]. > > > > Cheers, > > > > Till > > > > [1] https://cwiki.apache.org/confluence/display/FLINK/Akka+and+Actors > > > > On Fri, Mar 20, 2015 at 7:31 PM, Henry Saputra <henry.sapu...@gmail.com > > <javascript:;>> > > wrote: > > > > > Ah the Tweet infra bot just announce extended downtime for Confluence > [1] > > > > > > - Henry > > > > > > [1] https://twitter.com/infrabot/status/578983473970475008 > > > > > > On Fri, Mar 20, 2015 at 11:27 AM, Stephan Ewen <se...@apache.org > > <javascript:;>> wrote: > > > > For me as well. Earlier today it said "down for maintenance" > > > > > > > > On Fri, Mar 20, 2015 at 7:14 PM, Kostas Tzoumas <ktzou...@apache.org > > <javascript:;>> > > > wrote: > > > > > > > >> it's down for me as well > > > >> > > > >> On Fri, Mar 20, 2015 at 7:12 PM, Henry Saputra < > > henry.sapu...@gmail.com <javascript:;> > > > > > > > >> wrote: > > > >> > > > >> > Is the wiki down for any of you? > > > >> > > > > >> > I can't access > > > >> > > https://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Home > > > >> > > > > >> > 404 > > > >> > > > > >> > - Henry > > > >> > > > > >> > On Fri, Mar 20, 2015 at 4:46 AM, Kostas Tzoumas < > > ktzou...@apache.org <javascript:;>> > > > >> > wrote: > > > >> > > I added a document for data exchange between tasks: > > > >> > > > > > >> > > > > >> > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks > > > >> > > > > > >> > > Feel free to edit. I plan to link the class names to the class > > > files in > > > >> > > github. > > > >> > > > > > >> > > On Tue, Mar 17, 2015 at 11:17 AM, Kostas Tzoumas < > > > ktzou...@apache.org <javascript:;>> > > > >> > > wrote: > > > >> > > > > > >> > >> +1 for the Wiki. > > > >> > >> > > > >> > >> When these have been stabilized we can move them to the docs if > > we > > > >> > decide > > > >> > >> to do so. > > > >> > >> > > > >> > >> On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen < > se...@apache.org > > <javascript:;>> > > > >> > wrote: > > > >> > >> > > > >> > >>> I have put my suggested version of an outline for the docs > into > > > the > > > >> > wiki. > > > >> > >>> Regardless where the docs end up (wiki or repository), we can > > use > > > the > > > >> > wiki > > > >> > >>> to outline the docs. > > > >> > >>> > > > >> > >>> > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals > > > >> > >>> > > > >> > >>> Some pages contain some stub or outline, others are completely > > > blank. > > > >> > >>> > > > >> > >>> Not a comple list. Additions are welcome. > > > >> > >>> > > > >> > >>> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen < > > se...@apache.org <javascript:;>> > > > >> > wrote: > > > >> > >>> > > > >> > >>> > I think the Wiki has a much lower barrier of entry to fix > > docs, > > > >> > >>> especially > > > >> > >>> > for external people. The docs, with the Jekyll setup, is > > rather > > > >> > tricky. > > > >> > >>> > I would very much like that all kinds of people contribute > to > > > the > > > >> > docs > > > >> > >>> > about the internals, not just the usual three suspects that > > have > > > >> done > > > >> > >>> this > > > >> > >>> > so far. > > > >> > >>> > > > > >> > >>> > Having a good landing page in the regular docs is exactly to > > not > > > >> > loose > > > >> > >>> all > > > >> > >>> > the people that do not look into a wiki. The overview pages > > for > > > the > > > >> > >>> > internals need to be good and accessible and nicely link to > > the > > > >> wiki > > > >> > to > > > >> > >>> > "forward" people there. > > > >> > >>> > > > > >> > >>> > The overhead of deciding what goes where should not be > > terribly > > > >> > large, > > > >> > >>> in > > > >> > >>> > my opinion, since there is no really "wrong" place to put > it. > > > >> > >>> > > > > >> > >>> > > > > >> > >>> > > > > >> > >>> > On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek < > > > >> > aljos...@apache.org <javascript:;>> > > > >> > >>> > wrote: > > > >> > >>> > > > > >> > >>> >> Why do you wan't to split stuff between the doc in the > > > repository > > > >> > and > > > >> > >>> >> the wiki. I for one would always be to lazy to check stuff > > in a > > > >> wiki > > > >> > >>> >> when there is also a documentation. Plus, this would lead > to > > > >> > >>> >> additional overhead in deciding what goes where and syncing > > > >> between > > > >> > >>> >> the two places for documentation. > > > >> > >>> >> > > > >> > >>> >> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen < > > > se...@apache.org <javascript:;>> > > > >> > >>> wrote: > > > >> > >>> >> > Ah, I totally forgot to add to the internals: > > > >> > >>> >> > > > > >> > >>> >> > - Fault tolerance in Batch mode > > > >> > >>> >> > > > > >> > >>> >> > - Fault Tolerance in Streaming Mode, with state > handling > > > >> > >>> >> > > > > >> > >>> >> > On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen < > > > se...@apache.org <javascript:;> > > > >> > > > > >> > >>> wrote: > > > >> > >>> >> > > > > >> > >>> >> >> Hi all! > > > >> > >>> >> >> > > > >> > >>> >> >> I would like to kick of an effort to improve the > > > documentation > > > >> of > > > >> > >>> the > > > >> > >>> >> >> Flink Architecture and internals. This also means making > > the > > > >> > >>> streaming > > > >> > >>> >> >> architecture more prominent in the docs. > > > >> > >>> >> >> > > > >> > >>> >> >> Being quite a sophisticated stack, we need to improve > the > > > >> > >>> presentation > > > >> > >>> >> of > > > >> > >>> >> >> how Flink works - to an extend necessary to use Flink > (and > > > to > > > >> > >>> >> appreciate > > > >> > >>> >> >> all the cool stuff that is happening). This should also > > > come in > > > >> > >>> handy > > > >> > >>> >> with > > > >> > >>> >> >> new contributors. > > > >> > >>> >> >> > > > >> > >>> >> >> As a general umbrella, we need to first decide where and > > > how to > > > >> > >>> >> organize > > > >> > >>> >> >> the documentation. > > > >> > >>> >> >> > > > >> > >>> >> >> I would propose to put the bulk of the documentation > into > > > the > > > >> > Wiki. > > > >> > >>> >> Create > > > >> > >>> >> >> a dedicated section on Flink Internals and sub-pages for > > > each > > > >> > >>> >> component / > > > >> > >>> >> >> topic. To the docs, we add a general overview from which > > we > > > >> link > > > >> > >>> into > > > >> > >>> >> the > > > >> > >>> >> >> Wiki. > > > >> > >>> >> >> > > > >> > >>> >> >> > > > >> > >>> >> >> == These sections would go into the DOCS in the git > > > repository > > > >> > == > > > >> > >>> >> >> > > > >> > >>> >> >> - Overview of Program, pre-flight phase (type > > extraction, > > > >> > >>> optimizer), > > > >> > >>> >> >> JobManager, TaskManager. Differences between streaming > and > > > >> > batch. We > > > >> > >>> >> can > > > >> > >>> >> >> realize this through one very nice picture with few > lines > > of > > > >> > text. > > > >> > >>> >> >> > > > >> > >>> >> >> - High level architecture stack, different program > > > >> > representations > > > >> > >>> >> (API > > > >> > >>> >> >> operators, common API DAG, optimizer DAG, parallel data > > flow > > > >> > >>> (JobGraph > > > >> > >>> >> / > > > >> > >>> >> >> Execution Graph) > > > >> > >>> >> >> > > > >> > >>> >> >> - (maybe) Parallelism and scheduling. This seems to be > > > >> > paramount > > > >> > >>> to > > > >> > >>> >> >> understand for users. > > > >> > >>> >> >> > > > >> > >>> >> >> - Processes (JobManager, TaskManager, Webserver, > > > WebClient, > > > >> CLI > > > >> > >>> >> client) > > > >> > >>> >> >> > > > >> > >>> >> >> > > > >> > >>> >> >> > > > >> > >>> >> >> == These sections would go into the WIKI == > > > >> > >>> >> >> > > > >> > >>> >> >> - Project structure (maven projects, what is where, > > > >> > dependencies > > > >> > >>> >> between > > > >> > >>> >> >> projects) > > > >> > >>> >> >> > > > >> > >>> >> >> - Component overview > > > >> > >>> >> >> > > > >> > >>> >> >> -> JobManager (InstanceManager, Scheduler, BLOB > > server, > > > >> > Library > > > >> > >>> >> Cache, > > > >> > >>> >> >> Archiving) > > > >> > >>> >> >> > > > >> > >>> >> >> -> TaskManager (MemoryManager, IOManager, BLOB > Cache, > > > >> Library > > > >> > >>> >> Cache) > > > >> > >>> >> >> > > > >> > >>> >> >> -> Involved Actor Systems / Actors / Messages > > > >> > >>> >> >> > > > >> > >>> >> >> - Details about submitting a job (library upload, job > > > graph > > > >> > >>> >> submission, > > > >> > >>> >> >> execution graph setup, scheduling trigger) > > > >> > >>> >> >> > > > >> > >>> >> >> - Memory Management > > > >> > >>> >> >> > > > >> > >>> >> >> - Optimizer internals > > > >> > >>> >> >> > > > >> > >>> >> >> - Akka Setup specifics > > > >> > >>> >> >> > > > >> > >>> >> >> - Netty and pluggable data exchange strategies > > > >> > >>> >> >> > > > >> > >>> >> >> - Testing: Flink test clusters and unit test utilities > > > >> > >>> >> >> > > > >> > >>> >> >> - Developer How-To: Setting up Eclipse, IntelliJ, > Travis > > > >> > >>> >> >> > > > >> > >>> >> >> - Step-by-step guide to add a new operator > > > >> > >>> >> >> > > > >> > >>> >> >> > > > >> > >>> >> >> I will go ahead and stub some sections in the Wiki. > > > >> > >>> >> >> > > > >> > >>> >> >> As we discuss and agree/disagree with the outline, we > can > > > >> evolve > > > >> > the > > > >> > >>> >> Wiki. > > > >> > >>> >> >> > > > >> > >>> >> >> Greetings, > > > >> > >>> >> >> Stephan > > > >> > >>> >> >> > > > >> > >>> >> >> > > > >> > >>> >> > > > >> > >>> > > > > >> > >>> > > > > >> > >>> > > > >> > >> > > > >> > >> > > > >> > > > > >> > > > > > >