To summarize, I tend to Option 2 "Language First" in case we could find a way to eliminate documentation duplication.
On Wed, Mar 23, 2022 at 5:02 PM Dian Fu <[email protected]> wrote: > Hi Konstantin, > > Thanks a lot for bringing up this discussion. > > Currently, the Python documentation is more like a mixture of Option 1 and > Option 2. It contains two parts: > 1) The first part is the independent page [1] which could be seen as the > main entrypoint for Python users. > 2) The second part is the Python tabs which are among the DataStream API / > Table API pages. > > The motivation to provide an independent page for Python documentation is > as follows: > 1) We are trying to create a Pythonic documentation for Python users (we > are still far away from that and I have received much feedback saying that > the Python documentation and API is too Java-like). However, to avoid > duplication, it will link to the DataStream API / Table API pages when > necessary instead of copying content. There are indeed exceptions, e.g. the > window example given by Jark, that's because it only provides a very > limited window support in Python DataStream API at present and to give > Python users a complete picture of what they can do in Python DataStream > API, we have added a dedicated page. We are trying to finalize the window > support in 1.16 [2] and remove the duplicate documentation. > 2) There are some kinds of documentations which are only applicable for > Python language, e.g. dependency management[2], conversion between Table > and Pandas DataFrame [3], etc. Providing an independent page helps to > provide a place to hold all these kinds of documentation together. > > Regarding Option 1: "Language Tabs", this makes it hard to create Pythonic > documentation for Python users. > Regarding Option 2: "Language First", it may mean a lot of duplications. > Currently, there are a lot of descriptions in the DataStream API / Table > API pages which are shared between Java/Scala/Python. > > > In the rest of the documentation, Python is sometimes > > included like in this Table API page [2] and sometimes ignored like on > the > > project setup pages [3]. > I agree that this is something that we need to improve. > > Regards, > Dian > > [1] > https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/python/overview/ > [2] https://issues.apache.org/jira/browse/FLINK-26477 > [2] > https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/python/dependency_management/ > [3] > https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/python/table/conversion_of_pandas/ > > On Wed, Mar 23, 2022 at 4:17 PM Jark Wu <[email protected]> wrote: > >> Hi Konstantin, >> >> Thanks for starting this discussion. >> >> From my perspective, I prefer the "Language Tabs" approach. >> But maybe we can improve the tabs to move to the sidebar or top menu, >> which allows users to first decide on their language and then the API. >> IMO, programming languages are just like spoken languages which can be >> picked in the sidebar. >> What I want to avoid is the duplicate docs and in-complete features in a >> specific language. >> "Language First" may confuse users about what is and where to find the >> complete features provided by flink. >> >> For example, there are a lot of duplications in the "Window" pages[1] and >> "Python Window" pages[2]. >> And users can't have a complete overview of Flink's window mechanism from >> the Python API part. >> Users have to go through the Java/Scala DataStream API first to build the >> overall knowledge, >> and then to read the Python API part. >> >> > * Second, most of the Flink Documentation currently is using a "Language >> Tabs" approach, but this might become obsolete in the long-term anyway as >> we move more and more in a Scala-free direction. >> >> The Scala-free direction means users can pick arbitrary Scala versions, >> not >> drop the Scala API. >> So the "Language Tabs" is still necessary and helpful for switching >> languages. >> >> Best, >> Jark >> >> [1]: >> >> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/datastream/operators/windows/ >> [2]: >> >> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/ >> >> >> >> >> >> >> >> On Tue, 22 Mar 2022 at 21:40, Konstantin Knauf <[email protected]> wrote: >> >> > Hi everyone, >> > >> > I would like to discuss a particular aspect of our documentation: the >> > top-level structure with respect to languages and APIs. The current >> > structure is inconsistent and the direction is unclear to me, which >> makes >> > it hard for me to contribute gradual improvements. >> > >> > Currently, the Python documentation has its own independent branch in >> the >> > documentation [1]. In the rest of the documentation, Python is sometimes >> > included like in this Table API page [2] and sometimes ignored like on >> the >> > project setup pages [3]. Scala and Java on the other hand are always >> > documented in parallel next to each other in tabs. >> > >> > The way I see it, most parts (application development, connectors, >> getting >> > started, project setup) of our documentation have two primary >> dimensions: >> > API (DataStream, Table API), Language (Python, Java, Scala) >> > >> > In addition, there is SQL, for which the language is only a minor factor >> > (UDFs), but which generally requires a different structure (different >> > audience, different tools). On the other hand, SQL and Table API have >> some >> > conceptual overlap, whereas I doubt these concepts are of big interest >> > to SQL users. So, to me SQL should be treated separately in any case >> with >> > links to the Table API documentation for some concepts. >> > >> > I think, in general, both approaches can work: >> > >> > >> > *Option 1: "Language Tabs"* >> > Application Development >> > > DataStream API (Java, Scala, Python) >> > > Table API (Java, Scala, Python) >> > > SQL >> > >> > >> > *Option 2: "Language First" * >> > Java Development Guide >> > > Getting Started >> > > DataStream API >> > > Table API >> > Python Development Guide >> > > Getting Started >> > > Datastream API >> > > Table API >> > SQL Development Guide >> > >> > I don't have a strong opinion on this, but tend towards "Language >> First". >> > >> > * First, I assume, users actually first decide on their language/tools >> of >> > choice and then move on to the API. >> > >> > * Second, most of the Flink Documentation currently is using a "Language >> > Tabs" approach, but this might become obsolete in the long-term anyway >> as >> > we move more and more in a Scala-free direction. >> > >> > For the connectors, I think, there is a good argument for "Language & >> API >> > Embedded", because documenting every connector for each API and language >> > separately would result in a lot of duplication. Here, I would go one >> step >> > further then what we have right now and target >> > >> > Connectors >> > -> Kafka (All APIs incl. SQL, All Languages) >> > -> Kinesis (same) >> > -> ... >> > >> > This also results in a quick overview for users about which connectors >> > exist and plays well with our plan of externalizing connectors. >> > >> > For completeness & scope of the discussion: there are two outdated >> FLIPs on >> > documentation (42, 60), which both have not been implemented, are >> partially >> > contradicting each other and are generally out-of-date. I specifically >> > don't intend to add another FLIP to this graveyard, but still reach a >> > consensus on the high-level direction. >> > >> > What do you think? >> > >> > Cheers, >> > >> > Konstantin >> > >> > -- >> > >> > Konstantin Knauf >> > >> > https://twitter.com/snntrable >> > >> > https://github.com/knaufk >> > >> >
