To summarize, I tend to Option 2 "Language First" in case we could find a
way to eliminate documentation duplication.

On Wed, Mar 23, 2022 at 5:02 PM Dian Fu <dian0511...@gmail.com> wrote:

> Hi Konstantin,
>
> Thanks a lot for bringing up this discussion.
>
> Currently, the Python documentation is more like a mixture of Option 1 and
> Option 2. It contains two parts:
> 1) The first part is the independent page [1] which could be seen as the
> main entrypoint for Python users.
> 2) The second part is the Python tabs which are among the DataStream API /
> Table API pages.
>
> The motivation to provide an independent page for Python documentation is
> as follows:
> 1) We are trying to create a Pythonic documentation for Python users (we
> are still far away from that and I have received much feedback saying that
> the Python documentation and API is too Java-like). However, to avoid
> duplication, it will link to the DataStream API / Table API pages when
> necessary instead of copying content. There are indeed exceptions, e.g. the
> window example given by Jark, that's because it only provides a very
> limited window support in Python DataStream API at present and to give
> Python users a complete picture of what they can do in Python DataStream
> API, we have added a dedicated page. We are trying to finalize the window
> support in 1.16 [2] and remove the duplicate documentation.
> 2) There are some kinds of documentations which are only applicable for
> Python language, e.g. dependency management[2], conversion between Table
> and Pandas DataFrame [3], etc. Providing an independent page helps to
> provide a place to hold all these kinds of documentation together.
>
> Regarding Option 1: "Language Tabs", this makes it hard to create Pythonic
> documentation for Python users.
> Regarding Option 2: "Language First", it may mean a lot of duplications.
> Currently, there are a lot of descriptions in the DataStream API / Table
> API pages which are shared between Java/Scala/Python.
>
> > In the rest of the documentation, Python is sometimes
> > included like in this Table API page [2] and sometimes ignored like on
> the
> > project setup pages [3].
> I agree that this is something that we need to improve.
>
> Regards,
> Dian
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/python/overview/
> [2] https://issues.apache.org/jira/browse/FLINK-26477
> [2]
> https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/python/dependency_management/
> [3]
> https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/python/table/conversion_of_pandas/
>
> On Wed, Mar 23, 2022 at 4:17 PM Jark Wu <imj...@gmail.com> wrote:
>
>> Hi Konstantin,
>>
>> Thanks for starting this discussion.
>>
>> From my perspective, I prefer the "Language Tabs" approach.
>> But maybe we can improve the tabs to move to the sidebar or top menu,
>> which allows users to first decide on their language and then the API.
>> IMO, programming languages are just like spoken languages which can be
>> picked in the sidebar.
>> What I want to avoid is the duplicate docs and in-complete features in a
>> specific language.
>> "Language First" may confuse users about what is and where to find the
>> complete features provided by flink.
>>
>> For example, there are a lot of duplications in the "Window" pages[1] and
>> "Python Window" pages[2].
>> And users can't have a complete overview of Flink's window mechanism from
>> the Python API part.
>> Users have to go through the Java/Scala DataStream API first to build the
>> overall knowledge,
>> and then to read the Python API part.
>>
>> > * Second, most of the Flink Documentation currently is using a "Language
>> Tabs" approach, but this might become obsolete in the long-term anyway as
>> we move more and more in a Scala-free direction.
>>
>> The Scala-free direction means users can pick arbitrary Scala versions,
>> not
>> drop the Scala API.
>> So the "Language Tabs" is still necessary and helpful for switching
>> languages.
>>
>> Best,
>> Jark
>>
>> [1]:
>>
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/datastream/operators/windows/
>> [2]:
>>
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/
>>
>>
>>
>>
>>
>>
>>
>> On Tue, 22 Mar 2022 at 21:40, Konstantin Knauf <kna...@apache.org> wrote:
>>
>> > Hi everyone,
>> >
>> > I would like to discuss a particular aspect of our documentation: the
>> > top-level structure with respect to languages and APIs. The current
>> > structure is inconsistent and the direction is unclear to me, which
>> makes
>> > it hard for me to contribute gradual improvements.
>> >
>> > Currently, the Python documentation has its own independent branch in
>> the
>> > documentation [1]. In the rest of the documentation, Python is sometimes
>> > included like in this Table API page [2] and sometimes ignored like on
>> the
>> > project setup pages [3]. Scala and Java on the other hand are always
>> > documented in parallel next to each other in tabs.
>> >
>> > The way I see it, most parts (application development, connectors,
>> getting
>> > started, project setup) of our documentation have two primary
>> dimensions:
>> > API (DataStream, Table API), Language (Python, Java, Scala)
>> >
>> > In addition, there is SQL, for which the language is only a minor factor
>> > (UDFs), but which generally requires a different structure (different
>> > audience, different tools). On the other hand, SQL and Table API have
>> some
>> > conceptual overlap, whereas I doubt these concepts are of big interest
>> > to SQL users. So, to me SQL should be treated separately in any case
>> with
>> > links to the Table API documentation for some concepts.
>> >
>> > I think, in general, both approaches can work:
>> >
>> >
>> > *Option 1: "Language Tabs"*
>> > Application Development
>> > > DataStream API  (Java, Scala, Python)
>> > > Table API (Java, Scala, Python)
>> > > SQL
>> >
>> >
>> > *Option 2: "Language First" *
>> > Java Development Guide
>> > > Getting Started
>> > > DataStream API
>> > > Table API
>> > Python Development Guide
>> > > Getting Started
>> > > Datastream API
>> > > Table API
>> > SQL Development Guide
>> >
>> > I don't have a strong opinion on this, but tend towards "Language
>> First".
>> >
>> > * First, I assume, users actually first decide on their language/tools
>> of
>> > choice and then move on to the API.
>> >
>> > * Second, most of the Flink Documentation currently is using a "Language
>> > Tabs" approach, but this might become obsolete in the long-term anyway
>> as
>> > we move more and more in a Scala-free direction.
>> >
>> > For the connectors, I think, there is a good argument for "Language &
>> API
>> > Embedded", because documenting every connector for each API and language
>> > separately would result in a lot of duplication. Here, I would go one
>> step
>> > further then what we have right now and target
>> >
>> > Connectors
>> > -> Kafka (All APIs incl. SQL, All Languages)
>> > -> Kinesis (same)
>> > -> ...
>> >
>> > This also results in a quick overview for users about which connectors
>> > exist and plays well with our plan of externalizing connectors.
>> >
>> > For completeness & scope of the discussion: there are two outdated
>> FLIPs on
>> > documentation (42, 60), which both have not been implemented, are
>> partially
>> > contradicting each other and are generally out-of-date. I specifically
>> > don't intend to add another FLIP to this graveyard, but still reach a
>> > consensus on the high-level direction.
>> >
>> > What do you think?
>> >
>> > Cheers,
>> >
>> > Konstantin
>> >
>> > --
>> >
>> > Konstantin Knauf
>> >
>> > https://twitter.com/snntrable
>> >
>> > https://github.com/knaufk
>> >
>>
>

Reply via email to