*First of all why ASF ownership? * For the project of this size maintaining high quality (it is not hard to use stubgen or monkeytype, but resulting annotations are rather simplistic) annotations independent of the actual codebase is far from trivial. For starters, changes which are mostly transparent to the final user (like pyspark.ml changes in 3.0 / 3.1) might require significant changes in the annotations. Additionally some signature changes are rather hard to track and such separation can easily lead to divergence.
Additionally, annotations are as much about describing facts, as showing intended usage (the simplest use case is documenting argument dependencies). This makes process of annotation rather subjective and requires good understanding of author's intention. Finally, annotation-friendly signatures require conscious decisions (see for example https://github.com/python/mypy/issues/5621). Overall, ASF ownership is probably the best way to ensure long-term sustainability and quality of annotations. *Now, why separate repo?* Based on the discussion so far it is clear that there is no consensus about using inline annotations. There are three other options: * Stub files packaged alongside actual code. * Separate project within root, packaged separately. * Separate repository, packaged separately. As already pointed out here and in the comments to https://github.com/apache/spark/pull/29180, annotations are still somewhat unstable. Ecosystem evolves quickly and new features, some having potential for fundamental change in the way how we annotate code. Therefore, it might be beneficial to maintain subproject (out of lack of a better word), that can evolve faster than the code that is annotate. While I have no strong opinion about this part, it is definitely a relatively unobtrusive way of bringing code and annotations closer together. On 8/4/20 7:44 PM, Sean Owen wrote: > Maybe more specifically, why an ASF repo? > > On Tue, Aug 4, 2020 at 11:45 AM Felix Cheung <felixcheun...@hotmail.com> > wrote: >> What would be the reason for separate git repo? >> >> ________________________________ >> From: Hyukjin Kwon <gurwls...@gmail.com> >> Sent: Monday, August 3, 2020 1:58:55 AM >> To: Maciej Szymkiewicz <mszymkiew...@gmail.com> >> Cc: Driesprong, Fokko <fo...@driesprong.frl>; Holden Karau >> <hol...@pigscanfly.ca>; Spark Dev List <dev@spark.apache.org> >> Subject: Re: [PySpark] Revisiting PySpark type annotations >> >> Okay, seems like we can create a separate repo as apache/spark? e.g.) >> https://issues.apache.org/jira/browse/INFRA-20470 >> We can also think about porting the files as are. >> I will try to have a short sync with the author Maciej, and share what we >> discussed offline. >> -- Best regards, Maciej Szymkiewicz Web: https://zero323.net Keybase: https://keybase.io/zero323 Gigs: https://www.codementor.io/@zero323 PGP: A30CEF0C31A501EC
signature.asc
Description: OpenPGP digital signature