Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code
On 06/30/2014 04:19 PM, Richard Eisenberg wrote: Thanks, Johan, for starting this discussion. I mostly agree with the proposal. However, one (at times, serious) drawback to using Haddock is that it means that editing comments can cause parse failures. The way the GHC build works, these failures may not be detected until the end of a hacking session (if I'm using, say, `make 2`, as I tend to do) and then can be hard to diagnose. I've actually been bitten by this when working on GHC. You should never get a parse failure anymore, as of 2.14.x versions. At worst, the comment will render in an ugly way but it will no longer cause failures due to typos in comment syntax. What might cause failures is if you put a Haddock comment where GHC doesn't expect it to. This should be fixed in GHC parser. If you don't want to mess around with pretty comments, all you have to do is to turn your function comments from ‘-- ’ to ’-- |’ and suddenly everyone else can benefit without source-diving. So, I have to ask: why use Haddock? Do folks read the Haddock docs for GHC? (I don't, but perhaps that's because the docs aren't so good right now.) Would it be acceptable to change this proposal not to require Haddock docs? I read the generated Haddock docs. The advantage of Haddock over something else here is that you can look at class instances, have clickable links and so on. Having to navigate on Haddock pages and reading the function docs elsewhere would be cluttered. Even if we decide to keep this proposal about Haddock docs specifically, I would strongly request that correct rendering of the Haddock docs not be scrutinized. At the end of a hacking session (which is hard enough to find time for, as is), I don't want to be expected to look through the generated HTML to make sure that my typewriter font and italics are rendering correctly. This is something of a corollary to Simon's comment about wanting to refer to Notes from Haddock comments -- I would want the Haddock output to be quite secondary to the proper documentation in the source code. (Note that this demotion of the role of Haddock is certainly not my practice in released libraries! But, Haddock is much less useful in an application like GHC than in a library.) GHC is also a library, I think it's unreasonable to expect people to jump into source and manually tracking down all the comments (if any) when they want to use GHC API. All that said, I do agree with the intent of this proposal and am happy to take on my part of the burden of documenting new (and perhaps some old) functions as I work. I have been very guilty of the broken window effect in not documenting new code. Thanks, Richard -- Mateusz K. ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code
Agreed, let's do it. Thanks for the well-argued proposal. Next up: consistent style :-) Cheers, Simon On 27/06/2014 10:51, Johan Tibell wrote: Hi! I found myself exploring new parts of the GHC code base the last few weeks (exciting!), which again reminded me of my biggest frustration when working on GHC: the lack of per-function/type (Haddock) comments. GHC code is sometimes commented with notes, which are great but tend to (1) mostly cover the exceptional cases and (2) talk about the implementation of a function, not how a caller might use it or why. Lack of documentation, in GHC and other software projects, usually has (at least) two causes: * Programmers comment code they think is complex enough to warrant a comment. The problem is that the author is usually a poor judge of what's complex enough, because he/she is too familiar with the code and tends to under-document code when following this principle. * Documenting is boring and tends to have little benefit the person writing to documentation. Given lack of incentives we tend to document less than we ought to. I've only seen one successful way to combat the lack of documentation that stems from the above: have the project's style guide mandate that top-level functions and types (or at least those that are exported) have documentation. This works well at Google. Anecdote: we have one code base inside Google that was until recently exempt from this rule and documentation is almost completely absent in that code base, even though hundreds of engineers work on and need to understand it every day. This breeds institutional knowledge problems i.e. if the author of a core piece of code leaves, lots of knowledge is lost. *Proposal: *I propose that we require that new top-level functions and types have Haddock comments, even if they start out as a single, humble sentence. I've found that putting even that one sentence (1) helps new users and (2) establishes a place for improvements to be made. There's a strong broken window effect to lack of comments, in that lack of comments breeds more lack of comments as developers follow established practices. We should add this requirement to the style guide. Having it as a written down policy tends to prevent having to re-hash the whole argument about documentation over and over again. This has also helped us a lot at Google, because programmers can spend endless amount of time arguing about comments, placement of curly braces, etc. and having a written policy helps cut down on that. To give an idea of how to write good comments, here are two examples of undocumented code I ran into in GHC and how better comments would have helped. *First example* In compiler/nativeGen/X86/Instr.hs there's a (local) function called mkRUR, which is a helper function use when computing instruction register usage. The first question that I asked upon seeing uses of that function was what does RUR stand for? Given the context the function is in, I guessed it stands for read-update-read, because R is used to mean read in the enclosing function and updating is related to reading so that must be what U stands for. It turns out that it stands for RegUsageReadonly. Here's a comment that would have captured, in a single sentence, what this function is for: -- | Create register usage info for instruction that only -- reads registers. mkRUR src = src' `seq` RU src' [] where src' = filter (interesting platform) src That already a big improvement. A note about the register filtering, which means that not all registers you pass to the function will be recorded as being read in the end, could also be useful. Aside: providing a type signature, which would have made it clear that the return type is RU, might also have helped in this particular case. *Second example* In the same file there a function called x86_regUsageOfInstr. It's the function that encloses the local function mkRUR above. I could figure out that this function has something to do with register usage, of the instruction passed as an argument, and that register usage is important for the register allocator. However, trying to understand in more detail what that meant was more of challenge than it needed to be. First, a comment more clearly explaining what computing register usage means in practice would be helpful: -- | Returns which registers are read and written by this -- instruction, as a (read, written) pair. This info is used -- by the register allocator. x86_regUsageOfInstr :: Platform - Instr - RegUsage The reason mentioning that the return value is essentially a (read, written) pair is helpful is because the body of the function a big case statement full of lines like this one: GCMP _ src1 src2 - mkRUR [src1,src2] ... FDIV _ src dst - usageRM src dst It's not immediately clear that all the various helper functions used here just end up computing a pair of the above form.
Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code
That has a high chance of backfiring and requiring everyone to use do { ...; ... } with explicit braces and semis. ;) -Edward On Wed, Jul 2, 2014 at 4:08 AM, Simon Marlow marlo...@gmail.com wrote: Agreed, let's do it. Thanks for the well-argued proposal. Next up: consistent style :-) Cheers, Simon On 27/06/2014 10:51, Johan Tibell wrote: Hi! I found myself exploring new parts of the GHC code base the last few weeks (exciting!), which again reminded me of my biggest frustration when working on GHC: the lack of per-function/type (Haddock) comments. GHC code is sometimes commented with notes, which are great but tend to (1) mostly cover the exceptional cases and (2) talk about the implementation of a function, not how a caller might use it or why. Lack of documentation, in GHC and other software projects, usually has (at least) two causes: * Programmers comment code they think is complex enough to warrant a comment. The problem is that the author is usually a poor judge of what's complex enough, because he/she is too familiar with the code and tends to under-document code when following this principle. * Documenting is boring and tends to have little benefit the person writing to documentation. Given lack of incentives we tend to document less than we ought to. I've only seen one successful way to combat the lack of documentation that stems from the above: have the project's style guide mandate that top-level functions and types (or at least those that are exported) have documentation. This works well at Google. Anecdote: we have one code base inside Google that was until recently exempt from this rule and documentation is almost completely absent in that code base, even though hundreds of engineers work on and need to understand it every day. This breeds institutional knowledge problems i.e. if the author of a core piece of code leaves, lots of knowledge is lost. *Proposal: *I propose that we require that new top-level functions and types have Haddock comments, even if they start out as a single, humble sentence. I've found that putting even that one sentence (1) helps new users and (2) establishes a place for improvements to be made. There's a strong broken window effect to lack of comments, in that lack of comments breeds more lack of comments as developers follow established practices. We should add this requirement to the style guide. Having it as a written down policy tends to prevent having to re-hash the whole argument about documentation over and over again. This has also helped us a lot at Google, because programmers can spend endless amount of time arguing about comments, placement of curly braces, etc. and having a written policy helps cut down on that. To give an idea of how to write good comments, here are two examples of undocumented code I ran into in GHC and how better comments would have helped. *First example* In compiler/nativeGen/X86/Instr.hs there's a (local) function called mkRUR, which is a helper function use when computing instruction register usage. The first question that I asked upon seeing uses of that function was what does RUR stand for? Given the context the function is in, I guessed it stands for read-update-read, because R is used to mean read in the enclosing function and updating is related to reading so that must be what U stands for. It turns out that it stands for RegUsageReadonly. Here's a comment that would have captured, in a single sentence, what this function is for: -- | Create register usage info for instruction that only -- reads registers. mkRUR src = src' `seq` RU src' [] where src' = filter (interesting platform) src That already a big improvement. A note about the register filtering, which means that not all registers you pass to the function will be recorded as being read in the end, could also be useful. Aside: providing a type signature, which would have made it clear that the return type is RU, might also have helped in this particular case. *Second example* In the same file there a function called x86_regUsageOfInstr. It's the function that encloses the local function mkRUR above. I could figure out that this function has something to do with register usage, of the instruction passed as an argument, and that register usage is important for the register allocator. However, trying to understand in more detail what that meant was more of challenge than it needed to be. First, a comment more clearly explaining what computing register usage means in practice would be helpful: -- | Returns which registers are read and written by this -- instruction, as a (read, written) pair. This info is used -- by the register allocator. x86_regUsageOfInstr :: Platform - Instr - RegUsage The reason mentioning that the return value is essentially a (read, written) pair is helpful is because
Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code
I also support this proposal (I actually tripped up on these exact functions as well!) As stated elsewhere, I think this should be the case for all top-level functions, not just exported ones. Of course, I'd also like it if this rule explicitly extended to top-level data types, type classes, etc as well. I believe that was the intention but I'm just making sure. :) (Finally, I actually would like some kind of mechanical enforcement of this, but I don't think it has to be a hard rule - we shouldn't reject things on that basis alone. I'm not sure how we would do that anyway, though.) On Fri, Jun 27, 2014 at 4:51 AM, Johan Tibell johan.tib...@gmail.com wrote: Hi! I found myself exploring new parts of the GHC code base the last few weeks (exciting!), which again reminded me of my biggest frustration when working on GHC: the lack of per-function/type (Haddock) comments. GHC code is sometimes commented with notes, which are great but tend to (1) mostly cover the exceptional cases and (2) talk about the implementation of a function, not how a caller might use it or why. Lack of documentation, in GHC and other software projects, usually has (at least) two causes: Programmers comment code they think is complex enough to warrant a comment. The problem is that the author is usually a poor judge of what's complex enough, because he/she is too familiar with the code and tends to under-document code when following this principle. Documenting is boring and tends to have little benefit the person writing to documentation. Given lack of incentives we tend to document less than we ought to. I've only seen one successful way to combat the lack of documentation that stems from the above: have the project's style guide mandate that top-level functions and types (or at least those that are exported) have documentation. This works well at Google. Anecdote: we have one code base inside Google that was until recently exempt from this rule and documentation is almost completely absent in that code base, even though hundreds of engineers work on and need to understand it every day. This breeds institutional knowledge problems i.e. if the author of a core piece of code leaves, lots of knowledge is lost. Proposal: I propose that we require that new top-level functions and types have Haddock comments, even if they start out as a single, humble sentence. I've found that putting even that one sentence (1) helps new users and (2) establishes a place for improvements to be made. There's a strong broken window effect to lack of comments, in that lack of comments breeds more lack of comments as developers follow established practices. We should add this requirement to the style guide. Having it as a written down policy tends to prevent having to re-hash the whole argument about documentation over and over again. This has also helped us a lot at Google, because programmers can spend endless amount of time arguing about comments, placement of curly braces, etc. and having a written policy helps cut down on that. To give an idea of how to write good comments, here are two examples of undocumented code I ran into in GHC and how better comments would have helped. First example In compiler/nativeGen/X86/Instr.hs there's a (local) function called mkRUR, which is a helper function use when computing instruction register usage. The first question that I asked upon seeing uses of that function was what does RUR stand for? Given the context the function is in, I guessed it stands for read-update-read, because R is used to mean read in the enclosing function and updating is related to reading so that must be what U stands for. It turns out that it stands for RegUsageReadonly. Here's a comment that would have captured, in a single sentence, what this function is for: -- | Create register usage info for instruction that only -- reads registers. mkRUR src = src' `seq` RU src' [] where src' = filter (interesting platform) src That already a big improvement. A note about the register filtering, which means that not all registers you pass to the function will be recorded as being read in the end, could also be useful. Aside: providing a type signature, which would have made it clear that the return type is RU, might also have helped in this particular case. Second example In the same file there a function called x86_regUsageOfInstr. It's the function that encloses the local function mkRUR above. I could figure out that this function has something to do with register usage, of the instruction passed as an argument, and that register usage is important for the register allocator. However, trying to understand in more detail what that meant was more of challenge than it needed to be. First, a comment more clearly explaining what computing register usage means in practice would be helpful: -- | Returns which registers are read and written by this
Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code
On Wed, Jul 2, 2014 at 6:04 PM, Austin Seipp aus...@well-typed.com wrote: Of course, I'd also like it if this rule explicitly extended to top-level data types, type classes, etc as well. I believe that was the intention but I'm just making sure. :) That was the intention. (Finally, I actually would like some kind of mechanical enforcement of this, but I don't think it has to be a hard rule - we shouldn't reject things on that basis alone. I'm not sure how we would do that anyway, though.) The way I suggest we do this, if we do this, is to add a linter to Phabricator that adds a note to the code review that the new code lacks the appropriate docs. That way we encourage users to add them, without e.g. making validate fail or something similar. This is what we do at Google (and FB too I presume). ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code
Yes, Phabricator/arcanist is probably the best way to do it, I agree. I was more wondering technically how we'd enforce it (just regex? parse/lex the code for top level definitions? etc). There are also other opportunities for linting here, so we should think about that a bit. Anyway, good stuff! On Wed, Jul 2, 2014 at 11:33 AM, Johan Tibell johan.tib...@gmail.com wrote: On Wed, Jul 2, 2014 at 6:04 PM, Austin Seipp aus...@well-typed.com wrote: Of course, I'd also like it if this rule explicitly extended to top-level data types, type classes, etc as well. I believe that was the intention but I'm just making sure. :) That was the intention. (Finally, I actually would like some kind of mechanical enforcement of this, but I don't think it has to be a hard rule - we shouldn't reject things on that basis alone. I'm not sure how we would do that anyway, though.) The way I suggest we do this, if we do this, is to add a linter to Phabricator that adds a note to the code review that the new code lacks the appropriate docs. That way we encourage users to add them, without e.g. making validate fail or something similar. This is what we do at Google (and FB too I presume). -- Regards, Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/ ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code
Which makes a lot of GHC code more readable — I’m serious! Manuel PS: I have resisted it for a while, but after slogging through GHC for extended periods, I’ve come to appreciate the additional clarity in large and tricky functions (e.g., in the type checker renamer). Edward Kmett ekm...@gmail.com: That has a high chance of backfiring and requiring everyone to use do { ...; ... } with explicit braces and semis. ;) -Edward On Wed, Jul 2, 2014 at 4:08 AM, Simon Marlow marlo...@gmail.com wrote: Agreed, let's do it. Thanks for the well-argued proposal. Next up: consistent style :-) Cheers, Simon On 27/06/2014 10:51, Johan Tibell wrote: Hi! I found myself exploring new parts of the GHC code base the last few weeks (exciting!), which again reminded me of my biggest frustration when working on GHC: the lack of per-function/type (Haddock) comments. GHC code is sometimes commented with notes, which are great but tend to (1) mostly cover the exceptional cases and (2) talk about the implementation of a function, not how a caller might use it or why. Lack of documentation, in GHC and other software projects, usually has (at least) two causes: * Programmers comment code they think is complex enough to warrant a comment. The problem is that the author is usually a poor judge of what's complex enough, because he/she is too familiar with the code and tends to under-document code when following this principle. * Documenting is boring and tends to have little benefit the person writing to documentation. Given lack of incentives we tend to document less than we ought to. I've only seen one successful way to combat the lack of documentation that stems from the above: have the project's style guide mandate that top-level functions and types (or at least those that are exported) have documentation. This works well at Google. Anecdote: we have one code base inside Google that was until recently exempt from this rule and documentation is almost completely absent in that code base, even though hundreds of engineers work on and need to understand it every day. This breeds institutional knowledge problems i.e. if the author of a core piece of code leaves, lots of knowledge is lost. *Proposal: *I propose that we require that new top-level functions and types have Haddock comments, even if they start out as a single, humble sentence. I've found that putting even that one sentence (1) helps new users and (2) establishes a place for improvements to be made. There's a strong broken window effect to lack of comments, in that lack of comments breeds more lack of comments as developers follow established practices. We should add this requirement to the style guide. Having it as a written down policy tends to prevent having to re-hash the whole argument about documentation over and over again. This has also helped us a lot at Google, because programmers can spend endless amount of time arguing about comments, placement of curly braces, etc. and having a written policy helps cut down on that. To give an idea of how to write good comments, here are two examples of undocumented code I ran into in GHC and how better comments would have helped. *First example* In compiler/nativeGen/X86/Instr.hs there's a (local) function called mkRUR, which is a helper function use when computing instruction register usage. The first question that I asked upon seeing uses of that function was what does RUR stand for? Given the context the function is in, I guessed it stands for read-update-read, because R is used to mean read in the enclosing function and updating is related to reading so that must be what U stands for. It turns out that it stands for RegUsageReadonly. Here's a comment that would have captured, in a single sentence, what this function is for: -- | Create register usage info for instruction that only -- reads registers. mkRUR src = src' `seq` RU src' [] where src' = filter (interesting platform) src That already a big improvement. A note about the register filtering, which means that not all registers you pass to the function will be recorded as being read in the end, could also be useful. Aside: providing a type signature, which would have made it clear that the return type is RU, might also have helped in this particular case. *Second example* In the same file there a function called x86_regUsageOfInstr. It's the function that encloses the local function mkRUR above. I could figure out that this function has something to do with register usage, of the instruction passed as an argument, and that register usage is important for the register allocator. However, trying to understand in more detail what that meant was more of challenge than it needed to be. First, a comment more clearly explaining what computing register usage means in practice would be
Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code
At the risk of sounding redundantly redundant, I'd like to third this. My workflow for finding stuff in the GHC codebase is a mixture of grep and Hoogle. Searching Hoogle for +ghc :: [TyVar] - Type - Type is a huge timesaver, and Hoogle sends me to the generated haddock comments. Usually the haddocks themselves aren't there, but the Source link is a handy way to jump to the code and explore. So having actual haddock documentation would only help in this regard. The Notes are *great* for subtle issues with the implementation of some function, but it'd be nice to have some commentary on how to _use_ that function without having to understand how it works. On Mon, Jun 30, 2014 at 3:42 PM, Ben Gamari bgamari.f...@gmail.com wrote: David Luposchainsky dluposchain...@googlemail.com writes: Hey list, I am strongly in favour of the proposal. As a pedestrian-level GHC contributor, the *vast* majority of my time is spent trying to figure out what certain things do, when the answer could be found in a one- or two-line comment above a definition. I'd like to second this. As an occassional contributor, I find myself wading through a lot of code to deduce functions' purpose. While I'm often pleasantly surprised by the quality of the notes scattered about the code, per-definition Haddocks would fill in the many remaining gaps and provide a nice overview of each module. I agree that enforcing the quality of the rendered Haddocks is unnecessary. Once the language has been written there are many contributors (such as myself) who can further clean up the formatting. Cheers, - Ben ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code
Thanks, Johan, for starting this discussion. I mostly agree with the proposal. However, one (at times, serious) drawback to using Haddock is that it means that editing comments can cause parse failures. The way the GHC build works, these failures may not be detected until the end of a hacking session (if I'm using, say, `make 2`, as I tend to do) and then can be hard to diagnose. I've actually been bitten by this when working on GHC. So, I have to ask: why use Haddock? Do folks read the Haddock docs for GHC? (I don't, but perhaps that's because the docs aren't so good right now.) Would it be acceptable to change this proposal not to require Haddock docs? Even if we decide to keep this proposal about Haddock docs specifically, I would strongly request that correct rendering of the Haddock docs not be scrutinized. At the end of a hacking session (which is hard enough to find time for, as is), I don't want to be expected to look through the generated HTML to make sure that my typewriter font and italics are rendering correctly. This is something of a corollary to Simon's comment about wanting to refer to Notes from Haddock comments -- I would want the Haddock output to be quite secondary to the proper documentation in the source code. (Note that this demotion of the role of Haddock is certainly not my practice in released libraries! But, Haddock is much less useful in an application like GHC than in a library.) All that said, I do agree with the intent of this proposal and am happy to take on my part of the burden of documenting new (and perhaps some old) functions as I work. I have been very guilty of the broken window effect in not documenting new code. Thanks, Richard On Jun 27, 2014, at 5:51 AM, Johan Tibell johan.tib...@gmail.com wrote: Hi! I found myself exploring new parts of the GHC code base the last few weeks (exciting!), which again reminded me of my biggest frustration when working on GHC: the lack of per-function/type (Haddock) comments. GHC code is sometimes commented with notes, which are great but tend to (1) mostly cover the exceptional cases and (2) talk about the implementation of a function, not how a caller might use it or why. Lack of documentation, in GHC and other software projects, usually has (at least) two causes: Programmers comment code they think is complex enough to warrant a comment. The problem is that the author is usually a poor judge of what's complex enough, because he/she is too familiar with the code and tends to under-document code when following this principle. Documenting is boring and tends to have little benefit the person writing to documentation. Given lack of incentives we tend to document less than we ought to. I've only seen one successful way to combat the lack of documentation that stems from the above: have the project's style guide mandate that top-level functions and types (or at least those that are exported) have documentation. This works well at Google. Anecdote: we have one code base inside Google that was until recently exempt from this rule and documentation is almost completely absent in that code base, even though hundreds of engineers work on and need to understand it every day. This breeds institutional knowledge problems i.e. if the author of a core piece of code leaves, lots of knowledge is lost. Proposal: I propose that we require that new top-level functions and types have Haddock comments, even if they start out as a single, humble sentence. I've found that putting even that one sentence (1) helps new users and (2) establishes a place for improvements to be made. There's a strong broken window effect to lack of comments, in that lack of comments breeds more lack of comments as developers follow established practices. We should add this requirement to the style guide. Having it as a written down policy tends to prevent having to re-hash the whole argument about documentation over and over again. This has also helped us a lot at Google, because programmers can spend endless amount of time arguing about comments, placement of curly braces, etc. and having a written policy helps cut down on that. To give an idea of how to write good comments, here are two examples of undocumented code I ran into in GHC and how better comments would have helped. First example In compiler/nativeGen/X86/Instr.hs there's a (local) function called mkRUR, which is a helper function use when computing instruction register usage. The first question that I asked upon seeing uses of that function was what does RUR stand for? Given the context the function is in, I guessed it stands for read-update-read, because R is used to mean read in the enclosing function and updating is related to reading so that must be what U stands for. It turns out that it stands for RegUsageReadonly. Here's a comment that would have captured, in a single sentence, what
Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code
On Fri, Jun 27, 2014 at 11:11 AM, Mateusz Kowalczyk fuuze...@fuuzetsu.co.uk wrote: On 06/27/2014 03:26 PM, David Fox wrote: I would counter propose a place on hackage for people to type in or modify the documentation for functions, designed in such a way that the documentation would easily find its way back into the project's source code (with developer approval.) This way the documentation can be generated by people who only recently came to understand the function, so the questions a newcomer has are fresh in their mind. Are you asking for a wiki-like thing for documentation? There were a few times where this has been proposed such as https://github.com/haskell/haddock/issues/72 but in general it turns out that there's not enough interest for anyone to sit down and implement it and make sure it all works properly. Patches should be going straight to upstream rather than lingering on Hackage until someone notices them (even with automated tools, it's a pain). I doubt many people would use it for anything but typos because if you have enough knowledge about a function to document it, you're likely to already be involved with the project in some way and have means to report it properly. My thought was that it would end up in the library's source code, not that it would reside in a wiki. The question is whether anyone has the motivation to write a sufficiently smooth mechanism to achieve this. If I was editing a package that I normally upload to hackage and I could look at a nice presentation of alternative documentation strings people have suggested for the different functions in my library, I would be happy to cut and paste them into the source code. ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hey list, I am strongly in favour of the proposal. As a pedestrian-level GHC contributor, the *vast* majority of my time is spent trying to figure out what certain things do, when the answer could be found in a one- or two-line comment above a definition. As for Richard's remark, So, I have to ask: why use Haddock? Do folks read the Haddock docs for GHC? (I don't, but perhaps that's because the docs aren't so good right now.) I find Haddock very useful for one major reason: it allows me an HTML overview over all exported definitions (in the index), which is a big plus when you're searching for things. Greetings, David/quchen -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJTsYVUAAoJELrQsaT5WQUsKawIAIMHt9Ha4qTWtJO6qwOjN5RD JOx1MnuPlDLosbyE9+BlkEV1tRnnG/snyxwFTgmtFSO9fAV2FPZEbtzZ2AZd4xbb VgORhTAeL1n1aBitGNaAzT1T60tS2JNict2S0pUWa0Qt3nYWwoRw1B+OOaZRuuaR cHkOFKMbzU5knmeD/RyDIE+oRxZvjAKdAaaQ0vJ70ovNUptjtfDeX6Nxto65qFis sKsWjsL++TgeOscejw7DNLeCei/cwrzjOSNOB6xFGAxPHUHZFvSkbuVAMNWIgbic 55tbDIog/l9P/N8RoUQh4PLjh3TG3xT3vsM5iiTKl3UZ7eTMpzmzAKhvikoHGOU= =w6S7 -END PGP SIGNATURE- ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code
Richard, Not requiring contributors to check that the Haddock render well is OK. As long as the comments are there someone with free time on their hands can always tidy up the rendering. I'm looking forward to the day I can browse the GHC Haddocks and make sense of them. I tried in the past but the generated docs aren't accessible enough without any actual comments. ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code
Left hold off one more week to give more contributors a change to voice their thoughts. If no one protests I will announce the new policy next Monday. Sounds good? ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code
I think that's a good idea, as long as we do enforce it as social policy rather than a script. In other words, we should encourage folks to add comments documenting functions, but not necessarily require them *everywhere*, and leave it I also share Richard's concern about using Haddock notation, and the possibility of introducing build-failures due to typos in the documentation. -Iavor On Mon, Jun 30, 2014 at 1:18 PM, Johan Tibell johan.tib...@gmail.com wrote: Left hold off one more week to give more contributors a change to voice their thoughts. If no one protests I will announce the new policy next Monday. Sounds good? ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code
David Luposchainsky dluposchain...@googlemail.com writes: Hey list, I am strongly in favour of the proposal. As a pedestrian-level GHC contributor, the *vast* majority of my time is spent trying to figure out what certain things do, when the answer could be found in a one- or two-line comment above a definition. I'd like to second this. As an occassional contributor, I find myself wading through a lot of code to deduce functions' purpose. While I'm often pleasantly surprised by the quality of the notes scattered about the code, per-definition Haddocks would fill in the many remaining gaps and provide a nice overview of each module. I agree that enforcing the quality of the rendered Haddocks is unnecessary. Once the language has been written there are many contributors (such as myself) who can further clean up the formatting. Cheers, - Ben pgpnP8AxgwTf_.pgp Description: PGP signature ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code
Given the examples provided with this proposal it looks like this change is targeted mostly at compiler hackers, and not at library/package developers. For the latter community it would be nice to establish a policy for libraries and packages that encourages (nudges) developers to include comments and examples, something more useful than type signatures or pointers to the literature on type theory. For example, the R stats system encourages package developers to include examples, and these examples are run as part of the package checking process, thus encouraging developers to keep the examples up-to-date. This kind of change might help Haskell move beyond its current role as a research and teaching tool, and an incubator for new programming ideas that are eventually implemented and popularized in other programming languages... Cheers, Dominick On Fri, Jun 27, 2014 at 6:17 AM, Simon Peyton Jones simo...@microsoft.com wrote: I'd be OK with this, (it's a bit like requiring signatures on all top level functions) but I don't know how we'd enforce it. Do you think the requirement should be for all top-level functions or just exported ones? I agree that Notes have a different purpose. But it should be OK style to refer to a Note from a top-level function comment, even though Haddock won't be able to make much sense of it. Simon From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of Johan Tibell Sent: 27 June 2014 10:51 To: ghc-devs@haskell.org Subject: Proposal: require Haddock comment for every new top-level function and type in GHC source code Hi! I found myself exploring new parts of the GHC code base the last few weeks (exciting!), which again reminded me of my biggest frustration when working on GHC: the lack of per-function/type (Haddock) comments. GHC code is sometimes commented with notes, which are great but tend to (1) mostly cover the exceptional cases and (2) talk about the implementation of a function, not how a caller might use it or why. Lack of documentation, in GHC and other software projects, usually has (at least) two causes: Programmers comment code they think is complex enough to warrant a comment. The problem is that the author is usually a poor judge of what's complex enough, because he/she is too familiar with the code and tends to under-document code when following this principle. Documenting is boring and tends to have little benefit the person writing to documentation. Given lack of incentives we tend to document less than we ought to. I've only seen one successful way to combat the lack of documentation that stems from the above: have the project's style guide mandate that top-level functions and types (or at least those that are exported) have documentation. This works well at Google. Anecdote: we have one code base inside Google that was until recently exempt from this rule and documentation is almost completely absent in that code base, even though hundreds of engineers work on and need to understand it every day. This breeds institutional knowledge problems i.e. if the author of a core piece of code leaves, lots of knowledge is lost. Proposal: I propose that we require that new top-level functions and types have Haddock comments, even if they start out as a single, humble sentence. I've found that putting even that one sentence (1) helps new users and (2) establishes a place for improvements to be made. There's a strong broken window effect to lack of comments, in that lack of comments breeds more lack of comments as developers follow established practices. We should add this requirement to the style guide. Having it as a written down policy tends to prevent having to re-hash the whole argument about documentation over and over again. This has also helped us a lot at Google, because programmers can spend endless amount of time arguing about comments, placement of curly braces, etc. and having a written policy helps cut down on that. To give an idea of how to write good comments, here are two examples of undocumented code I ran into in GHC and how better comments would have helped. First example In compiler/nativeGen/X86/Instr.hs there's a (local) function called mkRUR, which is a helper function use when computing instruction register usage. The first question that I asked upon seeing uses of that function was what does RUR stand for? Given the context the function is in, I guessed it stands for read-update-read, because R is used to mean read in the enclosing function and updating is related to reading so that must be what U stands for. It turns out that it stands for RegUsageReadonly. Here's a comment that would have captured, in a single sentence, what this function is for: -- | Create register usage info for instruction that only -- reads registers. mkRUR src = src' `seq` RU src' [] where src' = filter (interesting
Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code
On Mon, Jun 30, 2014 at 11:22 PM, Dominick Samperi djsamp...@gmail.com wrote: Given the examples provided with this proposal it looks like this change is targeted mostly at compiler hackers, and not at library/package developers. Yes, this discussion is only about documenting the GHC modules. -- Johan ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
RE: Proposal: require Haddock comment for every new top-level function and type in GHC source code
I’d be OK with this, (it’s a bit like requiring signatures on all top level functions) but I don’t know how we’d enforce it. Do you think the requirement should be for all top-level functions or just exported ones? I agree that Notes have a different purpose. But it should be OK style to refer to a Note from a top-level function comment, even though Haddock won’t be able to make much sense of it. Simon From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of Johan Tibell Sent: 27 June 2014 10:51 To: ghc-devs@haskell.org Subject: Proposal: require Haddock comment for every new top-level function and type in GHC source code Hi! I found myself exploring new parts of the GHC code base the last few weeks (exciting!), which again reminded me of my biggest frustration when working on GHC: the lack of per-function/type (Haddock) comments. GHC code is sometimes commented with notes, which are great but tend to (1) mostly cover the exceptional cases and (2) talk about the implementation of a function, not how a caller might use it or why. Lack of documentation, in GHC and other software projects, usually has (at least) two causes: * Programmers comment code they think is complex enough to warrant a comment. The problem is that the author is usually a poor judge of what's complex enough, because he/she is too familiar with the code and tends to under-document code when following this principle. * Documenting is boring and tends to have little benefit the person writing to documentation. Given lack of incentives we tend to document less than we ought to. I've only seen one successful way to combat the lack of documentation that stems from the above: have the project's style guide mandate that top-level functions and types (or at least those that are exported) have documentation. This works well at Google. Anecdote: we have one code base inside Google that was until recently exempt from this rule and documentation is almost completely absent in that code base, even though hundreds of engineers work on and need to understand it every day. This breeds institutional knowledge problems i.e. if the author of a core piece of code leaves, lots of knowledge is lost. Proposal: I propose that we require that new top-level functions and types have Haddock comments, even if they start out as a single, humble sentence. I've found that putting even that one sentence (1) helps new users and (2) establishes a place for improvements to be made. There's a strong broken window effect to lack of comments, in that lack of comments breeds more lack of comments as developers follow established practices. We should add this requirement to the style guide. Having it as a written down policy tends to prevent having to re-hash the whole argument about documentation over and over again. This has also helped us a lot at Google, because programmers can spend endless amount of time arguing about comments, placement of curly braces, etc. and having a written policy helps cut down on that. To give an idea of how to write good comments, here are two examples of undocumented code I ran into in GHC and how better comments would have helped. First example In compiler/nativeGen/X86/Instr.hs there's a (local) function called mkRUR, which is a helper function use when computing instruction register usage. The first question that I asked upon seeing uses of that function was what does RUR stand for? Given the context the function is in, I guessed it stands for read-update-read, because R is used to mean read in the enclosing function and updating is related to reading so that must be what U stands for. It turns out that it stands for RegUsageReadonly. Here's a comment that would have captured, in a single sentence, what this function is for: -- | Create register usage info for instruction that only -- reads registers. mkRUR src = src' `seq` RU src' [] where src' = filter (interesting platform) src That already a big improvement. A note about the register filtering, which means that not all registers you pass to the function will be recorded as being read in the end, could also be useful. Aside: providing a type signature, which would have made it clear that the return type is RU, might also have helped in this particular case. Second example In the same file there a function called x86_regUsageOfInstr. It's the function that encloses the local function mkRUR above. I could figure out that this function has something to do with register usage, of the instruction passed as an argument, and that register usage is important for the register allocator. However, trying to understand in more detail what that meant was more of challenge than it needed to be. First, a comment more clearly explaining what computing register usage means in practice would be helpful: -- | Returns which registers are read and written by this -- instruction, as a (read,
Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code
On Fri, Jun 27, 2014 at 12:17 PM, Simon Peyton Jones simo...@microsoft.com wrote: I’d be OK with this, (it’s a bit like requiring signatures on all top level functions) but I don’t know how we’d enforce it. I think social enforcement is enough. If we agree that this is something we want to do and communicate that to ghc-devs@, put a note in our style guide, and kindly remind people to add comments when we do code reviews, we'll eventually end up with a culture of writing Haddocks. I think the most important part right now is that current contributors agree that this is something we want to do. Aside: people usually say that they find it hard to know what to document in their own code, because they don't know what others will find difficult. My advice is this: add a sentence or two about what the function does and why it exists, no matter how obvious you think that statement is. Do you think the requirement should be for all top-level functions or just exported ones? I take what I can get, but I think documenting all top-level functions makes sense in the case of GHC, as there's so much that goes on in our modules but we often only export a handful of functions. For example, compiler/codeGen/StgCmmPrim.hs is 2,000 lines long but only exports 3 functions. For someone that wants to work on that module for the first time only have docs on those three functions is helpful, but likely not enough. FWIW I document all top-level functions in my projects (and when I don't I often regret it later). I agree that Notes have a different purpose. But it should be OK style to refer to a Note from a top-level function comment, even though Haddock won’t be able to make much sense of it. Sure. Personally I would refer to the note from the function body if it talks mostly about the implementation, as opposed to how to use the function. -- Johan ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code
I would counter propose a place on hackage for people to type in or modify the documentation for functions, designed in such a way that the documentation would easily find its way back into the project's source code (with developer approval.) This way the documentation can be generated by people who only recently came to understand the function, so the questions a newcomer has are fresh in their mind. On Fri, Jun 27, 2014 at 4:19 AM, Johan Tibell johan.tib...@gmail.com wrote: On Fri, Jun 27, 2014 at 12:17 PM, Simon Peyton Jones simo...@microsoft.com wrote: I’d be OK with this, (it’s a bit like requiring signatures on all top level functions) but I don’t know how we’d enforce it. I think social enforcement is enough. If we agree that this is something we want to do and communicate that to ghc-devs@, put a note in our style guide, and kindly remind people to add comments when we do code reviews, we'll eventually end up with a culture of writing Haddocks. I think the most important part right now is that current contributors agree that this is something we want to do. Aside: people usually say that they find it hard to know what to document in their own code, because they don't know what others will find difficult. My advice is this: add a sentence or two about what the function does and why it exists, no matter how obvious you think that statement is. Do you think the requirement should be for all top-level functions or just exported ones? I take what I can get, but I think documenting all top-level functions makes sense in the case of GHC, as there's so much that goes on in our modules but we often only export a handful of functions. For example, compiler/codeGen/StgCmmPrim.hs is 2,000 lines long but only exports 3 functions. For someone that wants to work on that module for the first time only have docs on those three functions is helpful, but likely not enough. FWIW I document all top-level functions in my projects (and when I don't I often regret it later). I agree that Notes have a different purpose. But it should be OK style to refer to a Note from a top-level function comment, even though Haddock won’t be able to make much sense of it. Sure. Personally I would refer to the note from the function body if it talks mostly about the implementation, as opposed to how to use the function. -- Johan ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code
On 06/27/2014 01:19 PM, Johan Tibell wrote: On Fri, Jun 27, 2014 at 12:17 PM, Simon Peyton Jones simo...@microsoft.com wrote: I’d be OK with this, (it’s a bit like requiring signatures on all top level functions) but I don’t know how we’d enforce it. I think social enforcement is enough. If we agree that this is something we want to do and communicate that to ghc-devs@, put a note in our style guide, and kindly remind people to add comments when we do code reviews, we'll eventually end up with a culture of writing Haddocks. I think the most important part right now is that current contributors agree that this is something we want to do. Aside: people usually say that they find it hard to know what to document in their own code, because they don't know what others will find difficult. My advice is this: add a sentence or two about what the function does and why it exists, no matter how obvious you think that statement is. Do you think the requirement should be for all top-level functions or just exported ones? I take what I can get, but I think documenting all top-level functions makes sense in the case of GHC, as there's so much that goes on in our modules but we often only export a handful of functions. For example, compiler/codeGen/StgCmmPrim.hs is 2,000 lines long but only exports 3 functions. For someone that wants to work on that module for the first time only have docs on those three functions is helpful, but likely not enough. FWIW I document all top-level functions in my projects (and when I don't I often regret it later). I agree that Notes have a different purpose. But it should be OK style to refer to a Note from a top-level function comment, even though Haddock won’t be able to make much sense of it. Sure. Personally I would refer to the note from the function body if it talks mostly about the implementation, as opposed to how to use the function. Notes could be moved into the module header if necessary so that they are rendered by Haddock. Alternatively, one function can contain the note and other places can refer to it by means of an anchor. -- Johan ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs My personal gripe of trying to read docs for GHC modules/types is that some older modules are in .lhs format which means we have to dive out of nicely-rendered HTML and go into source. Was there ever talk of converting Literate Haskell files into more Haddock-friendly format? -- Mateusz K. ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code
I don't think this is mutually exclusive with Johan's proposal. Let me suggest an amendment: Developers (both new and old) would be encouraged to submit patches adding or improving documentation in the source code. Documentation patches would be vetted as any others would be. Discouraging source code comments is the worst legacy of Unix. IMO (contradicting KR) even incorrect comments are better than none. Howard From: David Fox d...@seereason.com To: Johan Tibell johan.tib...@gmail.com Cc: ghc-devs@haskell.org ghc-devs@haskell.org Sent: Friday, June 27, 2014 6:26 AM Subject: Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code I would counter propose a place on hackage for people to type in or modify the documentation for functions, designed in such a way that the documentation would easily find its way back into the project's source code (with developer approval.) This way the documentation can be generated by people who only recently came to understand the function, so the questions a newcomer has are fresh in their mind. ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
Re: Proposal: require Haddock comment for every new top-level function and type in GHC source code
On 06/27/2014 03:26 PM, David Fox wrote: I would counter propose a place on hackage for people to type in or modify the documentation for functions, designed in such a way that the documentation would easily find its way back into the project's source code (with developer approval.) This way the documentation can be generated by people who only recently came to understand the function, so the questions a newcomer has are fresh in their mind. Are you asking for a wiki-like thing for documentation? There were a few times where this has been proposed such as https://github.com/haskell/haddock/issues/72 but in general it turns out that there's not enough interest for anyone to sit down and implement it and make sure it all works properly. Patches should be going straight to upstream rather than lingering on Hackage until someone notices them (even with automated tools, it's a pain). I doubt many people would use it for anything but typos because if you have enough knowledge about a function to document it, you're likely to already be involved with the project in some way and have means to report it properly. On Fri, Jun 27, 2014 at 4:19 AM, Johan Tibell johan.tib...@gmail.com wrote: On Fri, Jun 27, 2014 at 12:17 PM, Simon Peyton Jones simo...@microsoft.com wrote: I’d be OK with this, (it’s a bit like requiring signatures on all top level functions) but I don’t know how we’d enforce it. I think social enforcement is enough. If we agree that this is something we want to do and communicate that to ghc-devs@, put a note in our style guide, and kindly remind people to add comments when we do code reviews, we'll eventually end up with a culture of writing Haddocks. I think the most important part right now is that current contributors agree that this is something we want to do. Aside: people usually say that they find it hard to know what to document in their own code, because they don't know what others will find difficult. My advice is this: add a sentence or two about what the function does and why it exists, no matter how obvious you think that statement is. Do you think the requirement should be for all top-level functions or just exported ones? I take what I can get, but I think documenting all top-level functions makes sense in the case of GHC, as there's so much that goes on in our modules but we often only export a handful of functions. For example, compiler/codeGen/StgCmmPrim.hs is 2,000 lines long but only exports 3 functions. For someone that wants to work on that module for the first time only have docs on those three functions is helpful, but likely not enough. FWIW I document all top-level functions in my projects (and when I don't I often regret it later). I agree that Notes have a different purpose. But it should be OK style to refer to a Note from a top-level function comment, even though Haddock won’t be able to make much sense of it. Sure. Personally I would refer to the note from the function body if it talks mostly about the implementation, as opposed to how to use the function. -- Johan ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs -- Mateusz K. ___ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs