Re: Design docs: consolidation and discoverability
I like the idea of having design docs be kept up to date and tracked in git. If the Apache repo isn't a good fit, perhaps we can have a separate repo just for design docs? Maybe something like github.com/spark-docs/spark-docs/ ? If there's other stuff we want to track but haven't, perhaps we can generalize the purpose of the repo a bit and rename it accordingly (e.g. spark-misc/spark-misc). Nick On Mon, Apr 27, 2015 at 1:21 PM Sandy Ryza sandy.r...@cloudera.com wrote: My only issue with Google Docs is that they're mutable, so it's difficult to follow a design's history through its revisions and link up JIRA comments with the relevant version. -Sandy On Mon, Apr 27, 2015 at 7:54 AM, Steve Loughran ste...@hortonworks.com wrote: One thing to consider is that while docs as PDFs in JIRAs do document the original proposal, that's not the place to keep living specifications. That stuff needs to live in SCM, in a format which can be easily maintained, can generate readable documents, and, in an unrealistically ideal world, even be used by machines to validate compliance with the design. Test suites tend to be the implicit machine-readable part of the specification, though they aren't usually viewed as such. PDFs of word docs in JIRAs are not the place for ongoing work, even if the early drafts can contain them. Given it's just as easy to point to markdown docs in github by commit ID, that could be an alternative way to publish docs, with the document itself being viewed as one of the deliverables. When the time comes to update a document, then its there in the source tree to edit. If there's a flaw here, its that design docs are that: the design. The implementation may not match, ongoing work will certainly diverge. If the design docs aren't kept in sync, then they can mislead people. Accordingly, once the design docs are incorporated into the source tree, keeping them in sync with changes has be viewed as essential as keeping tests up to date On 26 Apr 2015, at 22:34, Patrick Wendell pwend...@gmail.com wrote: I actually don't totally see why we can't use Google Docs provided it is clearly discoverable from the JIRA. It was my understanding that many projects do this. Maybe not (?). If it's a matter of maintaining public record on ASF infrastructure, perhaps we can just automate that if an issue is closed we capture the doc content and attach it to the JIRA as a PDF. My sense is that in general the ASF infrastructure policy is becoming more and more lenient with regards to using third party services, provided the are broadly accessible (such as a public google doc) and can be definitively archived on ASF controlled storage. - Patrick On Fri, Apr 24, 2015 at 4:57 PM, Sean Owen so...@cloudera.com wrote: I know I recently used Google Docs from a JIRA, so am guilty as charged. I don't think there are a lot of design docs in general, but the ones I've seen have simply pushed docs to a JIRA. (I did the same, mirroring PDFs of the Google Doc.) I don't think this is hard to follow. I think you can do what you like: make a JIRA and attach files. Make a WIP PR and attach your notes. Make a Google Doc if you're feeling transgressive. I don't see much of a problem to solve here. In practice there are plenty of workable options, all of which are mainstream, and so I do not see an argument that somehow this is solved by letting people make wikis. On Fri, Apr 24, 2015 at 7:42 PM, Punyashloka Biswal punya.bis...@gmail.com wrote: Okay, I can understand wanting to keep Git history clean, and avoid bottlenecking on committers. Is it reasonable to establish a convention of having a label, component or (best of all) an issue type for issues that are associated with design docs? For example, if we used the existing Brainstorming issue type, and people put their design doc in the description of the ticket, it would be relatively easy to figure out what designs are in progress. Given the push-back against design docs in Git or on the wiki and the strong preference for keeping docs on ASF property, I'm a bit surprised that all the existing design docs are on Google Docs. Perhaps Apache should consider opening up parts of the wiki to a larger group, to better serve this use case. Punya On Fri, Apr 24, 2015 at 5:01 PM Patrick Wendell pwend...@gmail.com wrote: Using our ASF git repository as a working area for design docs, it seems potentially concerning to me. It's difficult process wise because all commits need to go through committers and also, we'd pollute our git history a lot with random incremental design updates. The git history is used a lot by downstream packagers, us during our QA process, etc... we really try to keep it oriented around code patches:
Re: Design docs: consolidation and discoverability
Nick, I like your idea of keeping it in a separate git repository. It seems to combine the advantages of the present Google Docs approach with the crisper history, discoverability, and text format simplicity of GitHub wikis. Punya On Mon, Apr 27, 2015 at 1:30 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: I like the idea of having design docs be kept up to date and tracked in git. If the Apache repo isn't a good fit, perhaps we can have a separate repo just for design docs? Maybe something like github.com/spark-docs/spark-docs/ ? If there's other stuff we want to track but haven't, perhaps we can generalize the purpose of the repo a bit and rename it accordingly (e.g. spark-misc/spark-misc). Nick On Mon, Apr 27, 2015 at 1:21 PM Sandy Ryza sandy.r...@cloudera.com wrote: My only issue with Google Docs is that they're mutable, so it's difficult to follow a design's history through its revisions and link up JIRA comments with the relevant version. -Sandy On Mon, Apr 27, 2015 at 7:54 AM, Steve Loughran ste...@hortonworks.com wrote: One thing to consider is that while docs as PDFs in JIRAs do document the original proposal, that's not the place to keep living specifications. That stuff needs to live in SCM, in a format which can be easily maintained, can generate readable documents, and, in an unrealistically ideal world, even be used by machines to validate compliance with the design. Test suites tend to be the implicit machine-readable part of the specification, though they aren't usually viewed as such. PDFs of word docs in JIRAs are not the place for ongoing work, even if the early drafts can contain them. Given it's just as easy to point to markdown docs in github by commit ID, that could be an alternative way to publish docs, with the document itself being viewed as one of the deliverables. When the time comes to update a document, then its there in the source tree to edit. If there's a flaw here, its that design docs are that: the design. The implementation may not match, ongoing work will certainly diverge. If the design docs aren't kept in sync, then they can mislead people. Accordingly, once the design docs are incorporated into the source tree, keeping them in sync with changes has be viewed as essential as keeping tests up to date On 26 Apr 2015, at 22:34, Patrick Wendell pwend...@gmail.com wrote: I actually don't totally see why we can't use Google Docs provided it is clearly discoverable from the JIRA. It was my understanding that many projects do this. Maybe not (?). If it's a matter of maintaining public record on ASF infrastructure, perhaps we can just automate that if an issue is closed we capture the doc content and attach it to the JIRA as a PDF. My sense is that in general the ASF infrastructure policy is becoming more and more lenient with regards to using third party services, provided the are broadly accessible (such as a public google doc) and can be definitively archived on ASF controlled storage. - Patrick On Fri, Apr 24, 2015 at 4:57 PM, Sean Owen so...@cloudera.com wrote: I know I recently used Google Docs from a JIRA, so am guilty as charged. I don't think there are a lot of design docs in general, but the ones I've seen have simply pushed docs to a JIRA. (I did the same, mirroring PDFs of the Google Doc.) I don't think this is hard to follow. I think you can do what you like: make a JIRA and attach files. Make a WIP PR and attach your notes. Make a Google Doc if you're feeling transgressive. I don't see much of a problem to solve here. In practice there are plenty of workable options, all of which are mainstream, and so I do not see an argument that somehow this is solved by letting people make wikis. On Fri, Apr 24, 2015 at 7:42 PM, Punyashloka Biswal punya.bis...@gmail.com wrote: Okay, I can understand wanting to keep Git history clean, and avoid bottlenecking on committers. Is it reasonable to establish a convention of having a label, component or (best of all) an issue type for issues that are associated with design docs? For example, if we used the existing Brainstorming issue type, and people put their design doc in the description of the ticket, it would be relatively easy to figure out what designs are in progress. Given the push-back against design docs in Git or on the wiki and the strong preference for keeping docs on ASF property, I'm a bit surprised that all the existing design docs are on Google Docs. Perhaps Apache should consider opening up parts of the wiki to a larger group, to better serve this use case. Punya On Fri, Apr 24, 2015 at 5:01 PM Patrick Wendell pwend...@gmail.com wrote:
Re: Design docs: consolidation and discoverability
My only issue with Google Docs is that they're mutable, so it's difficult to follow a design's history through its revisions and link up JIRA comments with the relevant version. -Sandy On Mon, Apr 27, 2015 at 7:54 AM, Steve Loughran ste...@hortonworks.com wrote: One thing to consider is that while docs as PDFs in JIRAs do document the original proposal, that's not the place to keep living specifications. That stuff needs to live in SCM, in a format which can be easily maintained, can generate readable documents, and, in an unrealistically ideal world, even be used by machines to validate compliance with the design. Test suites tend to be the implicit machine-readable part of the specification, though they aren't usually viewed as such. PDFs of word docs in JIRAs are not the place for ongoing work, even if the early drafts can contain them. Given it's just as easy to point to markdown docs in github by commit ID, that could be an alternative way to publish docs, with the document itself being viewed as one of the deliverables. When the time comes to update a document, then its there in the source tree to edit. If there's a flaw here, its that design docs are that: the design. The implementation may not match, ongoing work will certainly diverge. If the design docs aren't kept in sync, then they can mislead people. Accordingly, once the design docs are incorporated into the source tree, keeping them in sync with changes has be viewed as essential as keeping tests up to date On 26 Apr 2015, at 22:34, Patrick Wendell pwend...@gmail.com wrote: I actually don't totally see why we can't use Google Docs provided it is clearly discoverable from the JIRA. It was my understanding that many projects do this. Maybe not (?). If it's a matter of maintaining public record on ASF infrastructure, perhaps we can just automate that if an issue is closed we capture the doc content and attach it to the JIRA as a PDF. My sense is that in general the ASF infrastructure policy is becoming more and more lenient with regards to using third party services, provided the are broadly accessible (such as a public google doc) and can be definitively archived on ASF controlled storage. - Patrick On Fri, Apr 24, 2015 at 4:57 PM, Sean Owen so...@cloudera.com wrote: I know I recently used Google Docs from a JIRA, so am guilty as charged. I don't think there are a lot of design docs in general, but the ones I've seen have simply pushed docs to a JIRA. (I did the same, mirroring PDFs of the Google Doc.) I don't think this is hard to follow. I think you can do what you like: make a JIRA and attach files. Make a WIP PR and attach your notes. Make a Google Doc if you're feeling transgressive. I don't see much of a problem to solve here. In practice there are plenty of workable options, all of which are mainstream, and so I do not see an argument that somehow this is solved by letting people make wikis. On Fri, Apr 24, 2015 at 7:42 PM, Punyashloka Biswal punya.bis...@gmail.com wrote: Okay, I can understand wanting to keep Git history clean, and avoid bottlenecking on committers. Is it reasonable to establish a convention of having a label, component or (best of all) an issue type for issues that are associated with design docs? For example, if we used the existing Brainstorming issue type, and people put their design doc in the description of the ticket, it would be relatively easy to figure out what designs are in progress. Given the push-back against design docs in Git or on the wiki and the strong preference for keeping docs on ASF property, I'm a bit surprised that all the existing design docs are on Google Docs. Perhaps Apache should consider opening up parts of the wiki to a larger group, to better serve this use case. Punya On Fri, Apr 24, 2015 at 5:01 PM Patrick Wendell pwend...@gmail.com wrote: Using our ASF git repository as a working area for design docs, it seems potentially concerning to me. It's difficult process wise because all commits need to go through committers and also, we'd pollute our git history a lot with random incremental design updates. The git history is used a lot by downstream packagers, us during our QA process, etc... we really try to keep it oriented around code patches: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=shortlog Committing a polished design doc along with a feature, maybe that's something we could consider. But I still think JIRA is the best location for these docs, consistent with what most other ASF projects do that I know. On Fri, Apr 24, 2015 at 1:19 PM, Cody Koeninger c...@koeninger.org wrote: Why can't pull requests be used for design docs in Git if people who aren't committers want to contribute changes (as opposed to just comments)? On Fri, Apr 24, 2015 at 2:57 PM, Sean Owen so...@cloudera.com wrote:
Re: Design docs: consolidation and discoverability
Github's wiki is just another Git repo. If we use a separate repo, it's probably easiest to use the wiki git repo rather than the primary git repo. Punya On Mon, Apr 27, 2015 at 1:50 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: Oh, a GitHub wiki (which is separate from having docs in a repo) is yet another approach we could take, though if we want to do that on the main Spark repo we'd need permission from Apache, which may be tough to get... On Mon, Apr 27, 2015 at 1:47 PM Punyashloka Biswal punya.bis...@gmail.com wrote: Nick, I like your idea of keeping it in a separate git repository. It seems to combine the advantages of the present Google Docs approach with the crisper history, discoverability, and text format simplicity of GitHub wikis. Punya On Mon, Apr 27, 2015 at 1:30 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: I like the idea of having design docs be kept up to date and tracked in git. If the Apache repo isn't a good fit, perhaps we can have a separate repo just for design docs? Maybe something like github.com/spark-docs/spark-docs/ ? If there's other stuff we want to track but haven't, perhaps we can generalize the purpose of the repo a bit and rename it accordingly (e.g. spark-misc/spark-misc). Nick On Mon, Apr 27, 2015 at 1:21 PM Sandy Ryza sandy.r...@cloudera.com wrote: My only issue with Google Docs is that they're mutable, so it's difficult to follow a design's history through its revisions and link up JIRA comments with the relevant version. -Sandy On Mon, Apr 27, 2015 at 7:54 AM, Steve Loughran ste...@hortonworks.com wrote: One thing to consider is that while docs as PDFs in JIRAs do document the original proposal, that's not the place to keep living specifications. That stuff needs to live in SCM, in a format which can be easily maintained, can generate readable documents, and, in an unrealistically ideal world, even be used by machines to validate compliance with the design. Test suites tend to be the implicit machine-readable part of the specification, though they aren't usually viewed as such. PDFs of word docs in JIRAs are not the place for ongoing work, even if the early drafts can contain them. Given it's just as easy to point to markdown docs in github by commit ID, that could be an alternative way to publish docs, with the document itself being viewed as one of the deliverables. When the time comes to update a document, then its there in the source tree to edit. If there's a flaw here, its that design docs are that: the design. The implementation may not match, ongoing work will certainly diverge. If the design docs aren't kept in sync, then they can mislead people. Accordingly, once the design docs are incorporated into the source tree, keeping them in sync with changes has be viewed as essential as keeping tests up to date On 26 Apr 2015, at 22:34, Patrick Wendell pwend...@gmail.com wrote: I actually don't totally see why we can't use Google Docs provided it is clearly discoverable from the JIRA. It was my understanding that many projects do this. Maybe not (?). If it's a matter of maintaining public record on ASF infrastructure, perhaps we can just automate that if an issue is closed we capture the doc content and attach it to the JIRA as a PDF. My sense is that in general the ASF infrastructure policy is becoming more and more lenient with regards to using third party services, provided the are broadly accessible (such as a public google doc) and can be definitively archived on ASF controlled storage. - Patrick On Fri, Apr 24, 2015 at 4:57 PM, Sean Owen so...@cloudera.com wrote: I know I recently used Google Docs from a JIRA, so am guilty as charged. I don't think there are a lot of design docs in general, but the ones I've seen have simply pushed docs to a JIRA. (I did the same, mirroring PDFs of the Google Doc.) I don't think this is hard to follow. I think you can do what you like: make a JIRA and attach files. Make a WIP PR and attach your notes. Make a Google Doc if you're feeling transgressive. I don't see much of a problem to solve here. In practice there are plenty of workable options, all of which are mainstream, and so I do not see an argument that somehow this is solved by letting people make wikis. On Fri, Apr 24, 2015 at 7:42 PM, Punyashloka Biswal punya.bis...@gmail.com wrote: Okay, I can understand wanting to keep Git history clean, and avoid bottlenecking on committers. Is it reasonable to establish a convention of having a label, component or (best of all) an issue type for issues that are associated with design docs? For example, if we used the existing Brainstorming issue type, and people put their design doc
Re: Design docs: consolidation and discoverability
I actually don't totally see why we can't use Google Docs provided it is clearly discoverable from the JIRA. It was my understanding that many projects do this. Maybe not (?). If it's a matter of maintaining public record on ASF infrastructure, perhaps we can just automate that if an issue is closed we capture the doc content and attach it to the JIRA as a PDF. My sense is that in general the ASF infrastructure policy is becoming more and more lenient with regards to using third party services, provided the are broadly accessible (such as a public google doc) and can be definitively archived on ASF controlled storage. - Patrick On Fri, Apr 24, 2015 at 4:57 PM, Sean Owen so...@cloudera.com wrote: I know I recently used Google Docs from a JIRA, so am guilty as charged. I don't think there are a lot of design docs in general, but the ones I've seen have simply pushed docs to a JIRA. (I did the same, mirroring PDFs of the Google Doc.) I don't think this is hard to follow. I think you can do what you like: make a JIRA and attach files. Make a WIP PR and attach your notes. Make a Google Doc if you're feeling transgressive. I don't see much of a problem to solve here. In practice there are plenty of workable options, all of which are mainstream, and so I do not see an argument that somehow this is solved by letting people make wikis. On Fri, Apr 24, 2015 at 7:42 PM, Punyashloka Biswal punya.bis...@gmail.com wrote: Okay, I can understand wanting to keep Git history clean, and avoid bottlenecking on committers. Is it reasonable to establish a convention of having a label, component or (best of all) an issue type for issues that are associated with design docs? For example, if we used the existing Brainstorming issue type, and people put their design doc in the description of the ticket, it would be relatively easy to figure out what designs are in progress. Given the push-back against design docs in Git or on the wiki and the strong preference for keeping docs on ASF property, I'm a bit surprised that all the existing design docs are on Google Docs. Perhaps Apache should consider opening up parts of the wiki to a larger group, to better serve this use case. Punya On Fri, Apr 24, 2015 at 5:01 PM Patrick Wendell pwend...@gmail.com wrote: Using our ASF git repository as a working area for design docs, it seems potentially concerning to me. It's difficult process wise because all commits need to go through committers and also, we'd pollute our git history a lot with random incremental design updates. The git history is used a lot by downstream packagers, us during our QA process, etc... we really try to keep it oriented around code patches: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=shortlog Committing a polished design doc along with a feature, maybe that's something we could consider. But I still think JIRA is the best location for these docs, consistent with what most other ASF projects do that I know. On Fri, Apr 24, 2015 at 1:19 PM, Cody Koeninger c...@koeninger.org wrote: Why can't pull requests be used for design docs in Git if people who aren't committers want to contribute changes (as opposed to just comments)? On Fri, Apr 24, 2015 at 2:57 PM, Sean Owen so...@cloudera.com wrote: Only catch there is it requires commit access to the repo. We need a way for people who aren't committers to write and collaborate (for point #1) On Fri, Apr 24, 2015 at 3:56 PM, Punyashloka Biswal punya.bis...@gmail.com wrote: Sandy, doesn't keeping (in-progress) design docs in Git satisfy the history requirement? Referring back to my Gradle example, it seems that https://github.com/gradle/gradle/commits/master/design-docs/build-comparison.md is a really good way to see why the design doc evolved the way it did. When keeping the doc in Jira (presumably as an attachment) it's not easy to see what changed between successive versions of the doc. Punya On Fri, Apr 24, 2015 at 3:53 PM Sandy Ryza sandy.r...@cloudera.com wrote: I think there are maybe two separate things we're talking about? 1. Design discussions and in-progress design docs. My two cents are that JIRA is the best place for this. It allows tracking the progression of a design across multiple PRs and contributors. A piece of useful feedback that I've gotten in the past is to make design docs immutable. When updating them in response to feedback, post a new version rather than editing the existing one. This enables tracking the history of a design and makes it possible to read comments about previous designs in context. Otherwise it's really difficult to understand why particular approaches were chosen or abandoned. 2. Completed design docs for features that we've implemented. Perhaps less essential to project progress, but it would be really lovely to have a
Re: Design docs: consolidation and discoverability
That would require giving wiki access to everyone or manually adding people any time they make a doc. I don't see how this helps though. They're still docs on the internet and they're still linked from the central project JIRA, which is what you should follow. On Apr 24, 2015 8:14 AM, Punyashloka Biswal punya.bis...@gmail.com wrote: Dear Spark devs, Right now, design docs are stored on Google docs and linked from tickets. For someone new to the project, it's hard to figure out what subjects are being discussed, what organization to follow for new feature proposals, etc. Would it make sense to consolidate future design docs in either a designated area on the Apache Confluence Wiki, or on GitHub's Wiki pages? If people have a strong preference to keep the design docs on Google Docs, then could we have a top-level page on the confluence wiki that lists all active and archived design docs? Punya
Re: Design docs: consolidation and discoverability
My 2 cents - I'd rather see design docs in github pull requests (using plain text / markdown). That doesn't require changing access or adding people, and github PRs already allow for conversation / email notifications. Conversation is already split between jira and github PRs. Having a third stream of conversation in Google Docs just leads to things being ignored. On Fri, Apr 24, 2015 at 7:21 AM, Sean Owen so...@cloudera.com wrote: That would require giving wiki access to everyone or manually adding people any time they make a doc. I don't see how this helps though. They're still docs on the internet and they're still linked from the central project JIRA, which is what you should follow. On Apr 24, 2015 8:14 AM, Punyashloka Biswal punya.bis...@gmail.com wrote: Dear Spark devs, Right now, design docs are stored on Google docs and linked from tickets. For someone new to the project, it's hard to figure out what subjects are being discussed, what organization to follow for new feature proposals, etc. Would it make sense to consolidate future design docs in either a designated area on the Apache Confluence Wiki, or on GitHub's Wiki pages? If people have a strong preference to keep the design docs on Google Docs, then could we have a top-level page on the confluence wiki that lists all active and archived design docs? Punya
Re: Design docs: consolidation and discoverability
I'd love to see more design discussions consolidated in a single place as well. That said, there are many practical challenges to overcome. Some of them are out of our control: 1. For large features, it is fairly common to open a PR for discussion, close the PR taking some feedback into account, and reopen another one. You sort of lose the discussions that way. 2. With the way Jenkins is setup currently, Jenkins testing introduces a lot of noise to GitHub pull requests, making it hard to differentiate legitimate comments from noise. This is unfortunately due to the fact that ASF won't allow our Jenkins bot to have API privilege to post messages. 3. The Apache Way is that all development discussions need to happen on ASF property, i.e. dev lists and JIRA. As a result, technically we are not allowed to have development discussions on GitHub. On Fri, Apr 24, 2015 at 7:09 AM, Cody Koeninger c...@koeninger.org wrote: My 2 cents - I'd rather see design docs in github pull requests (using plain text / markdown). That doesn't require changing access or adding people, and github PRs already allow for conversation / email notifications. Conversation is already split between jira and github PRs. Having a third stream of conversation in Google Docs just leads to things being ignored. On Fri, Apr 24, 2015 at 7:21 AM, Sean Owen so...@cloudera.com wrote: That would require giving wiki access to everyone or manually adding people any time they make a doc. I don't see how this helps though. They're still docs on the internet and they're still linked from the central project JIRA, which is what you should follow. On Apr 24, 2015 8:14 AM, Punyashloka Biswal punya.bis...@gmail.com wrote: Dear Spark devs, Right now, design docs are stored on Google docs and linked from tickets. For someone new to the project, it's hard to figure out what subjects are being discussed, what organization to follow for new feature proposals, etc. Would it make sense to consolidate future design docs in either a designated area on the Apache Confluence Wiki, or on GitHub's Wiki pages? If people have a strong preference to keep the design docs on Google Docs, then could we have a top-level page on the confluence wiki that lists all active and archived design docs? Punya
Re: Design docs: consolidation and discoverability
Okay, I can understand wanting to keep Git history clean, and avoid bottlenecking on committers. Is it reasonable to establish a convention of having a label, component or (best of all) an issue type for issues that are associated with design docs? For example, if we used the existing Brainstorming issue type, and people put their design doc in the description of the ticket, it would be relatively easy to figure out what designs are in progress. Given the push-back against design docs in Git or on the wiki and the strong preference for keeping docs on ASF property, I'm a bit surprised that all the existing design docs are on Google Docs. Perhaps Apache should consider opening up parts of the wiki to a larger group, to better serve this use case. Punya On Fri, Apr 24, 2015 at 5:01 PM Patrick Wendell pwend...@gmail.com wrote: Using our ASF git repository as a working area for design docs, it seems potentially concerning to me. It's difficult process wise because all commits need to go through committers and also, we'd pollute our git history a lot with random incremental design updates. The git history is used a lot by downstream packagers, us during our QA process, etc... we really try to keep it oriented around code patches: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=shortlog Committing a polished design doc along with a feature, maybe that's something we could consider. But I still think JIRA is the best location for these docs, consistent with what most other ASF projects do that I know. On Fri, Apr 24, 2015 at 1:19 PM, Cody Koeninger c...@koeninger.org wrote: Why can't pull requests be used for design docs in Git if people who aren't committers want to contribute changes (as opposed to just comments)? On Fri, Apr 24, 2015 at 2:57 PM, Sean Owen so...@cloudera.com wrote: Only catch there is it requires commit access to the repo. We need a way for people who aren't committers to write and collaborate (for point #1) On Fri, Apr 24, 2015 at 3:56 PM, Punyashloka Biswal punya.bis...@gmail.com wrote: Sandy, doesn't keeping (in-progress) design docs in Git satisfy the history requirement? Referring back to my Gradle example, it seems that https://github.com/gradle/gradle/commits/master/design-docs/build-comparison.md is a really good way to see why the design doc evolved the way it did. When keeping the doc in Jira (presumably as an attachment) it's not easy to see what changed between successive versions of the doc. Punya On Fri, Apr 24, 2015 at 3:53 PM Sandy Ryza sandy.r...@cloudera.com wrote: I think there are maybe two separate things we're talking about? 1. Design discussions and in-progress design docs. My two cents are that JIRA is the best place for this. It allows tracking the progression of a design across multiple PRs and contributors. A piece of useful feedback that I've gotten in the past is to make design docs immutable. When updating them in response to feedback, post a new version rather than editing the existing one. This enables tracking the history of a design and makes it possible to read comments about previous designs in context. Otherwise it's really difficult to understand why particular approaches were chosen or abandoned. 2. Completed design docs for features that we've implemented. Perhaps less essential to project progress, but it would be really lovely to have a central repository to all the projects design doc. If anyone wants to step up to maintain it, it would be cool to have a wiki page with links to all the final design docs posted on JIRA.
Re: Design docs: consolidation and discoverability
I know I recently used Google Docs from a JIRA, so am guilty as charged. I don't think there are a lot of design docs in general, but the ones I've seen have simply pushed docs to a JIRA. (I did the same, mirroring PDFs of the Google Doc.) I don't think this is hard to follow. I think you can do what you like: make a JIRA and attach files. Make a WIP PR and attach your notes. Make a Google Doc if you're feeling transgressive. I don't see much of a problem to solve here. In practice there are plenty of workable options, all of which are mainstream, and so I do not see an argument that somehow this is solved by letting people make wikis. On Fri, Apr 24, 2015 at 7:42 PM, Punyashloka Biswal punya.bis...@gmail.com wrote: Okay, I can understand wanting to keep Git history clean, and avoid bottlenecking on committers. Is it reasonable to establish a convention of having a label, component or (best of all) an issue type for issues that are associated with design docs? For example, if we used the existing Brainstorming issue type, and people put their design doc in the description of the ticket, it would be relatively easy to figure out what designs are in progress. Given the push-back against design docs in Git or on the wiki and the strong preference for keeping docs on ASF property, I'm a bit surprised that all the existing design docs are on Google Docs. Perhaps Apache should consider opening up parts of the wiki to a larger group, to better serve this use case. Punya On Fri, Apr 24, 2015 at 5:01 PM Patrick Wendell pwend...@gmail.com wrote: Using our ASF git repository as a working area for design docs, it seems potentially concerning to me. It's difficult process wise because all commits need to go through committers and also, we'd pollute our git history a lot with random incremental design updates. The git history is used a lot by downstream packagers, us during our QA process, etc... we really try to keep it oriented around code patches: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=shortlog Committing a polished design doc along with a feature, maybe that's something we could consider. But I still think JIRA is the best location for these docs, consistent with what most other ASF projects do that I know. On Fri, Apr 24, 2015 at 1:19 PM, Cody Koeninger c...@koeninger.org wrote: Why can't pull requests be used for design docs in Git if people who aren't committers want to contribute changes (as opposed to just comments)? On Fri, Apr 24, 2015 at 2:57 PM, Sean Owen so...@cloudera.com wrote: Only catch there is it requires commit access to the repo. We need a way for people who aren't committers to write and collaborate (for point #1) On Fri, Apr 24, 2015 at 3:56 PM, Punyashloka Biswal punya.bis...@gmail.com wrote: Sandy, doesn't keeping (in-progress) design docs in Git satisfy the history requirement? Referring back to my Gradle example, it seems that https://github.com/gradle/gradle/commits/master/design-docs/build-comparison.md is a really good way to see why the design doc evolved the way it did. When keeping the doc in Jira (presumably as an attachment) it's not easy to see what changed between successive versions of the doc. Punya On Fri, Apr 24, 2015 at 3:53 PM Sandy Ryza sandy.r...@cloudera.com wrote: I think there are maybe two separate things we're talking about? 1. Design discussions and in-progress design docs. My two cents are that JIRA is the best place for this. It allows tracking the progression of a design across multiple PRs and contributors. A piece of useful feedback that I've gotten in the past is to make design docs immutable. When updating them in response to feedback, post a new version rather than editing the existing one. This enables tracking the history of a design and makes it possible to read comments about previous designs in context. Otherwise it's really difficult to understand why particular approaches were chosen or abandoned. 2. Completed design docs for features that we've implemented. Perhaps less essential to project progress, but it would be really lovely to have a central repository to all the projects design doc. If anyone wants to step up to maintain it, it would be cool to have a wiki page with links to all the final design docs posted on JIRA. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Design docs: consolidation and discoverability
I think it's OK to have design discussions on github, as emails go to ASF lists. After all, loads of PR discussions happen there. It's easy for anyone to follow. I also would rather just discuss on Github, except for all that noise. It's not great to put discussions in something like Google Docs actually; the resulting doc needs to be pasted back to JIRA promptly if so. I suppose it's still better than a private conversation or not talking at all, but the principle is that one should be able to access any substantive decision or conversation by being tuned in to only the project systems of record -- mailing list, JIRA. On Fri, Apr 24, 2015 at 2:30 PM, Reynold Xin r...@databricks.com wrote: I'd love to see more design discussions consolidated in a single place as well. That said, there are many practical challenges to overcome. Some of them are out of our control: 1. For large features, it is fairly common to open a PR for discussion, close the PR taking some feedback into account, and reopen another one. You sort of lose the discussions that way. 2. With the way Jenkins is setup currently, Jenkins testing introduces a lot of noise to GitHub pull requests, making it hard to differentiate legitimate comments from noise. This is unfortunately due to the fact that ASF won't allow our Jenkins bot to have API privilege to post messages. 3. The Apache Way is that all development discussions need to happen on ASF property, i.e. dev lists and JIRA. As a result, technically we are not allowed to have development discussions on GitHub. On Fri, Apr 24, 2015 at 7:09 AM, Cody Koeninger c...@koeninger.org wrote: My 2 cents - I'd rather see design docs in github pull requests (using plain text / markdown). That doesn't require changing access or adding people, and github PRs already allow for conversation / email notifications. Conversation is already split between jira and github PRs. Having a third stream of conversation in Google Docs just leads to things being ignored. On Fri, Apr 24, 2015 at 7:21 AM, Sean Owen so...@cloudera.com wrote: That would require giving wiki access to everyone or manually adding people any time they make a doc. I don't see how this helps though. They're still docs on the internet and they're still linked from the central project JIRA, which is what you should follow. On Apr 24, 2015 8:14 AM, Punyashloka Biswal punya.bis...@gmail.com wrote: Dear Spark devs, Right now, design docs are stored on Google docs and linked from tickets. For someone new to the project, it's hard to figure out what subjects are being discussed, what organization to follow for new feature proposals, etc. Would it make sense to consolidate future design docs in either a designated area on the Apache Confluence Wiki, or on GitHub's Wiki pages? If people have a strong preference to keep the design docs on Google Docs, then could we have a top-level page on the confluence wiki that lists all active and archived design docs? Punya - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Design docs: consolidation and discoverability
The Gradle dev team keep their design documents *checked into* their Git repository -- see https://github.com/gradle/gradle/blob/master/design-docs/build-comparison.md for example. The advantages I see to their approach are: - design docs stay on ASF property (since Github is synced to the Apache-run Git repository) - design docs have a lifetime across PRs, but can still be modified and commented on through the mechanism of PRs - keeping a central location helps people to find good role models and converge on conventions Sean, I find it hard to use the central Jira as a jumping-off point for understanding ongoing design work because a tiny fraction of the tickets actually relate to design docs, and it's not easy from the outside to figure out which ones are relevant. Punya On Fri, Apr 24, 2015 at 2:49 PM Sean Owen so...@cloudera.com wrote: I think it's OK to have design discussions on github, as emails go to ASF lists. After all, loads of PR discussions happen there. It's easy for anyone to follow. I also would rather just discuss on Github, except for all that noise. It's not great to put discussions in something like Google Docs actually; the resulting doc needs to be pasted back to JIRA promptly if so. I suppose it's still better than a private conversation or not talking at all, but the principle is that one should be able to access any substantive decision or conversation by being tuned in to only the project systems of record -- mailing list, JIRA. On Fri, Apr 24, 2015 at 2:30 PM, Reynold Xin r...@databricks.com wrote: I'd love to see more design discussions consolidated in a single place as well. That said, there are many practical challenges to overcome. Some of them are out of our control: 1. For large features, it is fairly common to open a PR for discussion, close the PR taking some feedback into account, and reopen another one. You sort of lose the discussions that way. 2. With the way Jenkins is setup currently, Jenkins testing introduces a lot of noise to GitHub pull requests, making it hard to differentiate legitimate comments from noise. This is unfortunately due to the fact that ASF won't allow our Jenkins bot to have API privilege to post messages. 3. The Apache Way is that all development discussions need to happen on ASF property, i.e. dev lists and JIRA. As a result, technically we are not allowed to have development discussions on GitHub. On Fri, Apr 24, 2015 at 7:09 AM, Cody Koeninger c...@koeninger.org wrote: My 2 cents - I'd rather see design docs in github pull requests (using plain text / markdown). That doesn't require changing access or adding people, and github PRs already allow for conversation / email notifications. Conversation is already split between jira and github PRs. Having a third stream of conversation in Google Docs just leads to things being ignored. On Fri, Apr 24, 2015 at 7:21 AM, Sean Owen so...@cloudera.com wrote: That would require giving wiki access to everyone or manually adding people any time they make a doc. I don't see how this helps though. They're still docs on the internet and they're still linked from the central project JIRA, which is what you should follow. On Apr 24, 2015 8:14 AM, Punyashloka Biswal punya.bis...@gmail.com wrote: Dear Spark devs, Right now, design docs are stored on Google docs and linked from tickets. For someone new to the project, it's hard to figure out what subjects are being discussed, what organization to follow for new feature proposals, etc. Would it make sense to consolidate future design docs in either a designated area on the Apache Confluence Wiki, or on GitHub's Wiki pages? If people have a strong preference to keep the design docs on Google Docs, then could we have a top-level page on the confluence wiki that lists all active and archived design docs? Punya
Re: Design docs: consolidation and discoverability
Using our ASF git repository as a working area for design docs, it seems potentially concerning to me. It's difficult process wise because all commits need to go through committers and also, we'd pollute our git history a lot with random incremental design updates. The git history is used a lot by downstream packagers, us during our QA process, etc... we really try to keep it oriented around code patches: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=shortlog Committing a polished design doc along with a feature, maybe that's something we could consider. But I still think JIRA is the best location for these docs, consistent with what most other ASF projects do that I know. On Fri, Apr 24, 2015 at 1:19 PM, Cody Koeninger c...@koeninger.org wrote: Why can't pull requests be used for design docs in Git if people who aren't committers want to contribute changes (as opposed to just comments)? On Fri, Apr 24, 2015 at 2:57 PM, Sean Owen so...@cloudera.com wrote: Only catch there is it requires commit access to the repo. We need a way for people who aren't committers to write and collaborate (for point #1) On Fri, Apr 24, 2015 at 3:56 PM, Punyashloka Biswal punya.bis...@gmail.com wrote: Sandy, doesn't keeping (in-progress) design docs in Git satisfy the history requirement? Referring back to my Gradle example, it seems that https://github.com/gradle/gradle/commits/master/design-docs/build-comparison.md is a really good way to see why the design doc evolved the way it did. When keeping the doc in Jira (presumably as an attachment) it's not easy to see what changed between successive versions of the doc. Punya On Fri, Apr 24, 2015 at 3:53 PM Sandy Ryza sandy.r...@cloudera.com wrote: I think there are maybe two separate things we're talking about? 1. Design discussions and in-progress design docs. My two cents are that JIRA is the best place for this. It allows tracking the progression of a design across multiple PRs and contributors. A piece of useful feedback that I've gotten in the past is to make design docs immutable. When updating them in response to feedback, post a new version rather than editing the existing one. This enables tracking the history of a design and makes it possible to read comments about previous designs in context. Otherwise it's really difficult to understand why particular approaches were chosen or abandoned. 2. Completed design docs for features that we've implemented. Perhaps less essential to project progress, but it would be really lovely to have a central repository to all the projects design doc. If anyone wants to step up to maintain it, it would be cool to have a wiki page with links to all the final design docs posted on JIRA. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Design docs: consolidation and discoverability
I think there are maybe two separate things we're talking about? 1. Design discussions and in-progress design docs. My two cents are that JIRA is the best place for this. It allows tracking the progression of a design across multiple PRs and contributors. A piece of useful feedback that I've gotten in the past is to make design docs immutable. When updating them in response to feedback, post a new version rather than editing the existing one. This enables tracking the history of a design and makes it possible to read comments about previous designs in context. Otherwise it's really difficult to understand why particular approaches were chosen or abandoned. 2. Completed design docs for features that we've implemented. Perhaps less essential to project progress, but it would be really lovely to have a central repository to all the projects design doc. If anyone wants to step up to maintain it, it would be cool to have a wiki page with links to all the final design docs posted on JIRA. -Sandy On Fri, Apr 24, 2015 at 12:01 PM, Punyashloka Biswal punya.bis...@gmail.com wrote: The Gradle dev team keep their design documents *checked into* their Git repository -- see https://github.com/gradle/gradle/blob/master/design-docs/build-comparison.md for example. The advantages I see to their approach are: - design docs stay on ASF property (since Github is synced to the Apache-run Git repository) - design docs have a lifetime across PRs, but can still be modified and commented on through the mechanism of PRs - keeping a central location helps people to find good role models and converge on conventions Sean, I find it hard to use the central Jira as a jumping-off point for understanding ongoing design work because a tiny fraction of the tickets actually relate to design docs, and it's not easy from the outside to figure out which ones are relevant. Punya On Fri, Apr 24, 2015 at 2:49 PM Sean Owen so...@cloudera.com wrote: I think it's OK to have design discussions on github, as emails go to ASF lists. After all, loads of PR discussions happen there. It's easy for anyone to follow. I also would rather just discuss on Github, except for all that noise. It's not great to put discussions in something like Google Docs actually; the resulting doc needs to be pasted back to JIRA promptly if so. I suppose it's still better than a private conversation or not talking at all, but the principle is that one should be able to access any substantive decision or conversation by being tuned in to only the project systems of record -- mailing list, JIRA. On Fri, Apr 24, 2015 at 2:30 PM, Reynold Xin r...@databricks.com wrote: I'd love to see more design discussions consolidated in a single place as well. That said, there are many practical challenges to overcome. Some of them are out of our control: 1. For large features, it is fairly common to open a PR for discussion, close the PR taking some feedback into account, and reopen another one. You sort of lose the discussions that way. 2. With the way Jenkins is setup currently, Jenkins testing introduces a lot of noise to GitHub pull requests, making it hard to differentiate legitimate comments from noise. This is unfortunately due to the fact that ASF won't allow our Jenkins bot to have API privilege to post messages. 3. The Apache Way is that all development discussions need to happen on ASF property, i.e. dev lists and JIRA. As a result, technically we are not allowed to have development discussions on GitHub. On Fri, Apr 24, 2015 at 7:09 AM, Cody Koeninger c...@koeninger.org wrote: My 2 cents - I'd rather see design docs in github pull requests (using plain text / markdown). That doesn't require changing access or adding people, and github PRs already allow for conversation / email notifications. Conversation is already split between jira and github PRs. Having a third stream of conversation in Google Docs just leads to things being ignored. On Fri, Apr 24, 2015 at 7:21 AM, Sean Owen so...@cloudera.com wrote: That would require giving wiki access to everyone or manually adding people any time they make a doc. I don't see how this helps though. They're still docs on the internet and they're still linked from the central project JIRA, which is what you should follow. On Apr 24, 2015 8:14 AM, Punyashloka Biswal punya.bis...@gmail.com wrote: Dear Spark devs, Right now, design docs are stored on Google docs and linked from tickets. For someone new to the project, it's hard to figure out what subjects are being discussed, what organization to follow for new feature proposals, etc. Would it make sense to consolidate future design docs in either a designated area on the Apache Confluence
Re: Design docs: consolidation and discoverability
Sandy, doesn't keeping (in-progress) design docs in Git satisfy the history requirement? Referring back to my Gradle example, it seems that https://github.com/gradle/gradle/commits/master/design-docs/build-comparison.md is a really good way to see why the design doc evolved the way it did. When keeping the doc in Jira (presumably as an attachment) it's not easy to see what changed between successive versions of the doc. Punya On Fri, Apr 24, 2015 at 3:53 PM Sandy Ryza sandy.r...@cloudera.com wrote: I think there are maybe two separate things we're talking about? 1. Design discussions and in-progress design docs. My two cents are that JIRA is the best place for this. It allows tracking the progression of a design across multiple PRs and contributors. A piece of useful feedback that I've gotten in the past is to make design docs immutable. When updating them in response to feedback, post a new version rather than editing the existing one. This enables tracking the history of a design and makes it possible to read comments about previous designs in context. Otherwise it's really difficult to understand why particular approaches were chosen or abandoned. 2. Completed design docs for features that we've implemented. Perhaps less essential to project progress, but it would be really lovely to have a central repository to all the projects design doc. If anyone wants to step up to maintain it, it would be cool to have a wiki page with links to all the final design docs posted on JIRA. -Sandy On Fri, Apr 24, 2015 at 12:01 PM, Punyashloka Biswal punya.bis...@gmail.com wrote: The Gradle dev team keep their design documents *checked into* their Git repository -- see https://github.com/gradle/gradle/blob/master/design-docs/build-comparison.md for example. The advantages I see to their approach are: - design docs stay on ASF property (since Github is synced to the Apache-run Git repository) - design docs have a lifetime across PRs, but can still be modified and commented on through the mechanism of PRs - keeping a central location helps people to find good role models and converge on conventions Sean, I find it hard to use the central Jira as a jumping-off point for understanding ongoing design work because a tiny fraction of the tickets actually relate to design docs, and it's not easy from the outside to figure out which ones are relevant. Punya On Fri, Apr 24, 2015 at 2:49 PM Sean Owen so...@cloudera.com wrote: I think it's OK to have design discussions on github, as emails go to ASF lists. After all, loads of PR discussions happen there. It's easy for anyone to follow. I also would rather just discuss on Github, except for all that noise. It's not great to put discussions in something like Google Docs actually; the resulting doc needs to be pasted back to JIRA promptly if so. I suppose it's still better than a private conversation or not talking at all, but the principle is that one should be able to access any substantive decision or conversation by being tuned in to only the project systems of record -- mailing list, JIRA. On Fri, Apr 24, 2015 at 2:30 PM, Reynold Xin r...@databricks.com wrote: I'd love to see more design discussions consolidated in a single place as well. That said, there are many practical challenges to overcome. Some of them are out of our control: 1. For large features, it is fairly common to open a PR for discussion, close the PR taking some feedback into account, and reopen another one. You sort of lose the discussions that way. 2. With the way Jenkins is setup currently, Jenkins testing introduces a lot of noise to GitHub pull requests, making it hard to differentiate legitimate comments from noise. This is unfortunately due to the fact that ASF won't allow our Jenkins bot to have API privilege to post messages. 3. The Apache Way is that all development discussions need to happen on ASF property, i.e. dev lists and JIRA. As a result, technically we are not allowed to have development discussions on GitHub. On Fri, Apr 24, 2015 at 7:09 AM, Cody Koeninger c...@koeninger.org wrote: My 2 cents - I'd rather see design docs in github pull requests (using plain text / markdown). That doesn't require changing access or adding people, and github PRs already allow for conversation / email notifications. Conversation is already split between jira and github PRs. Having a third stream of conversation in Google Docs just leads to things being ignored. On Fri, Apr 24, 2015 at 7:21 AM, Sean Owen so...@cloudera.com wrote: That would require giving wiki access to everyone or manually adding people any time they make a doc. I don't see how this helps though. They're still docs on the internet and they're still linked from the central project JIRA,
Re: Design docs: consolidation and discoverability
Only catch there is it requires commit access to the repo. We need a way for people who aren't committers to write and collaborate (for point #1) On Fri, Apr 24, 2015 at 3:56 PM, Punyashloka Biswal punya.bis...@gmail.com wrote: Sandy, doesn't keeping (in-progress) design docs in Git satisfy the history requirement? Referring back to my Gradle example, it seems that https://github.com/gradle/gradle/commits/master/design-docs/build-comparison.md is a really good way to see why the design doc evolved the way it did. When keeping the doc in Jira (presumably as an attachment) it's not easy to see what changed between successive versions of the doc. Punya On Fri, Apr 24, 2015 at 3:53 PM Sandy Ryza sandy.r...@cloudera.com wrote: I think there are maybe two separate things we're talking about? 1. Design discussions and in-progress design docs. My two cents are that JIRA is the best place for this. It allows tracking the progression of a design across multiple PRs and contributors. A piece of useful feedback that I've gotten in the past is to make design docs immutable. When updating them in response to feedback, post a new version rather than editing the existing one. This enables tracking the history of a design and makes it possible to read comments about previous designs in context. Otherwise it's really difficult to understand why particular approaches were chosen or abandoned. 2. Completed design docs for features that we've implemented. Perhaps less essential to project progress, but it would be really lovely to have a central repository to all the projects design doc. If anyone wants to step up to maintain it, it would be cool to have a wiki page with links to all the final design docs posted on JIRA. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Design docs: consolidation and discoverability
Why can't pull requests be used for design docs in Git if people who aren't committers want to contribute changes (as opposed to just comments)? On Fri, Apr 24, 2015 at 2:57 PM, Sean Owen so...@cloudera.com wrote: Only catch there is it requires commit access to the repo. We need a way for people who aren't committers to write and collaborate (for point #1) On Fri, Apr 24, 2015 at 3:56 PM, Punyashloka Biswal punya.bis...@gmail.com wrote: Sandy, doesn't keeping (in-progress) design docs in Git satisfy the history requirement? Referring back to my Gradle example, it seems that https://github.com/gradle/gradle/commits/master/design-docs/build-comparison.md is a really good way to see why the design doc evolved the way it did. When keeping the doc in Jira (presumably as an attachment) it's not easy to see what changed between successive versions of the doc. Punya On Fri, Apr 24, 2015 at 3:53 PM Sandy Ryza sandy.r...@cloudera.com wrote: I think there are maybe two separate things we're talking about? 1. Design discussions and in-progress design docs. My two cents are that JIRA is the best place for this. It allows tracking the progression of a design across multiple PRs and contributors. A piece of useful feedback that I've gotten in the past is to make design docs immutable. When updating them in response to feedback, post a new version rather than editing the existing one. This enables tracking the history of a design and makes it possible to read comments about previous designs in context. Otherwise it's really difficult to understand why particular approaches were chosen or abandoned. 2. Completed design docs for features that we've implemented. Perhaps less essential to project progress, but it would be really lovely to have a central repository to all the projects design doc. If anyone wants to step up to maintain it, it would be cool to have a wiki page with links to all the final design docs posted on JIRA.