I started https://github.com/apache/arrow/pull/5015 for the removal last week; will finish that up today or tomorrow.
Neal On Sun, Aug 11, 2019 at 8:23 AM Wes McKinney <wesmck...@gmail.com> wrote: > > It looks like the git pruning is done. So we can remove the site/ > directory from the main repository at some point soon. > > On Thu, Aug 8, 2019 at 2:29 PM Neal Richardson > <neal.p.richard...@gmail.com> wrote: > > > > I need a committer to make a master branch on arrow-site so that I can > > PR to it. I thought it could be just an empty orphan branch but that > > proved not to work, so a committer will need to do the following: > > > > ``` > > git clone g...@github.com:$YOURGITHUB/arrow.git arrow-copy > > cd arrow-copy > > git filter-branch --prune-empty --subdirectory-filter site master > > vi .git/config > > # Change remote "origin"'s URL to be g...@github.com:arrow/arrow-site.git > > git push -f origin master > > ``` > > > > On Thu, Aug 8, 2019 at 12:07 PM Wes McKinney <wesmck...@gmail.com> wrote: > > > > > > Yes, I think we have adequate lazy consensus. Can you spell out what > > > are the next steps? > > > > > > On Thu, Aug 8, 2019 at 2:01 PM Neal Richardson > > > <neal.p.richard...@gmail.com> wrote: > > > > > > > > Have we reached "lazy consensus" here? No further comments in the last > > > > three days. > > > > > > > > Thanks, > > > > Neal > > > > > > > > On Mon, Aug 5, 2019 at 1:46 PM Joris Van den Bossche > > > > <jorisvandenboss...@gmail.com> wrote: > > > > > > > > > > This sounds as a good proposal to me (at least at the moment where we > > > > > have > > > > > separate docs and main site). > > > > > I agree that documentation should indeed stay with the code, as you > > > > > want to > > > > > update those together in PRs. But the website is something you can > > > > > typically update separately and also might want to update > > > > > independently > > > > > from code releases. And certainly if this proposal makes it easier to > > > > > work > > > > > on the site, all the better. > > > > > > > > > > Joris > > > > > > > > > > Op ma 5 aug. 2019 20:30 schreef Wes McKinney <wesmck...@gmail.com>: > > > > > > > > > > > Let's wait a little while to collect any additional opinions about > > > > > > this. > > > > > > > > > > > > There's pretty good evidence from other Apache projects that this > > > > > > isn't too bad of an idea > > > > > > > > > > > > Apache Calcite: https://github.com/apache/calcite-site > > > > > > Apache Kafka: https://github.com/apache/kafka-site > > > > > > Apache Spark: https://github.com/apache/spark-website > > > > > > > > > > > > The Apache projects I've seen where the same repository is used for > > > > > > $FOO.apache.org tend to be ones where the documentation _is_ the > > > > > > website. I think we would need to commission a significant web > > > > > > design > > > > > > overhaul to be able to make our documentation page adequate as the > > > > > > landing point for visitors to https://arrow.apache.org. > > > > > > > > > > > > On Sat, Aug 3, 2019 at 3:46 PM Neal Richardson > > > > > > <neal.p.richard...@gmail.com> wrote: > > > > > > > > > > > > > > Given the status quo, it would be difficult for this to make the > > > > > > > Arrow > > > > > > > website less maintained. In fact, arrow-site is currently missing > > > > > > > the > > > > > > > most recent two patches that modified the site directory in > > > > > > > apache/arrow. Having multiple manual deploy steps increases the > > > > > > > likelihood that the website stays stale. > > > > > > > > > > > > > > As someone who has been working on the arrow site lately, this > > > > > > > proposal makes it easier for me to make changes to the website > > > > > > > because > > > > > > > I can automatically deploy my changes to a test site, and that > > > > > > > lets > > > > > > > others in the community, who perhaps don't touch the website much, > > > > > > > verify that they're good. > > > > > > > > > > > > > > I agree that the documentation situation needs attention, but as I > > > > > > > said initially, that's orthogonal to this static site generation. > > > > > > > I'd > > > > > > > like to work on that next, and I think these changes will make it > > > > > > > easier to do. I would not propose moving doc generation out of > > > > > > > apache/arrow--that belongs with the code. > > > > > > > > > > > > > > Neal > > > > > > > > > > > > > > On Sat, Aug 3, 2019 at 9:49 AM Wes McKinney <wesmck...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > I think that the project website and the project documentation > > > > > > > > are > > > > > > > > currently distinct entities. The current Jekyll website is > > > > > > > > independent > > > > > > > > from the Sphinx documentation project aside from a link to the > > > > > > > > documentation from the website. > > > > > > > > > > > > > > > > I am guessing that we would want to maintain some amount of > > > > > > > > separation > > > > > > > > between the main site at arrow.apache.org and the code / format > > > > > > > > documentation, at minimum because we may want to make > > > > > > > > documentation > > > > > > > > available for multiple versions of the project (this has > > > > > > > > already been > > > > > > > > cited as an issue -- when we release, we're overwriting the > > > > > > > > previous > > > > > > > > version of the docs) > > > > > > > > > > > > > > > > On Sat, Aug 3, 2019 at 11:33 AM Antoine Pitrou > > > > > > > > <anto...@python.org> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > I am concerned with this. What happens if we happen to move > > > > > > > > > part of > > > > > > the > > > > > > > > > current site to e.g. the Sphinx docs in the Arrow repository > > > > > > > > > (we > > > > > > already > > > > > > > > > did that, so it's not theoretical)? > > > > > > > > > > > > > > > > > > More generally, I also think that any move towards separating > > > > > > > > > website > > > > > > > > > and code repo more will lead to an even less maintained > > > > > > > > > website. > > > > > > > > > > > > > > > > > > Regards > > > > > > > > > > > > > > > > > > Antoine. > > > > > > > > > > > > > > > > > > > > > > > > > > > Le 02/08/2019 à 22:39, Wes McKinney a écrit : > > > > > > > > > > hi Neal, > > > > > > > > > > > > > > > > > > > > In general the improvements to the site sound good, and I > > > > > > > > > > agree > > > > > > with > > > > > > > > > > moving the site into the apache/arrow-site repository. > > > > > > > > > > > > > > > > > > > > It sounds like a committer will have to volunteer a PAT for > > > > > > > > > > the > > > > > > Travis > > > > > > > > > > CI settings in > > > > > > > > > > > > > > > > > > > > https://travis-ci.org/apache/arrow-site/settings > > > > > > > > > > > > > > > > > > > > Even though you can't get at such an environment variable > > > > > > > > > > there > > > > > > after > > > > > > > > > > it's set, it could still technically be compromised. > > > > > > > > > > Personally I > > > > > > > > > > wouldn't be comfortable having a token with "repo" scope out > > > > > > there. We > > > > > > > > > > might need to think about this some more -- the general > > > > > > > > > > idea of > > > > > > making > > > > > > > > > > it easier to deploy the website I'm totally on board with > > > > > > > > > > > > > > > > > > > > - Wes > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Aug 2, 2019 at 1:35 PM Neal Richardson > > > > > > > > > > <neal.p.richard...@gmail.com> wrote: > > > > > > > > > >> > > > > > > > > > >> Hi all, > > > > > > > > > >> https://issues.apache.org/jira/browse/ARROW-5746 requested > > > > > > > > > >> to > > > > > > move the > > > > > > > > > >> source for https://arrow.apache.org out of `apache/arrow` > > > > > > > > > >> due to > > > > > > the > > > > > > > > > >> growing number of binary files (mostly images) there. > > > > > > > > > >> > > > > > > > > > >> https://issues.apache.org/jira/browse/ARROW-4473 requested > > > > > > > > > >> improvements to the ability to make a test deploy of the > > > > > > > > > >> website > > > > > > and > > > > > > > > > >> noted challenges/bugs in trying to do this when the site > > > > > > `baseurl` is > > > > > > > > > >> a subdirectory. > > > > > > > > > >> > > > > > > > > > >> On my fork of `arrow-site` [1] I have a solution to both. I > > > > > > created a > > > > > > > > > >> `master` branch and copied the contents of the `site/` > > > > > > > > > >> directory > > > > > > in > > > > > > > > > >> `apache/arrow` to that, using `git filter-branch > > > > > > > > > >> --prune-empty > > > > > > > > > >> --subdirectory-filter site master` to preserve the commit > > > > > > > > > >> history > > > > > > [2]. > > > > > > > > > >> Then I added a build script [3] that gets executed by > > > > > > > > > >> Travis-CI > > > > > > [4]. > > > > > > > > > >> > > > > > > > > > >> The script builds the Jekyll site and pushes it to a > > > > > > > > > >> branch that > > > > > > gets > > > > > > > > > >> published. On `apache/arrow-site`, commits to the `master` > > > > > > > > > >> branch > > > > > > > > > >> trigger a build of the Jekyll site and push the result to > > > > > > > > > >> the > > > > > > > > > >> `asf-site` branch. On forks, commits to `master` build the > > > > > > > > > >> site > > > > > > and > > > > > > > > > >> publish to the `gh-pages` branch, which can deploy to > > > > > > > > > >> GitHub > > > > > > Pages. > > > > > > > > > >> > > > > > > > > > >> ## Features > > > > > > > > > >> > > > > > > > > > >> * Automatic building of the arrow.apache.org site whenever > > > > > > changes are > > > > > > > > > >> made to the Jekyll source--no manual build step required. > > > > > > > > > >> * Automatic building of a test site from your fork, which > > > > > > > > > >> will > > > > > > enable > > > > > > > > > >> reviewers to verify your changes without having to build > > > > > > > > > >> and serve > > > > > > > > > >> locally and trust that what works locally will work when > > > > > > > > > >> deployed. > > > > > > > > > >> * Relative URL problems are fixed: links work regardless of > > > > > > whether > > > > > > > > > >> the "base URL" is top level or a subdirectory. > > > > > > > > > >> * Reduced size of the core `apache/arrow` repository > > > > > > > > > >> * Documentation publishing is not affected. Updating the > > > > > > > > > >> contents > > > > > > of > > > > > > > > > >> the `docs/` directory in the published `asf-site` branch > > > > > > > > > >> can > > > > > > continue > > > > > > > > > >> to happen by whatever other process. The automatic > > > > > > > > > >> building and > > > > > > > > > >> publishing of the Jekyll site does not overwrite the > > > > > > > > > >> `docs/` > > > > > > > > > >> directory. > > > > > > > > > >> > > > > > > > > > >> ## Usage > > > > > > > > > >> > > > > > > > > > >> Local development and serving of the Jekyll site is not > > > > > > > > > >> affected > > > > > > by > > > > > > > > > >> this build process--it works exactly the same as before, > > > > > > > > > >> just > > > > > > located > > > > > > > > > >> in the `arrow-site` repository instead of the `site/` > > > > > > > > > >> directory of > > > > > > > > > >> `apache/arrow`. > > > > > > > > > >> > > > > > > > > > >> To enable the automatic building on your fork, there are a > > > > > > > > > >> couple > > > > > > of > > > > > > > > > >> quick setup steps to enable GitHub Pages and Travis-CI, > > > > > > > > > >> described > > > > > > here > > > > > > > > > >> [5]. > > > > > > > > > >> > > > > > > > > > >> In order set up the automatic deploy on > > > > > > > > > >> `apache/arrow-site`, a > > > > > > > > > >> committer will need to set a GITHUB_PAT there. I imagine > > > > > > > > > >> there > > > > > > could > > > > > > > > > >> be some hesitation to doing this, but it is safe because > > > > > > > > > >> > > > > > > > > > >> 1. Builds only happen on the master branch, and only > > > > > > > > > >> committers > > > > > > can > > > > > > > > > >> modify the master branch, so by accepting a patch to > > > > > > > > > >> `master`, > > > > > > they're > > > > > > > > > >> implicitly accepting a patch to `asf-site` > > > > > > > > > >> 2. Malicious actors can't modify the build script in a pull > > > > > > request > > > > > > > > > >> and use the token because Travis does "not provide > > > > > > [repository-setting > > > > > > > > > >> environment variables] to untrusted builds, triggered by > > > > > > > > > >> pull > > > > > > requests > > > > > > > > > >> from another repository" [6] > > > > > > > > > >> 3. Non-committers cannot access the Travis-CI settings to > > > > > > > > > >> alter > > > > > > the > > > > > > > > > >> GITHUB_PAT (and even committers cannot view the value of > > > > > > > > > >> the token > > > > > > > > > >> once it is set) > > > > > > > > > >> 4. IIUC there is still a manual action required to get the > > > > > > > > > >> ASF to > > > > > > > > > >> update arrow.apache.org with the contents of the `asf-site` > > > > > > branch > > > > > > > > > >> > > > > > > > > > >> While it would be useful, it is not required that we enable > > > > > > automatic > > > > > > > > > >> deploy on `apache/arrow-site` in order to get benefit from > > > > > > > > > >> this > > > > > > > > > >> proposal because this enables contributors to opt-in to > > > > > > > > > >> deploying > > > > > > test > > > > > > > > > >> sites from their forks, and those tests sites will > > > > > > > > > >> actually work. > > > > > > > > > >> > > > > > > > > > >> Let me know if you have any questions or concerns. If > > > > > > > > > >> there are no > > > > > > > > > >> objections, then to proceed I'll need a committer to > > > > > > > > > >> create an > > > > > > orphan > > > > > > > > > >> `master` branch on `apache/arrow-site`, and then I can > > > > > > > > > >> make a pull > > > > > > > > > >> request to that, which we'd want to merge without > > > > > > > > > >> squashing in > > > > > > order > > > > > > > > > >> to preserve the git history of the site from > > > > > > > > > >> `apache/arrow`. > > > > > > > > > >> > > > > > > > > > >> Thanks, > > > > > > > > > >> Neal > > > > > > > > > >> > > > > > > > > > >> [1] https://github.com/nealrichardson/arrow-site/ > > > > > > > > > >> [2] > > > > > > > > > >> https://github.com/nealrichardson/arrow-site/commits/master > > > > > > > > > >> [3] > > > > > > https://github.com/nealrichardson/arrow-site/blob/master/build-and-deploy.sh > > > > > > > > > >> [4] > > > > > > https://github.com/nealrichardson/arrow-site/blob/master/.travis.yml > > > > > > > > > >> [5] > > > > > > https://github.com/nealrichardson/arrow-site/tree/master#previewing-the-site > > > > > > > > > >> [6] > > > > > > https://docs.travis-ci.com/user/environment-variables/#defining-variables-in-repository-settings > > > > > >