That looks very nice, thanks for the reactivity. Do you plan to cherry-pick the format-preserving-printer on top of some 1.2 branch? I'm fine with porting my own script back to trunk, but I wonder whether I will have the same incompatible-formatting issues as last time (where those clear bugs that you fixed, or just a change in expected format?).
On Wed, Nov 25, 2015 at 3:37 AM, Louis Gesbert <[email protected]> wrote: > I took a couple hours to implement the by-field format preserving printer in > opam, and it works quite well in practice. > > See a first PR here: https://github.com/ocaml/opam-repository/pull/5176/files > > I think this makes bulk updates much more acceptable. > I'll push the new opam-admin.top code shortly. > > > Le mardi 24 novembre 2015, 10:57:00 Louis Gesbert a écrit : >> Hopping in a bit late... :) >> >> Indeed, my first approach for bulk-updating opam files was to first >> normalise everything, so that diff could help me see what happened >> thereafter. This loses a lot of history, and is difficult to maintain >> afterwards -- but opam- fmt could help there. >> >> Another option would be to implement a format-preserving reprinter for bulk >> updates, that would keep the original fields order, and check every field >> for semantic changes before reprinting (we have precise file positions >> already, so we could extract the raw string to reprint when there are no >> changes). It shouldn't be too difficult to implement, doesn't require >> changes to the parser or format, and using this for bulk updates would >> already make them much more tolerable. Attaching comments to the AST would >> be nice and allow to keep them even in rewritten fields, but require much >> more dev time. >> >> Note 1: Gabriel, just making sure you are using opam trunk ? I rewrote big >> parts of OpamFormat/OpamFile recently (merged like one or two months ago). >> >> Note 2: I think we can now go ahead and push bulk changes to the repo. It >> should only break `opam init` on 1.1, and I seriously doubt anyone will >> complain about it anymore. If the rewriting script breaks, we'll just have >> to freeze the 1.1 updates and update the documentation pages accordingly. >> >> Best, >> Louis >> >> Le lundi 23 novembre 2015, 13:15:29 David Allsopp a écrit : >> > Tempting as it would be to spend the next couple of hours hacking it, my 2 >> > euro-cents… why not just make the lexer in opam itself lossless, and >> > expose >> > that in the API? So every single token emitted by the lexer becomes a pair >> > including the “whitespace” (= all whitespace, newlines, comments, etc.) >> > which follow that token and any tokens which may be ambiguously lexed >> > (e.g. >> > integers where arbitrary zero prefixes or where bases are permitted) have >> > to be deferred. All that’s needed then is a function which transforms >> > these >> > lossless tokens to the original lexer’s tokens (so dropping the >> > whitespace/comment parts, possibly performing a few int_of_string and >> > other >> > related functions – but all aided by the type system) for plugging into >> > the >> > parsing stage. It’s even relatively transparent to have the “debugging” >> > version of opam verify that the stream is actually reversible to the >> > original file. >> > >> > A task like this then uses the full-parsed power of opam-lib to identify >> > opam files which need updating (as you are) and then a fairly simple state >> > machine over the lossless lexer stream to update it. >> > >> > I think that having a rigid format used for opam-repository is a good >> > idea, >> > but given that presumably that wouldn’t want to become mandatory for all >> > repositories, being able to do formatting (and even version) preserving >> > updates seems useful. Having effectively a richer lexer also means no fork >> > required, so better maintainability, and also puts the onus of supporting >> > the updating of an older version opam file on the script author rather >> > than >> > the opam dev team (as presumably OPAM will always remain capable of >> > reading >> > older format files??). Making the update be an FSM over raw lexer tokens >> > is >> > also slightly nicer and less error-prone than a regex approach. >> > >> > >> > David >> > >> > From: opam-devel [mailto:[email protected]] On Behalf Of >> > Gabriel Scherer >> >> Sent: 23 November 2015 12:19 >> >> > To: Thomas Gazagnaire >> > Cc: opam-devel >> > Subject: Re: [opam-devel] [ANN] opam-fmt 1.0 >> > >> > One idea I just had when re-reading the scavenged comments is to just >> > consider comment lines as independent items. If you add ordering >> > information to the internal opam representation to keep the items in the >> > order, then I suspect that all the examples above would be reformatted in >> > a >> > way that is acceptable to the original author. One final bit of >> > sophistication would be to store (in the comment items) a boolean for >> > whether they started on their own line, or are to be inserted at the end >> > of >> > the previous line. >> >> Of course this fake structure is less true to the real >> >> > document structure than properly placed attribute nodes would be. It seems >> > easier to implement (and thus a decent compromise to make), but then maybe >> > not: keeping an ordering may be non-trivial for data whose current >> > internal >> > representation is not a set of things (for example dependency formulas are >> > actually full logic formulas; but then if we keep to Conjunctive Normal >> > Forms they can have a list-like representation). >> > On Mon, Nov 23, 2015 at 1:11 PM, Thomas Gazagnaire >> >> > <[email protected]<mailto:[email protected]>> wrote: >> Thanks >> >> > Gabriel, that's very useful! >> > >> > For the various parsers, it might make sense to simply fork (or >> > copy/paste) >> > bits of opam parsers from various versions of opam and try to fix these >> > parsers to: >> > >> > - keep comments >> > - support some kind of alignments? >> > >> > Thomas >> > >> > > On 22 Nov 2015, at 20:55, Gabriel Scherer >> > > >> > > <[email protected]<mailto:[email protected]>> wrote: >> > >> - Any chance you could use cmdliner instead of Arg? Fairly minor, but >> > >> >> > >> all the other plugins use it and it's nice to have the same behaviours >> > >> for OPAM plugins where possible. >> > > >> > > I'm fond of Daniel's design work, so I would gladly move to >> > > Cmdliner -- Arg was just what I could easily use "in anger" for >> > > a first try. >> > > >> > >> - How does this behave on pre-1.2 files? I think it's about time that >> > >> >> > >> we deprecate pre-1.2 so that we can have clean metadata standards >> > >> about the new fields such as dev-repo. >> > > >> > > opam-fmt updates older opam files to its own support version -- and >> > > refuses to work on newer files. Two things: >> > > >> > > >> > > >> > > I think it would be a better design to have a family of scripts >> > > opam-fmt-1.0, opam-fmt-1.1, opam-fmt-1.2 etc. and a "mother script" >> > > opam-fmt that calls the right reformatter according to the file >> > > version¹. However, that requires changes in the packaging of opam-lib, >> > > so that the package for distinct versions can be installed >> > > simultaneously (they would be separate packages >> > > opam-lib-VER, and ocamlfind packages as well). >> > > >> > > >> > > >> > > ¹: another option would be for opam-lib to also support pairs of parsing >> > > and printing functions for older format version specifically, but that >> > > is an invasive choice to make in a codebase. Right now there is a tiny >> > > bit of logic to know which fields are 1.0 or 1.1-specific, but this >> > > would be much more ambitious. >> > > >> > > >> > > >> > > There are various warnings implemented in opam-lib that could be emitted >> > > during the processing of files by opam-fmt -- they may be already >> > > available depending on the OPAMDEBUG variable or something, but an >> > > explicit support in the interface could be nice. When I tested >> > > reformatting opam-repository, I observed that a large part of its opam >> > > files raise such warnings (so indeed there seems to be a metadata >> > > problem in the repository today). >> > > >> > >> - Having a bot regularly go through and reformat the entire repository >> > >> >> > >> also seems quite legit. The alternative would be to reformat at the >> > >> merge point, but this would require a staging branch. >> > > >> > > I'm not sure what you call "merge point"; my idea was to put the burden >> > > of reformatting onto users submitting PRs against the >> > > repository. (Regular reformatting are a sensible idea, but they run in >> > > the problem of loss of information, whether distributed manual >> > > reformatting keeps humans closer in the loop) >> > > >> > >> - I would really like opam-fmt to be lossless, so preserving comments >> > >> >> > >> seems like an extension that we should implement. Ideally everyone >> > >> can just call it on their packages without thinking about it. >> > > >> > > I have mixed feelings about trying to be lossless. At the very least, >> > > one should recognize that setting this as a design goal would impose >> > > a significant burden on the developers of the parsing/printing functions >> > > in opam-lib. >> > > >> > > >> > > >> > > Some human choices (alignment of string fields for example) are rather >> > > difficult and fragile to recognize -- and they could complexify the >> > > codebase. Even for comments, right now you cannot tell to which >> > > configuration item an element is attached. There are several ways around >> > > this, which are interesting to consider but also involve a fair amount >> > > of work: >> > > >> > > >> > > >> > > - You could use ocamldoc-like placement rules: "always after the >> > > >> > > relevant field" (a first comment would be a file-global comment), or >> > > "either before or after, but an empty line between the comment and >> > > a non-relevant field"; this seems painful and not-that-easy to >> > > implement. >> > > >> > > - You could move to a docstring-like (or attribute-like) syntax where >> > > >> > > comments are explicitly attached to an AST node; from a language >> > > design point of view this would be my preference, but it may require >> > > a change in concrete syntax. >> > > >> > > - Finally, the choices you can make in this design space depend a lot on >> > > >> > > whether reformatting will be performed by humans or by bots. If your >> > > comments-attachment rules are obscure, humans have the opportunity to >> > > reformat, see that they got them wrong, and reiterate. Bots will just >> > > put stuff at the wrong place. >> > > >> > > I think that the people that maintain this corner of opam today are >> > > those that will pay the greater cost if "lossless" becomes a design >> > > goal, so it should be their choice to make. >> > > >> > > >> > > >> > > >> > > In the meantime, it would be interesting to have a look at how opam >> > > files in the repository actually use comments. With >> > > >> > > find packages -name 'opam' | xargs grep --color=always " #" >> > > >> > > I see 75 occurences of comments, 38 of which are just "TODO fixme". The >> > > 37 others seem rather interesting, below are a few representative >> > > examples: >> > > >> > > packages/arakoon/arakoon.1.8.6/opam: >> > >> "lwt" { = "2.4.8" } # 2.4.9 had an incompatible API change >> > > >> > > Having textual exaplanations for choice of bound is a reasonable >> > > use-case for attributes. >> > > >> > > packages/camlp4/camlp4.4.01/opam: >> > >> build: [] # dummy package >> > > >> > > This could be replaced by a dedicated note/comment field. >> > > >> > > packages/frama-c/frama-c.20150201/opam: >> > >> "lablgtk" { >= "2.18.2" } #for ocaml >= 4.02.1 >> > > >> > > I don't understand the semantics of this one. >> > > >> > > packages/git/git.1.6.0/opam: >> > >> depopts: [ >> > >> >> > >> # --enable-mirage >> > >> "mirage-types-lwt" >> > >> "mirage-http" >> > >> "mirage-flow" >> > >> "channel" >> > >> # --enable-unix >> > >> "cohttp" >> > >> "conduit" >> > >> "base-unix" >> > >> >> > >> ] >> > > >> > > This usage is very interesting, it seems to call for a hierarchy inside >> > > the "depopts" list (and "dependencies" as well, I suppose), with >> > > annotations on sub-groups of dependencies. >> > > >> > > packages/gsl/gsl.1.18.2/opam: >> > >> depends: [ >> > >> >> > >> "base-bigarray" >> > >> "camlp4" >> > >> "ocamlfind" {>= "1.3.1"} >> > >> # Included from _opam file >> > >> "conf-gsl" >> > >> >> > >> ] >> > > >> > > I don't know what this comment means. >> > > >> > > packages/lz4/lz4.1.0.0/opam: >> > >> depexts: [ >> > >> >> > >> [["debian"] ["liblz4-dev"]] >> > >> # [["ubuntu"] ["liblz4-dev"]] reenable when CI updates its Ubuntu >> > >> [["source"] ["https://.../install.sh"]] >> > >> >> > >> ] >> > > >> > > Again, this would require annotations. >> > > >> > > packages/ppx_deriving/ppx_deriving.0.3/opam: >> > >> build: [ >> > >> >> > >> # If there is no native dynlink, we can't use native builds >> > >> "ocaml" "pkg/build.ml<http://build.ml>" "native=true" >> > >> >> > >> "native-dynlink=true" >> > >> >> > >> ] >> > > >> > > packages/frama-c-e-acsl/frama-c-e-acsl.0.5/opam: >> > >> build: [ >> > >> >> > >> ["ocaml" >> > >> "run_autoconf_if_needed.ml<http://run_autoconf_if_needed.ml>"] >> > >> #when used in pinned mode the configure *can* not yet be generated >> > >> ["./configure" "--prefix" prefix] >> > >> [make] >> > >> >> > >> ] >> > > >> > > packages/clangml/clangml.0.5.2/opam: >> > >> depexts: [ >> > >> >> > >> [["debian"] ["libboost-dev" "llvm-3.4-dev" "clang-3.4" >> > >> "libclang-3.4-dev" "binutils-dev"]] >> >> [["ubuntu"] ["libboost-dev" >> >> > >> "llvm-3.4-dev" "clang-3.4" "libclang-3.4-dev" "binutils-dev"]] >> > >> [["gentoo"] ["dev-libs/boost" "sys-devel/llvm-3.4.1-r2" >> > >> "sys-devel/clang-3.4.0-r100" "sys-devel/binutils"]] # archlinux has no >> > >> package providing llvm and clang 3.4.1 >> > >> [["archlinux"] ["boost" "binutils"]] >> > >> >> > >> ] >> > > >> > > packages/mtime/mtime.0.8.1/opam: >> > >> depends: [ "ocamlfind" >> > >> >> > >> "js_of_ocaml" # FIXME should become a deptopt >> > >> >> > >> ] >> > > >> > > On Sun, Nov 22, 2015 at 8:09 PM, Anil Madhavapeddy >> > > >> > > <[email protected]<mailto:[email protected]>> wrote: >> > >> Thanks for this Gabriel! Quick notes: >> > >> >> > >> >> > >> >> > >> - I would really like opam-fmt to be lossless, so preserving comments >> > >> >> > >> seems like an extension that we should implement. Ideally everyone >> > >> can just call it on their packages without thinking about it. >> > >> >> > >> - Having a bot regularly go through and reformat the entire repository >> > >> >> > >> also seems quite legit. The alternative would be to reformat at the >> > >> merge point, but this would require a staging branch. >> > >> >> > >> - Any chance you could use cmdliner instead of Arg? Fairly minor, but >> > >> >> > >> all the other plugins use it and it's nice to have the same behaviours >> > >> for OPAM plugins where possible. >> > >> >> > >> - How does this behave on pre-1.2 files? I think it's about time that >> > >> >> > >> we deprecate pre-1.2 so that we can have clean metadata standards >> > >> about the new fields such as dev-repo. >> > >> >> > >> regards, >> > >> Anil >> > >> >> > >>> On 21 Nov 2015, at 21:53, Gabriel Scherer >> > >>> <[email protected]<mailto:[email protected]>> wrote: >> > >>> >> > >>> >> > >>> >> > >>> Hi opam-devel, >> > >>> >> > >>> >> > >>> >> > >>> As part of the discussion in >> > >>> >> > >>> >> > >>> >> > >>> bulk addition of 'ocamlbuild {build}' dependencies >> > >>> https://github.com/ocaml/opam-repository/pull/5140 >> > >>> >> > >>> >> > >>> >> > >>> it became apparent that performing bulk updates on opam-repository is >> > >>> made harder by the fact that the parse-print roundtrip does not >> > >>> preserve human-formatted opam files. For my proposed patch I tried to >> > >>> separate the reformatting of opam file (to follow the opam-lib printer >> > >>> convention) from the actual changes in two separate commits, but that >> > >>> means more work for script authors, and also creates patches that are >> > >>> harder to review. (If (re)formatting was controlled by the maintainer >> > >>> of the OPAM packages instead of authors of bulk updates, we would be >> > >>> more confident in its correctness.) >> > >>> >> > >>> >> > >>> >> > >>> In order to move that discussion forward (how to maintain opam >> > >>> metadata in a way that is easy for both human and scripts to work >> > >>> with?), I propose the opam-fmt script that can be found at >> > >>> https://github.com/gasche/opam-fmt/ >> > >>> >> > >>> >> > >>> >> > >>> I wrote it in the last few days and there are probably some rough >> > >>> edges. Once I feel that it should work, I may try to package it on the >> > >>> opam-repo (in the meantime, clone then pin). >> > >>> >> > >>> >> > >>> >> > >>> This suggests one possible way forward: publicize opam-fmt, encourage >> > >>> users to reformat their opam files using it, adapt the opam-repository >> > >>> QA to call `opam fmt --check` on opam files proposed in PR to enforce >> > >>> it, and after some time (once we trust it works as expected thanks to >> > >>> the human guinea pigs) reformat all older opam files to get a perfect >> > >>> round-trip on all files of the repository. >> > >>> It is not clear to me that this is the best way forward. (For example >> > >>> it means that, in the current state of the opam file parsing/printing >> > >>> code, comments in opam files would always be erased by reformatting. >> > >>> Should we remove comments from the valid syntax of opam files, or >> > >>> attach them to configuration lines to re-print them correctly later, >> > >>> or maybe refuse to work on files with comments?) Opam developers and >> > >>> repository maintainers may decide that the cost of caring about >> > >>> reformatting outweigh the benefits in terms of scriptability. >> > >>> >> > >>> >> > >>> >> > >>> What do you think? >> > >>> _______________________________________________ >> > >>> opam-devel mailing list >> > >>> [email protected]<mailto:[email protected]> >> > >>> http://lists.ocaml.org/listinfo/opam-devel >> > > >> > > _______________________________________________ >> > > opam-devel mailing list >> > > [email protected]<mailto:[email protected]> >> > > http://lists.ocaml.org/listinfo/opam-devel >> >> _______________________________________________ >> opam-devel mailing list >> [email protected] >> http://lists.ocaml.org/listinfo/opam-devel > _______________________________________________ > opam-devel mailing list > [email protected] > http://lists.ocaml.org/listinfo/opam-devel _______________________________________________ opam-devel mailing list [email protected] http://lists.ocaml.org/listinfo/opam-devel
