Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook
Joel Crisp wrote: I'm not against programming, just against making everyone do it. If you can provide a framework which allows a registry of common file types against the way of handling them and a library of shipped code fragments which can be incorporated without the end user having to do any coding, then that would be fine. Maybe something like: monotone types filetype --match=\*.xml --type=text/xml--- Setup initial default mappings I don't like much having a specialized monotone command just for that. Besides, this way you can't control the order of matching. imho, a configuration file is a better solution. or monotone types filetype --file=foo.xml --type=x-rational-xmi --- Change the type of the file As I see it, there are three distinct issues to handle: - mapping file extensions (and/or content) to mime-types - mapping mime-types to merge/diff tools - assigning mime-types to files, and handling them in monotone. The first two tasks can be accomplished by using a bit of lua glue to read mappings from configuration files. These files could be in pure tabular form or, better, use the syntax proposed by graydon, i.e. something like file_mapping(.xml, text/xml) and/or content_mapping(offset, bytestring , mime-type) The same goes for merging: merge_tool(mime-type, difftool, mergetool, automerge_allowed) While the user sees only a collection of mapping directives, these lines effectively translate to lua functions calls, making customization both powerful and easy. Storing mime-types in monotone should be done with file attributes, but currently this is a bit tricky, because you need a way to resolve conflicting mime-types *before* merging. This could be accomplished by merging .mt-attrs before other files, but introducing ordering into merges could be dangerous. Anyway, there are ongoing developments that should make these things easier. In the meantime, we could _partially_ resolve the issue by using only the mapping tables at merge/diff time, without explicitly assigning a mime-type to files. Per-file mapping is still possible by using the full filename instead of the extension file_mapping(model.xml, application/xmi) Cheers, Riccardo ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook
Hi Riccardo This sounds much better. The criteria which I'm concerned about are: 1) ease of use - end users should not have to (knowingly) use LUA to configure 'pre-defined' file types 2) flexibility - the type of each file should be able to be set independently and new file types defined (may use LUA for this) 3) power - all file operations should be customizable 4) reliability - it should work reliably and consistently Your proposal sounds like it would address all of these. BTW, you didn't copy monotone-devel - feel free to forward this mail if that is what you intended Joel ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook
Glen Ditchfield wrote: Why can't there be one function that examines the files and decides to run the internal merge algorithm on some kinds of files, and to exec external tools on other kinds of files? Sorry if I'm stating the obvious, but perhaps not everyone is aware that monotone embeds a complete lua interpreter, and you aren't limited to just reimplement the predefined hooks in your monotonerc files. You can also add other functions, tables, etc. For example, you could create a single function to categorize your files, and use it both on the add-time and merge hooks. Something like that: function choose_merge(filename) filedata=read_contents_of_file(name) if filedata ~= nil then if is_word(filedata) return msword else * other categorizing code * end end return nil -- filetype unknown end attr_init_functions[manual_merge] = function(filename) if choose_merge(filename) ~= nil then return true -- files with associate tool merge manually else return nil end end function merge3(anc_path, left_path, right_path, merged_path, ancestor, left, right) * common code to setup files (see std_hooks.lua) * ftype = choose_merge(filename) if ftype==msword then * call word * else * other tools * end ... end ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook
I'm not against programming, just against making everyone do it. If you can provide a framework which allows a registry of common file types against the way of handling them and a library of shipped code fragments which can be incorporated without the end user having to do any coding, then that would be fine. Maybe something like: monotone types filetype --match=\*.xml --type=text/xml--- Setup initial default mappings or monotone types filetype --file=foo.xml --type=x-rational-xmi --- Change the type of the file Then have an object interface like (pseudocode!): type_handlers { string[] getSupportedTypePatterns() void merge3(...args...) boolean isBinary() void copyIn(...stream..., ...database..., ..other args) void copyOut(..destination.., ..database..., ..other args) etc. } A library of type handlers which implemented this type of interface could then be selected at run-time simply by looking up the file type associated with file, then looking up the handler for that type. Note that there is no reason why these should not be lua if they are shipped as a standard library. Whilst I take you point about user preference in merge tools, for many of the 'exotic' types there will be a much more limited set of merge tools and suppling type variants which are specific to each tool should be feasible. Thoughts? Joel rghetta wrote: On Wed, 2005-06-01 at 20:07 +0100, Joel Crisp wrote: I just don't think that it is fair to expect everyone to program what should be standard functionality in hooks. Hooks should be there for functionality which is non-standard, e.g. integration with my software process rather than yours...mailing when checkins are done, or enforcing lifecycle constraints. Choosing how to handle common file types hardly fits into that category, and I think the average user would prefer that to be supported via a less obscure mechanism. To give you some comparason: in a recent government job I worked in we weren't allowed to use triggers _at all_ (in clearcase, which uses perl) on the grounds that no-one else would be able to maintain themlet alone in a language with the limited uptake of lua (note, I personally think it is ok as a language, but the perception in the industry as a whole is that it is a game programmers language not a 'commercial' one) Could you provide some example of an acceptable syntax ? How you like to specify merging behaviour, how to identify a filetype, etc. I really don't see how implement what you want without resorting to some hook programming, unless we add a built-in filetype identifier. And even with something like that you need to handle the uncommon filetype ... Riccardo ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook
On Sun, 2005-05-29 at 12:20 +0100, Joel Crisp wrote: Hi My concern about this approach is that if you have lots of different types of files to handle, XML, Word, Rational XMI (which is XML but has a specific merge tool), etc then you would end up having to do lots of jiggering in the merge hooks. Also, the order in which you tried to identify the files at the point of merge would become significant, for example the case of the XMI file actually being an XML file means that you would have to ensure that you checked for XMI before XMI. This could get very complex. But you still need to *automatically* categorize files in first place, and to do that you still need to look for XMI before XML (but if your XMI files have consistently a .xmi extension, looking at that is order independent ;-) ). Riccardo ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook
On Fri, 2005-05-27 at 20:13 -0700, Nathaniel Smith wrote: On Fri, May 27, 2005 at 09:44:23PM +0200, rghetta wrote: Ok, I'll try to summarize the requests (and possible answers) so far: Both Nathaniel Smith and Emile Snyder advocated the use of .mt-attrs, perhaps coupled with the attr_init hook to automagically mark the files at add time. Howewer, the attr_init hooks receive only the filename, while the hook needs also the file content to guess the file type if the name doesn't matches. But, the file is sitting right there on the filesystem, and the hook can run arbitrary code. For instance, it could peek at the file to see whether it looks like it's binary. I'm a bit worried about efficency, here. Add already reads the file ? If yes, then monotone will read the file twice, and this could have a noticeable impact on add performance. Attributes seems also just not available at merge time. Both of these issues need to be resolved before using attributes to decide on merging. Is a rewrite of the attribute system needed ? What would this rewrite do? (It's entirely possible we do need one, the .mt-attr concept doesn't seem fully developed yet to me, but I don't see what you're getting at here.) At merge time we know the file names, and we know what revision they come from; in principle there is no reason we can't grab the .mt-attrs file from that revision and see what it says. I was wrong. Attributes *are* available at merge time. Looking better at the merging code, I found we already do that to get the file encoding. It's parsing the attribute file(s) everytime, and this could have an impact on merging performance for large trees, but handling the binary attribute is trivial. Riccardo ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook
From the feedback to this patch, it appears that in naming the hook binary_file() I made big a mistake. Since the hook only effect is to disable the internal merging algorithm of monotone, perhaps a better name would be manual_merge, and that could also be used for the .mt-attr property. Riccardo ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook
On Sat, 2005-05-28 at 09:44 -0500, Glen Ditchfield wrote: I worry that, when monotone checks for control characters, it is not always good enough, and too late for a hook to fix things. I would like to have a hook that sees that the first six bytes of the file are \320\317\021\340\241\261 and concludes this is an MS Word file, instead of a hook that checks the file suffix for eight different case-sensitive variations of .doc and still guesses wrong sometimes. This is related to Joel Crisp's point in an earlier posting. What is the root problem? Does monotone just have to spot the binary files, or does it have to get a more exact idea of each file's type so that it can invoke a type-specific merge function? (This is an MS Word file, so merge the revisions with Word.) The binary file flag just disables monotone internal merging, thus invoking everytime the lua hooks merge2()/merge3(). Like the binary_file() hook, you can override them in a monotonerc file. These hooks get both name and full file content of all to-be-merged files. If you want to choose the merge program based on file content, you do it there. In short, the step to use MS Word to handle .doc files are: 1. redefine the binary_file() hook to mark .doc files as binary (btw, current hook comparisons *aren't* case sensitive. The hook takes the filename, converts it to lowercase, and matches on the converted name). If you're worried that still can miss some ill-named word files, make binary the default and match only on known text files. 2. redefine the merge2()/merge3() to invoke word when the first bytes of content match. Note: if we implement the add-time hook, you will have also access to file content at step 1. Riccardo ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook
On Sun, May 29, 2005 at 09:29:45AM +0200, rghetta wrote: On Fri, 2005-05-27 at 20:13 -0700, Nathaniel Smith wrote: But, the file is sitting right there on the filesystem, and the hook can run arbitrary code. For instance, it could peek at the file to see whether it looks like it's binary. I'm a bit worried about efficency, here. Add already reads the file ? If yes, then monotone will read the file twice, and this could have a noticeable impact on add performance. No, add doesn't read the file, so there's no duplicated work. -- Nathaniel -- ...these, like all words, have single, decontextualized meanings: everyone knows what each of these words means, everyone knows what constitutes an instance of each of their referents. Language is fixed. Meaning is certain. Santa Claus comes down the chimney at midnight on December 24. -- The Language War, Robin Lakoff ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook
Hi My concern about this approach is that if you have lots of different types of files to handle, XML, Word, Rational XMI (which is XML but has a specific merge tool), etc then you would end up having to do lots of jiggering in the merge hooks. Also, the order in which you tried to identify the files at the point of merge would become significant, for example the case of the XMI file actually being an XML file means that you would have to ensure that you checked for XMI before XMI. This could get very complex. I don't see that as particularly clean for a wide uptake system. Would it be possible to provide a default lua hook for merge which used a lookup table to map the file type to the correct merge facility and an easy way of setting up that merge? Or replace the lua hook with one which takes an object implementing merge2,merge3 and any other relevent functions for the particular file type? Joel rghetta wrote: On Sat, 2005-05-28 at 09:44 -0500, Glen Ditchfield wrote: I worry that, when monotone checks for control characters, it is not always good enough, and too late for a hook to fix things. I would like to have a hook that sees that the first six bytes of the file are \320\317\021\340\241\261 and concludes this is an MS Word file, instead of a hook that checks the file suffix for eight different case-sensitive variations of .doc and still guesses wrong sometimes. This is related to Joel Crisp's point in an earlier posting. What is the root problem? Does monotone just have to spot the binary files, or does it have to get a more exact idea of each file's type so that it can invoke a type-specific merge function? (This is an MS Word file, so merge the revisions with Word.) The binary file flag just disables monotone internal merging, thus invoking everytime the lua hooks merge2()/merge3(). Like the binary_file() hook, you can override them in a monotonerc file. These hooks get both name and full file content of all to-be-merged files. If you want to choose the merge program based on file content, you do it there. In short, the step to use MS Word to handle .doc files are: 1. redefine the binary_file() hook to mark .doc files as binary (btw, current hook comparisons *aren't* case sensitive. The hook takes the filename, converts it to lowercase, and matches on the converted name). If you're worried that still can miss some ill-named word files, make binary the default and match only on known text files. 2. redefine the merge2()/merge3() to invoke word when the first bytes of content match. Note: if we implement the add-time hook, you will have also access to file content at step 1. Riccardo ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook
Glen Ditchfield wrote: You base the text/binary decision on the name of the file. How hard would it be to base it on the contents of the file instead, the way the Unix 'file' command does? On Friday 27 May 2005 14:44, rghetta replied: The hook uses only the filespec, true, but if it returns nil, monotone will check the file content for ASCII NULs and some other control chars. I worry that, when monotone checks for control characters, it is not always good enough, and too late for a hook to fix things. I would like to have a hook that sees that the first six bytes of the file are \320\317\021\340\241\261 and concludes this is an MS Word file, instead of a hook that checks the file suffix for eight different case-sensitive variations of .doc and still guesses wrong sometimes. This is related to Joel Crisp's point in an earlier posting. What is the root problem? Does monotone just have to spot the binary files, or does it have to get a more exact idea of each file's type so that it can invoke a type-specific merge function? (This is an MS Word file, so merge the revisions with Word.) On Friday 27 May 2005 14:44, rghetta wrote: Unless adding .mt-attrs support is more or less trivial, my proposal is to merge the current patch to resolve the merging bug. This may be one of those good enough solutions -- the kind where nobody ever gets around to coding up the right thing (in a way that would be backwards-compatible with the established good enough thing), and future generations just live with a small, nagging annoyance ... ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook
Ok, I'll try to summarize the requests (and possible answers) so far: Both Nathaniel Smith and Emile Snyder advocated the use of .mt-attrs, perhaps coupled with the attr_init hook to automagically mark the files at add time. Howewer, the attr_init hooks receive only the filename, while the hook needs also the file content to guess the file type if the name doesn't matches. IMHO, the guessing part is a necessity, requiring the user to manually specify every binary file seem too harsh to me, especially because every project has its share of non mergeable files, even monotone ;-) The hook is here just to handle the corner cases (like file specific merging tools), but I think monotone should make the right choice automatically. Attributes seems also just not available at merge time. Both of these issues need to be resolved before using attributes to decide on merging. Is a rewrite of the attribute system needed ? Joel Crisp wrote: I'd prefer a 'file type' attribute rather than a binary file attribute - there are many types of files which may require specialist merging (e.g XML) or storage (e.g. super big video files which are stored external to the scm. The binary_file hook is used only to inhibit algorithmic merging (perhaps a better name for the hook would be disable_auto_merge). Essentially, a binary file is treated as a text file with a conflict, i.e. monotone will invoke the merge2 or merge3() lua hooks. The merge2/merge3 hooks receive both filenames and full file content, thus by redefining them you can use choose a specialized merge tool based on the file type (I made the example of gimp for merging images). AFAIK, monotone doesn't directly support storing files externally, but you can simulate it by storing only a pointer file and redefining the mentioned hooks. Glen Ditchfield wrote: You base the text/binary decision on the name of the file. How hard would it be to base it on the contents of the file instead, the way the Unix 'file' command does? The hook uses only the filespec, true, but if it returns nil, monotone will check the file content for ASCII NULs and some other control chars. Unless adding .mt-attrs support is more or less trivial, my proposal is to merge the current patch to resolve the merging bug. Then perhaps we could rethink a bit the attributes to have them available everywhere, not just when dealing with the working copy or the manifest. After that, adding a binary or disable_auto_merge attribute should be easy. What do you (collectively :-) think ? Riccardo ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook
I like the idea of an .mt-attrs approach because the binary'ness of a file is a property of the file, not something that different people should have different ideas about (a'la hooks). I don't have particularly strong feelings about the right way to help monotone automatically figure it out for you, but I do feel strongly that there should be some way to explicitly tell monotone to treat a file as binary and have it do the right thing from then on. thanks, -emile On Wed, 2005-05-25 at 21:09 -0700, Nathaniel Smith wrote: On Wed, May 25, 2005 at 12:33:04AM +0200, rghetta wrote: If the hook returns nil, the file will be treated as binary if the monotone function guess_binary() returns true, i.e. if the files contains NUL bytes or a selection of other ASCII control chars (for example, STX and ETX). Another possible way to do binary support, for discussion: -- have the merger peek at .mt-attrs, and if a binary attribute is set on a file, consider it binary. (Currently nothing in .mt-attrs has hard-coded behavior, so this would be a change.) -- use the cool new attr_init hooks to automatically guess at add time whether each file is binary. -- never again automatically touch this attribute; let people set it to what they want, if they want Another possible way to do binary support, for discussion: -- just use guess_binary() on the data at merge time I don't tend to store binary files under VCS, so I don't have as much of an intuition about what the nicest way to do so would be; it'd be good to hear opinions from those actually affected by this :-) -- Nathaniel signature.asc Description: This is a digitally signed message part ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook
On Tuesday 24 May 2005 17:33, rghetta wrote: function binary_file(name) lowname=string.lower(name) -- some known binaries, return true if (string.find(lowname, %.gif$)) then return true end You base the text/binary decision on the name of the file. How hard would it be to base it on the contents of the file instead, the way the Unix 'file' command does? ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] [PATCH] and RFC: binary files merging and hook
On Wed, May 25, 2005 at 12:33:04AM +0200, rghetta wrote: If the hook returns nil, the file will be treated as binary if the monotone function guess_binary() returns true, i.e. if the files contains NUL bytes or a selection of other ASCII control chars (for example, STX and ETX). Another possible way to do binary support, for discussion: -- have the merger peek at .mt-attrs, and if a binary attribute is set on a file, consider it binary. (Currently nothing in .mt-attrs has hard-coded behavior, so this would be a change.) -- use the cool new attr_init hooks to automatically guess at add time whether each file is binary. -- never again automatically touch this attribute; let people set it to what they want, if they want Another possible way to do binary support, for discussion: -- just use guess_binary() on the data at merge time I don't tend to store binary files under VCS, so I don't have as much of an intuition about what the nicest way to do so would be; it'd be good to hear opinions from those actually affected by this :-) -- Nathaniel -- /* Tell the world that we're going to be the grim * reaper of innocent orphaned children. */ -- Linux kernel 2.4.5, main.c This email may be read aloud. ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel