On 11/19/13 11:39 PM, Stefan Sperling wrote: > Hmm... I don't see a need to change anything. > > libmagic is used as a last resort. Subversion already offers several > ways of configuring the svn:mime-type property, all of which supersede > libmagic (svn propset/propedit, auto-props, svn:auto-props). > So unless there is a really good reason why any of these cannot be used, > I think we should tell our users to configure their clients appropriately.
There is a huge difference between the stance of telling users to not set svn:mime-type/auto-props to application/xml for XML files they want treated as text and telling them to remove the automatically added svn:mime-type that libmagic is setting. libmagic is causing us to set application/xml on any XML file regardless of file extension. In order for a user to configure around this by setting auto-props they'll have to anticipate every file extension that happens to be detected as application/xml. That's rather burdensome. The alternative is they build without libmagic. Which is burdensome for users that are using binaries. The alternative is to suggest that we allow you to disable libmagic even if it was built with it (which might not be a bad idea even with the changes I propose). But I think that's really just hiding the problem. > Additionally, users can use their own MAGIC files to control what > mime-types libmagic gives to Subversion. We don't need to replicate > this functionality in Subversion itself. We're not duplicating functionality. libmagic doesn't tell you if a file is text it tells you the format. In the case of XML if the file is text or binary is ambiguous. However, I suspect most XML files that people are putting into SVN are text (though I'm sure people will find all manner of exceptions). Really though the terms text and binary here are misleading. The issue isn't so much if the content is text or binary but if the content is diffable/mergeable which is what we use the output of svn_mime_type_is_binary() to decide. Even if libmagic scanned the entire file (which it doesn't) and looked at the character set it couldn't tell you if a file is diff/mergeable. The file could be base64 encoded data, which will be no more mergeable than unencoded data. > libmagic uses a heuristic, so it can get things wrong. But if we start > filtering libmagic output for the purpose of treating XML documents > as text, we could eventually end up with a huge list of hard-coded > exceptions for all sorts of things. Where do we stop? How can we make > sure that everyone will be happy with a list we make up? > The whole point of relying on libmagic is to avoid such a list. We don't need to make everyone happy with the list we're providing. If we make it configurable and we make it available as a server directed configuration then everyone can make choices that are best for their project. That said the bug that Johan linked to (which I'd long since forgotten about) has a link to an email that I wrote in 2004 (that I also forgot about) that makes a compelling argument as to why application/xml shouldn't default to being treated as text. So I retract that portion of my proposal for now (if and when we grow checkpoints and can undo updates/merges that lose local changes then we can adjust the default). For reference that email is here: http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=374218