Re: [RFC] - Proper encoding for patch file?
On 08.09.2011 20:07, Mark Phippard wrote: This is a JavaHL issue. See the attached patch which resolves the problem I face. If I use the JavaHL diff API to produce a patch it fails if there are paths in the patch with UTF8 characters in the name. Here is an example of the Exception: Invalid argument svn: Can't convert string from 'UTF-8' to native encoding: svn: Index: ?\230?\181?\139?\232?\175?\149?\230?\150?\135?\228?\187?\182.txt === RA layer request failed svn: Error reading spooled REPORT request response The problem seems to be that JavaHL creates the output file for the patch with the encoding of SVN_APR_LOCALE_CHARSET. If I change this to utf-8 as shown in the patch then the method works. The command line client from the same system works fine. How do people feel about this? Does it make sense that JavaHL should create the patch file with UTF-8 encoding? I tend to think it does, but thought I would raise the question here. Unfortunately, on Linux (and other *ix), the filename encoding is just a convention. So there's no guarantee that the filename is in fact UTF-8, even if the locale says it should be. Therefore, just writing the file names to the patch file unchanged (in UTF-8) will not in fact do the right thing in exactly the kind of corner case that's triggering this error. The only marginally sane solution is to include complete Unicode normalization and transliteration libraries in Subversion ... and use them correctly. I expect that'd mean storing the actual transliterated filename in the WC datbase alongside the original UTF-8 value that came from the repository, because transliteration is in general not reversible. -- Brane P.S.: As an added bonus, that would allow us to transliterate characters that are invalid on some particular filesystem, if they happen to appear in names in the repository.
[RFC] - Proper encoding for patch file?
This is a JavaHL issue. See the attached patch which resolves the problem I face. If I use the JavaHL diff API to produce a patch it fails if there are paths in the patch with UTF8 characters in the name. Here is an example of the Exception: Invalid argument svn: Can't convert string from 'UTF-8' to native encoding: svn: Index: ?\230?\181?\139?\232?\175?\149?\230?\150?\135?\228?\187?\182.txt === RA layer request failed svn: Error reading spooled REPORT request response The problem seems to be that JavaHL creates the output file for the patch with the encoding of SVN_APR_LOCALE_CHARSET. If I change this to utf-8 as shown in the patch then the method works. The command line client from the same system works fine. How do people feel about this? Does it make sense that JavaHL should create the patch file with UTF-8 encoding? I tend to think it does, but thought I would raise the question here. -- Thanks Mark Phippard http://markphip.blogspot.com/ Index: subversion/bindings/javahl/native/SVNClient.cpp === --- subversion/bindings/javahl/native/SVNClient.cpp (revision 1166827) +++ subversion/bindings/javahl/native/SVNClient.cpp (working copy) @@ -987,7 +987,7 @@ showCopiesAsAdds, force, FALSE, - SVN_APR_LOCALE_CHARSET, + utf-8, outfile, NULL /* error file */, changelists.array(subPool), @@ -1019,7 +1019,7 @@ showCopiesAsAdds, force, FALSE, - SVN_APR_LOCALE_CHARSET, + utf-8, outfile, NULL /* error file */, changelists.array(subPool),
Re: [RFC] - Proper encoding for patch file?
I should point out this is on OSX. The results on Windows are more interesting: 1. Unlike OSX, on Windows the API completes without error. 2. However, the paths in the index are show ??? in place of UTF-8 3. But the content within the patch, shows up fine. So this seems like another data point in favor of just telling SVN to output as UTF-8 since it seems to only apply to the pathnames. Comments? On Thu, Sep 8, 2011 at 2:07 PM, Mark Phippard markp...@gmail.com wrote: This is a JavaHL issue. See the attached patch which resolves the problem I face. If I use the JavaHL diff API to produce a patch it fails if there are paths in the patch with UTF8 characters in the name. Here is an example of the Exception: Invalid argument svn: Can't convert string from 'UTF-8' to native encoding: svn: Index: ?\230?\181?\139?\232?\175?\149?\230?\150?\135?\228?\187?\182.txt === RA layer request failed svn: Error reading spooled REPORT request response The problem seems to be that JavaHL creates the output file for the patch with the encoding of SVN_APR_LOCALE_CHARSET. If I change this to utf-8 as shown in the patch then the method works. The command line client from the same system works fine. How do people feel about this? Does it make sense that JavaHL should create the patch file with UTF-8 encoding? I tend to think it does, but thought I would raise the question here. -- Thanks Mark Phippard http://markphip.blogspot.com/ -- Thanks Mark Phippard http://markphip.blogspot.com/
Re: [RFC] - Proper encoding for patch file?
On 09/08/2011 02:07 PM, Mark Phippard wrote: This is a JavaHL issue. See the attached patch which resolves the problem I face. If I use the JavaHL diff API to produce a patch it fails if there are paths in the patch with UTF8 characters in the name. Here is an example of the Exception: Invalid argument svn: Can't convert string from 'UTF-8' to native encoding: svn: Index: ?\230?\181?\139?\232?\175?\149?\230?\150?\135?\228?\187?\182.txt === RA layer request failed svn: Error reading spooled REPORT request response The problem seems to be that JavaHL creates the output file for the patch with the encoding of SVN_APR_LOCALE_CHARSET. If I change this to utf-8 as shown in the patch then the method works. The command line client from the same system works fine. How do people feel about this? Does it make sense that JavaHL should create the patch file with UTF-8 encoding? I tend to think it does, but thought I would raise the question here. Why does the command-line client work? Does it not also use the locale encoding for its diff headers? At any rate, consistency between the behaviors of the relevant Java and C APIs seems like a reasonable goal. -- C. Michael Pilato cmpil...@collab.net CollabNet www.collab.net Distributed Development On Demand signature.asc Description: OpenPGP digital signature
Re: [RFC] - Proper encoding for patch file?
On Thu, Sep 8, 2011 at 2:27 PM, C. Michael Pilato cmpil...@collab.net wrote: Why does the command-line client work? Does it not also use the locale encoding for its diff headers? At any rate, consistency between the behaviors of the relevant Java and C APIs seems like a reasonable goal. I have not tested exhaustively, but my OSX Terminal says UTF-8 is the default encoding. Maybe that is why I do not see it from command line? -- Thanks Mark Phippard http://markphip.blogspot.com/
Re: [RFC] - Proper encoding for patch file?
On Thu, Sep 8, 2011 at 2:30 PM, Mark Phippard markp...@gmail.com wrote: On Thu, Sep 8, 2011 at 2:27 PM, C. Michael Pilato cmpil...@collab.net wrote: Why does the command-line client work? Does it not also use the locale encoding for its diff headers? At any rate, consistency between the behaviors of the relevant Java and C APIs seems like a reasonable goal. I have not tested exhaustively, but my OSX Terminal says UTF-8 is the default encoding. Maybe that is why I do not see it from command line? Changed Terminal to use MacOS Roman as default encoding. Now I get this: $ svn diff subversion/svn/diff-cmd.c:373: (apr_err=22) subversion/libsvn_client/diff.c:1989: (apr_err=22) subversion/libsvn_client/diff.c:1667: (apr_err=22) subversion/libsvn_wc/diff_local.c:560: (apr_err=22) subversion/libsvn_wc/status.c:2364: (apr_err=22) subversion/libsvn_wc/status.c:1171: (apr_err=22) subversion/libsvn_wc/status.c:1157: (apr_err=22) subversion/libsvn_wc/diff_local.c:474: (apr_err=22) subversion/libsvn_wc/diff_local.c:474: (apr_err=22) subversion/libsvn_wc/diff_local.c:419: (apr_err=22) subversion/libsvn_client/diff.c:1098: (apr_err=22) subversion/libsvn_client/diff.c:1012: (apr_err=22) subversion/libsvn_subr/stream.c:248: (apr_err=22) subversion/libsvn_subr/utf.c:775: (apr_err=22) subversion/libsvn_subr/utf.c:580: (apr_err=22) svn: E22: Can't convert string from 'UTF-8' to native encoding: subversion/libsvn_subr/utf.c:578: (apr_err=22) svn: E22: Index: Design Documents/?\230?\181?\139?\232?\175?\149?\230?\150?\135?\228?\187?\182.txt -- Thanks Mark Phippard http://markphip.blogspot.com/
Re: [RFC] - Proper encoding for patch file?
On Thu, Sep 8, 2011 at 2:30 PM, Mark Phippard markp...@gmail.com wrote: On Thu, Sep 8, 2011 at 2:27 PM, C. Michael Pilato cmpil...@collab.net wrote: Why does the command-line client work? Does it not also use the locale encoding for its diff headers? At any rate, consistency between the behaviors of the relevant Java and C APIs seems like a reasonable goal. I have not tested exhaustively, but my OSX Terminal says UTF-8 is the default encoding. Maybe that is why I do not see it from command line? FWIW, even if I explicitly set LANG=en_US.UTF-8 before launching Java, and even if I change all of the JVM properties to make UTF-8 the default encoding for files for the JVM, I still get this error. So JavaHL does not seem to pickup the environment in the same ways as the command line. -- Thanks Mark Phippard http://markphip.blogspot.com/
Re: [RFC] - Proper encoding for patch file?
On Thu, Sep 8, 2011 at 1:49 PM, Mark Phippard markp...@gmail.com wrote: On Thu, Sep 8, 2011 at 2:30 PM, Mark Phippard markp...@gmail.com wrote: On Thu, Sep 8, 2011 at 2:27 PM, C. Michael Pilato cmpil...@collab.net wrote: Why does the command-line client work? Does it not also use the locale encoding for its diff headers? At any rate, consistency between the behaviors of the relevant Java and C APIs seems like a reasonable goal. I have not tested exhaustively, but my OSX Terminal says UTF-8 is the default encoding. Maybe that is why I do not see it from command line? FWIW, even if I explicitly set LANG=en_US.UTF-8 before launching Java, and even if I change all of the JVM properties to make UTF-8 the default encoding for files for the JVM, I still get this error. So JavaHL does not seem to pickup the environment in the same ways as the command line. FWIW, JavaHL is just using SVN_APR_LOCALE_CHARSET, which is a magic number inside of APR. I've no idea what it actually does. -Hyrum -- uberSVN: Apache Subversion Made Easy http://www.uberSVN.com/
Re: [RFC] - Proper encoding for patch file?
On Thu, Sep 08, 2011 at 02:07:03PM -0400, Mark Phippard wrote: This is a JavaHL issue. See the attached patch which resolves the problem I face. If I use the JavaHL diff API to produce a patch it fails if there are paths in the patch with UTF8 characters in the name. Here is an example of the Exception: Invalid argument svn: Can't convert string from 'UTF-8' to native encoding: svn: Index: ?\230?\181?\139?\232?\175?\149?\230?\150?\135?\228?\187?\182.txt === This might be related to the following TODO comment in libsvn_client/patch.c. In other words, this is a known limitation of the current implementation. [[[ static svn_error_t * grab_filename(const char **file_name, const char *line, apr_pool_t *result_pool, apr_pool_t *scratch_pool) { const char *utf8_path; const char *canon_path; /* Grab the filename and encode it in UTF-8. */ /* TODO: Allow specifying the patch file's encoding. * For now, we assume its encoding is native. */ /* ### This can fail if the filename cannot be represented in the current * ### locale's encoding. */ SVN_ERR(svn_utf_cstring_to_utf8(utf8_path, line, scratch_pool)); ]]]