Re: [RFC] - Proper encoding for patch file?

2011-10-02 Thread Branko Čibej
On 08.09.2011 20:07, Mark Phippard wrote:
 This is a JavaHL issue.  See the attached patch which resolves the
 problem I face.

 If I use the JavaHL diff API to produce a patch it fails if there are
 paths in the patch with UTF8 characters in the name.  Here is an
 example of the Exception:

 Invalid argument
 svn: Can't convert string from 'UTF-8' to native encoding:
 svn: Index: ?\230?\181?\139?\232?\175?\149?\230?\150?\135?\228?\187?\182.txt
 ===

 RA layer request failed
 svn: Error reading spooled REPORT request response


 The problem seems to be that JavaHL creates the output file for the
 patch with the encoding of SVN_APR_LOCALE_CHARSET.  If I change this
 to utf-8 as shown in the patch then the method works.

 The command line client from the same system works fine.

 How do people feel about this?  Does it make sense that JavaHL should
 create the patch file with UTF-8 encoding?  I tend to think it does,
 but thought I would raise the question here.


Unfortunately, on Linux (and other *ix), the filename encoding is just a
convention. So there's no guarantee that the filename is in fact UTF-8,
even if the locale says it should be. Therefore, just writing the file
names to the patch file unchanged (in UTF-8) will not in fact do the
right thing in exactly the kind of corner case that's triggering this error.

The only marginally sane solution is to include complete Unicode
normalization and transliteration libraries in Subversion ... and use
them correctly. I expect that'd mean storing the actual transliterated
filename in the WC datbase alongside the original UTF-8 value that came
from the repository, because transliteration is in general not reversible.

-- Brane

P.S.: As an added bonus, that would allow us to transliterate
characters that are invalid on some particular filesystem, if they
happen to appear in names in the repository.


[RFC] - Proper encoding for patch file?

2011-09-08 Thread Mark Phippard
This is a JavaHL issue.  See the attached patch which resolves the
problem I face.

If I use the JavaHL diff API to produce a patch it fails if there are
paths in the patch with UTF8 characters in the name.  Here is an
example of the Exception:

Invalid argument
svn: Can't convert string from 'UTF-8' to native encoding:
svn: Index: ?\230?\181?\139?\232?\175?\149?\230?\150?\135?\228?\187?\182.txt
===

RA layer request failed
svn: Error reading spooled REPORT request response


The problem seems to be that JavaHL creates the output file for the
patch with the encoding of SVN_APR_LOCALE_CHARSET.  If I change this
to utf-8 as shown in the patch then the method works.

The command line client from the same system works fine.

How do people feel about this?  Does it make sense that JavaHL should
create the patch file with UTF-8 encoding?  I tend to think it does,
but thought I would raise the question here.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/
Index: subversion/bindings/javahl/native/SVNClient.cpp
===
--- subversion/bindings/javahl/native/SVNClient.cpp (revision 1166827)
+++ subversion/bindings/javahl/native/SVNClient.cpp (working copy)
@@ -987,7 +987,7 @@
showCopiesAsAdds,
force,
FALSE,
-   SVN_APR_LOCALE_CHARSET,
+   utf-8,
outfile,
NULL /* error file */,
changelists.array(subPool),
@@ -1019,7 +1019,7 @@
showCopiesAsAdds,
force,
FALSE,
-   SVN_APR_LOCALE_CHARSET,
+   utf-8,
outfile,
NULL /* error file */,
changelists.array(subPool),


Re: [RFC] - Proper encoding for patch file?

2011-09-08 Thread Mark Phippard
I should point out this is on OSX.  The results on Windows are more interesting:

1. Unlike OSX, on Windows the API completes without error.

2. However, the paths in the index are show ??? in place of UTF-8

3.  But the content within the patch, shows up fine.

So this seems like another data point in favor of just telling SVN to
output as UTF-8 since it seems to only apply to the pathnames.

Comments?



On Thu, Sep 8, 2011 at 2:07 PM, Mark Phippard markp...@gmail.com wrote:
 This is a JavaHL issue.  See the attached patch which resolves the
 problem I face.

 If I use the JavaHL diff API to produce a patch it fails if there are
 paths in the patch with UTF8 characters in the name.  Here is an
 example of the Exception:

    Invalid argument
 svn: Can't convert string from 'UTF-8' to native encoding:
 svn: Index: ?\230?\181?\139?\232?\175?\149?\230?\150?\135?\228?\187?\182.txt
 ===

 RA layer request failed
 svn: Error reading spooled REPORT request response


 The problem seems to be that JavaHL creates the output file for the
 patch with the encoding of SVN_APR_LOCALE_CHARSET.  If I change this
 to utf-8 as shown in the patch then the method works.

 The command line client from the same system works fine.

 How do people feel about this?  Does it make sense that JavaHL should
 create the patch file with UTF-8 encoding?  I tend to think it does,
 but thought I would raise the question here.

 --
 Thanks

 Mark Phippard
 http://markphip.blogspot.com/




-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/


Re: [RFC] - Proper encoding for patch file?

2011-09-08 Thread C. Michael Pilato
On 09/08/2011 02:07 PM, Mark Phippard wrote:
 This is a JavaHL issue.  See the attached patch which resolves the
 problem I face.
 
 If I use the JavaHL diff API to produce a patch it fails if there are
 paths in the patch with UTF8 characters in the name.  Here is an
 example of the Exception:
 
 Invalid argument
 svn: Can't convert string from 'UTF-8' to native encoding:
 svn: Index: ?\230?\181?\139?\232?\175?\149?\230?\150?\135?\228?\187?\182.txt
 ===
 
 RA layer request failed
 svn: Error reading spooled REPORT request response
 
 
 The problem seems to be that JavaHL creates the output file for the
 patch with the encoding of SVN_APR_LOCALE_CHARSET.  If I change this
 to utf-8 as shown in the patch then the method works.
 
 The command line client from the same system works fine.
 
 How do people feel about this?  Does it make sense that JavaHL should
 create the patch file with UTF-8 encoding?  I tend to think it does,
 but thought I would raise the question here.

Why does the command-line client work?  Does it not also use the locale
encoding for its diff headers?  At any rate, consistency between the
behaviors of the relevant Java and C APIs seems like a reasonable goal.

-- 
C. Michael Pilato cmpil...@collab.net
CollabNet  www.collab.net  Distributed Development On Demand



signature.asc
Description: OpenPGP digital signature


Re: [RFC] - Proper encoding for patch file?

2011-09-08 Thread Mark Phippard
On Thu, Sep 8, 2011 at 2:27 PM, C. Michael Pilato cmpil...@collab.net wrote:
 Why does the command-line client work?  Does it not also use the locale
 encoding for its diff headers?  At any rate, consistency between the
 behaviors of the relevant Java and C APIs seems like a reasonable goal.

I have not tested exhaustively, but my OSX Terminal says UTF-8 is the
default encoding.  Maybe that is why I do not see it from command
line?

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/


Re: [RFC] - Proper encoding for patch file?

2011-09-08 Thread Mark Phippard
On Thu, Sep 8, 2011 at 2:30 PM, Mark Phippard markp...@gmail.com wrote:
 On Thu, Sep 8, 2011 at 2:27 PM, C. Michael Pilato cmpil...@collab.net wrote:
 Why does the command-line client work?  Does it not also use the locale
 encoding for its diff headers?  At any rate, consistency between the
 behaviors of the relevant Java and C APIs seems like a reasonable goal.

 I have not tested exhaustively, but my OSX Terminal says UTF-8 is the
 default encoding.  Maybe that is why I do not see it from command
 line?

Changed Terminal to use MacOS Roman as default encoding.  Now I get this:

$ svn diff
subversion/svn/diff-cmd.c:373: (apr_err=22)
subversion/libsvn_client/diff.c:1989: (apr_err=22)
subversion/libsvn_client/diff.c:1667: (apr_err=22)
subversion/libsvn_wc/diff_local.c:560: (apr_err=22)
subversion/libsvn_wc/status.c:2364: (apr_err=22)
subversion/libsvn_wc/status.c:1171: (apr_err=22)
subversion/libsvn_wc/status.c:1157: (apr_err=22)
subversion/libsvn_wc/diff_local.c:474: (apr_err=22)
subversion/libsvn_wc/diff_local.c:474: (apr_err=22)
subversion/libsvn_wc/diff_local.c:419: (apr_err=22)
subversion/libsvn_client/diff.c:1098: (apr_err=22)
subversion/libsvn_client/diff.c:1012: (apr_err=22)
subversion/libsvn_subr/stream.c:248: (apr_err=22)
subversion/libsvn_subr/utf.c:775: (apr_err=22)
subversion/libsvn_subr/utf.c:580: (apr_err=22)
svn: E22: Can't convert string from 'UTF-8' to native encoding:
subversion/libsvn_subr/utf.c:578: (apr_err=22)
svn: E22: Index: Design
Documents/?\230?\181?\139?\232?\175?\149?\230?\150?\135?\228?\187?\182.txt



-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/


Re: [RFC] - Proper encoding for patch file?

2011-09-08 Thread Mark Phippard
On Thu, Sep 8, 2011 at 2:30 PM, Mark Phippard markp...@gmail.com wrote:
 On Thu, Sep 8, 2011 at 2:27 PM, C. Michael Pilato cmpil...@collab.net wrote:
 Why does the command-line client work?  Does it not also use the locale
 encoding for its diff headers?  At any rate, consistency between the
 behaviors of the relevant Java and C APIs seems like a reasonable goal.

 I have not tested exhaustively, but my OSX Terminal says UTF-8 is the
 default encoding.  Maybe that is why I do not see it from command
 line?

FWIW, even if I explicitly set LANG=en_US.UTF-8 before launching Java,
and even if I change all of the JVM properties to make UTF-8 the
default encoding for files for the JVM, I still get this error.  So
JavaHL does not seem to pickup the environment in the same ways as the
command line.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/


Re: [RFC] - Proper encoding for patch file?

2011-09-08 Thread Hyrum K Wright
On Thu, Sep 8, 2011 at 1:49 PM, Mark Phippard markp...@gmail.com wrote:
 On Thu, Sep 8, 2011 at 2:30 PM, Mark Phippard markp...@gmail.com wrote:
 On Thu, Sep 8, 2011 at 2:27 PM, C. Michael Pilato cmpil...@collab.net 
 wrote:
 Why does the command-line client work?  Does it not also use the locale
 encoding for its diff headers?  At any rate, consistency between the
 behaviors of the relevant Java and C APIs seems like a reasonable goal.

 I have not tested exhaustively, but my OSX Terminal says UTF-8 is the
 default encoding.  Maybe that is why I do not see it from command
 line?

 FWIW, even if I explicitly set LANG=en_US.UTF-8 before launching Java,
 and even if I change all of the JVM properties to make UTF-8 the
 default encoding for files for the JVM, I still get this error.  So
 JavaHL does not seem to pickup the environment in the same ways as the
 command line.

FWIW, JavaHL is just using SVN_APR_LOCALE_CHARSET, which is a magic
number inside of APR.  I've no idea what it actually does.

-Hyrum


-- 

uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com/


Re: [RFC] - Proper encoding for patch file?

2011-09-08 Thread Stefan Sperling
On Thu, Sep 08, 2011 at 02:07:03PM -0400, Mark Phippard wrote:
 This is a JavaHL issue.  See the attached patch which resolves the
 problem I face.
 
 If I use the JavaHL diff API to produce a patch it fails if there are
 paths in the patch with UTF8 characters in the name.  Here is an
 example of the Exception:
 
 Invalid argument
 svn: Can't convert string from 'UTF-8' to native encoding:
 svn: Index: ?\230?\181?\139?\232?\175?\149?\230?\150?\135?\228?\187?\182.txt
 ===

This might be related to the following TODO comment in libsvn_client/patch.c.
In other words, this is a known limitation of the current implementation.

[[[
static svn_error_t *
grab_filename(const char **file_name, const char *line, apr_pool_t *result_pool,
  apr_pool_t *scratch_pool)
{
  const char *utf8_path;
  const char *canon_path;

  /* Grab the filename and encode it in UTF-8. */
  /* TODO: Allow specifying the patch file's encoding.
   *   For now, we assume its encoding is native. */
  /* ### This can fail if the filename cannot be represented in the current
   * ### locale's encoding. */
  SVN_ERR(svn_utf_cstring_to_utf8(utf8_path,
  line,
  scratch_pool));

]]]