Re: Possible Bug (clarification) in Cygwin 1.7.5 -- findfirstfile (and findnextfile) yeild bad cfilename when file names have special characters. Works in cygwin 1.5, fails in 1.7

2011-11-10 Thread Corinna Vinschen
On Nov  9 22:18, Leon Vanderploeg wrote:
 Many thanks to Charles and Corinna for the help.  I have modified the
 code to use the POSIX functions.  I still have one problem I cannot
 seem to conquer.  
 
 I need to be able to read and write the (yes, I know it's evil)
 archive bit.  Unless there is a POSIX function (which I seriously
 doubt) for these items, I am locked into the windows APIs.
 
 I have read and re-read the Cygwin documentation on
 internationalization at least 6 times and I cannot figure out what I
 need to do to get this to work.  I have tried numerous combinations of
 environment variables and locale settings in the code, but none of
 them work.  The windows API fails to find the file specified.  I just
 want US English that can handle the extended character set to the
 windows APIs.  In this case, let's use the example of the copyright
 symbol (the small c with a circle around it).  What needs to be set in
 the environment, and what needs to be set in the C code to handle
 these characters correctly?

Nothing.  Just use always the UNICODE API, rather than the ANSI API:

  #include sys/cygwin.h

  DWORD
  my_GetFileAttributes (const char *cygwin_multibyte_filename)
  {
DWORD attr = INVALID_FILE_ATTRIBUTES;
PWCHAR w32_filename = cygwin_create_path (CCP_POSIX_TO_WIN_W,
  cygwin_multibyte_filename);
if (w32_filename)
  {
attr = GetFileAttributes (w32_filename);
free (w32_filename);
  }
return attr;
  }


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Possible Bug (clarification) in Cygwin 1.7.5 -- findfirstfile (and findnextfile) yeild bad cfilename when file names have special characters. Works in cygwin 1.5, fails in 1.7

2011-11-10 Thread Corinna Vinschen
On Nov 10 10:58, Corinna Vinschen wrote:
 On Nov  9 22:18, Leon Vanderploeg wrote:
  Many thanks to Charles and Corinna for the help.  I have modified the
  code to use the POSIX functions.  I still have one problem I cannot
  seem to conquer.  
  
  I need to be able to read and write the (yes, I know it's evil)
  archive bit.  Unless there is a POSIX function (which I seriously
  doubt) for these items, I am locked into the windows APIs.
  
  I have read and re-read the Cygwin documentation on
  internationalization at least 6 times and I cannot figure out what I
  need to do to get this to work.  I have tried numerous combinations of
  environment variables and locale settings in the code, but none of
  them work.  The windows API fails to find the file specified.  I just
  want US English that can handle the extended character set to the
  windows APIs.  In this case, let's use the example of the copyright
  symbol (the small c with a circle around it).  What needs to be set in
  the environment, and what needs to be set in the C code to handle
  these characters correctly?
 
 Nothing.  Just use always the UNICODE API, rather than the ANSI API:
 
   #include sys/cygwin.h
 
   DWORD
   my_GetFileAttributes (const char *cygwin_multibyte_filename)
   {
 DWORD attr = INVALID_FILE_ATTRIBUTES;
 PWCHAR w32_filename = cygwin_create_path (CCP_POSIX_TO_WIN_W,
 cygwin_multibyte_filename);
 if (w32_filename)
   {
   attr = GetFileAttributes (w32_filename);

Sigh.  Please make that

attr = GetFileAttributesW (w32_filename);

Note the trailing W.


   free (w32_filename);
   }
 return attr;
   }


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



RE: Possible Bug (clarification) in Cygwin 1.7.5 -- findfirstfile (and findnextfile) yeild bad cfilename when file names have special characters. Works in cygwin 1.5, fails in 1.7

2011-11-09 Thread Leon Vanderploeg
Many thanks to Charles and Corinna for the help.  I have modified the code to 
use the POSIX functions.  I still have one problem I cannot seem to conquer.  

I need to be able to read and write the (yes, I know it's evil) archive bit.  
Unless there is a POSIX function (which I seriously doubt) for these items, I 
am locked into the windows APIs.

I have read and re-read the Cygwin documentation on internationalization at 
least 6 times and I cannot figure out what I need to do to get this to work.  I 
have tried numerous combinations of environment variables and locale settings 
in the code, but none of them work.  The windows API fails to find the file 
specified.  I just want US English that can handle the extended character set 
to the windows APIs.  In this case, let's use the example of the copyright 
symbol (the small c with a circle around it).  What needs to be set in the 
environment, and what needs to be set in the C code to handle these characters 
correctly?

Your help and assistance is GREATLY appreciated.

Leon

Leon Vanderploeg
Cell   303-877-9654


On Nov  3 17:56, Charles Wilson wrote:
 On 11/3/2011 4:48 PM, Leon Vanderploeg wrote:
  With cygwin 1.7.5, cFileName with a special characters such as ñ (n 
  with tidle above it) fail be properly extracted from a 
  WIN32_FIND_DATA structure with findFirstFile (or findNextFile).
  
  To set up a simple test scenario, I created a file in C:\Testing 
  named  Mañana.docx.  I compiled the code at the end of this message 
  on Cygwin 1.7.9 with GCC version 3.4.4 on Server 2008 32 bit system.
  On this system (and on a Windows 7 32 bit machine), it returns:
 
 a) Why are you using native Win32 APIs in a cygwin program? You should 
 be using the POSIX interfaces instead -- see /usr/include/dirent.h.
 
 DIR *opendir (const char *);
 DIR *fdopendir (int);
 struct dirent *readdir (DIR *);
 int readdir_r (DIR *, struct dirent *, struct dirent **); void 
 rewinddir (DIR *); int closedir (DIR *);

ACK++

 b) What you observe is an artifact of cygwin-1.7's new *support* for 
 i18n.  In cygwin-1.5, it just didn't care and passed all the bytes 
 back exactly as found without transliteration.  In 1.7, it (correctly) 
 transcodes strings into the current locale -- and your current locale 
 does not appear to support ñ -- or, at least, you haven't told cygwin 
 to use the correct one.
 
 (I'm probably thoroughly botching this explanation, but the point is,

Just a bit.  What you have to keep in mind is that Windows stores all object 
names, including filenames, as UTF-16 strings, UNICODE in Windows terminology.  
When you use the ANSI Win32 API as in this example, then the UTF-16 names are 
converted to the currently defined ANSI charset on output, for instance 
codepage 1252 for Western Europe languages.

Cygwin 1.5 either used the ANSI API, or it converted strings from UTF-16 to the 
current Windows ANSI charset or vice versa.

Cygwin 1.7 doesn't use the ANSI API anymore, rather it uses UNICODE to talk to 
Windows only, and the multibyte charset is defined through the
environment(*) as defined in POSIX.  UTF-8 is the default now.

 you need to check your LC_* and LANG env vars, and maybe call 
 setlocale(LC_ALL, ) in your application.)

And even than the code won't work.  If you don't define UNICODE, 
FindFirstFile/FindNextFile will use the ANSI versions of this API, 
FindFirstFileA/FindNextFileA.  If you didn't set your LANG/LC_CTYPE/LC_ALL 
variables to use your current Windows ANSI charset *and* called setlocale, 
Cygwin will use UTF-8 by default.  Therefore, the character ñ will have another 
multibyte encoding, 0xc3 0xb1, rather than, say, 0xf1 in Windows codepage 1252. 
 To avoid this problem, you can use the UNICODE API FindFirstFileW/ 
FindNextFileW and convert the filename the current multibyte charset via 
wcstombs and friends.

However, as Chuck has pointed out, the obviously right thing to do is to use 
the POSIX API opendir/readdir/closedir instead.


Corinna



--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Possible Bug (clarification) in Cygwin 1.7.5 -- findfirstfile (and findnextfile) yeild bad cfilename when file names have special characters. Works in cygwin 1.5, fails in 1.7

2011-11-04 Thread Corinna Vinschen
On Nov  3 17:56, Charles Wilson wrote:
 On 11/3/2011 4:48 PM, Leon Vanderploeg wrote:
  With cygwin 1.7.5, cFileName with a special characters such as ñ (n
  with tidle above it) fail be properly extracted from a
  WIN32_FIND_DATA structure with findFirstFile (or findNextFile).
  
  To set up a simple test scenario, I created a file in C:\Testing
  named  Mañana.docx.  I compiled the code at the end of this message
  on Cygwin 1.7.9 with GCC version 3.4.4 on Server 2008 32 bit system.
  On this system (and on a Windows 7 32 bit machine), it returns:
 
 a) Why are you using native Win32 APIs in a cygwin program? You should
 be using the POSIX interfaces instead -- see /usr/include/dirent.h.
 
 DIR *opendir (const char *);
 DIR *fdopendir (int);
 struct dirent *readdir (DIR *);
 int readdir_r (DIR *, struct dirent *, struct dirent **);
 void rewinddir (DIR *);
 int closedir (DIR *);

ACK++

 b) What you observe is an artifact of cygwin-1.7's new *support* for
 i18n.  In cygwin-1.5, it just didn't care and passed all the bytes back
 exactly as found without transliteration.  In 1.7, it (correctly)
 transcodes strings into the current locale -- and your current locale
 does not appear to support ñ -- or, at least, you haven't told cygwin to
 use the correct one.
 
 (I'm probably thoroughly botching this explanation, but the point is,

Just a bit.  What you have to keep in mind is that Windows stores all
object names, including filenames, as UTF-16 strings, UNICODE in Windows
terminology.  When you use the ANSI Win32 API as in this example, then
the UTF-16 names are converted to the currently defined ANSI charset on
output, for instance codepage 1252 for Western Europe languages.

Cygwin 1.5 either used the ANSI API, or it converted strings from UTF-16
to the current Windows ANSI charset or vice versa.

Cygwin 1.7 doesn't use the ANSI API anymore, rather it uses UNICODE to
talk to Windows only, and the multibyte charset is defined through the
environment(*) as defined in POSIX.  UTF-8 is the default now.

 you need to check your LC_* and LANG env vars, and maybe call
 setlocale(LC_ALL, ) in your application.)

And even than the code won't work.  If you don't define UNICODE,
FindFirstFile/FindNextFile will use the ANSI versions of this API,
FindFirstFileA/FindNextFileA.  If you didn't set your LANG/LC_CTYPE/LC_ALL
variables to use your current Windows ANSI charset *and* called
setlocale, Cygwin will use UTF-8 by default.  Therefore, the character ñ
will have another multibyte encoding, 0xc3 0xb1, rather than, say, 0xf1
in Windows codepage 1252.  To avoid this problem, you can use the
UNICODE API FindFirstFileW/ FindNextFileW and convert the filename the
current multibyte charset via wcstombs and friends.

However, as Chuck has pointed out, the obviously right thing to do is to
use the POSIX API opendir/readdir/closedir instead.


Corinna

(*) http://cygwin.com/cygwin-ug-net/setup-locale.html
http://cygwin.com/cygwin-ug-net/using-utils.html#locale

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Possible Bug (clarification) in Cygwin 1.7.5 -- findfirstfile (and findnextfile) yeild bad cfilename when file names have special characters. Works in cygwin 1.5, fails in 1.7

2011-11-03 Thread Charles Wilson
On 11/3/2011 4:48 PM, Leon Vanderploeg wrote:
 With cygwin 1.7.5, cFileName with a special characters such as ñ (n
 with tidle above it) fail be properly extracted from a
 WIN32_FIND_DATA structure with findFirstFile (or findNextFile).
 
 To set up a simple test scenario, I created a file in C:\Testing
 named  Mañana.docx.  I compiled the code at the end of this message
 on Cygwin 1.7.9 with GCC version 3.4.4 on Server 2008 32 bit system.
 On this system (and on a Windows 7 32 bit machine), it returns:

a) Why are you using native Win32 APIs in a cygwin program? You should
be using the POSIX interfaces instead -- see /usr/include/dirent.h.

DIR *opendir (const char *);
DIR *fdopendir (int);
struct dirent *readdir (DIR *);
int readdir_r (DIR *, struct dirent *, struct dirent **);
void rewinddir (DIR *);
int closedir (DIR *);

b) What you observe is an artifact of cygwin-1.7's new *support* for
i18n.  In cygwin-1.5, it just didn't care and passed all the bytes back
exactly as found without transliteration.  In 1.7, it (correctly)
transcodes strings into the current locale -- and your current locale
does not appear to support ñ -- or, at least, you haven't told cygwin to
use the correct one.

(I'm probably thoroughly botching this explanation, but the point is,
you need to check your LC_* and LANG env vars, and maybe call
setlocale(LC_ALL, ) in your application.)

--
Chuck

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple