Updated: tzcode, tzdata 2023b

2023-03-25 Thread Cygwin tzcode/tzdata Maintainer via Cygwin-announce
The following packages have been upgraded in the Cygwin distribution:

* tzcode2023b
* tzdata2023b

The Time Zone Database (often called tz, tzdb, or zoneinfo) contains
data that represents the history of local time for many locations around
the world, and supports conversion of UTC time to local time at those
locations to allow display of those local times. It is updated
periodically to reflect changes made by political bodies to summer
daylight saving time rules, UTC offsets, and time zone boundaries.
The tzcode package provides the tzselect, zdump, and zic utilities.

For more information, see the project home page:

https://www.iana.org/time-zones

For more details on changes, see the announcement or below:

https://mm.icann.org/pipermail/tz-announce/2023-March/78.html


Release 2023b   2023-03-24

Briefly:

* Lebanon delays the start of DST this year.

Changes to future timestamps

* This year Lebanon springs forward April 20/21 not March 25/26.
  The 2023b release of the tz code and data is available. It follows so
  closely on the 2023a release because Lebanon's government announced
  that Lebanon's spring-forward transition previously scheduled for the
  end of this week has been delayed until April 20.



[ANNOUNCEMENT] Upgraded: grep 3.10

2023-03-25 Thread Cygwin grep Co-Maintainer via Cygwin-announce via Cygwin
The following package has been upgraded in the Cygwin distribution:

* grep  3.10

GNU grep searches one or more input files for lines containing a match
to a specified pattern. By default, grep outputs the matching lines. The
GNU implementation includes several useful extensions over POSIX.

The previous release stated that egrep and fgrep are deprecated
obsolescent commands, will be dropped in future, and from this release
until then, every use will show a stderr warning message, reminding you
how to change your commands and scripts:

$ egrep ...
egrep: warning: egrep is obsolescent; using grep -E
...
$ fgrep ...
fgrep: warning: fgrep is obsolescent; using grep -F
...

Cygwin releases will suppress the egrep and fgrep warning messages, but
developers and maintainers should rigorously remove all such usages from
their practices and scripts, as those commands could be dropped, or any
warning messages could be treated as fatal errors, in future.

Other invalid usages documented previously also now generate stderr
warning or error messages e.g.

grep: warning: * at start of expression
grep: warning: ? at start of expression
grep: warning: + at start of expression
grep: warning: {...} at start of expression
grep: warning: stray \ before 
grep: warning: stray \ before unprintable character
grep: warning: stray \ before white space

For more information see the project home pages:

https://www.gnu.org/software/grep/
https://sv.gnu.org/projects/grep/

For changes since the previous Cygwin release please see below or read
/usr/share/doc/grep/NEWS after installation; for complete details see:

/usr/share/doc/grep/ChangeLog
https://git.sv.gnu.org/gitweb/?p=grep.git;a=log;h=refs/tags/v3.9


Noteworthy changes in release 3.10  2023-03-22

* Bug fixes

  With -P, \d now matches only ASCII digits, regardless of PCRE
  options/modes. The changes in grep-3.9 to make \b and \w work
  properly had the undesirable side effect of making \d also match
  e.g., the Arabic digits: ٠١٢٣٤٥٦٧٨٩.  With grep-3.9, -P '\d+'
  would match that ten-digit (20-byte) string. Now, to match such
  a digit, you would use \p{Nd}. Similarly, \D is now mapped to [^0-9].
  [bug introduced in grep 3.9]


-- 
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Upgraded: grep 3.10

2023-03-25 Thread Cygwin grep Co-Maintainer via Cygwin-announce
The following package has been upgraded in the Cygwin distribution:

* grep  3.10

GNU grep searches one or more input files for lines containing a match
to a specified pattern. By default, grep outputs the matching lines. The
GNU implementation includes several useful extensions over POSIX.

The previous release stated that egrep and fgrep are deprecated
obsolescent commands, will be dropped in future, and from this release
until then, every use will show a stderr warning message, reminding you
how to change your commands and scripts:

$ egrep ...
egrep: warning: egrep is obsolescent; using grep -E
...
$ fgrep ...
fgrep: warning: fgrep is obsolescent; using grep -F
...

Cygwin releases will suppress the egrep and fgrep warning messages, but
developers and maintainers should rigorously remove all such usages from
their practices and scripts, as those commands could be dropped, or any
warning messages could be treated as fatal errors, in future.

Other invalid usages documented previously also now generate stderr
warning or error messages e.g.

grep: warning: * at start of expression
grep: warning: ? at start of expression
grep: warning: + at start of expression
grep: warning: {...} at start of expression
grep: warning: stray \ before 
grep: warning: stray \ before unprintable character
grep: warning: stray \ before white space

For more information see the project home pages:

https://www.gnu.org/software/grep/
https://sv.gnu.org/projects/grep/

For changes since the previous Cygwin release please see below or read
/usr/share/doc/grep/NEWS after installation; for complete details see:

/usr/share/doc/grep/ChangeLog
https://git.sv.gnu.org/gitweb/?p=grep.git;a=log;h=refs/tags/v3.9


Noteworthy changes in release 3.10  2023-03-22

* Bug fixes

  With -P, \d now matches only ASCII digits, regardless of PCRE
  options/modes. The changes in grep-3.9 to make \b and \w work
  properly had the undesirable side effect of making \d also match
  e.g., the Arabic digits: ٠١٢٣٤٥٦٧٨٩.  With grep-3.9, -P '\d+'
  would match that ten-digit (20-byte) string. Now, to match such
  a digit, you would use \p{Nd}. Similarly, \D is now mapped to [^0-9].
  [bug introduced in grep 3.9]



[ANNOUNCEMENT] Updated: tzcode, tzdata 2023a

2023-03-25 Thread Cygwin tzcode/tzdata Maintainer via Cygwin-announce via Cygwin
The following packages have been upgraded in the Cygwin distribution:

* tzcode2023a
* tzdata2023a

The Time Zone Database (often called tz, tzdb, or zoneinfo) contains
data that represents the history of local time for many locations around
the world, and supports conversion of UTC time to local time at those
locations to allow display of those local times. It is updated
periodically to reflect changes made by political bodies to summer
daylight saving time rules, UTC offsets, and time zone boundaries.
The tzcode package provides the tzselect, zdump, and zic utilities.

For more information, see the project home page:

https://www.iana.org/time-zones

For more details on changes, see the announcement or below:

https://mm.icann.org/pipermail/tz-announce/2023-March/77.html


Release 2023a   2023-03-22

Briefly:

* Egypt now uses DST again, from April through October.
* This year Morocco springs forward April 23, not April 30.
* Palestine delays the start of DST this year.
* Much of Greenland still uses DST from 2024 on.
* America/Yellowknife now links to America/Edmonton.
* tzselect can now use current time to help infer timezone.
* The code now defaults to C99 or later.
* Fix use of C23 attributes.

Changes to future timestamps

* Starting in 2023, Egypt will observe DST from April's last Friday
  through October's last Thursday. Assume the transition times are 00:00
  and 24:00, respectively.

* In 2023 Morocco's spring-forward transition after Ramadan
  will occur April 23, not April 30. Adjust predictions for future years
  accordingly. This affects predictions for 2023, 2031, 2038, and later
  years.

* This year Palestine will delay its spring forward from
  March 25 to April 29 due to Ramadan. Make guesses for future Ramadans
  too.

* Much of Greenland, represented by America/Nuuk, will continue to
  observe DST using European Union rules. When combined with Greenland's
  decision not to change the clocks in fall 2023, America/Nuuk therefore
  changes from -03/-02 to -02/-01 effective 2023-10-29 at 01:00 UTC.
  This change from 2022g doesn't affect timestamps until 2024-03-30, and
  doesn't affect tm_isdst until 2023-03-25.

Changes to past timestamps

* America/Yellowknife has changed from a Zone to a backward
  compatibility Link, as it no longer differs from America/Edmonton
  since 1970. This affects some pre-1948 timestamps. The old data are
  now in 'backzone'.

Changes to past time zone abbreviations

* When observing Moscow time, Europe/Kirov and Europe/Volgograd now
  use the abbreviations MSK/MSD instead of numeric abbreviations,
  for consistency with other timezones observing Moscow time.

Changes to code

* You can now tell tzselect local time, to simplify later choices.
  Select the 'time' option in its first prompt.

* You can now compile with -DTZNAME_MAXIMUM=N to limit time zone
  abbreviations to N bytes (default 255). The reference runtime
  library now rejects POSIX-style TZ strings that contain longer
  abbreviations, treating them as UTC. Previously the limit was
  platform dependent and abbreviations were silently truncated to
  16 bytes even when the limit was greater than 16.

* The code by default is now designed for C99 or later. To build in
  a C89 environment, compile with -DPORT_TO_C89. To support C89
  callers of the tzcode library, compile with -DSUPPORT_C89. The
  two new macros are transitional aids planned to be removed in a
  future version, when C99 or later will be required.

* The code now builds again on pre-C99 platforms, if you compile
  with -DPORT_TO_C89. This fixes a bug introduced in 2022f.

* On C23-compatible platforms tzcode no longer uses syntax like
  'static [[noreturn]] void usage(void);'. Instead, it uses
  '[[noreturn]] static void usage(void);' as strict C23 requires.

* The code's functions now constrain their arguments with the C
  'restrict' keyword consistently with their documentation.
  This may allow future optimizations.

* zdump again builds standalone with ckdadd and without setenv,
  fixing a bug introduced in 2022g.

* leapseconds.awk can now process a leap seconds file that never
  expires; this might be useful if leap seconds are discontinued.

Changes to commentary

* tz-link.html has a new section "Coordinating with governments and
  distributors".

* To improve tzselect diagnostics, zone1970.tab's comments column is
  now limited to countries that have multiple timezones.

* Note that leap seconds are planned to be discontinued by 2035.


-- 
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Updated: tzcode, tzdata 2023a

2023-03-25 Thread Cygwin tzcode/tzdata Maintainer via Cygwin-announce
The following packages have been upgraded in the Cygwin distribution:

* tzcode2023a
* tzdata2023a

The Time Zone Database (often called tz, tzdb, or zoneinfo) contains
data that represents the history of local time for many locations around
the world, and supports conversion of UTC time to local time at those
locations to allow display of those local times. It is updated
periodically to reflect changes made by political bodies to summer
daylight saving time rules, UTC offsets, and time zone boundaries.
The tzcode package provides the tzselect, zdump, and zic utilities.

For more information, see the project home page:

https://www.iana.org/time-zones

For more details on changes, see the announcement or below:

https://mm.icann.org/pipermail/tz-announce/2023-March/77.html


Release 2023a   2023-03-22

Briefly:

* Egypt now uses DST again, from April through October.
* This year Morocco springs forward April 23, not April 30.
* Palestine delays the start of DST this year.
* Much of Greenland still uses DST from 2024 on.
* America/Yellowknife now links to America/Edmonton.
* tzselect can now use current time to help infer timezone.
* The code now defaults to C99 or later.
* Fix use of C23 attributes.

Changes to future timestamps

* Starting in 2023, Egypt will observe DST from April's last Friday
  through October's last Thursday. Assume the transition times are 00:00
  and 24:00, respectively.

* In 2023 Morocco's spring-forward transition after Ramadan
  will occur April 23, not April 30. Adjust predictions for future years
  accordingly. This affects predictions for 2023, 2031, 2038, and later
  years.

* This year Palestine will delay its spring forward from
  March 25 to April 29 due to Ramadan. Make guesses for future Ramadans
  too.

* Much of Greenland, represented by America/Nuuk, will continue to
  observe DST using European Union rules. When combined with Greenland's
  decision not to change the clocks in fall 2023, America/Nuuk therefore
  changes from -03/-02 to -02/-01 effective 2023-10-29 at 01:00 UTC.
  This change from 2022g doesn't affect timestamps until 2024-03-30, and
  doesn't affect tm_isdst until 2023-03-25.

Changes to past timestamps

* America/Yellowknife has changed from a Zone to a backward
  compatibility Link, as it no longer differs from America/Edmonton
  since 1970. This affects some pre-1948 timestamps. The old data are
  now in 'backzone'.

Changes to past time zone abbreviations

* When observing Moscow time, Europe/Kirov and Europe/Volgograd now
  use the abbreviations MSK/MSD instead of numeric abbreviations,
  for consistency with other timezones observing Moscow time.

Changes to code

* You can now tell tzselect local time, to simplify later choices.
  Select the 'time' option in its first prompt.

* You can now compile with -DTZNAME_MAXIMUM=N to limit time zone
  abbreviations to N bytes (default 255). The reference runtime
  library now rejects POSIX-style TZ strings that contain longer
  abbreviations, treating them as UTC. Previously the limit was
  platform dependent and abbreviations were silently truncated to
  16 bytes even when the limit was greater than 16.

* The code by default is now designed for C99 or later. To build in
  a C89 environment, compile with -DPORT_TO_C89. To support C89
  callers of the tzcode library, compile with -DSUPPORT_C89. The
  two new macros are transitional aids planned to be removed in a
  future version, when C99 or later will be required.

* The code now builds again on pre-C99 platforms, if you compile
  with -DPORT_TO_C89. This fixes a bug introduced in 2022f.

* On C23-compatible platforms tzcode no longer uses syntax like
  'static [[noreturn]] void usage(void);'. Instead, it uses
  '[[noreturn]] static void usage(void);' as strict C23 requires.

* The code's functions now constrain their arguments with the C
  'restrict' keyword consistently with their documentation.
  This may allow future optimizations.

* zdump again builds standalone with ckdadd and without setenv,
  fixing a bug introduced in 2022g.

* leapseconds.awk can now process a leap seconds file that never
  expires; this might be useful if leap seconds are discontinued.

Changes to commentary

* tz-link.html has a new section "Coordinating with governments and
  distributors".

* To improve tzselect diagnostics, zone1970.tab's comments column is
  now limited to countries that have multiple timezones.

* Note that leap seconds are planned to be discontinued by 2035.



Re: newlocale: Linux incompatibility

2023-03-25 Thread Corinna Vinschen via Cygwin
On Mar 25 13:03, Brian Inglis via Cygwin wrote:
> On 2023-03-25 05:49, Corinna Vinschen via Cygwin wrote:
> It looks like /proc/locales contains the same content as produced by locale 
> -a?

Yes, locale -a actually opens /proc/locales to read the locales from the
Cygwin core, just as it opens /proc/codesets to implement locale -m.
The idea was to have these definitions collected inside the DLL instead
of having to duplicate code in an external tool.


Corinna

-- 
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: newlocale: Linux incompatibility

2023-03-25 Thread Corinna Vinschen via Cygwin
On Mar 25 13:03, Brian Inglis via Cygwin wrote:
> On 2023-03-25 05:49, Corinna Vinschen via Cygwin wrote:
> > On Mar 24 16:49, Brian Inglis via Cygwin wrote:
> > I never heard about an environment variable called LANGUAGE.  This is
> > about LANG/LC_ALL/LC_whatever, so POSIX syntax is required...
> 
> Used by gettext:
> 
> https://www.gnu.org/software/gettext/manual/html_node/The-LANGUAGE-variable.html

Ok, I'm not using that because I didn't even know that.  But I'm not
sure why you even mention it, it has nothing to do with Cygwin's
locale implementation which is based on the POSIX definitions.
Exception here is where the data comes from since we don't maintain
locale definition files and thus we don't follow
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html
to the letter.

> > > Aha - a nice new 3.5.0 feature - as well as /proc/codesets - is that
> > > charsets e.g. ISO-10646, etc. rather than encodings e.g. UTF-8, etc.!
> 
> > It's a list of what you can use as codeset in $LANG and friends as in
> >LC_CTYPE=lang_TERRITORY.codeset@modifier
> 
> You are using codeset to mean encoding, whereas in Unicode and W3 it usually
> means coded character set/charset; it can also mean charmap; see iconv(1):

I'm using the POSIX definition here.  Codeset is codeset, as in
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html

Quote:

  If the locale value has the form:

  language[_territory][.codeset]

  it refers to an implementation-provided locale, where settings of
  language, territory, and codeset are implementation-defined.

So I'm using the name "codesets" to follow POSIX documentation for
setting the matching locale environment variables, exactly to avoid
confusion.


Corinna

-- 
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


[ANNOUNCEMENT] Updated: {,{mingw64-{x86_64,i686}}xz 5.4.2-1

2023-03-25 Thread Achim Gratz via Cygwin


The following packages have been uploaded to the Cygwin distribution:

 xz-5.4.2-1
 liblzma5-5.4.2-1
 liblzma-devel-5.4.2-1

 mingw64-i686-xz-5.4.2-1
 mingw64-x86_64-xz-5.4.2-1

XZ Utils is free general-purpose data compression software with high 
compression ratio. XZ Utils are the successor to LZMA Utils.

This is an update to the latest upstream release.

-- 
  *** CYGWIN-ANNOUNCE UNSUBSCRIBE INFO ***

If you want to unsubscribe from the cygwin-announce mailing list, look
at the "List-Unsubscribe: " tag in the email header of this message.
Send email to the address specified there. It will be in the format:

cygwin-announce-unsubscribe-you=yourdomain@cygwin.com

If you need more information on unsubscribing, start reading here:

http://sourceware.org/lists.html#unsubscribe-simple

Please read *all* of the information on unsubscribing that is available
starting at this URL.

-- 
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Updated: {,{mingw64-{x86_64,i686}}xz 5.4.2-1

2023-03-25 Thread Achim Gratz


The following packages have been uploaded to the Cygwin distribution:

 xz-5.4.2-1
 liblzma5-5.4.2-1
 liblzma-devel-5.4.2-1

 mingw64-i686-xz-5.4.2-1
 mingw64-x86_64-xz-5.4.2-1

XZ Utils is free general-purpose data compression software with high 
compression ratio. XZ Utils are the successor to LZMA Utils.

This is an update to the latest upstream release.

-- 
  *** CYGWIN-ANNOUNCE UNSUBSCRIBE INFO ***

If you want to unsubscribe from the cygwin-announce mailing list, look
at the "List-Unsubscribe: " tag in the email header of this message.
Send email to the address specified there. It will be in the format:

cygwin-announce-unsubscribe-you=yourdomain@cygwin.com

If you need more information on unsubscribing, start reading here:

http://sourceware.org/lists.html#unsubscribe-simple

Please read *all* of the information on unsubscribing that is available
starting at this URL.


[ANNOUNCEMENT] Updated: Perl distributions

2023-03-25 Thread Achim Gratz via Cygwin


The following Perl distributions have been updated to their latest
release version available on CPAN:

noarch
--
 perl-Business-ISBN-3.008-1
 perl-Business-ISBN-Data-20230322.001-1
 perl-DateTime-TimeZone-2.59-1
 perl-Exporter-Tiny-1.006001-1
 perl-Test2-Suite-0.000150-1
 perl-YAML-Tiny-1.74-1


-- 
  *** CYGWIN-ANNOUNCE UNSUBSCRIBE INFO ***

If you want to unsubscribe from the cygwin-announce mailing list, look
at the "List-Unsubscribe: " tag in the email header of this message.
Send email to the address specified there. It will be in the format:

cygwin-announce-unsubscribe-you=yourdomain@cygwin.com

If you need more information on unsubscribing, start reading here:

http://sourceware.org/lists.html#unsubscribe-simple

Please read *all* of the information on unsubscribing that is available
starting at this URL.

-- 
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Updated: Perl distributions

2023-03-25 Thread Achim Gratz


The following Perl distributions have been updated to their latest
release version available on CPAN:

noarch
--
 perl-Business-ISBN-3.008-1
 perl-Business-ISBN-Data-20230322.001-1
 perl-DateTime-TimeZone-2.59-1
 perl-Exporter-Tiny-1.006001-1
 perl-Test2-Suite-0.000150-1
 perl-YAML-Tiny-1.74-1


-- 
  *** CYGWIN-ANNOUNCE UNSUBSCRIBE INFO ***

If you want to unsubscribe from the cygwin-announce mailing list, look
at the "List-Unsubscribe: " tag in the email header of this message.
Send email to the address specified there. It will be in the format:

cygwin-announce-unsubscribe-you=yourdomain@cygwin.com

If you need more information on unsubscribing, start reading here:

http://sourceware.org/lists.html#unsubscribe-simple

Please read *all* of the information on unsubscribing that is available
starting at this URL.


Re: newlocale: Linux incompatibility

2023-03-25 Thread Brian Inglis via Cygwin

On 2023-03-25 05:49, Corinna Vinschen via Cygwin wrote:

On Mar 24 16:49, Brian Inglis via Cygwin wrote:

On 2023-03-24 06:18, Corinna Vinschen via Cygwin wrote:

First, it's a bug in the Emacs testsuite.  The test simply assumes that
there's no en_DE locale on any system, but that's just not true.
Windows support the RFC 5646 locale "en-DE", which is called "English
(Germany)" in the "Region" settings.
You can also check with `locale -av | less' and search for en_DE.
For the reminder of this mail, I assume you're talking about Cygwin 3.5.
I won't fix this for 3.4 anymore, given how much locale handling has
changed for 3.5.
The second bug is that Cygwin blindly trusts the Windows function
ResolveLocaleName().  That function blatantly converts even vaguely
similar locales into something it supports.  E.g., it converts "en-XY"
to "en-US".  I. .e., even if you use "en_XY.utf8" as locale, the above
testcase will wrongly succeed.  So I have to rethink how I resolve POSIX
locales to Windows locales.



Does Windows even consider https://www.rfc-editor.org/rfc/rfc4647 "Matching
of Language Tags", part of https://www.rfc-editor.org/info/bcp47 "Language
Tags", and if POSIX only matches exactly, will LANGUAGE be able to be used
for fallback?



I never heard about an environment variable called LANGUAGE.  This is
about LANG/LC_ALL/LC_whatever, so POSIX syntax is required...


Used by gettext:

https://www.gnu.org/software/gettext/manual/html_node/The-LANGUAGE-variable.html

also LINGUAS FYI controlling, documentating, or limiting translations:

https://www.gnu.org/software/gettext/manual/html_node/po_002fLINGUAS.html
https://www.gnu.org/software/gettext/manual/html_node/Installers.html

as POSIX punts a lot of locale handling into the (hand waving) implementation 
defined bucket, where this is the primary implementation.



I currently define LANGUAGE=en_CA:en_GB:en in case en-CA is unsupported by
anything.
[I use my own en-CA locale not the glibc default created by https://rap.dk/.]
Will "-" be supported like "_" as a separator in values?



In Cygwin?  No.  The POSIX syntax is required, it's converted into
a matching Windows RFC 5646 locale internally.



And the third bug is that Cygwin fails to set errno if it doesn't
support a locale, but that's a minor inconvenience in comparison.
Thanks for the report, I totally missed the above problem with
ResolveLocaleName.



I pushed a couple of patches which hopefully clean up the code.  It's
really frustrating how these Windows locale functions work.  Or, rather,
not work.  I mean, come on...
- ResolveLocaleName() resolves "ff-BF" to "ff-Latn-SN", not to
"ff-Adlm-BF" or "ff-Latn-BF", even though both exist.
- There's a locale called "sd-Arab-PK" and a locale "sd-Deva-IN".  If
you ask for the script used in "sd-IN", the result is "Arab", not
"Deva".
I had to create a replacement function for ResolveLocaleName which
doesn't return totally screwy and unexpected results, and special case
two more locales in /proc/locales output so the output makes sense.



Aha - a nice new 3.5.0 feature - as well as /proc/codesets - is that
charsets e.g. ISO-10646, etc. rather than encodings e.g. UTF-8, etc.!



It's a list of what you can use as codeset in $LANG and friends as in
   LC_CTYPE=lang_TERRITORY.codeset@modifier


You are using codeset to mean encoding, whereas in Unicode and W3 it usually 
means coded character set/charset; it can also mean charmap; see iconv(1):


https://pubs.opengroup.org/onlinepubs/9699919799/utilities/iconv.html

Further confused by codeset definition:

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_99

linking to:

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap06.html#tag_06_02

which says POSIX "provides no means of defining a wide-character codeset" 
implying encodings such as UCS-2/UTF-16 and UCS-4/UTF-32 can not be specified, 
requiring a non-POSIX approach to conversion.


Also IBM uses codeset to distinguish between EBCDIC and ASCII encodings.

Adding to the confusion ISO uses codeset to refer generically to each set of 
codes supported by each part of ISO-639-1/2/3/5, ISO-3166-1/2/3, and ISO-15924, 
as well as ISO-8859-1...16.


I get no hits from RFCs.

To avoid ambiguity and reduce possible confusion, it may be better to name this 
charmaps as used in locale(1), and produced by locale -m with the same apparent 
content?

It looks like /proc/locales contains the same content as produced by locale -a?

JM2c ;^>

--
Take care. Thanks, Brian Inglis  Calgary, Alberta, Canada

La perfection est atteinte   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut
-- Antoine de Saint-Exupéry

--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/

RE: Strange link /bin/rungs in 32-bit Cygwin

2023-03-25 Thread Fergus Daly via Cygwin
>> Maybe of diminishing interest (32-bit Cygwin) - but:
>> Out of nowhere (but see below (*)) a link has occurred
>> /bin/rungs -> /usr/share/texmf-dist/scripts/texlive/rungs.tlu
>> which is, I think, a typo for /usr/share/texmf-dist/scripts/texlive/rungs.lua
>> Can anybody confirm?

> The symlink and script come from texlive-collection-basic.  In current 
> TeX Live on 64-bit Cygwin, the link does indeed point to rungs.lua.  But 
> I think the .tlu extension is used for texlua scripts, so what you're 
> seeing might not be a typo.  I'd have to look back at last year's 
> texlive-collection-basic to be sure, but you can do that more easily 
> than I can, since you already have a system with last year's 
> texlive-collection-basic.

You are right. The provision is right though different:
Cygwin32: texlive-collection-basic-20220321-1
Cygwin64: texlive-collection-basic-20230313-2
Both setup.ini files are right and so is their enactment.
Actually both my platforms were correctly configured with
correct links too.
My housekeeping, or rather my interpretation of housekeeping 
reports, was faulty. 
Sorry for wasted time.

-- 
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: newlocale: Linux incompatibility

2023-03-25 Thread Corinna Vinschen via Cygwin
On Mar 24 16:49, Brian Inglis via Cygwin wrote:
> On 2023-03-24 06:18, Corinna Vinschen via Cygwin wrote:
> > > First, it's a bug in the Emacs testsuite.  The test simply assumes that
> > > there's no en_DE locale on any system, but that's just not true.
> > > Windows support the RFC 5646 locale "en-DE", which is called "English
> > > (Germany)" in the "Region" settings.
> > > 
> > > You can also check with `locale -av | less' and search for en_DE.
> > > 
> > > For the reminder of this mail, I assume you're talking about Cygwin 3.5.
> > > I won't fix this for 3.4 anymore, given how much locale handling has
> > > changed for 3.5.
> > > 
> > > The second bug is that Cygwin blindly trusts the Windows function
> > > ResolveLocaleName().  That function blatantly converts even vaguely
> > > similar locales into something it supports.  E.g., it converts "en-XY"
> > > to "en-US".  I. .e., even if you use "en_XY.utf8" as locale, the above
> > > testcase will wrongly succeed.  So I have to rethink how I resolve POSIX
> > > locales to Windows locales.
> 
> Does Windows even consider https://www.rfc-editor.org/rfc/rfc4647 "Matching
> of Language Tags", part of https://www.rfc-editor.org/info/bcp47 "Language
> Tags", and if POSIX only matches exactly, will LANGUAGE be able to be used
> for fallback?

I never heard about an environment variable called LANGUAGE.  This is
about LANG/LC_ALL/LC_whatever, so POSIX syntax is required...

> I currently define LANGUAGE=en_CA:en_GB:en in case en-CA is unsupported by
> anything.
> [I use my own en-CA locale not the glibc default created by https://rap.dk/.]
> 
> Will "-" be supported like "_" as a separator in values?

In Cygwin?  No.  The POSIX syntax is required, it's converted into
a matching Windows RFC 5646 locale internally.

> > > And the third bug is that Cygwin fails to set errno if it doesn't
> > > support a locale, but that's a minor inconvenience in comparison.
> > > 
> > > Thanks for the report, I totally missed the above problem with
> > > ResolveLocaleName.
> > 
> > I pushed a couple of patches which hopefully clean up the code.  It's
> > really frustrating how these Windows locale functions work.  Or, rather,
> > not work.  I mean, come on...
> > 
> > - ResolveLocaleName() resolves "ff-BF" to "ff-Latn-SN", not to
> >"ff-Adlm-BF" or "ff-Latn-BF", even though both exist.
> > 
> > - There's a locale called "sd-Arab-PK" and a locale "sd-Deva-IN".  If
> >you ask for the script used in "sd-IN", the result is "Arab", not
> >"Deva".
> >
> > I had to create a replacement function for ResolveLocaleName which
> > doesn't return totally screwy and unexpected results, and special case
> > two more locales in /proc/locales output so the output makes sense.
> 
> Aha - a nice new 3.5.0 feature - as well as /proc/codesets - is that
> charsets e.g. ISO-10646, etc. rather than encodings e.g. UTF-8, etc.!

It's a list of what you can use as codeset in $LANG and friends as in

  LC_CTYPE=lang_TERRITORY.codeset@modifier


Corinna

-- 
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple