Re: how am I doing that ... via bash, just like the way you suggested I run 
"locale" the second time:
LC_CTYPE=C.UTF-8 grep -P needle haystack.txt  # just CTYPE seems to be enough, 
no need for ALL

Re: output difference for locale:
First way, everything is C.UTF-8 except LANG and LC_ALL, which are blank
Second way: same thing, except LC_ALL is no longer blank

Re: is_using_utf8 ... It relies on mbrtowc, which in turn relies on the current 
locale.
It seems that this function should NEVER return false in a UTF-8 locale.
But how does grep decide what the locale even is?
Presumably it must call setlocale at some point, or else it would be using the 
C locale, which is surely a unibyte locale.
Does it just do the obvious thing and call it with the empty string?
If so, is there any good way to find out what locale it actually got "resolved" 
to?

-----Original Message-----
From: Paul Eggert <egg...@cs.ucla.edu> 
Sent: Thursday, August 8, 2024 12:10 PM
To: Yagnatinsky, Mark : IT (NYK) <mark.yagnatin...@barclays.com>
Cc: 72...@debbugs.gnu.org
Subject: Re: bug#72524: how does grep determine locale if no LC environment 
variables are set

 CAUTION:  This email originated from outside our organization - 
egg...@cs.ucla.edu  Do not click on links, open attachments, or respond unless 
you recognize the sender and can validate the content is safe. 

______________________________________________________________________
On 2024-08-08 05:53, mark.yagnatinsky--- via Bug reports for GNU grep wrote:
> I ran into an odd issue... the workaround is easy enough but the issue is 
> weird.
> In case this relevant, my grep coms from git bash.  (which I think is 
> mostly Cygwin? (or maybe msys2??)) Anyway, grep -P doesn't work if no LC vars 
> are set, and complains that it only works in unibyte locales or UTF-8.
> Normally, the git bash mintty launcher sets LC_CTYPE to en_us.UTF-8 but not 
> if I bypass the launcher and run grep directly.
> Here's the weird part, if I ask /usr/bin/locale what LC_TYPE "should" be, it 
> says C.UTF-8.
> If I run grep with C.UTF-8 then it also works.  So it must be deriving a 
> default locale an different way.

My guess is that your default environment says that it supports UTF-8, but it 
doesn't support it well enough to pass grep's test; see grep/lib/localeinfo.c's 
is_using_utf8. If my guess is right, you may be encountering subtle bugs in 
programs other than grep.

When you say "I run grep with C.UTF-8" how exactly do you do that?

Is there any difference in output between these two shell commands?

localeinfo
LC_ALL=C.UTF-8 localeinfo

If you have a debugger, you might look into why is_using_utf8 returns false in 
your default locale.

This message is for information purposes only. It is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service, nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is intended for the recipient(s) only. It is not directed at 
retail customers. This message is subject to the terms at: 
https://www.ib.barclays/disclosures/web-and-email-disclaimer.html. 

For important disclosures, please see: 
https://www.ib.barclays/disclosures/sales-and-trading-disclaimer.html regarding 
marketing commentary from Barclays Sales and/or Trading desks, who are active 
market participants; 
https://www.ib.barclays/disclosures/barclays-global-markets-disclosures.html 
regarding our standard terms for Barclays Investment Bank where we trade with 
you in principal-to-principal wholesale markets transactions; and in respect to 
Barclays Research, including disclosures relating to specific issuers, see: 
https://publicresearch.barclays.com.
__________________________________________________________________________________
 
If you are incorporated or operating in Australia, read these important 
disclosures: 
https://www.ib.barclays/disclosures/important-disclosures-asia-pacific.html.
__________________________________________________________________________________
For more details about how we use personal information, see our privacy notice: 
https://www.ib.barclays/disclosures/personal-information-use.html. 
__________________________________________________________________________________
  • bug#72524: how does gre... mark . yagnatinsky--- via Bug reports for GNU grep
    • bug#72524: how doe... Paul Eggert
      • bug#72524: how... mark . yagnatinsky--- via Bug reports for GNU grep
        • bug#72524:... Paul Eggert
          • bug#72... mark . yagnatinsky--- via Bug reports for GNU grep
            • b... Paul Eggert
              • ... mark . yagnatinsky--- via Bug reports for GNU grep
                • ... Paul Eggert
                • ... mark . yagnatinsky--- via Bug reports for GNU grep
                • ... mark . yagnatinsky--- via Bug reports for GNU grep

Reply via email to