Re: [go-nuts] Re: Unicode variable name error

2022-11-08 Thread 'Dan Kortschak' via golang-nuts
On Tue, 2022-11-08 at 09:17 -0800, TheDiveO wrote:
> I've always wondered how to deal with exported versus unexported
> identifiers in scripts like Chinese?

There is an issue for this https://go.dev/issue/22188 which discusses
the approaches that are currently used with a view to making it easier.
It also links to previous issues about this.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/665360ff854b4abc823a5882f6f1584fdfb6ba5f.camel%40kortschak.io.


Re: [go-nuts] Re: Unicode variable name error

2022-11-08 Thread TheDiveO
I've always wondered how to deal with exported versus unexported 
identifiers in scripts like Chinese?

On Sunday, November 6, 2022 at 3:08:59 PM UTC+1 ba...@iitbombay.org wrote:

> In Indic scripts in certain contexts you have to use a vowel sign for the 
> typography to make sense; you can’t use a vowel letter in its place. So for 
> example the middle “ku” in my name has to be written as ક+ુ — which will be 
> rendered as કુ — even though it is equivalent to ક+્+ઉ. Also, “halant” (્), 
> is not a letter! 
>
> I would strongly urge Nikhilesh and other people wanting to use any Indic 
> script to **avoid* * it (even if Go implements TR31 as in Swift) and 
> instead use the lossless transliteration scheme of IAST if the program 
> calls for an Indian word as a Go object name.   
> https://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration
>   
> 
>
>
> On Nov 6, 2022, at 4:02 AM, Rob Pike  wrote:
>
> 
>
> % unicode -d పే
>
> U+0C2A 'ప' telugu letter pa
>
> U+0C47 'ే' telugu vowel sign ee
>
> % unicode -U C2A C47
>
> U+0C2A 'ప' TELUGU LETTER PA
>
> category: Lo
>
> canonical combining classes: 0
>
> bidirectional category: L
>
> mirrored: N
>
> U+0C47 'ే' TELUGU VOWEL SIGN EE
>
> category: Mn
>
> canonical combining classes: 0
>
> bidirectional category: NSM
>
> mirrored: N
>
> %
>
>
> The problem is the second code point, U+0C47, Telugu vowel sign EE. It is 
> not in the letter class. If I change your program to use just the first 
> code point, it works: https://play.golang.com/p/eNvuZH33s65
>
>
> The rules for identifiers in Go were chosen because they are easy to 
> implement, but they do have the problem that they do not treat all 
> languages equally. They may expand one day, but at the moment this is the 
> situation.
>
>
> There are a number of open issues around this. Start with 
> https://github.com/golang/go/issues/20706 if you want to read more.
>
>
> -rob
>
>
>
>
> On Sun, Nov 6, 2022 at 9:52 PM Konstantin Khomoutov  
> wrote:
>
>> On Sun, Nov 06, 2022 at 01:45:53PM +0530, Nikhilesh Susarla wrote:
>>
>> >> Per the Go spec[1], an identifier consists of a Unicode letter 
>> followed by
>> >> zero or more Unicode letters or digits. The character పే is in the 
>> Unicode
>> >> category nonspacing mark rather than the category letter.
>> [...]
>> > So, if the unicode letters are there in the nonspacing mark as you
>> > mentioned they can't be used right ?
>>
>> I sense the source of your misunderstanding might be rooted in your lack 
>> of
>> certain basics about Unicode. You seem to call "a letter" anything which 
>> may
>> appear in a text document (a Go source code file is a text document) but 
>> this
>> it not true. Maybe that's just a terminological problem, but still the 
>> fact
>> is, the Unicode standard calls "letters" a very particular group of things
>> among those the Unicode standard describes. To give a very simplified 
>> example,
>> in the text string "foo bar" there are six letters (five distinct) and one
>> space character which is not a letter. The charcter being discussed is 
>> not a
>> letter in Unicode, either.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "golang-nuts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to golang-nuts...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/golang-nuts/20221106105154.xkoemtt6tx25flam%40carbon
>> .
>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts...@googlegroups.com.
>
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/golang-nuts/CAOXNBZS085qwY5tXj%3Di5MeBguXeemHYBmSzjZks--MNmALohcg%40mail.gmail.com
>  
> 
> .
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/f2f8e29a-9a71-441a-84ea-b7f9e3a82d8en%40googlegroups.com.


Re: [go-nuts] Re: Unicode variable name error

2022-11-06 Thread Bakul Shah
In Indic scripts in certain contexts you have to use a vowel sign for the typography to make sense; you can’t use a vowel letter in its place. So for example the middle “ku” in my name has to be written as ક+ુ — which will be rendered as કુ — even though it is equivalent to ક+્+ઉ. Also, “halant” (્), is not a letter! I would strongly urge Nikhilesh and other people wanting to use any Indic script to *avoid*  it (even if Go implements TR31 as in Swift) and instead use the lossless transliteration scheme of IAST if the program calls for an Indian word as a Go object name.   https://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration On Nov 6, 2022, at 4:02 AM, Rob Pike  wrote:





% unicode -d పే
U+0C2A 'ప' telugu letter pa
U+0C47 'ే' telugu vowel sign ee





% unicode -U C2A C47





U+0C2A 'ప' TELUGU LETTER PA
	category: Lo
	canonical combining classes: 0
	bidirectional category: L
	mirrored: NU+0C47 'ే' TELUGU VOWEL SIGN EE






	category: Mn
	canonical combining classes: 0
	bidirectional category: NSM
	mirrored: N%The problem is the second code point, U+0C47, Telugu vowel sign EE. It is not in the letter class. If I change your program to use just the first code point, it works: https://play.golang.com/p/eNvuZH33s65The rules for identifiers in Go were chosen because they are easy to implement, but they do have the problem that they do not treat all languages equally. They may expand one day, but at the moment this is the situation.There are a number of open issues around this. Start with https://github.com/golang/go/issues/20706 if you want to read more.-robOn Sun, Nov 6, 2022 at 9:52 PM Konstantin Khomoutov  wrote:On Sun, Nov 06, 2022 at 01:45:53PM +0530, Nikhilesh Susarla wrote:

>> Per the Go spec[1], an identifier consists of a Unicode letter followed by
>> zero or more Unicode letters or digits. The character పే is in the Unicode
>> category nonspacing mark rather than the category letter.
[...]
> So, if the unicode letters are there in the nonspacing mark as you
> mentioned they can't be used right ?

I sense the source of your misunderstanding might be rooted in your lack of
certain basics about Unicode. You seem to call "a letter" anything which may
appear in a text document (a Go source code file is a text document) but this
it not true. Maybe that's just a terminological problem, but still the fact
is, the Unicode standard calls "letters" a very particular group of things
among those the Unicode standard describes. To give a very simplified example,
in the text string "foo bar" there are six letters (five distinct) and one
space character which is not a letter. The charcter being discussed is not a
letter in Unicode, either.

-- 
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/20221106105154.xkoemtt6tx25flam%40carbon.




-- 
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAOXNBZS085qwY5tXj%3Di5MeBguXeemHYBmSzjZks--MNmALohcg%40mail.gmail.com.




-- 
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/560F39D7-DC3F-443A-A062-B70D6DA42D5D%40iitbombay.org.


Re: [go-nuts] Re: Unicode variable name error

2022-11-06 Thread Rob Pike
% unicode -d పే

U+0C2A 'ప' telugu letter pa

U+0C47 'ే' telugu vowel sign ee

% unicode -U C2A C47

U+0C2A 'ప' TELUGU LETTER PA

category: Lo

canonical combining classes: 0

bidirectional category: L

mirrored: N

U+0C47 'ే' TELUGU VOWEL SIGN EE

category: Mn

canonical combining classes: 0

bidirectional category: NSM

mirrored: N

%


The problem is the second code point, U+0C47, Telugu vowel sign EE. It is
not in the letter class. If I change your program to use just the first
code point, it works: https://play.golang.com/p/eNvuZH33s65


The rules for identifiers in Go were chosen because they are easy to
implement, but they do have the problem that they do not treat all
languages equally. They may expand one day, but at the moment this is the
situation.


There are a number of open issues around this. Start with
https://github.com/golang/go/issues/20706 if you want to read more.


-rob




On Sun, Nov 6, 2022 at 9:52 PM Konstantin Khomoutov  wrote:

> On Sun, Nov 06, 2022 at 01:45:53PM +0530, Nikhilesh Susarla wrote:
>
> >> Per the Go spec[1], an identifier consists of a Unicode letter followed
> by
> >> zero or more Unicode letters or digits. The character పే is in the
> Unicode
> >> category nonspacing mark rather than the category letter.
> [...]
> > So, if the unicode letters are there in the nonspacing mark as you
> > mentioned they can't be used right ?
>
> I sense the source of your misunderstanding might be rooted in your lack of
> certain basics about Unicode. You seem to call "a letter" anything which
> may
> appear in a text document (a Go source code file is a text document) but
> this
> it not true. Maybe that's just a terminological problem, but still the fact
> is, the Unicode standard calls "letters" a very particular group of things
> among those the Unicode standard describes. To give a very simplified
> example,
> in the text string "foo bar" there are six letters (five distinct) and one
> space character which is not a letter. The charcter being discussed is not
> a
> letter in Unicode, either.
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/golang-nuts/20221106105154.xkoemtt6tx25flam%40carbon
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAOXNBZS085qwY5tXj%3Di5MeBguXeemHYBmSzjZks--MNmALohcg%40mail.gmail.com.


Re: [go-nuts] Re: Unicode variable name error

2022-11-06 Thread Konstantin Khomoutov
On Sun, Nov 06, 2022 at 01:45:53PM +0530, Nikhilesh Susarla wrote:

>> Per the Go spec[1], an identifier consists of a Unicode letter followed by
>> zero or more Unicode letters or digits. The character పే is in the Unicode
>> category nonspacing mark rather than the category letter.
[...]
> So, if the unicode letters are there in the nonspacing mark as you
> mentioned they can't be used right ?

I sense the source of your misunderstanding might be rooted in your lack of
certain basics about Unicode. You seem to call "a letter" anything which may
appear in a text document (a Go source code file is a text document) but this
it not true. Maybe that's just a terminological problem, but still the fact
is, the Unicode standard calls "letters" a very particular group of things
among those the Unicode standard describes. To give a very simplified example,
in the text string "foo bar" there are six letters (five distinct) and one
space character which is not a letter. The charcter being discussed is not a
letter in Unicode, either.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/20221106105154.xkoemtt6tx25flam%40carbon.