Re: Is º an unicode alphabetic character?

2014-09-12 Thread AsmMan via Digitalmars-d-learn

On Friday, 12 September 2014 at 04:04:22 UTC, Ali Çehreli wrote:

On 09/11/2014 08:04 PM, AsmMan wrote:

 what's an unicode alphabetic character?

Alphabetic is defined as Lu + Ll + Lt + Lm + Lo + Nl + 
Other_Alphabetic, all of which are explained here:


  
http://www.unicode.org/Public/5.1.0/ucd/UCD.html#General_Category_Values


 I misunderstood isAlpha(), I
 used to think it's to validate letters like a, b, è, é .. z
etc but
 isAlpha('º') from std.uni module return true.

º happens to be in the Letter, Lowercase category so yes, it 
is isAlpha().


 How can I validate only
 the letters of an unicode alphabet in D or should I write one?

There are so many alphabets in the world. It is likely that a 
Unicode character will be a part of one.


 I know I can do:

 bool is_id(dchar c)
 {
  return c = 'a'  c = 'z' || c = 'A'  c = 'z' || c
= 0xc0;
 }

There is a misunderstanding. There are so many Unicode 
characters that are = 0xc0 but not a part of the Alphabetic 
category. For example: ← (U+2190 LEFTWARDS ARROW).


Ali


If I want ASCII and latin only alphabet which range should I use?
ie, how should I rewrite is_id() function?


Re: Is º an unicode alphabetic character?

2014-09-12 Thread Ali Çehreli via Digitalmars-d-learn

On 09/11/2014 11:38 PM, AsmMan wrote:

 If I want ASCII and latin only alphabet which range should I use?
 ie, how should I rewrite is_id() function?

This seems to be it:

import std.stdio;
import std.uni;

void main()
{
alias latin = unicode.script.latin;
assert('ç' in latin);
assert('7' !in latin);

writeln(latin);
}

Ali



Re: Is º an unicode alphabetic character?

2014-09-12 Thread AsmMan via Digitalmars-d-learn

On Friday, 12 September 2014 at 07:57:43 UTC, Ali Çehreli wrote:

On 09/11/2014 11:38 PM, AsmMan wrote:

 If I want ASCII and latin only alphabet which range should I
use?
 ie, how should I rewrite is_id() function?

This seems to be it:

import std.stdio;
import std.uni;

void main()
{
alias latin = unicode.script.latin;
assert('ç' in latin);
assert('7' !in latin);

writeln(latin);
}

Ali


Sorry, I shouldn't asked for latin but an alphabet like French 
instead of: 
http://www.importanceoflanguages.com/Images/French/FrenchAlphabet.jpg 
(including the diacritics, of course)


As you mentioned, º happend to be a letter so it still pass in: 
assert('º' in latin);


so isn't different from isAlpha(). Is the UTF-8 table organized 
so that I can use a range (like we do for ASCII ch = 'a'  ch 
= 'z' || ch = 'A'  ch = 'Z') or should I put these alpha 
characters myself on table and then do look up?


Re: Is º an unicode alphabetic character?

2014-09-12 Thread AsmMan via Digitalmars-d-learn

Thanks Ali, I think I get close:

bool is_id(dchar c)
{
	return c = 'a'  c = 'z' || c = 'A'  c = 'Z' || c = 0xc0 
 c = 0x0d || c = 0xd8  c = 0xf6 || c = 0xf8  c = 0xff;

}

this doesn't include some math symbols. like c = 0xc0 did.


Is º an unicode alphabetic character?

2014-09-11 Thread AsmMan via Digitalmars-d-learn
what's an unicode alphabetic character? I misunderstood 
isAlpha(), I used to think it's to validate letters like a, b, è, 
é .. z etc but isAlpha('º') from std.uni module return true. How 
can I validate only the letters of an unicode alphabet in D or 
should I write one?


I know I can do:

bool is_id(dchar c)
{
return c = 'a'  c = 'z' || c = 'A'  c = 'z' || c = 0xc0;
}

but I'm looking for a native, if any


Re: Is º an unicode alphabetic character?

2014-09-11 Thread Ali Çehreli via Digitalmars-d-learn

On 09/11/2014 08:04 PM, AsmMan wrote:

 what's an unicode alphabetic character?

Alphabetic is defined as Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic, 
all of which are explained here:


  http://www.unicode.org/Public/5.1.0/ucd/UCD.html#General_Category_Values

 I misunderstood isAlpha(), I
 used to think it's to validate letters like a, b, è, é .. z etc but
 isAlpha('º') from std.uni module return true.

º happens to be in the Letter, Lowercase category so yes, it is isAlpha().

 How can I validate only
 the letters of an unicode alphabet in D or should I write one?

There are so many alphabets in the world. It is likely that a Unicode 
character will be a part of one.


 I know I can do:

 bool is_id(dchar c)
 {
  return c = 'a'  c = 'z' || c = 'A'  c = 'z' || c = 0xc0;
 }

There is a misunderstanding. There are so many Unicode characters that 
are = 0xc0 but not a part of the Alphabetic category. For example: ← 
(U+2190 LEFTWARDS ARROW).


Ali