Re: Rename std.ctype to std.ascii?

2011-06-16 Thread Regan Heath
On Tue, 14 Jun 2011 10:20:48 +0100, Jonathan M Davis jmdavisp...@gmx.com  
wrote:


So, given the arguably poor name of ctype and the fact that std.ctype  
does not actually match ctype.h's behavior, unless someone comes up with  
a really good reason not to fairly soon, I'm going to schedule std.ctype  
for deprecation and put the properly camelcased functions in std.ascii.


I reckon this is the best option.

--
Using Opera's revolutionary email client: http://www.opera.com/mail/


Re: Rename std.ctype to std.ascii?

2011-06-16 Thread Jouko Koski

Jonathan M Davis jmdavisp...@gmx.com wrote:

On 2011-06-14 11:53, Jouko Koski wrote:


I would not consider it being good idea to include this kind of 
ascii-only

utilities in the standard-ish library.



For some classes of operations, it makes perfect sense to be checking for
ASCII characters only. For others, it's just people not worrying about
internationalization like they should be. For instance, format strings 
don't
care about unicode as far as their escape sequences go. %a, %d, etc. are 
all

pure ASCII.


Do we really need a common library utility for such a bounded domain? I 
would vote dropping ascii-only std.ctype altogether. Those who know and 
ensure that they are dealing with ascii-only, ebcdic-only or whatever-only 
representations can easily write their own utilities to their particular 
domains - maybe even better optimized than std.ctype because the domain may 
be even more restricted. A common use ascii-only utility will be used 
inevitably in places where it shouldn't.


std.ctype/std.ascii deals with ASCII for those situations where you really 
do
only care about ASCII. It deals with unicode characters, but it returns 
false
for everything with them which returns a bool, and it never tries to 
change
their case. std.uni actually deals with unicode and worries about things 
like

whether a unicode character is uppercase or not.


That is what ctype.h (or wctype.h) utilities do when the default locale 
setting is in effect. Some other posters seem to suggest that a more 
generalized library module does this, too, without losing performance.


--
Jouko 



Re: Rename std.ctype to std.ascii?

2011-06-16 Thread Jonathan M Davis
On 2011-06-16 12:51, Jouko Koski wrote:
 Jonathan M Davis jmdavisp...@gmx.com wrote:
  On 2011-06-14 11:53, Jouko Koski wrote:
  I would not consider it being good idea to include this kind of
  ascii-only
  utilities in the standard-ish library.
  
  For some classes of operations, it makes perfect sense to be checking for
  ASCII characters only. For others, it's just people not worrying about
  internationalization like they should be. For instance, format strings
  don't
  care about unicode as far as their escape sequences go. %a, %d, etc. are
  all
  pure ASCII.
 
 Do we really need a common library utility for such a bounded domain? I
 would vote dropping ascii-only std.ctype altogether. Those who know and
 ensure that they are dealing with ascii-only, ebcdic-only or whatever-only
 representations can easily write their own utilities to their particular
 domains - maybe even better optimized than std.ctype because the domain may
 be even more restricted. A common use ascii-only utility will be used
 inevitably in places where it shouldn't.
 
  std.ctype/std.ascii deals with ASCII for those situations where you
  really do
  only care about ASCII. It deals with unicode characters, but it returns
  false
  for everything with them which returns a bool, and it never tries to
  change
  their case. std.uni actually deals with unicode and worries about things
  like
  whether a unicode character is uppercase or not.
 
 That is what ctype.h (or wctype.h) utilities do when the default locale
 setting is in effect. Some other posters seem to suggest that a more
 generalized library module does this, too, without losing performance. 

You actually do get a performance loss for a number of functions. They do tend 
to shortcut on ASCII in many cases, but they tend to become too large to be 
inlined, and if all you care about is ASCII, even if there are unicode 
characters in the string (which is common enough in domains that have nothing 
to do with English - e.g. regular expressions), you take a performance hit for 
all characters which aren't ASCII. There are also a number of functions which 
arguably don't make much sense to try and turn into unicode functions (e.g. 
isDigit) but are heavily used. Another fun one is isWhite vs isUniWhite. In 
most cases, you _don't_ care about unicode whitespace, and it is definitely 
more expensive to call isUniWhite than isWhite, because there are a _lot_ of 
extraneous whitespace characters in unicode.

std.ctype/std.ascii is _not_ going away. Too many people find those functions 
to be useful. I grant you that too many programmers don't worry about unicode 
when they should, but there are so many issues surrounding the proper handling 
of unicode that programmers aren't going to get it right unless they're 
actully trying to get it right. D provides a lot of the tools to make unicode 
mostly work correctly out of the box, but it's still complicated enough that 
you can't expect it to just work without programmers having some clue of 
what they're doing. And forcing people to come up with their own functions for 
basic ASCII operations (which pretty much every other programming language 
has) isn't going to help any.

- Jonathan M Davis


Re: Rename std.ctype to std.ascii?

2011-06-14 Thread Jonathan M Davis
On 2011-06-13 22:48, Jouko Koski wrote:
 Jonathan M Davis jmdavisp...@gmx.com wrote:
  std.ctype is modeled after C's ctype.h. It has functions for operating on
  characters - particularly functions which indicate the type of a
  character (I
  believe that ctype stands for character type, so that makes sense). For
  instance, isdigit will tell you whether a particular character is a
  digit. It
  only works on ASCII characters (non-ASCII characters return false for
  functions like isdigit and functions like toupper do nothing to non-ASCII
  characters).
 
 What is your definition for ASCII character?
 
 Most of the ctype.h functions (or macros) are locale dependent, see
 setlocale() and locale.h. And there is the wctype.h, too.
 
 While the C standardized ways of doing things might not be most appropriate
 approach in D domain, we must not base our design decisions on deficient
 analysis. I just want this text uppercase is one of the hardest things in
 the _world_. The problem is not just the header or package naming.

??? std.ctype does _nothing_ with localization. And even if it did, that 
doesn't change what ASCII is. ASCII is made up of the values 0 through 127. 
And honestly, I have no clue how _those_ characters could be affected by 
locale. Extended-ASCII might be, but I wouldn't think that ASCII would be. 
Regardless, std.ctype does nothing with locale.

- Jonathan M Davis


Re: Rename std.ctype to std.ascii?

2011-06-14 Thread KennyTM~

On Jun 14, 11 14:23, Jonathan M Davis wrote:

On 2011-06-13 22:48, Jouko Koski wrote:

Jonathan M Davisjmdavisp...@gmx.com  wrote:

std.ctype is modeled after C's ctype.h. It has functions for operating on
characters - particularly functions which indicate the type of a
character (I
believe that ctype stands for character type, so that makes sense). For
instance, isdigit will tell you whether a particular character is a
digit. It
only works on ASCII characters (non-ASCII characters return false for
functions like isdigit and functions like toupper do nothing to non-ASCII
characters).


What is your definition for ASCII character?

Most of thectype.h  functions (or macros) are locale dependent, see
setlocale() andlocale.h. And there is thewctype.h, too.

While the C standardized ways of doing things might not be most appropriate
approach in D domain, we must not base our design decisions on deficient
analysis. I just want this text uppercase is one of the hardest things in
the _world_. The problem is not just the header or package naming.


??? std.ctype does _nothing_ with localization. And even if it did, that
doesn't change what ASCII is. ASCII is made up of the values 0 through 127.
And honestly, I have no clue how _those_ characters could be affected by
locale. Extended-ASCII might be, but I wouldn't think that ASCII would be.
Regardless, std.ctype does nothing with locale.

- Jonathan M Davis


std.ctype does not, but ctype.h does. (which could be another reason 
it shouldn't be called std.ctype.)


Re: Rename std.ctype to std.ascii?

2011-06-14 Thread David Nadlinger

On 6/14/11 8:23 AM, Jonathan M Davis wrote:

What is your definition for ASCII character?

Most of thectype.h  functions (or macros) are locale dependent, see
setlocale() andlocale.h. And there is thewctype.h, too.

While the C standardized ways of doing things might not be most appropriate
approach in D domain, we must not base our design decisions on deficient
analysis. I just want this text uppercase is one of the hardest things in
the _world_. The problem is not just the header or package naming.


??? std.ctype does _nothing_ with localization. And even if it did, that
doesn't change what ASCII is. ASCII is made up of the values 0 through 127.
And honestly, I have no clue how _those_ characters could be affected by
locale. Extended-ASCII might be, but I wouldn't think that ASCII would be.
Regardless, std.ctype does nothing with locale.


But the functions in ctype.h do. And there can be some 
locale-dependent problems even if you use only ASCII, the most prominent 
being the different handling of »i« in the Turkish locale: 
http://www.i18nguy.com/unicode/turkish-i18n.html


This is probably another reason why it shouldn't be called std.ctype…

David


Re: Rename std.ctype to std.ascii?

2011-06-14 Thread Jonathan M Davis
On 2011-06-14 01:51, David Nadlinger wrote:
 On 6/14/11 8:23 AM, Jonathan M Davis wrote:
  What is your definition for ASCII character?
  
  Most of thectype.h  functions (or macros) are locale dependent, see
  setlocale() andlocale.h. And there is thewctype.h, too.
  
  While the C standardized ways of doing things might not be most
  appropriate approach in D domain, we must not base our design decisions
  on deficient analysis. I just want this text uppercase is one of the
  hardest things in the _world_. The problem is not just the header or
  package naming.
  
  ??? std.ctype does _nothing_ with localization. And even if it did, that
  doesn't change what ASCII is. ASCII is made up of the values 0 through
  127. And honestly, I have no clue how _those_ characters could be
  affected by locale. Extended-ASCII might be, but I wouldn't think that
  ASCII would be. Regardless, std.ctype does nothing with locale.
 
 But the functions in ctype.h do. And there can be some
 locale-dependent problems even if you use only ASCII, the most prominent
 being the different handling of »i« in the Turkish locale:
 http://www.i18nguy.com/unicode/turkish-i18n.html
 
 This is probably another reason why it shouldn't be called std.ctype…

From the looks of it, that affects extended ASCII but not ASCII (since the 
Turkish uppercase I isn't even in ASCII). It's definitely a great link though. 
Thanks!

It may be that we'll want to improve std.uni to deal with locales in some 
manner (either by providing new functions which handle them or altering the 
current ones to handle them), but std.ctype is pure ASCII. And while I don't 
see how locales can affect pure ASCII, ctype.h appears to actually deal with 
extended ASCII rather than just ASCII (where locales _do_ matter). So, all in 
all, std.ctype definitely has different behavior than ctype.h, which makes the 
name std.ctype that much worse.

So, given the arguably poor name of ctype and the fact that std.ctype does not 
actually match ctype.h's behavior, unless someone comes up with a really good 
reason not to fairly soon, I'm going to schedule std.ctype for deprecation and 
put the properly camelcased functions in std.ascii.

- Jonathan M Davis


Re: Rename std.ctype to std.ascii?

2011-06-14 Thread David Nadlinger

On 6/14/11 11:20 AM, Jonathan M Davis wrote:

On 2011-06-14 01:51, David Nadlinger wrote:

But the functions inctype.h  do. And there can be some
locale-dependent problems even if you use only ASCII, the most prominent
being the different handling of »i« in the Turkish locale:
http://www.i18nguy.com/unicode/turkish-i18n.html

This is probably another reason why it shouldn't be called std.ctype…


 From the looks of it, that affects extended ASCII but not ASCII (since the
Turkish uppercase I isn't even in ASCII). It's definitely a great link though.
Thanks!


Oh, I was probably a bit unclear – what I meant is that it affects you 
also if you use only ASCII input, since toupper('i') == 221 when your 
locale is tr_TR.ISO-8859-9.


David


Re: Rename std.ctype to std.ascii?

2011-06-14 Thread Jonathan M Davis
On 2011-06-14 02:51, David Nadlinger wrote:
 On 6/14/11 11:20 AM, Jonathan M Davis wrote:
  On 2011-06-14 01:51, David Nadlinger wrote:
  But the functions inctype.h  do. And there can be some
  locale-dependent problems even if you use only ASCII, the most prominent
  being the different handling of »i« in the Turkish locale:
  http://www.i18nguy.com/unicode/turkish-i18n.html
  
  This is probably another reason why it shouldn't be called std.ctype…
  
   From the looks of it, that affects extended ASCII but not ASCII (since
   the
  
  Turkish uppercase I isn't even in ASCII). It's definitely a great link
  though. Thanks!
 
 Oh, I was probably a bit unclear – what I meant is that it affects you
 also if you use only ASCII input, since toupper('i') == 221 when your
 locale is tr_TR.ISO-8859-9.

Yes, but the result is extended ASCII, so it doesn't affect anything which 
only deals with pure ASCII. ctype.h deals with extended ASCII, so locales 
actually affect what it's doing. std.ctype only deals in pure ASCII, so it 
wouldn't do anything which would result in a non-ASCII character, and so 
locales shouldn't matter at all. However, if you _do_ want to bring locales 
into it, then a locale like tr_TR.ISO_8859-9 is not going to be able to 
operate purely in ASCII, since the uppercase value of i is 221, which is 
extended ASCII.

So, yes I understood. It's just that as far as I can tell, locales don't 
matter if you're completely restricting yourself to ASCII like std.ctype does. 
And std.ctype is not going to try and deal with locales at this point (and 
likely not ever). I think that that is far better left to unicode. The Turkish 
locale is a great example of why you _want_ to be dealing with unicode when 
dealing with locales. std.ctype is for when you're specifically restricting 
yourself to ASCII (which sometimes can be very useful - e.g. with formatting 
strings or regex strings where all of the special characters are ASCII; using 
unicode functions would just make them slower at no benefit and would risk 
changing behavior based on locale if you brought locales into it). If you're 
not restricting yourself to ASCII, then std.uni is the way to go.

- Jonathan M Davis


Re: Rename std.ctype to std.ascii?

2011-06-14 Thread Jouko Koski

Jonathan M Davis jmdavisp...@gmx.com wrote:


So, yes I understood. It's just that as far as I can tell, locales don't
matter if you're completely restricting yourself to ASCII like std.ctype 
does.


I would not consider it being good idea to include this kind of ascii-only 
utilities in the standard-ish library. It might be best to rename the module 
to std.ascii_for_insular_yankees_others_keep_away so that nobody would use 
it by accident. This way the name would also remind us about the historical 
terms which were used quarter of a century ago when ascii-only ctype.h 
utilities were first suggested to the intenational C standardization 
committee.


--
Jouko 



Re: Rename std.ctype to std.ascii?

2011-06-14 Thread Andrej Mitrovic
Why does std.ctype exist anyway? Can't you use std.uni for both ASCII
and UTF? Or is there some overhead in using the uni functions?


Re: Rename std.ctype to std.ascii?

2011-06-14 Thread Daniel Gibson
Am 14.06.2011 20:58, schrieb Andrej Mitrovic:
 Why does std.ctype exist anyway? Can't you use std.uni for both ASCII
 and UTF? Or is there some overhead in using the uni functions?

I haven't looked at either implementation, but on ASCII everything is
really simple.. isalpha, isdigit, isupper and islower are just a simple
checks if the value is between two values, tolower(dchar c) is just
return isupper(c) ? c+32 : c; etc.
For Unicode this is most probably *much* harder (= more expensive).

Cheers,
- Daniel


Re: Rename std.ctype to std.ascii?

2011-06-14 Thread Timon Gehr
Daniel Gibson wrote:
 Am 14.06.2011 20:58, schrieb Andrej Mitrovic:
 Why does std.ctype exist anyway? Can't you use std.uni for both ASCII
 and UTF? Or is there some overhead in using the uni functions?

 I haven't looked at either implementation, but on ASCII everything is
 really simple.. isalpha, isdigit, isupper and islower are just a simple
 checks if the value is between two values, tolower(dchar c) is just
 return isupper(c) ? c+32 : c; etc.
 For Unicode this is most probably *much* harder (= more expensive).

 Cheers,
 - Daniel

The implementation of toUniLower shortcuts on ASCII characters. I don't expect 
it
to be any slower if not for inlineability. And if somebody really needs the 
speed,
I feel manually writing if('A' = c  c = 'Z') c+=32; (or similar) is just 
good
enough.


Timon


Re: Rename std.ctype to std.ascii?

2011-06-14 Thread Daniel Gibson
Am 14.06.2011 21:29, schrieb Timon Gehr:
 Daniel Gibson wrote:
 Am 14.06.2011 20:58, schrieb Andrej Mitrovic:
 Why does std.ctype exist anyway? Can't you use std.uni for both ASCII
 and UTF? Or is there some overhead in using the uni functions?

 I haven't looked at either implementation, but on ASCII everything is
 really simple.. isalpha, isdigit, isupper and islower are just a simple
 checks if the value is between two values, tolower(dchar c) is just
 return isupper(c) ? c+32 : c; etc.
 For Unicode this is most probably *much* harder (= more expensive).

 Cheers,
 - Daniel
 
 The implementation of toUniLower shortcuts on ASCII characters. I don't 
 expect it
 to be any slower if not for inlineability. And if somebody really needs the 
 speed,
 I feel manually writing if('A' = c  c = 'Z') c+=32; (or similar) is just 
 good
 enough.
 
 
 Timon

OK. I just looked at the implementation and it seems like there are
ASCII-shortcuts in all those unicode functions.
So I agree with Andrej, stc.ctype isn't really needed.

Cheers,
- Daniel


Re: Rename std.ctype to std.ascii?

2011-06-14 Thread Jonathan M Davis
On 2011-06-14 11:53, Jouko Koski wrote:
 Jonathan M Davis jmdavisp...@gmx.com wrote:
  So, yes I understood. It's just that as far as I can tell, locales don't
 
 matter if you're completely restricting yourself to ASCII like std.ctype
 does.
 
 I would not consider it being good idea to include this kind of ascii-only
 utilities in the standard-ish library. It might be best to rename the
 module to std.ascii_for_insular_yankees_others_keep_away so that nobody
 would use it by accident. This way the name would also remind us about the
 historical terms which were used quarter of a century ago when ascii-only
 ctype.h utilities were first suggested to the intenational C
 standardization committee.

For some classes of operations, it makes perfect sense to be checking for 
ASCII characters only. For others, it's just people not worrying about 
internationalization like they should be. For instance, format strings don't 
care about unicode as far as their escape sequences go. %a, %d, etc. are all 
pure ASCII. So, worrying about unicode with them just wouldn't make sense. In 
most cases, isDigit working on the arabic numerals 0 through 9 is _exactly_ 
what people want and need. But if you were to try and make it more unicode-
friendly, would Greek or Chinese numbers count as digits? Maybe, maybe not. It 
gets much more complicated. In some cases, all you care about with isUpper or 
toUpper is ASCII. In others, you want it to deal with unicode (and probably 
locales as well) properly.

std.ctype/std.ascii deals with ASCII for those situations where you really do 
only care about ASCII. It deals with unicode characters, but it returns false 
for everything with them which returns a bool, and it never tries to change 
their case. std.uni actually deals with unicode and worries about things like 
whether a unicode character is uppercase or not.

They're for two different use cases. Most of Phobos should be dealing with 
unicode (e.g. pretty much everything in std.string should be using the std.uni 
functions rather than the std.ascii functions if there's a function which is 
in both), but there are cases where unicode doesn't matter, and you might as 
well have the efficiency available of just dealing with ASCII. Ultimately, 
it's up to the programmer to do the right thing.

- Jonathan M Davis


Rename std.ctype to std.ascii?

2011-06-13 Thread Jonathan M Davis
std.ctype is modeled after C's ctype.h. It has functions for operating on 
characters - particularly functions which indicate the type of a character (I 
believe that ctype stands for character type, so that makes sense). For 
instance, isdigit will tell you whether a particular character is a digit. It 
only works on ASCII characters (non-ASCII characters return false for 
functions like isdigit and functions like toupper do nothing to non-ASCII 
characters).

std.uni, on the other hand, operates on characters just like std.ctype does, 
but it extends its charter to unicode characters (e.g. it has isUniUpper which 
_does_ work on unicode characters, unlike std.ctype's isupper).

The thing is that aside from those familiar with C/C++, most programmers are 
likely to find the module name ctype to be rather uniformative. If they're 
looking for something like isdigit, they're not terribly likely to go looking 
at std.ctype first. And I'm not sure that std.ascii will be all that much more 
obvious to them, but it fits in much better with std.uni. std.ascii gets the 
character functions which operate only on ASCII characters, and std.uni gets 
the character functions which operate on unicode characters in addition to 
ASCII characters.

I don't think that the change of module name is enough of an improvement to 
merit changing the name just because ctype is arguably bad. However, as it 
turns out, _no_ function in std.ctype is properly camelcased, and many of them 
return int instead of bool (which the C functions they're modeled after do but 
which is not particularly D-like and can cause problems when you actual _need_ 
them to return bool). And it has been made very clear in past discussions in 
this newsgroup that the consensus is that we prefer that Phobos functions 
follow Phobos' naming conventions (which means camelcasing) rather than 
matching the casing of functions in other languages. So, all of the functions 
in std.ctype need to be renamed.

I now have a pull request which creates properly camelcased versions of all of 
them ( https://github.com/D-Programming-Language/phobos/pull/101 ). The thing 
is though that because _every_ function in std.ctype is renamed, the cost of 
renaming the entire module (as far as people updating their code to use 
functions such as isDigit instead of isdigit goes) is essentially the same if 
as just renaming the functions in-place. In either case, the old functions 
will go through the full deprecation process before they're actually gone, so 
no one's code will suddenly break because of the changes, but any code that 
uses the old functions will eventually have to be change to use the properly 
named ones. And since the cost to making those changes is essentially the same 
whether we replace the whole std.ctype module or whether we replace all of its 
functions, I'm wondering whether it would be worthwhile to take this 
opportunity to rename std.ctype?

I don't think that the name change is enough of an improvement to do it if 
it's going to break everyone's code, but given that fixing all of its 
functions gives us a perfect opportunity to rename it at no additional cost, I 
feel that the question should be posed.

Should we rename std.ctype to std.ascii? Or should we just keep the old name, 
which is familiar to C programmers?

- Jonathan M Davis


Re: Rename std.ctype to std.ascii?

2011-06-13 Thread Jose Armando Garcia
On Mon, Jun 13, 2011 at 10:28 PM, Jonathan M Davis jmdavisp...@gmx.com wrote:
 std.ctype is modeled after C's ctype.h. It has functions for operating on
 characters - particularly functions which indicate the type of a character (I
 believe that ctype stands for character type, so that makes sense). For
 instance, isdigit will tell you whether a particular character is a digit. It
 only works on ASCII characters (non-ASCII characters return false for
 functions like isdigit and functions like toupper do nothing to non-ASCII
 characters).

 std.uni, on the other hand, operates on characters just like std.ctype does,
 but it extends its charter to unicode characters (e.g. it has isUniUpper which
 _does_ work on unicode characters, unlike std.ctype's isupper).

 The thing is that aside from those familiar with C/C++, most programmers are
 likely to find the module name ctype to be rather uniformative. If they're
 looking for something like isdigit, they're not terribly likely to go looking
 at std.ctype first. And I'm not sure that std.ascii will be all that much more
 obvious to them, but it fits in much better with std.uni. std.ascii gets the
 character functions which operate only on ASCII characters, and std.uni gets
 the character functions which operate on unicode characters in addition to
 ASCII characters.

 I don't think that the change of module name is enough of an improvement to
 merit changing the name just because ctype is arguably bad. However, as it
 turns out, _no_ function in std.ctype is properly camelcased, and many of them
 return int instead of bool (which the C functions they're modeled after do but
 which is not particularly D-like and can cause problems when you actual _need_
 them to return bool). And it has been made very clear in past discussions in
 this newsgroup that the consensus is that we prefer that Phobos functions
 follow Phobos' naming conventions (which means camelcasing) rather than
 matching the casing of functions in other languages. So, all of the functions
 in std.ctype need to be renamed.

 I now have a pull request which creates properly camelcased versions of all of
 them ( https://github.com/D-Programming-Language/phobos/pull/101 ). The thing
 is though that because _every_ function in std.ctype is renamed, the cost of
 renaming the entire module (as far as people updating their code to use
 functions such as isDigit instead of isdigit goes) is essentially the same if
 as just renaming the functions in-place. In either case, the old functions
 will go through the full deprecation process before they're actually gone, so
 no one's code will suddenly break because of the changes, but any code that
 uses the old functions will eventually have to be change to use the properly
 named ones. And since the cost to making those changes is essentially the same
 whether we replace the whole std.ctype module or whether we replace all of its
 functions, I'm wondering whether it would be worthwhile to take this
 opportunity to rename std.ctype?

 I don't think that the name change is enough of an improvement to do it if
 it's going to break everyone's code, but given that fixing all of its
 functions gives us a perfect opportunity to rename it at no additional cost, I
 feel that the question should be posed.

 Should we rename std.ctype to std.ascii? Or should we just keep the old name,
 which is familiar to C programmers?

 - Jonathan M Davis


or deprecate std.ctype and create a new std.ascii.


Re: Rename std.ctype to std.ascii?

2011-06-13 Thread Jonathan M Davis
On 2011-06-13 18:43, Jose Armando Garcia wrote:
 On Mon, Jun 13, 2011 at 10:28 PM, Jonathan M Davis jmdavisp...@gmx.com 
wrote:
  std.ctype is modeled after C's ctype.h. It has functions for operating on
  characters - particularly functions which indicate the type of a
  character (I believe that ctype stands for character type, so that makes
  sense). For instance, isdigit will tell you whether a particular
  character is a digit. It only works on ASCII characters (non-ASCII
  characters return false for functions like isdigit and functions like
  toupper do nothing to non-ASCII characters).
  
  std.uni, on the other hand, operates on characters just like std.ctype
  does, but it extends its charter to unicode characters (e.g. it has
  isUniUpper which _does_ work on unicode characters, unlike std.ctype's
  isupper).
  
  The thing is that aside from those familiar with C/C++, most programmers
  are likely to find the module name ctype to be rather uniformative. If
  they're looking for something like isdigit, they're not terribly likely
  to go looking at std.ctype first. And I'm not sure that std.ascii will
  be all that much more obvious to them, but it fits in much better with
  std.uni. std.ascii gets the character functions which operate only on
  ASCII characters, and std.uni gets the character functions which operate
  on unicode characters in addition to ASCII characters.
  
  I don't think that the change of module name is enough of an improvement
  to merit changing the name just because ctype is arguably bad. However,
  as it turns out, _no_ function in std.ctype is properly camelcased, and
  many of them return int instead of bool (which the C functions they're
  modeled after do but which is not particularly D-like and can cause
  problems when you actual _need_ them to return bool). And it has been
  made very clear in past discussions in this newsgroup that the consensus
  is that we prefer that Phobos functions follow Phobos' naming
  conventions (which means camelcasing) rather than matching the casing of
  functions in other languages. So, all of the functions in std.ctype need
  to be renamed.
  
  I now have a pull request which creates properly camelcased versions of
  all of them ( https://github.com/D-Programming-Language/phobos/pull/101
  ). The thing is though that because _every_ function in std.ctype is
  renamed, the cost of renaming the entire module (as far as people
  updating their code to use functions such as isDigit instead of isdigit
  goes) is essentially the same if as just renaming the functions
  in-place. In either case, the old functions will go through the full
  deprecation process before they're actually gone, so no one's code will
  suddenly break because of the changes, but any code that uses the old
  functions will eventually have to be change to use the properly named
  ones. And since the cost to making those changes is essentially the same
  whether we replace the whole std.ctype module or whether we replace all
  of its functions, I'm wondering whether it would be worthwhile to take
  this opportunity to rename std.ctype?
  
  I don't think that the name change is enough of an improvement to do it
  if it's going to break everyone's code, but given that fixing all of its
  functions gives us a perfect opportunity to rename it at no additional
  cost, I feel that the question should be posed.
  
  Should we rename std.ctype to std.ascii? Or should we just keep the old
  name, which is familiar to C programmers?
  
  - Jonathan M Davis
 
 or deprecate std.ctype and create a new std.ascii.

Well, yes. That's what would be happening. All of the old functions would be 
in std.ctype and put on the deprecation path, while the new std.ascii would 
have the new, properly camelcased functions in it. But what that's effectively 
doing is renaming std.ctype to std.ascii. It's just that std.ctype will stick 
around with its old functions until it's gone through the full deprecation 
cycle.

- Jonathan M Davis


Re: Rename std.ctype to std.ascii?

2011-06-13 Thread Andrej Mitrovic
I'm all for it. I've never liked ctype, and I got lost trying to find
ascii functions since I didn't know where to look. The first time I
saw ctype I thought it was a collection of C type aliases.. heh.


Re: Rename std.ctype to std.ascii?

2011-06-13 Thread Andrej Mitrovic
Come to think of it, I think I had a note in a todo somewhere that
said post a feature request to change ctype to ascii. It's a good
standard name.


Re: Rename std.ctype to std.ascii?

2011-06-13 Thread Jouko Koski


Jonathan M Davis jmdavisp...@gmx.com wrote:

std.ctype is modeled after C's ctype.h. It has functions for operating on
characters - particularly functions which indicate the type of a character 
(I

believe that ctype stands for character type, so that makes sense). For
instance, isdigit will tell you whether a particular character is a digit. 
It

only works on ASCII characters (non-ASCII characters return false for
functions like isdigit and functions like toupper do nothing to non-ASCII
characters).


What is your definition for ASCII character?

Most of the ctype.h functions (or macros) are locale dependent, see 
setlocale() and locale.h. And there is the wctype.h, too.


While the C standardized ways of doing things might not be most appropriate 
approach in D domain, we must not base our design decisions on deficient 
analysis. I just want this text uppercase is one of the hardest things in 
the _world_. The problem is not just the header or package naming.


--
Jouko